You are on page 1of 185

Departamento de Informtica

Universidad de Oviedo

Tesis Doctoral

Soft computing and machine learning for


image segmentation by deformable models

Nicola Bova

Universidad de Oviedo
Soft computing and machine learning for
image segmentation by deformable models
Memoria que presenta
Nicola Bova
Para optar al grado de Doctor por la Universidad de
Oviedo
Septiembre de 2013
Directores
scar Cordn Garca
scar Ibez Panizo
Departamento de Informatca

UNIVERSIDAD DE OVIEDO
Vicerrectorado de Internacionalizacin y
Postgrado

AUTORIZACIN PARA LA PRESENTACIN DE TESIS DOCTORAL


Ao Acadmico: 2013/2014
1.- Datos personales del autor de la Tesis
Apellidos: Bova

Nombre: Nicola

DNI/Pasaporte/NIE: Y1424997-T

Telfono: 633211739

Correo electrnico:
nicola.bova@gmail.com

2.- Datos acadmicos


Programa de Doctorado cursado: Ingeniera Informtica
rgano responsable: Comisin Acadmica del Programa de Doctorado

FOR-MAT-VOA-009-BIS
FOR-MAT-VOA-009-BIS

Departamento/Instituto en el que presenta la Tesis Doctoral: Informtica


Ttulo definitivo de la Tesis
Espaol/Otro Idioma: Soft computing y
aprendizaje automtico para la segmentacin
de imgenes con modelos deformables

Ingls: Soft computing and machine learning


for image segmentation by deformable
models

Rama de conocimiento: Inteligencia Artificial y Soft Computing


3.- Autorizacin del Director/es y Tutor de la tesis
D/D:
DNI/Pasaporte/NIE:
Oscar Cordn Garca
45281118-Y
Departamento/Instituto:
Universidad de Granada. Dpto. de Ciencias de la Computacin e Inteligencia Artificial
D/D:
Oscar Ibez Panizo
Departamento/Instituto:
European Centre for Soft Computing
Autorizacin del Tutor de la tesis
D/D:
Oscar Ibez Panizo
Departamento/Instituto:
European Centre for Soft Computing

D/D:
44432787-F

DNI/Pasaporte/NIE:
44432787-F

Autoriza la presentacin de la tesis doctoral en cumplimiento de lo establecido en el


Art.29.1 del Reglamento de los Estudios de Doctorado, aprobado por el Consejo de
Gobierno, en su sesin del da 21 de julio de 2011 (BOPA del 25 de agosto de 2011)
En Mieres, a 16 de Septiembre de 2013
Director/es de la Tesis

Fdo.:

scar Cordn Garca

Tutor de la Tesis

Fdo.: scar Ibez Panizo Fdo.: scar Ibez Panizo

SR. DIRECTOR DE DEPARTAMENTO DE SR. DIRECTOR DE DEPARTAMENTO DE INFORMATICA


SR. PRESIDENTE DE LA COMISIN ACADMICA DEL PROGRAMA DE DOCTORADO EN INGENIERIA
INFORMTICA

To those I love

Acknowledgment
First of all, I would like to point out that this work has been funded by the European Commission under the MIBISOC project (Grant Agreement: 238819, within the action Marie Curie
Initial Training Network of the 7FP) and is supported by the Spanish Ministerio de Economa
y Competitividad under project TIN2012-38525-C02-01 (SOCOVIFI2).
Then, I would like to gratefully and sincerely thank my advisors, scar Cordn and scar
Ibez, for letting me be a part of the MIBISOC project1 and for the endless motivation, support, guide, and friendship they gave me throughout all the phases of my doctorate. Their
mentorship was paramount in providing a well rounded experience consistent with my longterm career goals.
I feel the need to express gratitude to Stefano Cagnoni, for giving me the opportunity to
start this journey.
I wish to thank Viktor Gal, for his friendship and for the help in developing the system
detailed in Chapter 3.
Moreover, I cannot forget Li Bai, who made it possible for me to go on a secondment at
her department in Nottingham University, Nottingham, UK and for the long talks we had on
many interesting topics.
I express gratitude also to Olivier Colliot and Eric Bardinet, for letting me go on a secondment at the Brain and Spine Institute, La Piti-Salptrire Hospital, Paris, France.
Besides, I would like to thank Krzysztof Trawiski, for his friendship and for the kind
suggestions about the application of ensemble classifiers.
I show gratitude to Andrea Valsecchi for his friendship and for the support in performing
the statistical analysis corresponding to Sec. 2.6.4.
I am also grateful to Arnaud Quirin, for his friendship and for his helpful suggestions
about clustering.
I wish to thank David Feltell, for his kind support and for providing a level set implementation.
I cannot forget Jorge Novo, as part of the code related to DE for TAN optimization (Sec.
2.6) was provided by him and the VARPA group, University of A Corua, Spain.
Moreover, I am grateful to Luciano Sanchez, for his precious help on the understanding
of a machine learning method.
I also show gratitude to Sergio Damas, for having taken care of my needs during the time
I spent at the European Centre for Soft Computing.
1
MIBISOC: Medical Imaging Using Bio-inspired and Soft Computing is a Marie Curie Initial Training Network granted by the European Commission within the Seventh Framework Program (FP7 PEOPLE-ITN- 2008).

4
I feel the need to thank the University of Oviedo and, above all, the European Centre for
Soft Computing, where I had the opportunity to learn part of the techniques that made this
dissertation possible while, at the same time, working in a happy and nice environment.
Besides, I cannot forget the colleagues and, above all, the friends of the MIBISOC project
with whom I shared many interesting moments. I wish you all the best throughout your
future careers.
Moreover, I would like to thank all the new friends I made during my three years in
Asturias. I really had an amazing time while spending these moments with you. But I cannot
forget the quantity of friends I have in Italy (and abroad). I really miss you all, guys.
Finally, and most importantly, I feel the need to thank all my family, those who are near,
those who are distant, and those who are too much far away, now. I love you all.
In particular, I am eternally grateful to my parents, for their endless love and because
what I am today is only because of them.
I would like to thank Angie and Ale, I am sure you are as proud of me as I am of you.
The last words are for Caterina. Her support, encouragement, quiet patience and unwavering love were the pillars upon which I built the past ten years. Thank you for being such
a sweet companion along this lengthy journey we call life.

Contents
Contents

Resumen

Abstract

11

Introduction

13

Preliminaries
1.1. Deformable models . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1. Parametric deformable models . . . . . . . . . . . . . .
1.1.2. Geometric deformable models . . . . . . . . . . . . . .
1.2. Soft Computing . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3. Evolutionary Computation . . . . . . . . . . . . . . . . . . . . .
1.3.1. Conceptual Foundations of Evolutionary Computation
1.3.2. The scatter search template . . . . . . . . . . . . . . . .
1.4. Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1. Applications . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.3. Random Forest . . . . . . . . . . . . . . . . . . . . . . .
1.5. Visual descriptors . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1. Haralick features . . . . . . . . . . . . . . . . . . . . . .
1.5.2. Gabor filters . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.3. Local binary patterns . . . . . . . . . . . . . . . . . . . .
1.5.4. Histogram of oriented gradients . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

21
21
23
35
43
44
44
46
51
52
53
54
55
56
60
61
62

New advances in Topological Active Nets: model extension and global optimization
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1. Codification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2. Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3. Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.4. Critical Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3. Extended Topological Active Net . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1. Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2. Optimization process . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65
65
66
67
67
69
70
71
71
78

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

CONTENTS

2.3.3. Complementary tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82


2.4. Extended Topological Active Net model performance study . . . . . . . . . . 84
2.4.1. Experimental design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.4.2. Image datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.4.3. Analysis of the obtained results . . . . . . . . . . . . . . . . . . . . . . . 87
2.5. A Scatter Search Framework for Extended Topological Active Nets optimization 92
2.5.1. Motivation for the use of Scatter Search in ETAN optimization . . . . . 93
2.5.2. Scatter Search-based framework overview . . . . . . . . . . . . . . . . . 94
2.5.3. Objective function definition: internal energy terms . . . . . . . . . . . 96
2.5.4. Diversity function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.5.5. Diversification generation method . . . . . . . . . . . . . . . . . . . . . 99
2.5.6. Solution combination method . . . . . . . . . . . . . . . . . . . . . . . . 100
2.6. Performance study for the optimization of Extended Topological Active Nets
with Scatter Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
2.6.1. Experimental design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
2.6.2. Image dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
2.6.3. Analysis of the obtained results . . . . . . . . . . . . . . . . . . . . . . . 108
2.6.4. Statistical analysis of the results . . . . . . . . . . . . . . . . . . . . . . . 110
2.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3

Deformable models supervised guidance: a novel paradigm for automatic segmentation


3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1. Machine learning-based deformable model initialization approaches .
3.2.2. Machine learning-based deformable model energy term generation approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3. Machine learning and deformable models for image recognition and
understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4. Machine learning-based deformable model guidance approaches . . .
3.3. A general framework for deformable models supervised guidance . . . . . . .
3.4. A specific implementation of the framework for medical image segmentation
3.4.1. Localizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2. Deformable Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.3. Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.4. Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.5. Integration mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1. SCR database: lungs segmentation in chest radiographs . . . . . . . .
3.5.2. Allen Brain Atlas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

113
113
114
116
118
120
121
123
127
127
129
130
132
133
137
137
140
145

Conclusions and future work


149
4.1. Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.2. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

Conclusiones y trabajos futuros

155

CONTENTS

Conclusiones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Trabajos futuros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Bibliography

161

Abbreviations

179

Resumen
La Segmentacin de Imgenes (SI) es una tarea clave en Visin por Ordenador basada en
dividir una imagen en sus regiones constituyentes, las cuales comparten ciertas caractersticas
visuales. Entre las tcnicas de SI, los Modelos Deformables (MDs) son enfoques prometedores
que abordan el problema a travs de la explotacin de los datos de la imagen, junto con
conocimiento a priori. En general, los MDs son curvas que detectan caractersticas de inters
en imgenes por medio de la minimizacin de una funcin de energa, cuyo mnimo global
debe corresponder a la posicin ideal del MD. Por lo tanto, el proceso de ajuste del modelo
se aborda como un problema de optimizacin numrica.
A pesar del xito de los MDs, estas tcnicas de SI an presentan varios problemas. Algunos MDs tienen dificultades para manejar cambios topolgicos avanzados y los mtodos
de optimizacin comunes pueden provocar imprecisiones, es decir, ptimos locales desde el
punto de vista de optimizacin. Adems, es posible que los MDs tengan problemas relacionados con la forma en la que se ajustan. De hecho, la definicin adecuada de la funcin de
energa es crtica para la SI. Expresar la correlacin entre el mnimo global de la funcin y
la segmentacin ideal con una formulacin matemtica slida es una tarea difcil, si no imposible, si queremos que dicha formulacin proporcione resultados precisos en diferentes
escenarios de SI. Estos inconvenientes reducen la aplicabilidad de tcnicas de SI basadas en
MDs.
En esta tesis, proponemos distintas soluciones a los problemas anteriores. Para ello, mejoramos las tcnicas de SI basadas en MDs mediante la aplicacin de mtodos procedentes
de dos reas de investigacin distintas: la Computacin Evolutiva (CE) y el Aprendizaje Automtico (AA). Mientras que la primera constituye una clase de mtodos de optimizacin
eficientes y bio-inspirados utilizados para proporcionar soluciones a problemas computacionalmente difciles, la segunda engloba algoritmos que pueden aprender modelos a partir de
los datos que describen el comportamiento de un sistema real. Por un lado, nos centramos en
un MD especfico, las Mallas Topolgicas Activas (MTAs). Una MTA es un MD que integra
capacidades de SI basadas tanto en regiones como en fronteras. Se define como un conjunto de nodos relacionados entre s y dispuestos como una malla deformable. A pesar de sus
caractersticas prometedoras, la aplicabilidad de las MTAs es reducida debido a limitaciones
tanto en la gestin de los cambios topolgicos como en el mtodo de optimizacin empleado, que puede conducir a ptimos locales de la funcin de energa. Esta ltima cuestin se ha
abordado mediante la incorporacin de las MTAs en un marco de bsqueda global basado
en CE. Sin embargo, mientras que las propuestas anteriores han sido eficaces en evitar los
mnimos locales, han fracasado en disear operadores evolutivos adecuados para combinar
de manera efectiva las mallas. En esta tesis, introducimos una versin extendida de la MTA
con el fin de solucionar los problemas del modelo original y abordamos la tendencia de las
9

10

RESUMEN

MTAs a quedar atrapadas en mnimos locales incorporando el MD en un novedoso marco de


bsqueda global basado en un algoritmo evolutivo avanzado, la Bsqueda Dispersa.
Por otra parte, proponemos un marco genrico y flexible para la segmentacin automtica
de imgenes empleando tcnicas de AA que ejercen un control directo sobre el ajuste del contorno del MD. Adems, ofrecemos una implementacin del sistema propuesto mediante la
interaccin de un conjunto especifico de componentes elegidos con el objetivo de segmentar
imgenes mdicas de distinta naturaleza.
Los mtodos propuestos se comparan con mtodos de SI del estado del arte. En todos los
casos nuestros modelos han demostrado ser competitivos e incluso mejores que los competidores en los conjuntos de imgenes empleados.
El trabajo realizado ha dado como resultado dos artculos de revistas JCR y dos artculos
de congreso. Est previsto un tercer artculo que ser sometido a una revista JCR prximamente.

Abstract
Among computer vision processes, Image Segmentation (IS) is a key task. It subdivides
an image into its constituent regions or objects sharing certain visual characteristics. Among
IS techniques, Deformable Models (DMs) are promising approaches that tackle the problem
by exploiting constraints derived from the image data together with specific a priori knowledge. Generally, DMs are curves that detect features of interest in images by means of the
minimization of an energy function. The function is defined such that its global minimum
corresponds to the ideal DM position. Therefore, the model adjustment process is tackled as
a numerical optimization problem.
Regardless the extensive and successful use of DMs, these SI techniques are still affected
by several issues. At first, some DMs have difficulties in managing advanced topological
changes and common optimization methods can lead to inaccuracies, that is, local optima
in the sense of optimization. Besides, it is possible that DMs in general present problems
related to the the way they are adjusted. In fact, since the global minimum of the energy
function should correspond to the ideal segmentation result, the proper definition of this
function is critical to get a good outcome of the segmentation process. However, expressing
this correlation with a robust mathematical formulation is often a daunting task, especially
when applied to different IS scenarios. These reasons greatly reduce the applicability of DM
in IS.
In this dissertation we aim at providing solutions to the DM problems discussed so far. To
do so, we improve DM-based IS techniques by applying methods from both the Evolutionary
Algorithms (EAs) and Machine Learning (ML) fields. While the former constitutes a class of
efficient, bio-inspired optimization methods used to provide solutions to computationally
intractable problems, the latter concerns the study of algorithms that can learn models from
data.
On the one hand, we focus on a specific DM, Topological Active Nets (TANs). A TAN is a
DM integrating region-based and boundary-based segmentation capabilities. It is defined by
a set of interrelated nodes arranged as a deformable mesh. Despite their promising features,
TANs have experienced a limited adoption due to limitations regarding difficulties in managing topological changes and the employed local search-based optimization method, which
can lead to local optima of the optimization function. The latter issue has been addressed
by embedding TANs in a EA-based global search framework, able to consider multiple alternatives in the segmentation process. However, while previous EA-based proposals were
effective in avoiding local minima, they failed to design proper evolutionary operators able
to effectively combine nets, thus negating the main advantage of a global search approach.
To solve these issues, we introduce an extended version of the TAN model by providing a
solution to the problems of the original model. Then, we tackle the tendency of the TAN to
11

12

ABSTRACT

get stuck in local minima of the energy function by embedding the DM in a novel EA-based
global search framework.
On the other hand, we propose an accurate, flexible, and general purpose system for automatic IS employing ML techniques to perform a direct guidance of the DM contour. We
provide an implementation of the proposed system designed as the interaction of a specific
set of components. They were chosen aiming at providing accurate results in different medical IS scenarios.
The proposed methods were compared with state-of-the-art IS techniques. They were
competitive or even outperformed them on various image datasets.
The work carried out resulted in two JCR journal articles and two conference papers. A
third JCR journal article will be submitted in the near future.

Introduction
Artificial Intelligence (AI) is a branch of computer science and engineering dealing with
making intelligent machines [145]. It has become an essential part of the technology industry and many of the most difficult problems in computer science have been tackled with AI
techniques. AI goals include reasoning, knowledge, planning, learning, communication, perception and the ability to move and manipulate objects [138, 158, 181, 191]. The research carried
out in the field is highly technical and specialized. It is deeply divided into several subfields
focused on specific problems and applications. Interesting opportunities can arise from the
integration of the abilities and skills coming from different AI subfields.
Among other AI branches, Soft Computing (SC) [22, 236] is employed to tackle reasoning,
learning, and planning tasks. In particular, SC refers to a family of robust, intelligent systems,
that in contrast to precise, traditional modes of computation, are able to deal with computationally expensive tasks, such as NP-complete problems, even in case of vague, noisy, and
partial knowledge.
Within SC, Evolutionary Algorithms (EAs) constitute a class of search and optimization
methods with the ability to efficiently provide a solution to difficult and often computationally intractable problems. EAs imitate the principles of natural evolution [82, 95]. Indeed, an
EA maintains and processes a population of individuals, each one describing a set of genetic
parameters that encode a candidate solution to the optimization problem. The fitness of an
individual with respect to the optimization task is described by a scalar objective function.
Genetic operators such as recombination and mutation are applied to the parents in order to
generate new candidate solutions. The result of this evolutionary cycle is a set of more and
more suitable solutions to the optimization problem, according to the Darwinian principle
of survival of the fittest.
Along with SC, Machine Learning (ML) is the branch of AI that concerns learning. It studies the design of systems that improve automatically through experience. Hence, ML concerns the construction and study of computer algorithms that can learn from data [5]. Core
aspects of ML are representation and generalization. In the first place, representation of data
instances and functions evaluated on these instances are part of all ML systems. Moreover,
generalization is a highly desirable property as it is related to the chance to perform well on
unseen data instances. The ability to represent and generalize knowledge to provide accurate
predictions over new data is the essence of learning.
Before applying reasoning and learning on a given set of information items, however, it
is necessary to deal with the techniques that perceive these items, that is acquire and gather
them. Within the large group of sources of information, vision is probably the most important
one. It is the most advanced of human senses, so it is not surprising that images play the
single most important role in human perception [84].
13

14

INTRODUCTION

Computer vision is the enterprise of automating and integrating a wide range of processes
and representations for vision perception [12]. It includes methods for acquiring, processing,
analyzing, and understanding images. When applied within AI, computer vision researches
methods aiming at duplicating the capabilities of human vision. It does so by electronically
perceiving and understanding an image [203] considering models constructed with the aid
of geometry, physics, statistics, and learning theory [74].
Unlike humans, who are limited to the visual band of the electromagnetic spectrum,
imaging machines cover almost the entire electromagnetic spectrum, ranging from gamma
to radio waves. They can operate on images generated by sources that humans are not accustomed to associating with images. Thus, digital image processing encompasses a wide and
varied field of applications.
There are no clear cut boundaries in the continuum from image processing at one end to
computer vision at the other. However, one useful paradigm is to consider three types of computerized processes in this continuum: low-, mid-, and high-level processes [84]. Low-level
processes involve primitive operations such as image preprocessing to reduce noise, contrast
enhancement, and image sharpening. A low-level process is characterized by the fact that
both its inputs and outputs are images. Mid-level processing on images involves tasks such
as segmentation (partitioning an image into regions or objects), description of those objects to
reduce them to a form suitable for computer processing, and classification (recognition) of individual objects. A mid-level process is characterized by the fact that its inputs are generally
images, but its outputs are attributes extracted from those images (e.g., edges, contours, and
the identity of individual objects). Finally, higher-level processing involves giving sense of
an ensemble of recognized objects, as in image analysis, and, at the far end of the continuum,
performing the cognitive functions normally associated with vision.
Among the numerous processes in the field, Image Segmentation (IS) [84, 198] is a key
task. It is a critical issue as the quality of its outcomes has a strong influence on the posterior image understanding task. IS subdivides an image into its constituent regions or objects
by assigning a label to every pixel such that pixels with the same label share certain visual
characteristics. Its goal is to simplify and/or change the representation of an image into
something that is more meaningful and easier to analyze [198]. Segmentation accuracy determines the eventual success or failure of computerized analysis procedures. For this reason,
considerable care should be taken to improve the probability of rugged segmentation [84].
Some of the practical applications of IS are:
Medical imaging, where it is employed for tasks such as tumor or other pathologies location, computer-guided surgery, and diagnosis.
Machine vision, where IS is used to provide imaging-based automatic inspection, analysis, process control, and robot guidance in the industry field.
Object detection, with targets from a broad range of fields, such as automotive (pedestrian, road, lights, etc.) and satellite sensing (roads, forests, crops, etc.).
Content-based image retrieval, where IS is used to search for digital images in large databases.
In this case the search analyzes the contents of the image (in terms of colors, shapes,
textures, or any other information that can be derived from the image itself) rather than
the metadata (such as keywords, tags, or descriptions associated with the image).

15
Video surveillance, where IS is employed to automatically monitor sensible areas but also
for demanding real-time applications such us traffic control systems.
Recognition, among with other applications, is used for biometric authentication to perform the automated recognition of individuals on the basis of their biological and behavioral traits (fingerprint, face, iris, palmprint, retina, hand geometry, signature, and
gait).
In particular, medical IS is of immense practical importance in medical informatics. Medical images, such as Computed Axial Tomography (CAT), Magnetic Resonance Imaging (MRI),
Ultrasound, and X-Ray, are processed and analyzed to extract meaningful information such
as volume, shape, motion of organs; to detect abnormalities, and to quantify changes in
follow-up studies [72]. Manual segmentation of medical images by the radiologist is not
only a tedious and time consuming process, but also it is not very accurate especially with
the increasing medical imaging modalities and unmanageable quantity of images that need
to be examined. Automated image segmentation [72], which aims at automated extraction of
object boundary and region features, plays a fundamental role in understanding image content. This is a challenging task due to regions with boundary insufficiencies, lack of texture
contrast between Regions Of Interest (ROIs) and background, non-homogeneous intensities
within the same class of tissue, high complexity of anatomical structures, as well as a high
variability. Given its difficult nature and the large number of algorithms competing in the
field, medical imaging is an ideal test terrain for novel generic IS approaches.
On a general basis, the level to which the object subdivision is carried depends on the
problem being solved. That is, IS should stop when the objects of interest in the images
considered for a problem have been isolated. While some applications, such us automated
inspection in assembly lines, often only consider simple image primitives and make use of
structured environments to perform specific anomalies detection, segmentation of non trivial
images is one of the most difficult tasks in image processing.
IS algorithms are generally based on one of the two basic properties of intensity values:
discontinuity and similarity [84]. In the first category, the approach is to partition an image
based on abrupt changes in intensity, such as edges in an image. The principal approaches in
the second category are based on partitioning an image into regions that are similar according to a set of predefined criteria. Thresholding, region growing, and region splitting and
merging are examples of methods in this group.
Among IS techniques, Deformable Models (DMs) [147] are promising and actively researched model-based approaches to tackle a huge variety of IS problems. Generally, DMs
are curves, surfaces or solids that detect features of interest in images by means of the minimization of an energy function. Ideally, the energy function is defined in such a way that
its value, when calculated for a DM correctly segmenting the target object, corresponds to
its global minimum. Therefore, the model adjustment process is tackled as a numerical optimization (energy minimization) problem. The widely recognized potency of DMs stems from
their ability to segment, match, and track images by exploiting constraints derived from the
image data (in a bottom-up fashion) together with a priori knowledge about the location,
size, and shape of these structures (following a top-down approach) [147].
DMs are split into two families, parametric and geometric DMs [154]. While the former
represent the model explicitly and are guided by acting on their control points, the latter
represent the model implicitly as a level set of a higher-dimensional scalar function. Both

16

INTRODUCTION

families have advantages over each other. On the one hand, parametric DMs are quick and
easy to deform but have some issues regarding topological changes and adaptability to highly
complex shapes. On the other hand, geometric DMs can handle topological changes easily
but are slower than their parametric counterparts as well as it is harder to enforce restrictions
on their shape.
Regardless the extensive and successful use of DMs for IS in recent years [92, 147], they
are still affected by several issues. On the one hand, some DMs have difficulties in managing
advanced topological changes, heavy local deformations, and the definition of the energy
function. Moreover, employed optimization methods (often based on a local search) can lead
to result inaccuracies, that is, local optima in the sense of optimization. On the other hand,
it is possible that DMs present problems related to the the way they are adjusted. As said,
the usual approach to develop IS using DMs is by adjusting their models by minimizing the
value of an associated energy function. Usually, this function is based on image features,
prior knowledge, and the state of the model. Since the global minimum of the energy function should correspond to the ideal segmentation result, the proper definition of this function
is critical to the outcome of the segmentation process. Actually, in case of loose or suboptimal
correlation between energy function and segmentation results, even an ideal optimizer would
yield unsatisfactory outcomes. However, in the general case, expressing this correlation with
a robust mathematical formulation is a daunting task. In fact, energy definitions greatly vary
as a function of the application, image modality, type of DM in use, and target structure. In
particular, they generally use a large set of obscure weighting coefficients that are hard and
time-consuming to tune [194]. For these reasons, the need to define an appropriate function reflecting the energy-result greatly reduces the applicability of optimization-based DM
segmentation techniques.
The aim of this dissertation is to provide solutions to the DM problems discussed so far
in order they can perform IS in a more accurate way. On the one hand, SC and, in particular,
EAs represent a very attractive technique for this task as they can provide effective solutions
within the standard optimization approach. In fact, they are able to considering multiple
solutions in the search space thus having more chances to avoid local minima of the energy
function. On the other hand, ML provides a set of appealing techniques to tackle the adjustment of DMs from a different perspective. In fact, an alternative approach could be a novel
type of DM guidance process in which the necessary information is derived from a set of example data obtained from the images to be segmented, removing the need to a priori define
the energy function.
In order to do so, we will work with two different kinds of DMs, Topological Active Nets
(TANs) and Level Sets (LSs). While the former will be considered to develop solutions based
on the standard optimization-based approach, the latter will be used to illustrate the application of an alternative ML-based approach.
TANs [8, 9, 28] are attractive parametric DMs integrating region-based and boundarybased segmentation capabilities. The model is defined by a set of interrelated nodes arranged
as a deformable mesh. The nodes are divided in two subsets dealing with different tasks.
While the external nodes fit the target objects boundaries, the internal ones adapt to the inner regions of the targets. Topological changes are handled by cutting links between nodes
and, consequently, by commuting the type of the nodes as the model adapts to multiple objects. The TAN is adjusted according to a specific energy function taking into account image
features along with spatial relationships among neighboring nodes. Since each TAN configuration has a value of the energy function associated, the segmentation process is tackled as

17
an energy minimization problem.
TANs have been applied to different IS problems including iris location [162], stereo
matching [192], and medical CT image segmentation [102, 160, 163, 167] with successful results. However, despite their promising features, TANs have experienced a limited adoption
due to two strong limitations. On the one hand, the model is complex and has difficulties
in managing advanced topological changes, as well as it has issues related to the external
energy definition, mesh correctness and integrity, and heavy local deformations. None of
the works in the TAN literature has addressed these issues, so far. On the other hand, the
employed local search-based optimization method can lead to local optima of the optimization function. In this case, however, a reduced number of researchers addressed this issue
by embedding TANs in a global search framework [102, 160, 163, 167], able to consider multiple alternatives in the segmentation process. In order to do so, they hybridized TANs with
an EA, achieving encouraging results. In fact, when dealing with embedding TANs in a
global search framework considering multiple solutions, EAs represent a very appropriate
approach. On the one hand, EA-based proposals [102, 160, 163, 167] are inherently able to
consider multiple segmentation candidates at the same time, significantly reducing the risk
to fall in local minima. On the other hand, evolutionary-based mechanisms provide tools to
combine different solutions, with the capability, in the ideal case, of coalescing meshes with
diverse, or even complementary, characteristics. This fact contributes to enhance the overall
quality of the solution while, at the same time, improving the convergence time. However,
while previous EA-based works were effective in avoiding local minima, they failed to design proper evolutionary operators able to effectively combine nets. As a consequence, they
required very large populations of solutions to operate, thus negating the main advantage of
a global search approach. For these reasons, TANs are ideal candidates for the application
of advanced, EA-based optimization techniques.
Differently from TANs, LSs are geometric DMs [31, 141] based on curve evolution theory [116, 117, 193] and the LS method [173, 195]. In the LS method, when segmenting an Ndimensional image, the curve is represented implicitly as a level set of an (N +1)-dimensional
scalar function. As a result, topology changes can be handled automatically. Thus, the guidance of the models does not require to consider the topology while adapting the model, easing the whole process. Given this enticing capability, LSs have been extensively used in medical IS [68], a field that demands high accuracy but, at the same time, presents subjectivity in
the ideal segmentation definition attending the particular clinician. For these reasons, LSs are
particularly attractive as the DM of choice for the exploration of novel, ML-based approaches
in the DM-based IS field.

Objectives
The aim of this dissertation is to improve DM-based IS techniques employing both standard and alternative design approaches. In particular, the contribution is two-fold. On the
one hand, we will focus on a specific parametric DM, TANs, and will aim to enhance the
established TAN model to overcome the intrinsic model limitations still unaddressed. Moreover, we will try to embed the TAN in a different global search framework with specifically
designed components to improve the whole optimization process in terms of effectiveness
and efficiency. On the other hand, we will approach DM-based IS solutions from a different
perspective by exploring ML methods as an alternative to optimization approaches to adapt

18

INTRODUCTION

the DM. In this case, we aim at proposing a general purpose, ML-based segmentation framework able to learn from data, which will be particularly tested considering the use of LSs.
Specifically, these objectives are divided into the following ones:
Study the state of the art in TAN optimization. We aim at reviewing all the relevant proposals in the field, concerning both the model definition and the optimization procedure.
We will analyze the discipline to point out current drawbacks in adjusting to complex
shapes, external energy definition, and mesh integrity. Moreover, we will detail the
existing EA-based proposals, focusing on the issues related to the population size and
the considered evolutionary operators.
Propose an extension of the TAN model. First of all, we aim at incorporating to TAN an
external energy definition based on a recent technique performing the convolution of
vector fields. Moreover, we will try to address the ability of the model to tackle topological changes, such as link cuttings, net division, and hole segmentation to better
deal with holes or complex shapes of the target objects. Moreover, differently from the
original TAN model, we will always consider the integrity and correctness of the mesh
during the adjustment procedure. Finally, we aim at endowing the extended TAN with
a pre-segmentation phase to perform automatic initialization of the net.
Inclusion of the extended TAN in a convenient global search framework. We aim at addressing the population size issue of previous TAN evolutionary-based proposals. To do so,
we consider embedding TANs in a different EA-based global search framework relying on solution combinations and controlled randomizations. Moreover, we will try to
customize the framework components intending to develop specific evolutionary operators able to effectively deal with the IS problem at hand. To do so, we will consider
the use of the scatter search EA [80, 123], whose flexibility makes it an ideal candidate
to be used for the current global optimization task. In particular, we aim at introducing
proper solution combination operators able to coalesce meshes in an effective way, a
global search-suitable internal energy term, an appropriate solution diversity function,
and a frequency-memory population generator.
Study the state of the art in ML application to DMs. We aim at reviewing the relevant
proposals dealing with the application of ML techniques to the adjustment of DMs. We
will try to propose a taxonomy according to the different uses given to ML techniques.
Special attention will be put on those methods presenting an alternative approach to
the classical energy minimization configuration.
Propose an ML-based generic framework for IS using DMs. We aim at proposing an IS
framework able to learn a segmentation model from examples in an automatic fashion. Therefore, given a training set of reference images, the system should be able to
automatically locate the target ROI and segment it with minimal human intervention.
In particular, the framework will not rely on optimization techniques, removing the necessity to define an energy function. Finally, as the framework will be designed with
flexibility in mind, we intend to allow the use of any kind of DM.
Develop an ML-based IS framework implementation for medical imaging. To prove the feasibility of a ML-based IS framework, we aim at providing an effective implementation
tailored to the medical imaging field. We will choose and design specific components

19
of the generic framework so that the resulting model becomes competitive with stateof-the-art proposals performing the segmentation of different structures in different
medical image modalities. To do so, we will consider the LS DM.
Analyze the performance of the proposed methods. We aim at validating the designed IS
methods with a proper experimental study. In particular, we intend to compare them
with other state-of-the-art IS algorithms over various image sets of different typologies. To perform this comparison, we will select appropriate IS metrics from the field
to quantitatively measure the obtained performance. Finally, with the purpose of explaining and interpreting the obtained results, we intend to examine them in detail,
also employing statistical analysis when needed.

Structure
In order to achieve the previously described objectives, the current PhD dissertation is
organized in four chapters. The structure of each of them is briefly introduced as follows.
In Chapter 1 we introduce the preliminary background information of the wide range of
techniques required for a proper understanding of the work developed. First of all, we develop a survey of DMs with a particular interest toward those employed in this work. Then,
we introduce SC and evolutionary computation, focusing on the Scatter Search (SS) optimization framework which will be considered in one of our proposals. We also introduce the basis
of ML with a focus on the methods considered along this dissertation. Finally, we consider
the techniques to extract characteristic features from images by reviewing some image descriptors commonly adopted in literature.
Chapter 2 is devoted to the research work carried out to study, enhance and evaluate
our TAN-based segmentation algorithm. We start reviewing the TAN state-of-the-art approaches. Then, we present an extended version of the TAN model by describing the novel
components in depth. We also propose a specific local search method to optimize the extended TAN model. We compare our proposal to a significant set of parametric and geometric DMs on a mix of both synthetic and real-world images. Later, we introduce a novel global
framework for the optimization of the extended TAN, providing motivation for the chosen
SS paradigm and detailing the customized components. Finally, we test the new SS-based
framework against the best performing TAN-based global search optimization proposals.
In Chapter 3 we present an ML-based procedure as an alternative to the classical optimization formulation for DM guidance. To do so, we first study the relevant literature and
categorize common existing approaches. Then, we present a general purpose DM framework able to learn the segmentation model from examples extracted from training images.
Differently from optimization-based approaches, the proposed system directly guides the
DM evolution through common ML algorithms. To show the feasibility of our alternative
approach, we provide a reference implementation tailored to the medical imaging field using LSs as DM, an ensemble classifier as ML method, the part-based object detector introduced in [78] as localizer, and a proper set of image visual descriptors. We test it against a
large set of state-of-the-art segmentation algorithms over two image datasets with different
characteristics.
Finally, in Chapter 4 we draw some conclusions by summarizing the most relevant achievements. We also suggest future work for further improvements related to the DM-based IS
research line we dealt with in this dissertation.

CHAPTER

Preliminaries
In this chapter we introduce some concepts that we will use in the rest of this dissertation.
First, we deal with DMs, the leitmotif of this dissertation. They are a group of methods widely
employed in computer vision to perform IS. Then, we introduce SC, an umbrella of techniques
to provide inexact solutions to computationally hard tasks. These techniques are tolerant of
imprecision, uncertainty, partial truth, and approximation. In particular, we focus on EC, a
set of methods to tackle optimization problems in an efficient way. Subsequently, we present
ML, a branch of artificial intelligence that concerns the construction and study of systems able
to learn from data. Finally, we review a set of image descriptors widely employed to describe
image properties in computer vision. The size and diversity of this set of concepts, from the
AI and computer vision research fields, is due to the high degree of interdisciplinarity of this
work.

1.1.

Deformable models

In several real-world applications, images are often corrupted by noise and sampling artifacts, which can cause considerable difficulties when applying classical segmentation techniques such as edge detection and thresholding. As a result, these techniques either fail
completely or require some kind of postprocessing step to remove invalid object boundaries
in the segmentation results. To address these difficulties, DMs have been extensively studied and widely used in medical image segmentation, with promising results. DMs [147] are
curves or surfaces defined within an image domain that can move under the influence of
internal forces and external forces. While the former forces are defined within the curve or
surface itself, the latter ones are computed from the image data. The internal forces are designed to keep the model smooth during deformation. The external forces are defined to
move the model toward an object boundary or other desired features within an image. By
constraining extracted boundaries to be smooth and incorporating other prior information
about the object shape, DMs offer robustness to both image noise and boundary gaps and allow integrating boundary elements into a coherent and consistent mathematical description.
Such a boundary description can then be readily used by subsequent applications. Moreover, since DMs are implemented on the continuum, the resulting boundary representation
can achieve subpixel accuracy, a highly desirable property for many image processing appli21

CHAPTER 1. PRELIMINARIES

22

cations. Nowadays, DMs have grown to be one of the most active and successful research
areas in image segmentation [147]. Various names, such as snakes, active contours or surfaces, balloons, and deformable contours or surfaces, have been used in the literature to refer
to DMs. Fig. 1.1 shows an example of a segmentation task by a DM.

(a) Input image

(b) DM segmentation

(c) Output image

Figure 1.1: Example of segmentation of a medical image by a DM. (a) is the image to be segmented, (b) is the final adjustment of the DM (in red), and (c) is the result image identifying
two segments, the object (black) and the background (white).

There are basically two types of DMs: parametric DMs [7, 41, 114, 146] and geometric DMs
[31, 32, 141, 225].
Parametric DMs represent curves and surfaces explicitly in their parametric forms during deformation. This representation allows direct interaction with the model and can lead
to a compact representation for fast real-time implementation. Adaptations of the model
topology such as splitting or merging parts during the deformation, can, however, be difficult using parametric models. Geometric DMs, on the other hand, can handle topological
changes naturally. These models, based on the theory of curve evolution [6, 116, 117, 193]
and the LS method [173], represent curves and surfaces implicitly as a level set of a higherdimensional scalar function. Their parameterizations are computed only after complete deformation, thereby allowing topological adaptivity to be easily accommodated. Despite this
fundamental difference, the underlying principles of both methods are very similar.
In spite of the great similarities between the models, it is surprising to check the multiplicity of terms refering to practically the same concepts, distinguished in many cases by very
minor aspects: Deformable Models [210], Deformable Templates [107], Active Shape Models [46], Active Contour Models/Deformable Contours/Snakes [114], Deformable Surfaces
[40, 148, 154], Active Appearance Models [43], or Statistical Shape Models [93]. We preferred
to study DMs properties and models dividing them in two typologies: parametric (Section
1.1.1) and geometric (Section 1.1.2). Among them, we will pay special attention to Topological Active Nets (Section 1.1.1.3) and Level Sets (Section 1.1.2.2), as they are the two DMs
employed as parts of the proposals introduced in Chapter 2 and 3 of this dissertation, respectively.

1.1. DEFORMABLE MODELS

1.1.1.

23

Parametric deformable models

Two different types of formulations for parametric DMs exist: an energy minimization
formulation and a dynamic force formulation. Although these two formulations lead to similar results, the former has the advantage of its simplicity whereas the second formulation
has the flexibility of allowing the use of more general types of external forces. In this section, we concentrate on the first formulation, above all because this is the formulation used
in TANs. We then present several commonly used external forces that can effectively attract
DMs toward the desired image features. Finally, we review a set of notable parametric DMs
we employ or compare against along this PhD thesis.
1.1.1.1.

Energy minimization formulation

The basic premise of the energy minimization formulation of deformable contours is to


find a parametrized curve that minimizes the weighted sum of internal energy and potential
energy. The internal energy specifies the tension or the smoothness of the contour. The
potential energy is defined over the image domain and typically possesses local minima at
the image intensity edges occurring at object boundaries.
To find the object boundary, parametric curves are initialized within the image domain,
and are forced to move toward the potential energy minima under the influence of both these
forces.
Mathematically, a deformable contour is a curve, defined as X(s) = (X(s), Y (s)), which
moves through the spatial domain of an image to minimize the following energy functional
[72]:
(X) = S(X) + P(X).
The first term is the internal energy functional and is defined to be
S(X) =

1
2

(s)
0

X
s

+ (s)

2X 2
ds.
s2

The first-order derivative discourages stretching and makes the model behave like an elastic string. The second-order derivative discourages bending and makes the model behave like
a rigid rod. The weighting parameters (s) and (s) can be used to control the strength of
the models tension and rigidity, respectively. In practice, (s) and (s) are often chosen to
be constants.
The second term is the potential energy functional and is computed by integrating a potential energy function P (x, y) along the contour X(s):
1

P(X) =

P (X(s))ds.
0

The potential energy function P (x, y) is derived from the image data and takes smaller values at object boundaries as well as other features of interest. Given a gray-level image I(x, y)
viewed as a function of continuous position variables (x, y), a typical potential energy function designed to lead a deformable contour toward step edges is
P (x, y) = we |[G (x, y) I(x, y)]|2 ,

CHAPTER 1. PRELIMINARIES

24

where we is a positive weighting parameter, G (x, y) is a two-dimensional Gaussian function


with standard deviation , is the gradient operator, and is the 2D image convolution operator. If the desired image features are lines, then the appropriate potential energy function
can be defined as follows:
P (x, y) = wl [G (x, y) I(x, y)],
where wl is a weighting parameter. Positive wl is used to find black lines on a white background, while negative wl is used to find white lines on a black background. For both edge
and line potential energies, increasing can broaden its attraction range.
Regardless of the selection of the exact potential energy function, the procedure for minimizing the energy functional is the same. The problem of finding a curve X(s) that minimizes
the energy functional is known as a variational problem [52]. It has been shown that the
curve that minimizes must satisfy the following Euler-Lagrange equation [41, 114]:

s
s
1.1.1.2.

2
2X

s2
s2

P (X) = 0.

(1.1)

External forces

In this section, we describe several kinds of external forces for DMs. These external forces
are applicable to every type of DM.
Multiscale Gaussian potential force When using the Gaussian potential force described
earlier, must be selected to have a small value in order for the DM to follow the boundary accurately. As a result, the Gaussian potential force can only attract the model toward
the boundary when it is initialized nearby. To remedy this problem, Terzopoulos, Witkin,
and Kass [114] proposed using Gaussian potential forces at different scales to broaden its
attraction range while maintaining the models boundary localization accuracy. The basic
idea is to first use a large value of to create a potential energy function with a broad valley around the boundary. The coarse-scale Gaussian potential force attracts the deformable
contour or surface toward the desired boundaries from a long range. When the contour or
surface reaches equilibrium, the value of is then reduced to allow tracking of the boundary
at a finer scale. This scheme effectively extends the attraction range of the Gaussian potential force. A weakness of this approach, however, is that there is no established theorem for
how to schedule changes in . The available ad hoc scheduling schemes may therefore lead
to unreliable results.
Pressure force Cohen [41] proposed to increase the attraction range by using a pressure
force together with the Gaussian potential force. The pressure force can either inflate or deflate the model. Hence, it removes the requirement to initialize the model near the desired
object boundaries. DMs that use pressure forces are also known as balloons [41]. The pressure force is defined as
Fp (X) = wp N(X),
where N(X) is the inward unit normal of the model at the point X and wp is a constant
weighting parameter. The sign of wp determines whether to inflate or deflate the model and
is typically chosen by the user. Later, region information has been used to define wp with a
spatial-varying sign based upon whether the model is inside or outside the desired object

1.1. DEFORMABLE MODELS

25

[182, 188]. The value of wp determines the strength of the pressure force. It must be carefully
selected so that the pressure force is slightly smaller than the Gaussian potential force at
significant edges, but large enough to pass through weak or spurious edges. When the model
deforms, the pressure force keeps inflating or deflating the model until it is stopped by the
Gaussian potential force. A disadvantage in using pressure forces is that they may cause the
DM to cross itself and form loops [209].
Distance potential force Another approach for extending attraction range is to define the
potential energy function using a distance map as proposed by Cohen and Cohen [42]. The
value of the distance map at each pixel is obtained by calculating the distance between the
pixel and the closest boundary point, based either on Euclidean distance [59] or Chamfer
distance [24]. By defining the potential energy function based on the distance map, one can
obtain a potential force field that has a large attraction range. Given a computed distance
map d(x, y), one way of defining a corresponding potential energy function, introduced in
[42], is as follows:
2
Pd (x, y) = wd ed(x,y) .
The corresponding potential force field is given by Pd (x, y).
Gradient vector flow In the DM literature, the traditional external force is derived from
the edge map f (x, y) calculated on the image I(x, y) and having the property that it shows
larger values near the image edges. This edge map has three important properties. First,
its gradient f has vectors pointing toward the edges, which are normal to the edges at the
edges. Second, these vectors generally have large magnitudes only in the immediate vicinity
of the edges. Third, in homogeneous regions, where I(x, y) is nearly constant, f is nearly
zero. While the first property is desirable, the last two are not because the capture range will
be very small and homogeneous regions will have no external forces whatsoever.
In [231], Xu and Prince developed a new external force for active contours to solve the
problems associated with initialization and poor convergence to concave boundaries.
They defined the Gradient Vector Flow (GVF) as the vector field
v(x, y) = [u(x, y), v(x, y)]
that minimizes the energy functional
E=

[(u2x + u2y + vx2 + vy2 ) + |f |2 |v f |2 ]dxdy.

This variational formulation makes the result smooth when there is no data and keeps v
nearly equal to the gradient of the edge map when it is large, but forcing the field to be slowlyvarying in homogeneous regions. The parameter is a regularization parameter governing
the trade-off between the first and second terms in the integrand and should be set according
to the amount of noise present in the image.
Using the calculus of variations [52] it can be shown that the GVF field can be found by
solving the following Euler equations
2 u (u fx )(fx2 + fy2 ) = 0,
where 2 is the Laplacian operator.

2 v (v fy )(fx2 + fy2 ) = 0

CHAPTER 1. PRELIMINARIES

26

The snake using the GVF field provides a large capture range and the ability to capture
concavities by diffusing the gradient vectors of an edge map generated from the image. Although the GVF field has been widely used and improved in various models, it still shows
some disadvantages, such as high computational cost, noise sensitivity, parameter sensitivity,
and the ambiguous relationship between the capture range and parameters [129].
An example of the GVF force is shown in Fig. 1.2.

Figure 1.2: An example of distance potential force field. (a) A U-shaped object, (b) a close-up
of its boundary concavity, and (c) the distance potential force field within the concavity.

Vector Field Convolution Trying to overcome the previous limitations, the Vector Field
Convolution (VFC) [129] is calculated convolving a vector field kernel with an edge map
derived from the image. Given a gray-scale image I(x, y), its edge map is f (x, y) = |(G I)|,
where G is a 2-D Gaussian function with standard deviation and the operator denotes
the linear convolution. The VFC is defined as
fvfc (x, y) = [uvfc (x, y), vvfc (x, y)]
and is calculated by convolving the edge map with a vector field kernel
k(x, y) = m(x, y)n(x, y)

(1.2)

where m(x, y) is the magnitude of the vector at (x, y) and n(x, y) is the unit vector pointing
to the kernel origin (0, 0):
x y
n(x, y) =
,
,
r
r
(except that n(0, 0) = [0, 0]) and where r =
x2 + y 2 is the Euclidean distance from the
kernel origin. The kernel field has the property that a free particle placed in the field will
move to the kernel origin. If the kernel origin is considered as an edge point, the particle will
move toward the edge.

1.1. DEFORMABLE MODELS

27

The VFC depends on the magnitude term of the vector field kernel m(x, y). In [129], it is
defined as:
1
m(x, y) = .
r
This term is used in order the force field to have a reduced influence on the particle farther
away of the edges. is a positive parameter controlling the decrease and has a major role
depending on the application. A higher value means that the force field will not be influenced
by features of interest located far away.
An example with r = 5 and = 2 is shown in Fig. 1.3.

11

Kernel vector field

11

Figure 1.3: The vector field kernel.


The VFC external force is then given by:
fvfc (x, y) = f (x, y) k(x, y) = [f (x, y) uk (x, y), f (x, y) vk (x, y)].
Since the edge map is non-negative and larger near the edges, these will contribute to the VFC
external force more than homogeneous regions and the free particles will be attracted by the
edges1 . To enforce a wide range of attraction for edges, a proper value for the r parameter is
around l/2, with l being the largest of the image dimensions.
A sample image along with the gradient of its edge image and the result of the VFC are
depicted in Fig. 1.4, respectively.
1.1.1.3.

Topological active nets

The Active Nets model [211] came up as a DM that integrates features of region-based and
boundary-based segmentation techniques. This way, the model detects the inner topology
of the objects and fits their contours. The TANs [8, 9, 28] are an extension of the original
active net model that solves some intrinsic problems to the DMs such as the initialization
problem. It also has a dynamic behavior that allows topological local changes in order to
perform accurate adjustments and find all the objects of interest in the scene.
A TAN is a discrete implementation of an elastic two-dimensional mesh with interrelated
nodes [9]. The structure of a small TAN is depicted in Fig. 1.5. As this figure shows, the model
has two kinds of nodes: internal and external. Each kind of node represents different features
1
Working with gray-scale images, the vectors of the field will be attracted by and will point towards the
brighter areas of the image.

CHAPTER 1. PRELIMINARIES

28

1.0

11.5

22.0

32.5

43.0

53.5

64.0

Standard VFC

1.0

(a)

11.5

22.0

(b)

32.5

43.0

53.5

64.0

(c)

Figure 1.4: (a) original image; (b) Sobel gradient of (a); (c) standard VFC of (a).
of the objects: the external nodes fit the edges of the objects whereas the internal nodes model
the internal topology of the object. Therefore, this model allows the integration of information based on discontinuities and information based on regions in the segmentation process.
The former is associated to external nodes and the latter to internal nodes.

External nodes
Internal nodes
Links

Figure 1.5: A 5x5 mesh.


A TAN is defined parametrically as v(r, s) = (x(r, s), y(r, s)) where (r, s) [0, 1] [0, 1].
The mesh deformations are controlled by an energy function defined as follows:
1

E(v(r, s)) =

Eint (v(r, s)) + Eext (v(r, s))drds


0

where Eint and Eext are the internal and the external energy of the TAN, respectively. The
internal energy controls the shape and the structure of the mesh whereas the external energy
represents the external forces which govern the adjustment process.
The internal energy depends on first and second order derivatives which control contraction and bending, respectively. The internal energy term is defined by the following expression:
Eint (v(r, s)) =(|vr (r, s)|2 + |vs (r, s)|2 )+
(|vrr (r, s)|2 ) + |vrs (r, s)|2 + |vss (r, s)|2 )
where subscripts represents partial derivatives, and and are coefficients that control the
first and second order smoothness of the net. In order to calculate the energy, the parameter

1.1. DEFORMABLE MODELS

29

domain [0, 1][0, 1] is discretized as a regular grid defined by the internode spacing (k, l) and
the first and second derivatives are estimated using the finite differences technique. On one
hand, the first derivatives are computed using the following equations to avoid the central
differences:
|vr (r, s)|2 =

d+
r (r, s)

+ d
r (r, s)
2

|vs (r, s)|2 =

d+
s (r, s)

+ d
s (r, s)
2

where d+ and d are the forward and backward differences respectively, which are computed
as follows:
v(r + k, s) v(r, s)
k
v(r,
s
+
l)
v(r, s)
d+
s (r, s) =
l
d+
r (r, s) =

v(r, s) v(r k, s)
k
v(r,
s)

v(r,
s l)
d
.
s (r, s) =
l
d
r (r, s) =

On the other hand, the second derivatives are estimated by:


v(r k, s) 2v(r, s) + v(r + k, s)
k2
v(r, s l) 2v(r, s) + v(r, s + l)
vss (r, s) =
l2
v(r k, s) v(r k, s + l) v(r, s) + v(r, s + l)
vrs (r, s) =
.
kl

vrr (r, s) =

The external energy represents the features of the scene that guide the adjustment process.
In the original proposal, it was defined by the following expression:
Eext (v(r, s)) = f [I(v(r, s))] +

|(r, s)|

p(r,s)

1
f [I(v(p))]
v(r, s) v(p)

where and are weights, I(v(r, s)) is the intensity value of the original image in the position
v(r, s), (r, s) is the neighborhood of the node (r, s) and f is a function, which is different for
both types of nodes since the external nodes fit the edges whereas the internal nodes model
the inner features of the objects. If the objects to detect are dark and the background is bright,
the energy of an internal node will be minimum when it is on a point with a low gray level. On
the other hand, the energy of an external node will be minimum when it is on a discontinuity
and on a light point outside the object. In this situation, function f is defined as:

h[I(v(r, s))n ]
for internal nodes

h[I
max I(v(r, s))n +
f [I(v(r, s))] =
(1.3)

(G

G(v(r,
s)))]+
max

GD(v(r, s))
for external nodes
where and are weighting terms; Imax and Gmax are the maximum intensity values of image
I and the gradient image G, respectively; I(v(r, s)) and G(v(r, s)) are the intensity values of
the original image and the gradient image in node position v(r, s); I(v(r, s))n is the mean
intensity in a n n square; and h is an appropriate scaling function. The terms h[I(v(r, s))n ]
and h[Imax I(v(r, s))n ] are also called In-Out (IO) energy terms (respectively for internal and

CHAPTER 1. PRELIMINARIES

30

external nodes), in literature. Since it was proposed for one of the first TAN model extensions
[102], the external energy also includes the Gradient Distance (GD) term, GD(v(r, s)), this is,
the distance from the position v(r, s) to the nearest edge. This term introduces a continuous
range in the external energy since its value diminishes as the node gets closer to an edge.
This way, the gradient distance facilitates the adjustment of the external nodes to the object
boundaries. All subsequent publications employed it in TAN models. Image representations
of the gradient and the distance to gradient for every pixel, along with the original image,
are shown in Figure 1.6.

(a) Original image

(b) Gradient

(c) Distance gradient

Figure 1.6: The original image (a) and the representations of the gradient (b) and the distance
to gradient (c) for every pixel.
Finally, in [232] the authors incorporated color similarity into the active net energy function. In particular, they devised a color model for a specific road sign and calculated the
color similarity to this model for each point in the original image, getting a specific energy
term Eimage . In their formulation, this term is set equal to the active net external energy. This
term was introduced for a specific road sign detection application and was not employed in
subsequent general purpose proposals.
Regardless of the specific energy terms employed, the adjustment process consists of minimizing these energy functions. In the original proposals [8, 9], the mesh is placed over the
whole image and then the energy of each node is minimized using a Best Improvement Local
Search (BILS)2 algorithm [97]. In each step of the algorithm, the energy of each node is computed in its current position and in its nearest neighborhood. The position with the lowest
energy value is selected as the new position of the node. The algorithm stops when there is
no node in the mesh that can move to a position with lower energy.
The BILS algorithm gets good results in those cases with low presence of noise since it
takes the best local adjustment. Nevertheless, this local adjustment may not be the best global
one in many images because of the presence of noise and/or artifacts. Moreover, the BILS
algorithm does not consider any possible alternatives. This way, if the model reaches a wrong
segmentation, a local minimum from the optimization point of view, it gets stuck in it.
Link cutting procedure The BILS algorithm can perform topological changes, that is, cuts
of links between adjacent external nodes after the minimization process. First, external nodes
that are more distant to the object edges are identified using the Tchebycheffs theorem. This
way, an external node n is badly placed if its gradient distance, GDvext (n), fulfills the inequal2

The BILS was called greedy search in those previous papers.

1.1. DEFORMABLE MODELS

31

ity:
GDvext (n) > GDvext + 3GDvext
where GDvext and GDvext represent the average and the standard deviation of the gradient
distance of all external nodes.
Once the outlier set is identified, the link to remove is selected. It is the node with the
highest gradient distance and its worst neighbor in the outlier set.
After the cutting, some internal nodes become external since they are on the boundaries
of the net as figure 1.7 shows. The increase of external nodes allows a better adjustment to
object boundaries.
An example of one cut and the corresponding transformation of internal nodes into external ones is shown in Figure 1.8.

Figure 1.7: Link cutting procedure. The figures show the TAN before and after the link cutting. After the cut, the neighboring internal nodes become external nodes.
With the link cutting procedure described above, it is possible to solve cases in which
external nodes are located outside of the edges of the object, as in Figure 1.8. However, in
cases like the one shown in Figure 1.9, the problem is different, since all the external nodes
are located on the edge of the object, so that the link cutting procedure is unable to properly
cut the links.
In order to solve this, Ibez et al. [103] developed a complementary cutting procedure.
This new cutting procedure is applied when the former one cannot be applied. It consists of
cutting the link which has the higher number of points on pixels which do not belong to the
target segmentation object. If this cut is not allowed, due to the fact that the cut could form
threads (see Fig. 1.10), it checks the next link with the higher number of points out of the
object whose cut is allowed. The algorithm only considers those links whose length is above
a given threshold.
Automatic Net Division Since the link cutting process breaks the net topology to improve
the adjustment, it should be also desirable to divide the net in order to segment the different
objects present in the image in case there is more than one object of interest. To this end, a net
reconfiguration mechanism must be developed in order to perform multiple object detection
and segmentation. The net division is performed by the link cutting algorithm. However,
this algorithm cannot be applied directly to the automatic division. The problems arise in
cases where a node has only two neighbors. In such a case, no other link can be removed in
order to preserve the TAN topology. Thus, a thread will appear between two subnets as
Figure 1.10 shows.
However, this problem can be overcome if a direction in the cutting process is considered
[28]. This way, a cutting priority is associated to each node whose connections are removed.
A higher priority is assigned to the nodes in cut direction whereas a lower priority is assigned

32

CHAPTER 1. PRELIMINARIES

Figure 1.8: Image sequence showing the link cutting procedure based on Tchebycheffs node
selection. Note how, after the cuts and the corresponding transformations of internal nodes
into external ones, the latter ones, having more freedom due to the cut, end up on the edges
of the object.

to the nodes involved in the cut. Figure 1.10(c) shows the recomputation of the node priorities
after several cuts.
The cutting priority weights the gradient distance of each node. Thus, once the set of
badly placed external nodes is obtained, the link to remove consists of two neighboring nodes
within this set, n1 and n2 , that fulfill:
GDvext (n1 ) Pcut (n1 ) > GD(n) Pcut (n),
n = n1
GDvext (n2 ) Pcut (n2 ) > GDvext (m) Pcut (m), m = n2 , m (n1 ),

1.1. DEFORMABLE MODELS

33

Figure 1.9: Example of bad segmentation where the link cutting procedure based on Tchebycheff external node selection is not appropriate.

Figure 1.10: Threads and cutting priorities. (a) Image segmentation with threads. (b) If link
a is removed, no other link can be removed in order to preserve the TAN topology. (c)
Recomputation of cutting priorities. When a link is broken in a specific direction, the neighboring nodes in this direction increase their priorities.

where Pcut (x) is the cutting priority of node x, GDvext (x) is the distance from the position of
the external node x to the nearest edge, and (n1 ) is the set of neighboring nodes of n1 .

Topological Active Volumes In [13], the authors introduced Topological Active Volumes
(TAVs), a 3D extension of the TANs. TAVs thus become a DM focused on segmentation tasks
by means of a volumetric distribution of nodes. Parametrically, a TAV is defined as v(r, s, t) =
(x(r, s, t), y(r, s, t), z(r, s, t)), where (r, s, t) ([0, 1] [0, 1] [0, 1]). Similarly to TANs, the
state of the model is governed by an energy function, divided in internal and external energy
terms. Being mostly a 3D extension of the TAN, this model has the same behavior, advantages
and disadvantages of its 2D counterpart. Fig. 1.11 shows a representation of the TAV model,
whose basic repeated structure is a cube.

CHAPTER 1. PRELIMINARIES

34

External nodes
Internal nodes
Figure 1.11: A TAV grid.
1.1.1.4.

Other parametric models

Active Contour Models One of the first practical examples of parametric DMs, called snakes
or Active Contour Models (ACMs), was first proposed by Kass, Witkin and Terzopoulos [114],
shortly after than Terzopoulos and his seminal papers. An ACM is a variational method for
detecting object boundaries in images: n points C 0 = {p01 , , p0n } defining the initial closed
contour are deformed, by minimizing a certain energy function, to lie along the object boundary.
Let X(p) be a parameterization of contour C and I be a image intensity. Then the energy
is
E(C) = |X (p)|2 + |X (p)|2 |I(X(p))|
(1.4)
The first two terms represent the internal energy while the last term accounts for the external
one. The internal energy is responsible for smoothness, while the external energy is in charge
of attracting the contour toward the object boundary. , and are the free parameters of
the system and are determined a priori. Smaller reduces the noise but can not capture
the sharp corners while larger can effectively capture boundary but it is sensitive to the
noise. Also, makes the snake more resistant to traction, while makes it more resistant
to bending. These two parameters prevent the snake to become non-continuous or to break
during the iteration process of the optimization problem. The total energy can be written as
E(C) = Einternal (C) + Eexternal (C)

(1.5)

It can be shown in the calculus of variations, the contour should satisfy Eq. 1.1.
Active Shape Models Active Shape Models (ASMs) [49] add more prior knowledge to DMs.
These shape models derive a Point Distribution Model (PDM) from sets of labeled points
(landmarks) selected by an expert in a training set of images. In each image, a point, or set

1.1. DEFORMABLE MODELS

35

of points, is placed on the part of the object corresponding to its label. The model considers
the points average positions and the main modes of variation found in the training set, so
the shape models are parameterized in such a way as to allow legal configurations. While
this kind of model has problems with unexpected shapes, since an instance of the model can
only take into account deformations which appear in the training set, it is robust with respect
to noise and image artifacts, like missing or damaged parts. Therefore, in ASM, Principal
Component Analysis (PCA) is usually considered to construct, over a set of landmark points
extracted from training shapes, a PDM and an Allowable Shape Domain (ASD).
Active Appearance Models Active Appearance Models (AAMs) [43] extend ASMs by considering not only the shape of the model, but also other image properties, like intensity, texture or color. An appearance model can represent both the shape and texture variability seen
in a training set, and differ from ASMs in that, instead of searching locally about each model
point, they seek to minimize the difference between a new image and one synthesized by
the appearance model [44]. ASMs only use data around the model points and do not take
advantage of all the gray-level information available across an object as AAMs do.

1.1.2.

Geometric deformable models

Geometric DMs, proposed independently by Caselles et al. [31] and Malladi et al. [141],
provide an elegant solution to address the primary limitations of parametric DMs. These
models are based on curve evolution theory [116, 117, 193] and the LS method [173, 195].
In particular, curves and surfaces are evolved using only geometric measures, resulting in an
evolution that is independent of the parameterization. As in parametric DMs, the evolution is
coupled with the image data to recover object boundaries. Since the evolution is independent
of the parameterization, the evolving curves and surfaces can be represented implicitly as a
level set of a higher-dimensional function [72]. As a result, topology changes can be handled
automatically.
1.1.2.1.

Curve evolution theory

The purpose of curve evolution theory [72] is to study the deformation of curves using
only geometric measures such as the unit normal and curvature as opposed to the quantities
that depend on parameters such as the derivatives of an arbitrary parameterized curve. Let
us consider a moving curve X(s, t) = [X(s, t), Y (s, t)], where s is any parameterization and t
is the time, and denote its inward unit normal as N and its curvature as k, respectively. The
evolution of the curve along its normal direction can be characterized by the following partial
differential equation:
X
= V (k)N,
t
where V (k) is called speed function, since it determines the speed of the curve evolution. The
intuition behind this fact is that the tangent deformation affects only the curves parameterization, not its shape and geometry. The most extensively studied curve deformations in curve
evolution theory are curvature deformation and constant deformation. Curvature deformation is
given by the so-called geometric heat equation:
X
= kN,
t

CHAPTER 1. PRELIMINARIES

36

where is a positive constant. This equation will smooth a curve, eventually shrinking it to
a circular point. The use of the curvature deformation has an effect similar to the use of the
elastic internal force in parametric DMs.
Constant deformation is given by
X
= V0 N,
t
where V0 is a coefficient determining the speed and direction of deformation. Constant deformation plays the same role as the pressure force in parametric DMs. The properties of
curvature deformation and constant deformation are complementary to each other. Curvature deformation removes singularities by smoothing the curve, while constant deformation
can create singularities from an initially smooth curve. The basic idea of the geometric deformable model is to couple the speed of deformation (using curvature and/or constant deformation) with the image data, so that the evolution of the curve stops at object boundaries.
The evolution is implemented using the LS method. Thus, most of the research in geometric
DMs has been focused on the design of speed functions.
1.1.2.2.

Level set method

The LS [72] method is employed for implementing curve evolution. It is used to account
for automatic topology adaptation, and it also provides the basis for a numerical scheme that
is used by geometric DMs. In the LS method, the curve is represented implicitly as a level set
of a N-dimensional scalar function, referred to as the LS function, which is usually defined
on the same domain as the image. The LS is defined as the set of points that have the same
function value. Figure 1.12 shows an example of embedding a curve as a zero level set. It is
worth noting that the LS function is different from the level sets of images. The sole purpose
of the LS function is to provide an implicit representation of the evolving curve.

Figure 1.12: An example of embedding a curve as a LS. (a) A single curve. (b) The LS function
where the curve is embedded as the zero LS (in black). (c) The height map of the LS function
with its zero LS depicted in black.
Instead of tracking a curve through time, the LS method evolves a curve by updating the
LS function at fixed coordinates through time. A useful property of this approach is that
the LS function remains a valid function while the embedded curve can change its topology.

1.1. DEFORMABLE MODELS

37

Figure 1.13: From left to right, the zero LS splits into two curves while the LS function still
remains a valid function.
This situation is depicted in Fig 1.13. We now derive the LS embedding of the curve evolution
equation. Given a LS function (x, y, t) with the contour X(s, t) as its zero level set, we have
[X(s, t), t] = 0.
Differentiating the above equation with respect to t and using the chain rule, we obtain

X
+
= 0,
t
t
where denotes the gradient of .
We assume that is negative inside the zero LS and positive outside. Accordingly, the
inward unit normal to the LS curve is given by
N =

.
||

Using this fact, we can write

= V (k)||,
t
where the curvature at the zero LS is given by
k =
1.1.2.3.

xx 2y 2x y xy + yy 2x

=
.
||
(2x + 2y )3/2

Level Set Model Approximations

When working with LSs, the definition of the function is essential. One common choice
is the signed distance function d(x), which gives the distance of a point to the surface and the
sign. Generally d > 0 if the point x is outside and d < 0 if it is inside the surface (assuming
it is a closed surface). This definition is especially interesting to avoid numerical instabilities
and inaccuracies during computations. But even with this definition, will not remain a
signed distance function all the time and a reinitialization procedure to keep the LS intact
will be needed [172].

CHAPTER 1. PRELIMINARIES

38

Updating the values of at each point in space requires a numerical technique to evolve
the zero-level set from an initial specification. Naively then, we could initialize a surface in
space, create a suitable speed function F , and numerically integrate until some condition is
met and the surface extracted as all points where = 0. The apparent problem here is that
the whole image space must be evaluated each iteration, giving (in three dimensions) O(n3 )
complexity (i.e. O(nd ),where n is the cross sectional resolution of the spatial extents of the
domain and d is the number of spatial dimensions of the domain), rather than just the area
of interest - the O(n2 ) surface.
Algorithms have been developed to overcome these issues by only performing updates on
regions near the surface, rather than integrating over the entire space. The most well-known
is the narrow band approach [1], where numerical integration is performed within a band
of points initialized at either side of the surface (see Fig. 1.14). When the zero LS reaches
the edge of the narrow band, the band is reinitialized about the zero-curve using the Fast
Marching method [196], or some variant. The narrow band method dramatically increases
efficiency over the naive method, with little effect on accuracy.
Despite the improvements in computation time, the narrow-band approach is not optimal
for several reasons [226]. First it requires a band of significant width (m = 12 in the examples
in [1]) where one would like to have a band that is only as wide as necessary to calculate the
derivatives near the LS (e.g., m = 2). A wider band is necessary because the narrow-band
algorithm trades off two competing computational costs. One is the cost of stopping the evolution, computing the position of both the curve and distance transform, and determining the
domain of the band. The other is the cost of computing the evolution process over the entire
band. The narrow-band method also requires additional techniques, such as smoothing, to
maintain the stability at the boundaries of the band, where some grid points are undergoing
the evolution and nearby neighbors are stationary [226].
The next two subsections deal with two efficient narrow-band LS function approximations. In particular, the second one is devoted to the Shi LS approximation, to which is given
particular importance, as it is the LS model we will use in Chapter 3.

Figure 1.14: A LS narrow band.

1.1. DEFORMABLE MODELS

39

Sparse field approach The sparse-field algorithm [226] uses an approximation to the distance transform that makes it feasible to recompute the neighborhood of the LS model at each
time step. Thus, it takes the narrow-band strategy to the extreme by computing updates on a
band of grid points that is only one point wide. The values of the points in the active set can
be updated using the up-wind scheme and the mean-curvature flow [172]. When computing
updates on so few points, one must however be careful to maintain a neighborhood around
those points so that the derivatives that control the process can be computed with sufficient
accuracy. The strategy is to extend the embedding from the active points outward in layers
to create a neighborhood around those points that is precisely the width needed to calculate
the derivatives for the next time step.
This approach has several advantages. The algorithm does precisely the number of calculations needed to compute the next position of the level curve. It does not require explicitly
recalculating the positions of LSs and their distance transforms. For large 3D data sets, the
very process of incrementing a counter and checking the status of all of the grid points is prohibitive. In the sparse-field algorithm the number of points being computed is so small that
it becomes feasible to use a linked-list to keep track of them. Thus, at each iteration the algorithm visits only those points adjacent to the k-level curve. Besides, the sparse-field approach
identifies a single LS with a specific set of points whose values control the position of that LS.
This allows one to compute external forces to an accuracy that is better than the grid spacing
of the model, resulting in a modeling system that is more accurate for 3D reconstruction.
The k-level surface S of a function u defined on a discrete grid has a set of cells through
which it passes. The set of grid points adjacent to the LS is called the active set and the individual elements of this set are called active points. As the model deforms the active points
will change value. All of the derivatives (up to second order) required to calculate the update
of u are computed using nearest neighbor differences. Therefore, only the active points and
their neighbors are relevant to the evolution of the LS at any particular time in the evolution process. One important aspect of the sparse-field algorithm is the mechanism to control
membership in the active set. In order to maintain stability, one must update the active set
and neighboring points in a way that allows grid points to enter and leave the active set
without those changes in status affecting their values. This mechanism can be understood as
follows. Active points must be adjacent to the LS model. Therefore, their positions lie within
a fixed distance to the model. Because the embedding is a distance transform, the values of
u for locations in the active set must lie within a certain range of values. When active-point
values move out of this active range they are no longer adjacent to the model. They must
be removed from the set and other grid points, those whose values are moving into the active range, must be added to take their place. The precise ordering and execution of these
operations is critical to the proper operation of the algorithm.
There are two operations that are significant to the evolution of the active set. First, the
values of u at active points change from one iteration to the next. Second, as the values of
active points move out of the active range they are removed from the active set and other,
neighboring grid points are added to the active set to take their place.
The number of layers in the narrow-band should coincide with the size of the footprint or
neighborhood used to calculate derivatives. In this way, the inside and outside grid points
undergo no changes in their values that affect or distort the evolution of the zero LS. The work
in [226] uses first- and second-order derivatives computed on a 3 3 3 kernel (city-block
distance 2 to the corners). Therefore, only five layers are necessary: 2 inside layers, 2 outside
layers, and the active set.

40

CHAPTER 1. PRELIMINARIES

The reduction in accuracy caused by the reduced area in which the calculations are performed is tolerable for most applications and its implementation has allowed the LS family
of methods to achieve real-time performance in complex 3D applications [172].
The Shi level set In [199] and later in [200], the authors presented a complete and practical
algorithm for the approximation of LS-based curve evolution suitable for real-time implementation. This is a two-cycle algorithm to approximate LS-based curve evolution without
the need of solving Partial Differential Equations (PDEs). The evolution is divided into two
cycles: one cycle for the data-dependent term and a second cycle for the smoothness regularization, derived from a Gaussian filtering process. In both cycles, the evolution is developed
through a simple element switching mechanism between two linked lists, that implicitly represents the curve using an integer valued level-set function. This approach leads to very
significant computation speedups compared to exact PDE-based approaches. Since the Shi
model is different from the standard LS (as it does not make use of PDEs) and it is employed
in our proposal described in Chapter 3, the remaining of this section deals with relevant implementation details of this LS model. In particular, Algorithm 1 shows the complete Shi LS
Fast Two Cycle (FTC) algorithm.
1
2
3
4

5
6

7
8

10
11
12
13
14
15

16
17

18
19

Fd , Fint and lists Lout and Lin ;


Step 1: Initialize arrays ,
Step 2 (cycle one: data dependent evolution):
for i = 1 : Na do
Compute the data dependent speed Fd for each point in Lout and Lin and store its
sign in Fd ;
Outward evolution. For each point x Lout , switch_in(x) if Fd (x) > 0;

Eliminate redundant points in Lin . For each point x Lin , if y N (x), (y)
< 0,

delete x from Lin , and set (x) = 3;


Inward evolution. For each point x Lin , switch_out(x) if Fd (x) < 0;
Eliminate redundant points in Lout . For each point x Lout , if y N (x),

(y)
> 0, delete x from Lout , and set (x)
= 3;
Check the stopping conditions. If it is satisfied, go to Step 3; otherwise continue
this cycle;
end
Step 3 (cycle two: smoothing via Gaussian filtering):
for i = 1 : Ns do
Compute the smoothing speed Fint for each point in Lout and Lin ;
Outward evolution. For each point x Lout , switch_in(x) if Fint (x) > 0;

Eliminate redundant points in Lin . For each point x Lin , if y N (x), (y)
< 0,

delete x from Lin , and set (x)


= 3;
Inward evolution. For each point x Lin , switch_out(x) if Fint (x) < 0;
Eliminate redundant points in Lout . For each point x Lout , if y N (x),

(y)
> 0, delete x from Lout , and set (x)
= 3;
end
Step 4: If stopping condition not satisfied in cycle one, go to Step 2;
Algorithm 1: Shi LS FTC algorithm.

1.1. DEFORMABLE MODELS

41

In the LS method, a curve C is represented implicitly as the zero LS of a function defined


over a regular grid D of size M1 M2 Mk , with k being the number of dimensions of the
images at hand. The set of grid points enclosed by C is called . The list of inside neighboring
grid points Lin and outside neighboring grid points Lout for the object region is:
Lin = {x|x and y N (x) s.t. y D \ }

(1.6)

Lout = {x|x D \ and y N (x) s.t. y }

(1.7)

where N (x) is a discrete neighborhood of x.


The data structures used by the algorithm consist of:
an integer array for approximating the LS function;
an integer array F for the speed function;
two lists of grid points adjacent to the evolving curve C: Lin and Lout .
The grid points inside C but not in Lin are defined interior points while those outside C
but not in Lout are defined exterior points. The function locally approximates the LS signed
distance function and is defined as:

+3, if x is an exterior point

+1, if x L
out

(x)
=

1, if x Lin

3, if x is an interior point.
Only the sign of the evolution speed is used in the algorithm. Two basic procedures are
necessary to perform the evolution. On the one hand, st moves the boundary
outward by one pixel. For a point x Lout , it is defined as follows. The first thing to be
done is to switch x from Lout to Lin . With x Lin now, all its neighbors that were exterior
points become neighboring grid points and are added to Lout in the second step. By applying
a st procedure to any point in Lout , the boundary is moved outward by one grid
point in that location.
Similarly, the procedure stt effectively moves the boundary inward by one
pixel. By applying a stt procedure to an inside neighboring grid point x Lin ,
we move the boundary inward by one grid point at that location.
At every iteration, first the speed at each point in Lout and Lin is calculated and the sign of
the speed is stored in the array F . After that, the two lists are scanned through sequentially
to evolve the curve. The list Lout is scanned first and a st is applied at a point if
F > 0. This scan takes care of those parts of the curve with positive speed and moves them
outward by one grid point. After this scan, some of the points in Lin become interior points
due to the newly added inside neighboring grid points, so they are deleted from Lin . Then,
the list Lin is scanned through and stt is applied for a point with F < 0. This
scan moves those parts of the curve with negative speed inward by one grid point. Similarly,
points that have become exterior points are deleted from Lout after this scan. After a scan
through both lists, the following stopping conditions are checked:
(a) The speed at all the neighboring grid points satisfies
F (x) 0 x Lout
F (x) 0 x Lin

CHAPTER 1. PRELIMINARIES

42

(b) A prespecified maximum number of iteration is reached.


In addition to the steps described so far, the algorithm also performs a curvature dependent smoothness regularization cycle, by means of a Gaussian filtering process. This second
cycle approximates curvature-based evolution, but avoids the need for expensive computations. The Gaussian filter is of the form
G(x) =

1
1
exp 2 |x|2 .
2 2
2

Instead of calculating the force F using image cues, in this case the force depends on the
filtering and is defined as follows:
F (x) =

1,
0,

if G H()(x)
<
otherwise,

1
2

where H is the Heaviside function and is the convolution operator [84].


The two cycles described so far are then repeated to evolve the curve towards the object
boundary.
1.1.2.4.

Speed functions

In this section, we provide a brief overview of five of the most representative speed functions used by geometric deformable contours.
Caselles et al. and Malladi et al. The geometric deformable contour formulation, proposed
by Caselles et al. [31] and Malladi et al. [141], takes the following form:

= c(k + V0 )||,
t
where
c=

1
.
1 + |(G I|

Positive V0 shrinks the curve, and negative V0 expands the curve. The curve evolution is
coupled with the image data through a multiplicative stopping term c. This scheme can work
well for objects that have good contrast. However, when the object boundary is indistinct or
has gaps, the geometric deformable contour may leak out because the multiplicative term
only slows down the curve near the boundary rather than completely stopping the curve.
Once the curve passes the boundary, it will not be pulled back to recover the correct boundary.
Caselles et al. and Kichenassamy et al. To remedy the problem introduced in the latter
section, Caselles et al. [32] and Kichenassamy et al. [115, 234] used an energy minimization
formulation to design the speed function. This leads to the following geometric deformable
contour formulation:

= c(k + V0 )|| + c .
t
Note that the resulting speed function has an extra stopping term c that can pull back
the contour if it passes the boundary. This term behaves in similar fashion to the Gaussian potential force in the parametric formulation. However, the latter formulation can still generate
curves that pass through boundary gaps.

1.2. SOFT COMPUTING

43

Siddiqi et al. Siddiqi et al. [202] partially address the problem described in the last section
by altering the constant speed term through energy minimization, leading to the following
geometric deformable contour:

1
= (ck|| + c ) + (c + X c)||.
t
2
In this case, the constant speed term V0 is replaced by the second term, and the term 1/2Xc
provides additional stopping power that can prevent the geometrical contour from leaking
through small boundary gaps. The second term can be used alone as the speed function for
shape recovery as well. Although this model is robust to small gaps, large boundary gaps
can still cause problems.
Chan-Vese (CV) model Active Contours Without Edges [33] is a classic geometric and
implicit method by Chan and Vese that tries to solve the Mumford-Shah functional [155].
This algorithm was designed to detect objects whose boundaries are not necessarily defined
by gray level gradients. Indeed, it ignores edges completely, converting CV in a region-based
method. The idea is to separate the image into two regions having homogeneous intensity
values. More formally, the process minimizes the energy functional shown in Equation 1.8.
The functional is used to evolve a LS representing the contour C, by using the conventional
variational calculus approach.
E(C) = Length(C) + Area(C)
|I(x, y) I C |2 dxdy

+ 1
C

(1.8)

|I(x, y) I \C |2 dxdy

+ 2
\C

In the equation, I is the intensity value of the image to be segmented and I is its average
value. Along with the length of C and its area, there are a third and fourth terms representing
the variance of the intensity level (i.e., the homogeneity) inside and outside C, respectively.
Each term has a weight that determines its influence on the total energy, so that, for instance,
the smaller , the more the length of the curve can increase without penalizing the minimization.
Geodesic Active Contours Some hybridizations between geometric and parametric DMs
have already been presented, like Geodesic Active Contours (GACs) [32]. Such approach is
based on the relation between active contours and the computation of geodesics or minimal
distance curves, connecting classical snakes based on energy minimization and geometric
active contours based on the theory of curve evolution. The technique is based on active contours evolving in time according to intrinsic geometric measures of the image. The evolving
contours naturally split and merge, allowing the simultaneous detection of several objects
and both interior and exterior boundaries.

1.2.

Soft Computing

The term SC [22, 236] has been coined for this family of robust, intelligent systems, that
in contrast to precise, traditional modes of computation, are able to deal with vague and

CHAPTER 1. PRELIMINARIES

44

partial knowledge. It mainly comprises four different partners: Fuzzy Logic (FL) [179, 235],
Evolutionary Computation (EC) [11, 67], Neural Networks (NNs) [21, 90] and Probabilistic
Reasoning (PR) [157, 178]. The term soft computing distinguishes these techniques from hard
computing that is considered less flexible and computationally demanding. The key point
of the transition from hard to soft computing is the observation that the computational effort required by conventional computing techniques sometimes not only makes a problem
intractable, but is also unnecessary as in many applications precision can be sacrificed in order to accomplish more economical, less complex and more feasible solutions. Imprecision
results from our limited capability to resolve detail and encompasses the notions of partial,
vague, noisy and incomplete information about the real world. In other words, it becomes
not only difficult or even impossible, but also inappropriate to apply hard computing techniques when dealing with situations in which the required information is not available, the
behavior of the considered system is not completely known or the measures of the underlying
variables are noisy.
SC techniques are meant to operate in an environment that is subject to uncertainty and
imprecision. According to [237], the guiding principle of SC is:
exploit the tolerance for imprecision, uncertainty, partial truth, and approximation to achieve tractability, robustness, low solution cost and better rapport with
reality.
All four methodologies that constitute the realm of SC have been conceptualized and
developed over the past forty years. Each method offers its own advantages and brings certain weaknesses. Although they share some common characteristics, they are considered
complementary as desirable features lacking in one approach are present in another. Consequently, after a first stage in which they were applied in isolation, the last two decades
witnessed an increasing interest on hybrid systems obtained by symbiotically combining the
four components of SC [23,50,108]. Figure 1.15 shows some hybrid systems positioned in the
corresponding intersection of SC techniques.
Along its development, SC has gone beyond the use of its four initial components and
has incorporated some others as rough set theory, chaos theory, belief theory, and Bayesian
networks, as well as different branches form machine learning, such as clustering or decision
trees. The next section is devoted to introduce EC, the SC methodology mainly used in this
PhD dissertation. Machine learning is then briefly reviewed in Sec. 1.4 as different methods
within this topic will also be considered in the proposals made in Chapter 3.

1.3.

Evolutionary Computation

1.3.1.

Conceptual Foundations of Evolutionary Computation

Evolutionary algorithms (EAs) constitute a class of search and optimization methods,


which imitate the principles of natural evolution [67]. The common term evolutionary computation comprises techniques such as genetic algorithms, evolution strategies, evolutionary programming and genetic programming. Their principal mode of operation is based on the same generic
concepts, a population of competing candidate solutions, random combination and alteration
of potentially useful structures to generate new solutions and a selection mechanism to increase the proportion of better solutions. The different approaches are distinguished by the

1.3. EVOLUTIONARY COMPUTATION

45

Soft
Computing
Fuzzy
Logic

Neuro
Fuzzy
Systems

Genetic
Fuzzy
Systems
Fuzzy
Evolutionary
Algorithms

Evolutionary
Computation

Fuzzy
Neural
Networks

Neural
Networks

Genetic
Neural
Networks

Genetic
Bayesian
Networks

Probabilistic
Reasoning

Figure 1.15: Hybridization in SC

genetic structures that undergo adaptation and the genetic operators that generate new candidate solutions.
Since EAs emulate natural evolution, they adopt a biological terminology to describe their
structural elements and algorithmic operations. The utilization of biological terms for their
EA counterparts is an extreme simplification inasmuch as the biological objects are much
more complex. Keeping the limitations of this analogy in mind, it is helpful to become acquainted with the biological terminology used throughout the following chapters.
Each cell of a living organism embodies a strand of DNA distributed over a set of chromosomes. The cellular machinery transcribes this blueprint into proteins that are used to
build the organism. A gene is a functional entity that encodes a specific feature of the individual such as hair color. The term allele describes the value of a gene, which determines
the manifestation for an attribute, e.g., blond, black, brown hair color. Genotype refers to the
specific combination of genes carried by a particular individual, whereas the term phenotype
is related to the physical makeup of an organism. Each gene is located at a certain position
within the chromosome, which is called locus.
The literature in EAs alternatively avails the terms chromosome and genotype to describe
a set of genetic parameters that encode a candidate solution to the optimization problem. In
the remainder of this PhD dissertation, we will employ the term chromosome to describe the
genetic structures undergoing adaptation. Consequently, the term gene refers to a particular
functional entity of the solution, e.g., a specific parameter in a multi-dimensional optimization problem. In classical genetic algorithms [83, 96], solutions were often encoded as binary
strings. In this context, a gene corresponds to a single bit, an allele corresponds to either a 0
or 1 and loci are string positions.
Progress in natural evolution is based on three fundamental processes: mutation, recombination and selection of genetic variants. The role of mutation is the random variation of
the existing genetic material in order to generate new phenotypical traits. Recombination hybridizes two different chromosomes in order to integrate the advantageous features of both

CHAPTER 1. PRELIMINARIES

46

parents into their offspring. Selection increases the proportion of better adapted individuals in the population. Darwin coined the term survival of the fittest to illustrate the selection
principle that explains the adaptation of species to their environment. The term fitness describes the quality of an organism, which is synonymous with the ability of the phenotype
to reproduce offspring in order to promote its genes to future generations.
EAs provide a universal optimization technique applicable to a wide range of problem domains, such as parameter optimization, search, combinatorial problems, machine learning,
and automatic generation of computer programs [67]. Unlike specialized methods designed
for particular types of optimization tasks, they require no particular knowledge about the
problem structure other than the objective function itself. EAs are distinguished by their
robustness, the ability to exploit accumulated information about an initial unknown search
space in order to bias subsequent search into useful subspaces. They provide an efficient and
effective approach to manage large, complex and poorly understood search spaces, where
enumerative or heuristic search methods are inappropriate.
An EA processes a population of genetic variants from one generation to the next. A particular chromosome encodes a candidate solution of the optimization problem. The fitness
of an individual with respect to the optimization task is described by a scalar objective function. According to Darwins principle, highly fit individuals are more likely to be selected
to reproduce offspring to the next generation. Genetic operators such as recombination and
mutation are applied to the parents in order to generate new candidate solutions. As a result
of this evolutionary cycle of selection, recombination and mutation, more and more suitable
solutions to the optimization problem emerge within the population.
The major components and the principal structure of a generic EA are shown in Fig. 1.16.
During generation t, the EA maintains a population P (t) of chromosomes. The population at
time t = 0 is initialized at random. Each individual is evaluated by means of the said scalar
objective function giving some measure of fitness. A set of parents is selected from the current population in a way that more fit individuals obtain a higher chance for reproduction.
Recombination merges two chromosomes to form an offspring that incorporates features of
its parents. In this way, recombination fosters the exchange of genetic information among
individual members of the population. The offspring are subject to mutations, which randomly modify a gene in order to create new variants. The current population is replaced by
considering the newly generated offspring, thus forming the next generation by only using
them or a combination of the current and the offspring population. The evolutionary cycle
of evaluation, selection, recombination, mutation and replacement continues until a termination criterion is fulfilled. The stopping condition can be either defined by the maximum
number of generations or in terms of a desired fitness to be achieved by the best individual
in the population.

1.3.2.

The scatter search template

This section will be focused on a specific kind of EA, Scatter Search, that will be considered
in our proposals introduced in Chapter 2 due to its beneficial characteristics.
Scatter search (SS) fundamentals were originally proposed by Fred Glover [80] and have
been later developed in some texts like [123]. The main idea of this technique is based on
a systematic combination between solutions (instead of a randomized one like that usually
done in genetic algorithms) taken from a considerably reduced evolved pool of solutions

1.3. EVOLUTIONARY COMPUTATION

47

initialise P(t=0)

evaluate P(t)

select parents from P(t)

recombine parents into offspring

mutate offspring

replace P(t+1) with offspring

t := t + 1
no

termination criteria fulfilled


yes
finish

Figure 1.16: Generic structure of an Evolutionary Algorithm


named Reference set (between five and ten times lower than usual genetic algorithms population sizes) as well as on the typical use of a local optimizer.
In this section we give the basic flow of the SS based on the well-known five methods
template [123], while the specific designs are covered in Section 1.3.2.1. The advanced features of SS are related to the way these five methods are implemented. That is, the sophistication comes from the implementation of the SS methods instead of the decision to include
or exclude some elements. The fact that the mechanisms within SS are not restricted to a single uniform design allows the exploration of strategic possibilities that may prove effective
in a particular implementation. These observations and principles led the authors in [123] to
propose the following template for implementing SS that consists of five methods.
1. A diversification generation method to generate a collection of diverse trial solutions, using
an arbitrary trial solution (or seed solution) as an input.
2. An improvement method to transform a trial solution into one or more enhanced trial solutions. (Neither the input nor the output solutions are required to be feasible, though
the output solutions will more usually be expected to be so. If no improvement of the
input trial solution results, the enhanced solution is considered to be the same as the
input solution.)
3. A reference set update method to build and maintain a reference set consisting of the b
best solutions found (where the value of b is typically small, e.g., no more than 20),
organized to provide efficient accessing by other parts of the method. Solutions gain
membership to the reference set according to their quality or their diversity.

CHAPTER 1. PRELIMINARIES

48

4. A subset generation method to operate on the reference set, to produce a subset of its
solutions as a basis for creating combined solutions.
5. A solution combination method to transform a given subset of solutions produced by the
subset generation method into one or more combined solution vectors.
This basic design starts with the creation of an initial set of solutions P, and then extracts
from it the Reference Set (RefSet) of solutions. The diversification generation method is used
to build a large set P of diverse solutions. The size of P (PSize) is typically at least 10 times
the size of RefSet. The reference set, RefSet, is a collection of both high quality solutions and
diverse solutions that are used to generate new solutions by way of applying the combination
method. In this basic design we can use a simple mechanism to construct an initial reference
set and then update it during the search. The size of the reference set is denoted by b =
b1 + b2 =| Ref Set |. The construction of the initial reference set starts with the selection of
the best b1 solutions from P. These solutions are added to RefSet and deleted from P. For each
solution in P - RefSet, the minimum of the distances to the solutions in RefSet is computed.
Then, the solution with the maximum of these minimum distances is selected. This solution
is added to RefSet and deleted from P and the minimum distances are updated3 . This process
is repeated b2 times, where b2 = bb1 . The resulting reference set has b1 high quality solutions
and b2 diverse solutions. Regardless of the rules used to select the reference solutions, the
solutions in RefSet are ordered according to quality, where the best solution is the first one in
the list. The simplest form of the subset generation method consists of generating all pairs
of reference solutions. That is, the method would focus on subsets of size 2 resulting in
(b2 b)/2 NewSubsets. The pairs in NewSubsets are selected one at a time in lexicographical
order and the solution combination method is applied to generate one or more trial solutions.
These trial solutions are subjected to the improvement method, if one is available. Then, the
reference set update method is applied once again. The basic procedure terminates after all
subsets in NewSubsets are subjected to the combination method and none of the improved
trial solutions are admitted to RefSet under the rules of the reference set update method.
Figure 1.17 graphically shows the interaction among these five methods and puts in evidence the central role of the reference set.
Of the five methods in the SS methodology, only four are strictly required. The improvement method is usually needed if high quality outcomes are desired, thus making the SS
design become a memetic algorithm [171], but a SS procedure can be implemented without
it.
1.3.2.1.

Component design

The SS methodology is very flexible, since each of its elements can be implemented in
a variety of ways and degrees of sophistication. Specific designs generally, but not always,
translate into higher complexity and additional search parameters. There is no simple recipe
that can be used to follow a predetermined order in which advanced strategies should be
added to progressively improve the performance of a SS implementation. Therefore, the
order in which these strategies are described in this section does not reflect their importance
or ranking.
3

In applying this maximum-minimum criterion, or any criterion based on distances, it can be important to
scale the problem variables, to avoid a situation where a particular variable or subset of variables dominates the
distance measure and distorts the appropriate contribution of the vector components.

1.3. EVOLUTIONARY COMPUTATION

49
Population P

Initialization
Diversication Generation
Method

Repeat until
|P| = PSize

Improvement
Method
Reference Set
Update Method

Improvement
Method
Solution Combination
Method

Scatter Search main loop


Subset Generation
Method

RefSet

Stop
conditions

Restart (No more new solutions)

End of SS run

Figure 1.17: The control diagram of SS.


Subset generation method Solution combination methods in SS typically are not limited to
combining just two solutions and therefore the subset generation method in its more general
form consists of creating subsets of different sizes. The SS methodology assures that the
set of combined solutions may be produced in its entirety at the point where the subsets of
reference solutions are created. Therefore, once a given subset is created, there is no merit
in creating it again. This creates a situation that differs noticeably from those considered in
the context of genetic algorithms, where the combinations are typically determined by the
spin of a roulette wheel. The procedure for generating subsets of reference solutions uses
a strategy to expand pairs into subsets of larger size while controlling the total number of
subsets to be generated. In other words, the mechanism avoids the extreme type of process
that creates all the subsets of size 2, then all the subsets of size 3, and so on until reaching the
subsets of size b1 and finally the entire RefSet. This approach would clearly not be practical,
considering that there are 1013 subsets in a reference set of a typical size b = 10. Even for
a smaller reference set, combining all possible subsets is not effective because many subsets
will be almost identical. The following approach selects representative subsets of different
sizes by creating subset types:
Subset type 1: all 2-element subsets.
Subset type 2: 3-element subsets derived from the 2-element subsets by augmenting
each 2-element subset to include the best solution not in this subset.
Subset type 3: 4-element subsets derived from the 3-element subsets by augmenting
each 3-element subset to include the best solution not in this subset.

50

CHAPTER 1. PRELIMINARIES
Subset type 4: the subsets consisting of the best i elements, for i = 5 to b.

RefSet updating The reference set is the heart of a SS procedure. If at any given time during
the search all the reference solutions are alike, as measured by an appropriate metric, the
SS procedure will most likely be incapable of improving upon the best solution found even
when employing a sophisticated procedure to perform combinations or improve new trial
solutions. The combination method is limited by the reference solutions that it uses as input.
Hence, having the most advanced combination method is of little advantage if the reference
set is not carefully built and maintain during the search.
In the SS basic design, the static update of the reference set is used after the application
of the combination method. Following a static update, trial solutions that are constructed
as combination of reference solutions are placed in a solution pool, denoted by Pool. After
the application of both the combination method and the improvement method, the Pool is
full and the reference set is updated. The new reference set consists of the best b solutions
from the solutions in the current reference set and the solutions in the pool, i.e., the update
reference set contains the best b solutions in RefSet Pool.
The alternative to the static update is the dynamic update strategy, which applies the
combination method to new solutions in a manner that combines new solutions faster than
in the basic design. That is, if a new solution is admitted to the reference set, the goal is to
allow this new solution to be subjected to the combination method as quickly as possible.
In other words, instead of waiting until all the combinations have been performed to update
the reference set, if a new trial solution warrants admission in the reference set, the set is
immediately updated before the next combination is performed. Therefore, there is no need
for an intermediate pool in this design, since solutions are either discarded or become part
of the RefSet as soon as they are generated. The advantage of the dynamic update is that if
the reference set contains solutions of inferior quality, these solutions are quickly replaced
and future combinations are made with improved solutions. The disadvantage is that some
potentially promising combinations are eliminated before they can be considered. The implementation of dynamic updating is more complex than its static counterpart. Also, in the
static update the order in which the combinations are performed is not important because
the RefSet is not updated until all combinations have been performed. In the dynamic updating, the order is quite important because it determines the elimination of some potential
combinations. Hence, when implementing a dynamic update of the reference set, it may be
necessary to experiment with different combination orders as part of the fine tuning of the
procedure.
RefSet rebuilding (Restart) This is an updating procedure that is triggered when no new
trial solutions are admitted to the reference set. This update adds a mechanism to partially
rebuild the reference set when the combination and improvement methods do not provide
solutions of sufficient quality to displace current reference solutions.
The RefSet is partially rebuilt with a diversification update that works as follows and assumes that the size of the reference set is b = b1 + b2 . Solutions xb1 +1 , . . . , xb are deleted from
RefSet. The diversification generation method is reinitialized considering that the goal is to
generate solutions that are diverse with respect to the reference solutions x1 , . . . , xb1 . Then,
the diversification generation method is used to construct a set P of new solutions. The b2
solutions xb1 +1 , ..., xb in RefSet are sequentially selected from P with the criterion of maximiz-

1.4. MACHINE LEARNING

51

ing the diversity. It is usually implemented with a distance measure defined in the context of
the problem being solved. Then, maximize the diversity is achieved by maximizing the minimum distance. The maximum-minimum criterion, which is part of the reference set update
method, is applied with respect to solutions x1 , . . . , xb1 when selecting solution xb1 +1 , then
it is applied with respect to solutions x1 , . . . , xb1 +1 when selecting solution xb1 +2 , and so on.
Diversity control Scatter search does not allow duplications in the reference set, and its
combination methods are designed to take advantage of this lack of duplication. Hashing
is often used to reduce the computational effort of checking for duplicated solutions. The
following hash function, for instance, is an efficient way of comparing solutions and avoiding duplications when dealing with problems whose solutions can be represented with a
permutations p of size m:
m

ip(i)2

Hash(p) =

(1.9)

i=1

While the simpler SS implementations are designed to check that the reference set does
not contain duplications, they generally do not monitor the diversity of the b1 high quality solutions when creating the initial RefSet. On the other hand, recall that the b2 diverse solutions
are subjected to a strict diversity check with the maximum-minimum criterion. A minimum
diversity test can be applied to the b1 high quality solutions chosen as members of the initial
RefSet as follows. After the P set has been created, the best solution according to the objective
function value is selected to become x1 in the reference set. Then, x1 is deleted from P and
the next best solution x in P is chosen and added to RefSet only if dmin (x) th_dist. In other
words, at each step we add the next best solution in P only if the minimum distance between
the chosen solution x and the solutions currently in RefSet is at least as large as the threshold
value th_dist.

1.4.

Machine Learning

Machine learning (ML) is a branch of artificial intelligence dealing with the construction
and study of systems that can learn from data. That is, ML aims at programming computers
to optimize a performance criterion using examples or past experience [5]. We need learning
in cases where we cannot directly write a computer program to solve a given problem, but
need numerical data or experience. Mitchell provided a more formal definition: A computer
program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience
E [153].
The core of ML deals with knowledge representation and generalization. Representation
of data instances and functions evaluated on these instances are part of all ML systems.
Generalization, in this context, is the ability of an algorithm to perform accurately on
new, unseen examples after having been trained on a learning data set. The core objective of a
learner is to generalize from its experience. The training examples come from some generally
unknown probability distribution and the learner has to extract something more general from
them, something about that distribution, that allows it to produce useful predictions in new
cases.

CHAPTER 1. PRELIMINARIES

52

ML and data mining [86] are two related research fields as they often employ the same
methods and overlap significantly. However, they differ on the basic assumptions they work
with: in ML, performance is usually evaluated with respect to the ability to reproduce known
knowledge, while in data mining the key task is the discovery of previously unknown knowledge. Nonetheless, other authors [5, 86, 230] define data mining simply as the application of
ML methods to large databases.
Based on the type of the considered inputs and on the desired outcome, there are mainly
three different types of ML methods:
Supervised learning generates a function that maps inputs to desired outputs (also called
labels, because they are often provided by human experts labeling the training examples). The supervised learning algorithm attempts to generalize a function or mapping
from inputs to outputs which can then be used to speculatively generate an output for
previously unseen inputs.
Unsupervised learning algorithms operate on unlabeled examples, i.e., input where the
desired output is unknown. They aim at finding hidden structures in unlabeled data
by analyzing the regularities in the input. In statistics, this is called density estimation.
A very extended method for density estimation is clustering where the aim is to find
clusters or groupings of the inputs.
Reinforcement learning is used in those applications where the output of the system is a
sequence of actions. In such a case, a single action is not important; what is important
is the policy, that is, the sequence of correct actions to reach the goal. Therefore, reinforcement learning is concerned with how intelligent agents ought to take actions in an
environment so as to maximize some notion of cumulative reward. The agent should
be able to assess the goodness of policies and learn from past good action sequences to
be able to generate a policy.

1.4.1.

Applications

ML has been applied to a large set of applications from different fields and research areas,
including computer vision, engineering, mathematics, physics, neuroscience, and cognitive
science. Among notable applications are natural language processing, IS, search engines,
credit card fraud detection, stock market analysis, recommender systems, information retrieval, robot locomotion, and DNA sequence mining. In most of the cases, however, these
problems are reduced to two categories: classification and regression. They are defined as
follows.
Classification is probably the most frequently studied problem in ML and it has led to a
large number of important algorithmic and theoretic developments over the past century. In
its binary form it reduces to the question: given a pattern x drawn from a domain X , estimate which value an associated binary random variable y 1 will assume. More formally,
classification is the problem of identifying to which of a set of categories a new observation
belongs, on the basis of a training set of data containing observations (or instances) whose
category membership is known. For instance, given pictures of apples and oranges, we might
want to state whether the object in question is an apple or an orange. An observation, (also
called instance, pattern, or example) is described by its features. These are the characteristics

1.4. MACHINE LEARNING

53

of the examples for a given problem. Thus, the input to a classification task can be viewed as
a two-dimensional matrix, whose axes are the examples and the features. The classification
problem can be split in two categories: binary classification and multiclass classification. In
binary classification only two classes are involved, whereas multiclass classification involves
assigning an object to one of several classes. Since many classification methods have been
developed specifically for binary classification, multiclass classification often requires the
combined use of multiple binary classifiers.
Regression is a problem where the output is a number, rather than a class. In this case the
goal is to estimate a real-valued variable y R given a pattern x. For instance, we might want
to estimate the value of a stock the next day, the yield of a semiconductor fabrication given
the current process, the iron content of ore given mass spectroscopy measurements, or the
heart rate of an athlete, given accelerometer data. One of the key issues in which regression
problems differ from each other is the choice of a loss. For instance, when estimating stock
values our loss for a put option will be decidedly one-sided. On the other hand, a hobby
athlete might only care that our estimate of the heart rate matches the actual on average.

1.4.2.

Methods

Although ML is a relatively young field of research, there exist a vast plethora of learning
algorithms. A selection of significant approaches is presented as follows.
Bayesian networks [109] are probabilistic graphical models that represents a set of random
variables and their conditional independences via a directed acyclic graph. They are composed of nodes and arcs between the nodes. Each node corresponds to a random variable,
X, and has a value corresponding to the probability of the random variable, P (X). If there is
a directed arc from node X to node Y , this indicates that X has a direct influence on Y. This
influence is specified by the conditional probability P (Y |X). Efficient algorithms exist that
perform inference and learning.
Multilayer perceptrons [21, 90] are artificial neural network structures forming nonparametric estimators that can be used for classification and regression. This learning algorithm
is inspired by the structure and functional aspects of biological neural networks. In a neural
network a set of input neurons is activated by an input pattern.The activations of these neurons are then passed on, weighted and transformed by a specific function to other neurons.
Finally one or more output neurons are activated to determine the output.
Support Vector Machines (SVMs) [25, 34, 51, 56] are a set of related supervised learning
methods used for classification and regression. SVMs are maximum margin methods that
allow the model to be written as a sum of the influences of a subset of the training instances.
The training instances are considered as points in space and mapped so that the examples of
the separate categories are divided by a clear gap that is as wide as possible. The separating
instances are called the support vectors. They are chosen to maximize the separability, or
margin, of the instances of the classes. If the problem is nonlinear, it is mapped to a new space
by doing a nonlinear transformation using suitably chosen basis functions (the kernels) and
then a linear model is used in this new space. The linear model in the new space corresponds
to a nonlinear model in the original space.

CHAPTER 1. PRELIMINARIES

54

Cluster analysis [69] is an unsupervised statistical classification technique in which observations are sub-divided into groups (clusters) so that observations within the same cluster
are similar according to some predesignated criterion or criteria, while observations drawn
from different clusters are dissimilar. It is a discovery tool that reveals associations, patterns,
relationships, and structures in unlabeled data. Popular methods are the Mixture Densities,
k-Means, and Expectation Maximization algorithms [5].
Decision trees are learning algorithms whereby the models are obtained by recursively
partitioning the data space and fitting a simple prediction model within each partition [136].
As a result, the partitioning can be represented graphically as a decision tree. This tree is
composed of internal decision nodes and terminal leaves, each one representing one of the
possible outputs. Given an input, at each node, a test is applied and one of the branches is
taken depending on the outcome. This process starts at the root and is repeated recursively
until a leaf node is hit. Classification trees are designed for dependent variables that take
a finite number of unordered values, with prediction error measured in terms of misclassification cost. Regression trees are for dependent variables that take continuous or ordered
discrete values, with prediction error typically measured by the squared difference between
the observed and predicted values.
Ensemble learning [122] is the focus of a research area dealing with the construction of
models composed of multiple learners that complement each other so that, by combining
them, higher accuracy is attained. A necessary condition for the approach to be useful is that
member classifiers should have a substantial level of disagreement, i.e., they must make uncorrelated errors with respect to one another apart from enforcing that each classifier should
perform better than a random guess [3, 87]. Two key points of multi-classifiers are the generation of base learners that complement each other and the combination of the outputs of
the base learners to maximize accuracy. Regarding the former point, different strategies were
considered in literature [5], such as employing different algorithms, different parameters for
the same algorithm, different input representations, and different training sets. Regarding
the combination of outputs, it is possible to follow a parallel approach, where the output of the
base learners are considered all together at the same time or a serial approach, where the next
base learner is trained with or tested on only the instances where the previous base-learners
are not accurate enough. The simplest way to combine multiple classifiers is by voting, which
corresponds to taking a linear combination of the learners. Conversely, bagging is a voting
method whereby base-learners are made different by training them over slightly different
training sets. Finally, in boosting, complementary base-learners are generated by training the
next learner on the mistakes of the previous learners.

1.4.3.

Random Forest

Among ensemble learning decision trees, Random Forest (RF) [27] is endowed with some
attractive capabilities. It is introduced in this section since it is employed as an important
part of the proposal explained in Chapter 3 of this dissertation. It combines bagging and
the random selection of features to construct a collection of decision trees with controlled
variation. RF is employed to construct a prediction rule in a supervised learning problem
(classification or regression) and to assess and rank variables with respect to their ability to

1.5. VISUAL DESCRIPTORS

55

predict the response. The latter is done by considering the so-called variable importance
measures that are automatically computed for each predictor within the RF algorithm [26].
Each tree of the ensemble is grown by RF as follows:
Let N be the number of examples in the training set. n instances are sampled at random
with replacement (that is, a bootstrap sample), from the original data. This sample will
be the training set for growing the tree.
If there are M input variables (features), a number m
M is specified such that at
each node, m variables are selected at random out of the M and the best split on these
m is used to split the node. The value of m is held constant during the forest growing.
Each tree is grown to the largest extent possible. There is no pruning.
To classify a new object from an input vector, the input vector is put down each of the
trees in the forest. Each tree gives a classification and we say the tree votes for that class. The
forest chooses the classification having the most votes (over all the trees in the forest).
Lowering the value of m reduces both the correlation among trees and the strength of
each tree, while raising it increases both. The optimal range of m is application dependent,
but it is usually quite wide. This is the only adjustable parameter to which random forest is
somewhat sensitive.
RF is particularly attractive for a large set of notable features. First of all, its accuracy is
comparable with the state-of-the-art learning algorithms while running efficiently on large
data bases. It can handle thousands of input variables without variable deletion, even in the
case of M < N [26]. Moreover, RF has an effective method for estimating missing data and
maintains accuracy when a large proportion of the data are missing. It gives estimates of what
variables are important in the classification and generates an internal unbiased estimate of the
generalization error as the forest building progresses. Finally, it has methods for balancing
error in class population unbalanced data sets.

1.5.

Visual descriptors

As a result of the massive generation of content in our modern society, the amount of
audio-visual information available in digital format is increasing considerably. Therefore, it
has been necessary to design some systems that allow us to describe the content of several
types of multimedia information in order to search and classify them.
In computer vision, Visual Descriptors (VDs) or image descriptors are descriptions of the
visual features of the contents in images or videos. They describe elementary characteristics
such as the shape, the color, the texture or the motion, among others [156, 223]. Since VDs
aim at encoding the contents description, they need to express the information variability by
being able to discriminate among objects or scenes.
Among image characteristics taken into account by general purpose VD low-level descriptors [61, 104] for IS the most common employed ones are:
Color is one of the most basic quality of visual content. Employed VDs describe the color
distribution or the color relation between sequences or group of images. Among the
former, color histogram is one of the most widely used color features. It is invariant to
image rotation, translation, and viewing axis. The effectiveness of the color histogram
feature depends on the color space used and the quantization method [223].

CHAPTER 1. PRELIMINARIES

56

Texture descriptors characterize image regions. Texture is an important feature of a


visible surface where repetition or quasi-repetition of a fundamental pattern occurs.
Three popular texture descriptions are the co-occurrence matrix [89], Gabor filters [62,
63], and local binary patters [169].
Edge detection [84] is employed to identifying points in a digital image at which the
image brightness has discontinuities. Edges are detected to capture important events
and changes in properties of the world. Popular techniques are the Sobel [84] and the
Canny [29] edge detectors. More recently, the histogram of oriented gradients [57], a
VD used in object detection, has been designed as based on gradient directions, that is
edge orientations.
Shape contains important semantic information due to the possibility to discriminate
objects through their shape. Shape features can be classified into global and local features [223]. Global features are the properties derived from the entire shape. Examples
of global features are roundness or circularity, central moments, eccentricity, and major
axis orientation. Local features are those derived by partial processing of a shape and
do not depend on the entire shape. Examples of local features are size and orientation
of consecutive boundary segments, points of curvature, corners, and turning angle.
Since the aim of this PhD dissertation is to introduce general purpose IS methods based on
DMs, we will only focus on those descriptors that we will use in the rest of this dissertation.

1.5.1.

Haralick features

To characterize texture, Haralick [88] introduced an VD considering texture tonal primitive properties as well as the spatial interrelationships between them. Haralick considered
texture-tone as a two-layered structure, the first layer having to do with specifying the local
properties which manifest themselves in tonal primitives and the second layer having to do
with specifying the organization among the tonal primitives.
If we define I(x, y), where 0 < x < W 1, 0 < y < H 1, with W and H width and height
of the image in pixels, the gray level of pixel (x, y) and we consider it as a random variable,
it is possible to calculate first and second order probability density functions.
1.5.1.1.

First order statistics

First order statistics measure the probability of observing a gray level in a random part
of the image. They can be calculated from gray level intensities histogram of the image.
First order statistics only depend on the individual pixel values, not on interactions or cooccurrences of pixel values in a neighborhood [213]. For instance, the mean gray values is a
first order statistic of the image.
Assuming an image has NG gray levels, that is, every pixel has a value i [0, NG 1], it
is possible to define the non-normalized image histogram as
P (i) =

N (i)
N

NG 1

P (i) = 1
i=0

1.5. VISUAL DESCRIPTORS

57

where N (i) is the number of pixels in the image whose values is i and N defines the total
number of pixel in the image. Fig. 1.18 shows an example image. Fig. 1.19 depicts the gray
level histogram of Fig. 1.18 while the relative numeric representation is shown in Table 1.1.

3
2
0

Frequency

Figure 1.18: Example image with 4 4 pixels and 4 gray levels.

Figure 1.19: Gray level histogram of image in Fig. 1.18.

0
1
2
3

0
0
0
0
2

1
0
0
2
2

2
1
1
2
3

3
1
1
2
3

Table 1.1: Numeric form of the image in Fig. 1.18.


The first order statistical parameters are detailed as follows.
Mean gray level. Low values of this parameter imply a dark image, while high values a
bright one.
NG 1

i P (i)

=
i=0

CHAPTER 1. PRELIMINARIES

58

Gray level variance. This parameter shows how much the brightness of the image deviates from the mean value.
NG 1

2 =

(i )2 P (i)
i=0

Coefficient of variation. This parameter is a normalized measure of dispersion of a probability distribution or frequency distribution.
cv =

Skewness. This parameter is a measure of the extent to which a probability distribution


of a real-valued random variable leans to one side of the mean. The skewness value can be
positive or negative, or even undefined.
Sk =

1
3

NG 1

(i )3 P (i)
i=0

Kurtosis. This parameter is any measure of the peakedness of the probability distribution of a real-valued random variable. If Ku > 0 the obtained histogram is strongly multimodal while for Ku < 0 it is similar to a Gaussian.
Ku =

1
4

NG 1

[(i )3 P (i) 3]
i=0

Energy. This parameters gives a measure of the non-uniformity of the histogram.


NG 1

P (i)2

e=
i=0

Entropy. Differently from energy, this parameters gives a measure of the uniformity of
the histogram.
NG 1

P (i)ln[P (i)] , with 0 s lnNG

s=
i=0

1.5.1.2.

Second order statistics

Second order statistics are defined as the probability to observe a pair of gray level values
at the ends of a dipole of given length, placed in a given position and orientation in the image
plane. These are properties of a pair of pixel values [88, 213].
Gray Level Co-occurrence Matrix Gray Level Co-occurrence Matrix (GLCM) is a matrix
defined over an image to show the distribution of co-occurrent values at a given offset [85].
More rigorously, a GLCM C is defined over an image I of size n m and parametrized by an
offset (x, y) as:
n

Cx,y (i, j) =
p=1 q=1

1, if I(p, q) = i and I(p + x, q + y) = j


0, else

1.5. VISUAL DESCRIPTORS

59

This matrix can be calculated for every image color depth. However, it is worth to note that
a 32-bit GLCM would be of size 232 232 , making it unfeasible to tackle with.
The parametrization (x, y) makes the GLCM matrix non-invariant to rotation. By
choosing an offset vector, an image rotation different from 180 degrees will give rise to a
different co-occurrences distribution for the same rotated image. This is an undesirable property in texture analysis. Therefore, the GLCM matrix is calculated using a set of offset vectors
covering a 180 degree range (i.e. at 0, 45, 90, and 135 degrees) at the same distance, to obtain
a certain degree of rotational invariance.
To show how to compute a GLCM we will employ as reference Fig. 1.18. This image has
only four different gray intensities (0, 1, 2, 3). The associated GLCM will be, therefore, a 4 4
matrix.
By fixing the offset vector size to 1 and the rotating direction to right, we will then consider
the pairs of neighboring pixels (offset = 1) in which the reference pixel is on the left. This
configuration is called (1, 0). Subsequently, every image pixel having a pixel at its right (last
column on the right is excluded) is considered as reference pixel, computing, this way, a
squared 4 4 matrix. The values on the rows indicate the reference pixels, while values on
columns indicate the neighboring pixel. The resulting matrix will count the gray intensities
pairs occurrences and is shown in Fig. 1.2.

0
1
2
3

0
(0,0)
(1,0)
(2,0)
(3,0)

1
(0,1)
(1,1)
(2,1)
(3,1)

2
(0,2)
(1,2)
(2,2)
(3,2)

3
(0,3)
(1,3)
(2,3)
(3,3)

Table 1.2: Pairs of neighboring intensities considered in the GLCM of Table 1.1.
By applying this algorithm to the example image, we will obtain the matrix in Table 1.3.
The values in this matrix are the counts of adjacencies of the gray intensity of the reference
pixel (rows) with respect to the neighbors (columns), as shown in Table 1.2.
2
0
0
0

2
2
0
0

1
0
3
0

0
0
1
1

Table 1.3: GLCM given by the offset vector (1, 0). This matrix is not symmetric.
GLCMs usually employed in textural analysis are symmetric and rotation-invariant. The
GLCM obtained in the example (Tables 1.2 and 1.3) does not satisfy this property as only the
offset vector (1, 0) has been used to generate it. In order to obtain symmetric and rotationinvariant matrices, it is mandatory to also consider different rotations of the offset vector.
Applying 45 degree rotations to the same example implies the following offsets: (1,0), (-1,0),
(0,1), (0,-1), (1,1), (-1,-1), (1,-1), (-1,1). The resulting symmetric GLCM is shown in Table 1.4.
After calculating the GLCM matrix in the way shown earlier, the matrix has to be rescaled,
so that it can be considered as a probability density function, that is, it has to satisfy the
G
property N
i,j=1 Pij = 1. The rescaled version of matrix in Table 1.4 is shown in Table 1.5.

CHAPTER 1. PRELIMINARIES

60
16
4
6
0

4
12
5
0

6
5
12
6

0
0
6
2

Table 1.4: GLCM calculated by using all the offset vectors. The resulting matrix is symmetric.
0.1905
0.0476
0.0714
0.0000

0.0476
0.1429
0.0595
0.0000

0.0714
0.0595
0.1429
0.0714

0.0000
0.0000
0.0714
0.0238

Table 1.5: GLCM with rescaled values.


After rescaling the GLCM, it is possible to obtain some second order statistical parameters. They are detailed as follows.
Energy. This parameter is a function of the homogeneity of the texture. Given a high
textural homogeneity, the GLCM will have values distributed in a quite uniform way among
the cells, with a resulting low energy.
NG 1 NG 1

Pij2 , NG1 E 1

E=
i=0

j=0

Entropy. Differently from Energy, this parameter provides high figures when the values
of Pij are uniformly distributed over the whole matrix.
NG 1 NG 1

Pij ln[Pij ] , 0 S lnNG2

S=
i=0

j=0

Contrast. The contrast is a measure of the dispersion level of the GLCM. The higher the
dispersion, the higher the parameter value, a measure of the second order diagonal moment
of the matrix. Therefore, this parameter can be considered as a contrast measure.
NG 1 NG 1

(i j)2 Pij

I=
i=0

j=0

Homogeneity. This parameter gives a measure of the texture homogeneity, a different


formulation of the energy formulation.
NG 1 NG 1

H=
i=0

1.5.2.

j=0

Pij
1 + |i j|

Gabor filters

A Gabor filter [62, 63] is a filter used for edge detection. Image analysis by the Gabor
functions is similar to perception in the human visual system as it was discovered that the

1.5. VISUAL DESCRIPTORS

61

visual cortex response can be modeled by Gabor filters [62,63,144]. They have found particularly appropriate for image representation [128]. A Gabor filter is the product of an elliptical
Gaussian envelope and a complex plane wave [131], defined as
s,d (x, y) = k (z) =

||k||
||k||2 ||z||2

exp

2
2 2

exp ik z exp

2
2

where z = [x, y] is the variable in a spatial domain and k is the frequency vector which determines the scale and orientation of Gabor filters, k = ks eid , where ks = kmax /f s , kmax = /2,
f = 2, s = 0, 1, 2, 3, 4 and d = d/8, for d = 0, 1, 2, 3, 4, 5, 6, 7. In the equations the parameter
s represents the scale while d represents the orientation. Examples of the real part of Gabor
filters are presented in Fig. 1.20, where Gabor functions (full complex functions) are with
five different scales and eight different orientations, making a total of 40 Gabor functions.
The number of oscillations under the Gaussian envelope is determined by = 2. The term
exp( 2 /2) is subtracted in order to make the kernel free of a continuous component, and
thus insensitive to the average illumination level [131].

Figure 1.20: The real part of Gabor filters in five different scales and eight different directions
(image taken from [131]).

1.5.3.

Local binary patterns

Local Binary Patterns (LBPs) [169] are a feature used for texture classification in computer
vision. LBPs are a particular case of the Texture Spectrum model proposed in [220]. The LBP
feature vector is created in the following manner:
Divide the examined window into cells (e.g. 16 16 pixels for each cell).
For each pixel in a cell, compare the pixel to each of its 8 neighbors (on its left-top, leftmiddle, left-bottom, right-top, etc.). Follow the pixels along a circle, i.e. clockwise or
counter-clockwise.
Where the center pixel value is greater than the neighbor value, write 1, otherwise,
write 0. This gives an 8-digit binary number, which is usually converted to decimal for
convenience.

CHAPTER 1. PRELIMINARIES

62

Compute the histogram, over the cell, of the frequency of each number occurring, i.e.
each combination of which pixels are smaller and which are greater than the center.
Optionally normalize the histogram.
Concatenate normalized histograms of all cells, which gives the feature vector for the
window.

1.5.4.

Histogram of oriented gradients

Histogram of Oriented Gradients (HOG) [57] are feature descriptors used in computer
vision and image processing for the purpose of object detection. The method counts occurrences of gradient orientation in localized portions of an image. This method is similar to that
of edge orientation histograms [75, 76], Scale-Invariant Feature Transform (SIFT) descriptors
[137], and shape contexts [18], but differs in that it is computed on a dense grid of uniformly
spaced cells and uses overlapping local contrast normalization for improved accuracy.
The essential thought behind the HOG descriptors is that local object appearance and
shape within an image can be described by the distribution of intensity gradients or edge
directions. The HOG descriptor maintains a few key advantages over other descriptor methods. Since the HOG descriptor operates on localized cells, the method upholds invariance
to geometric and photometric transformations, except for object orientation. Such changes
would only appear in larger spatial regions.
The implementation of these descriptors can be achieved by dividing the image into small
connected regions, called cells, and compiling a histogram of gradient directions or edge orientations for the pixels within each cell. The combination of these histograms then represents
the descriptor. For improved accuracy, the local histograms can be contrast normalized by
calculating a measure of the intensity across a larger region of the image, called a block, and
then using this value to normalize all cells within the block. This normalization results in
better invariance to changes in illumination or shadowing.
Finally, it has further been determined that when HOG is combined with the LBP, it improves the detection performance considerably on some datasets [222]. The calculation of the
algorithm is detailed as follows.
1. Gradient computation: as in many feature detectors in image pre-processing the first
step is to ensure normalized color and gamma values. However, the authors in [57]
point out this step can be omitted in HOG descriptor computation, as the ensuing descriptor normalization essentially achieves the same result. Image pre-processing thus
provides little impact on performance. Instead, the first step of calculation is the computation of the gradient values. The most common method is to simply apply the 1-D
centered, point discrete derivative mask in one or both of the horizontal and vertical
directions. Specifically, this method requires filtering the color or intensity data of the
image with the following filter kernels:
[1, 0, 1] and [1, 0, 1]T .

(1.10)

2. Orientation binning: The second step of calculation involves creating the cell histograms. Each pixel within the cell casts a weighted vote for an orientation-based
histogram channel based on the values found in the gradient computation. The cells

1.5. VISUAL DESCRIPTORS

63

themselves can either be rectangular or radial in shape, and the histogram channels are
evenly spread over 0 to 180 degrees or 0 to 360 degrees, depending on whether the gradient is unsigned or signed. Dalal and Triggs [57] found that unsigned gradients used in
conjunction with 9 histogram channels performed best in their human detection experiments. As for the vote weight, pixel contribution can either be the gradient magnitude
itself or some function of the magnitude. In actual tests the gradient magnitude itself
generally produces the best results.
3. Descriptor blocks: In order to account for changes in illumination and contrast, the
gradient strengths must be locally normalized, which requires grouping the cells together into larger, spatially connected blocks. The HOG descriptor is then the vector of
the components of the normalized cell histograms from all of the block regions. These
blocks typically overlap, meaning that each cell contributes more than once to the final descriptor. Two main block geometries exist: a) rectangular R-HOG blocks and
b) circular C-HOG blocks. R-HOG blocks are generally square grids, represented by
three parameters: (i) the number of cells per block, (ii) the number of pixels per cell,
and (iii) the number of channels per cell histogram. In the human detection experiment
performed in [57], the optimal parameters were found to be 33 cell blocks of 66 pixel
cells with 9 histogram channels. Moreover, it was found that some minor improvement
in performance could be gained by applying a Gaussian spatial window within each
block before tabulating histogram votes in order to weight pixels around the edge of the
blocks less. The R-HOG blocks appear quite similar to the SIFT descriptors. However,
despite their similar formation, R-HOG blocks are computed in dense grids at some
single scale without orientation alignment, whereas SIFT descriptors are computed at
sparse, scale-invariant key image points and are rotated to align orientation. In addition, the R-HOG blocks are used in conjunction to encode spatial form information,
while SIFT descriptors are used singly.
C-HOG blocks can be found in two variants: (i) those with a single, central cell, and
(ii) those with an angularly divided central cell. In addition, these C-HOG blocks can
be described with four parameters: (i) the number of angular bins, (ii) the number of
radial bins, (iii) the radius of the center bin, and (iv) the expansion factor for the radius
of additional radial bins.
4. Block normalization: The authors explored four different methods for block normalization in [57]. Let v be the non-normalized vector containing all histograms in a given
block, v k be its k-norm for k = 1, 2, and e be some small constant (the exact value,
hopefully, is unimportant). Then the normalization factor can be one of the following:
L2-norm: f =

v
v

2 +e2
2

L2-hys: L2-norm followed by clipping (limiting the maximum values of v to 0.2)


and renormalizing,
L1-norm: f = v v1 +e , and
L1-sqrt: f =

v
v

1 +e

In addition, the scheme L2-hys can be computed by first taking the L2-norm, clipping
the result, and then renormalizing. In the experiments developed, it was found the L2hys, L2-norm, and L1-sqrt schemes provided similar performance, while the L1-norm

64

CHAPTER 1. PRELIMINARIES
provided slightly less reliable performance. Anyway, all four methods showed very
significant improvement over the non-normalized data.

CHAPTER

New advances in Topological Active


Nets: model extension and global
optimization
2.1.

Introduction

As pointed out in Sec. 1.1.1.3, TANs are promising parametric DMs that integrate features
of region-based and boundary-based segmentation techniques. However, TANs have two
strong limitations. On the one hand, the model is complex and has limitations regarding
topological changes, local deformations, and the definition of the energy functional. On the
other hand, the TAN optimization strategy, a Best Improvement Local Search (BILS) [9], can
lead to result inaccuracies, that is, local optima in the sense of optimization.
The limitations of the model itself have only been superficially addressed in the TAN literature. In fact, only a few works dealing with TANs have been developed during the last
ten years [14, 102, 163, 167, 232]. In addition, those works did not focus on the said TAN
model drawbacks and limitations. Conversely, the main approach developed to overcome
the limitations of the optimization method has been endowing TANs with a global search
framework, considering multiple alternatives in the segmentation process. Actually, the use
of two Memetic Algorithms (MAs) [171] was introduced in [102] and [167], respectively based
on Genetic Algorithms (GAs) [67] and Differential Evolution (DE) [183]. Although the segmentation results obtained by the MAs for TAN optimization improve the BILS approach,
their applicability is still limited in real world and complex synthetic images. In particular,
those proposals failed to design proper evolutionary operators able to effectively combine
nets and consequently required very large populations of solutions to operate, thus neglecting the main advantage of a global search approach. In addition, they lack a proper energy
definition for a global optimization scenario.
This chapter deals with the work carried out to address the TAN limitations discussed
so far. On the one hand, we deal with the limitations of the TAN model presenting an enhanced version of it, the Extended Topological Active Net (ETAN). The ETAN model aims
at overcoming the limitations of TANs while keeping their promising features. To do so, we
combine the best capabilities of two different kinds of DMs, TANs and Extended Vector Field
65

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


66
AND GLOBAL OPTIMIZATION
Convolution snakes [129, 185], as well as we design some specific components. In particular,
we develop novel mechanisms tackling topological changes including external and internal
link cuts, we propose a new external energy term to properly guide the model in case of complex concavities and highly non-convex shapes, we introduce node movement constraints to
avoid crossing links, and we design a new local search procedure (named Extended Best Improvement Local Search (EBILS)) incorporating heuristics to correct the position of eventually
misplaced nodes. Moreover, a new automatic pre-processing phase has been designed.
On the other hand, we aim at providing an accurate, quick and robust segmentation technique by endowing ETANs with an effective global search method. To do so, we embed
ETANs in a flexible and powerful MA framework, Scatter Search (SS) [123], carefully designing specific components to avoid getting stuck in local optima while considering small
populations.
The structure of this chapter is as follows. Sec. 2.2 summarizes the main global optimization approaches dealing with TANs selected from the state of the art. Sec. 2.3 introduces the
new ETAN model while Sec. 2.4 deals with the evaluation of its performance in comparison
with other local search methods. Sec. 2.5 describes the SS-based global search framework
and Sec. 2.6 is devoted to the evaluation of its performance in comparison with other global
search methods along with the ETAN algorithm. Finally, Sec. 2.7 summarizes some conclusions on the carried out work.

2.2.

Background

Apart from the original publication [28] and a few minor proposals [9, 232] based on the
use of a local search, almost all the existing approaches focused on the optimization process
by means of EAs. In particular, the initial evolutionary-based proposals [101,159,161,162,192]
optimize TAN (and TAV) node locations towards the best possible segmentation by means of
a GA. These pure evolutionary approaches show the problem that the topology of the meshes
cannot be changed to obtain a better adjustment. To avoid this limitation, MAs hybridizing
EAs with the BILS algorithm proposed in [9] were used in subsequent proposals [102, 160,
163, 167]. Following a Lamarckian strategy, all the changes made by the BILS procedure are
reverted to the original genotypes. This way, topological changes (link cuts and automatic
net division) in the net structure are possible through the use of the BILS algorithm. Among
these proposals, GAs were employed in [102, 160], the multiobjective GA SPEA2 in [163],
and DE in [167]. Finally, all TAN-based proposals need an an experimental tuning of the
energy parameters to obtain correct segmentations. This tuning must be done for each kind
of image. To approach this problem, a few works optimize TAN [163] and TAV [165, 168]
energy term weights automatically. They can jointly optimize the energy term weights and
search for the best node locations by means of an evolutionary multi-objective approach [39].
By considering the optimization of several objectives in parallel, the used SPEA2 algorithm
is able to search for the Pareto optimal individuals.
Given the ubiquity of global optimization-based proposals in TAN/TAV literature, we
focus on reviewing these works. In the following subsections we deal with the design details
of the EA-based proposals by carefully studying their different codifications, the employed
operators, and the designed fitness functions. Finally, a critical discussion about the different
TAN/TAV design strategies will be developed in the last subsection.

2.2. BACKGROUND

2.2.1.

67

Codification

For all the existing approaches, each chromosome represents a single TAN (or TAV) definition. It is composed of the Cartesian coordinates (x, y) (and also z in the TAV case) of every
node in the TAN (TAV). A TAN chromosome has two genes for each TAN node, one for its x
coordinate and another for its y coordinate, both encoded as integer values. Similarly, three
real-coded genes encode the x, y and z coordinates of each node in TAVs. This information
is denoted as A in Fig. 2.1.

Figure 2.1: The coding scheme of EA-based proposals.


In addition, since all the memetic approaches consider topological changes in each individual, a second chromosome codifies the model topology. Therefore, for each node, this additional chromosome encodes the statuses of the links connecting to the neighboring nodes,
the kind of node (internal or external), and the priority for a link to be cut in the node. This
information is denoted as B in Fig. 2.1. The case of DE approaches in [164, 166, 167] represents the only exception, as all the individuals share the same topology, defined by the best
individual of the population.

2.2.2.

Operators

From a global optimization perspective, there are several constrains inherent to the topology of the active net models. Nodes cannot be located over the same pixel (voxel in TAVs),
link crosses are not allowed, and threads have to be avoided for a proper mesh division.
In particular, most of the existing approaches [101, 102, 160, 162, 163] proposed the use of
an arithmetical crossover [94] to avoid crossings by taking into account the crossing restriction. It defines the new genes as the average between the corresponding values in the two
parent chromosomes, by considering a linear combination of the parents. However, in those
approaches encoding the topology, this crossover can lead to offsprings having crossings in
their connections when combining nets with different topologies. The probability of this undesirable event increases with the number differences between the topologies of the parent
nets. A solution to this problem was to allow the crossover to be performed only between
nets with the same topology (mating restriction). A more complex crossover was designed
in [192]. A crosspoint is first selected at random (C1 ). Then, points C2 and C3 are selected
at random from the upper left and lower right rectangles of C1 . Using those points as the
border, the crosspoints of the individual pair are exchanged in 14 different ways resulting
in 14 kinds of new network shapes (see Figure 2.2). Among those, the one with the smallest energy Ei is defined as the new individual (child) generated by the crossover. However,
this operator was designed to solve a completely different problem, stereo matching. By its
definition is is clear that in the IS problem it will generate a large amount of unfeasible nets.
The mutation operator employed in the majority of the GA-based approaches [101, 102,
162, 163] allows a node mutation to all the possible positions, without any crossing in the
TAN, thanks to an ad hoc design. The basic idea is to compute the area of the 4 polygons
formed by the 8 neighboring nodes and the central node (the mutating one), as Figure 2.3

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


68
AND GLOBAL OPTIMIZATION

Figure 2.2: Sakaues crossover [192].

shows. If the sum of the 4 subareas is the same before and after the mutation, the mutation
is correct as it will not produce any crossing. The same idea was also extended to 3D for TAV
models in [160]. The only GA-based approach that does not make use of this operator [192]
does not employ any mutation operator at all. Meanwhile, in DE-based approaches [167], if a
component of a mutant vector (candidate solution) goes off its limits, then the component is
set to the bound limit. In this application it means that, in order to avoid crossings in the net
structure, each node coordinate cannot overcome the limits established by its neighbors. The
neighboring nodes set the boundaries of the area where a node can move as result of the DE
formulae for each candidate solution. So, in a given direction, the new coordinate of a node
cannot exceed the nearest coordinate of its 3 neighboring nodes in that direction.

Figure 2.3: Ibez et al. mutation operator [102]. The left figure shows the area of the 4
polygons before the mutation. The central figure shows a correct mutation. In this case,
the addition of the areas remains unchanged. The right figure is an example of an incorrect
mutation.

2.2. BACKGROUND

69

Some of the existing works also made use of one or more of the ad hoc operators proposed
in [102]: the spread and shift operators, and the group mutation. They have been employed
to avoid the difficulties found in the application of a GA to the problem and to improve the
evolutionary optimization process. The spread operator stretches the TAN in a given direction with the aim of maintaining the diversity of sizes in the population. The group mutation
operator simultaneously modifies a group of neighboring nodes, in the same direction and
with the same value. This group of nodes are delimited by two rows and two columns randomly chosen. The shift operator is used in the exploration stage and translates the net to
another position in the image.
Concerning the selection mechanism, almost all of the approaches use a tournament selection with a window size of 3% the population size. We only found one exception, the
classical roulette wheel selection employed in [192].
Finally, the way the initial population is originated is, again, the same for the majority
of the evolutionary-based approaches. A random set of TANs (or TAVs) with rectangular
(or cubic in the case of TAVs) shapes is created in the first generation. Both uniform and
random distances between consecutive rows and columns are considered [102]. Since in [162]
the objective target was a circular-shape object (an eye iris), the authors modified the latter
approach by randomly creating nets with all the external nodes uniformly distributed in a
circular way, instead of using rectangular initial nets.

2.2.3.

Fitness Function

In all the approaches, the fitness function is set equal to the energy function of the model.
However, in [163] the term Distance In-Out (IOD) was introduced and used in all subsequent
publications of the same group. It is added separately to both cases of Eq. 1.3 and acts as
a gradient: for the internal nodes (IODi) its value minimizes towards brighter values of the
image, whereas for the external nodes its value (IODe) is minimized towards low values (the
background). More formally:

min
d, d = ||v(r, s) v(p, q)|| for external nodes

v(p,q), I(v(p,q))<U
IOD(v(r, s)) =

min
d, d = ||v(r, s) v(p, q)|| for internal nodes

v(p,q), I(v(p,q))>U

where ||v(r, s) v(p, q)|| represents the Euclidian distance between these points, I(v(p, q)) is
the intensity in the original image and U and U are thresholds. This term was also extended
to the TAV model in [168].
Most of the approaches need to define two steps in the evolutionary process, with two different sets of energy term weights [102,160,162]. In the first step, the energy parameters allow
the nodes to be outside the image without a high penalization, so the model can cover the
image in the first generations. In the second step, the parameter values are changed to search
for a more homogeneous distribution of the internal nodes, to reduce the size of the net, and
to adapt it to the image. Similarly, in [163], an evolutionary multi-objective framework was
proposed that, instead of changing the value of the different parameters, uses different objectives in each phase. While the GD and the IO energy terms (see Sec. 1.1.1.3) are used in
the first phase, the GD, the IOD and the internal energy terms are employed in the second
one.
When tackling IS as an energy minimization task, it is mandatory to impose a strong
correlation between the fitness function values and the quality of the obtained segmentation

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


70
AND GLOBAL OPTIMIZATION
results. Failing to do so will certainly lead to suboptimal segmentation results. Therefore,
it is not surprising that the most suitable fitness functions for the mesh adjustment to the
object contour and the evaluation of a segmentation through its mesh position do not match.
A common problem to the GA-BILS memetic approaches is the difficulty to maintain fitness
coherence in the population individuals as a result of topological changes in some of them.
To deal with that, a node that turns from internal to external after a link cut is only considered
as external for fitness computation when it is over the object edges. The DE-based approach
[167] does not employ this mechanism. In this case all the nets inherit the topology of the
best individual.
Finally, a few approaches designed specific energy terms to address specific IS problems.
For example, in [162] a TAN is employed to detect the optic disc in digital retinal images.
To simplify that aim, a new component was added to give priority, in fitness terms, to the
active nets of the genetic population with a circular structure (the shape of the optic disc).
The authors make use of the average radius r, calculated as the average distance between the
centroid of the whole active net and the external nodes. Then, they accumulate the sum of
the differences between r and the distance from the net center to all the external nodes. The
term is defined as CS[v(r, s)] = cs |(|v(r, s) v|) r|, where v is the center of the mesh
and r is the average radius to all external nodes. Therefore, nets with a circular structure
will have less energy than others. The parameter cs weights the energy term. The authors
also added another term to favor the location of external nodes in dark pixels and internal
nodes in bright ones. This is done because in retinal images the optic disc has brighter intensities than the area enclosing it. Meanwhile, in [192] the authors applied TANs to the stereo
matching problem. Two networks are provided to search for the corresponding points in the
two images. The fitness is calculated using the energy Ei of each node i, the mean energy
mean
Emean of the set, and the standard deviation , as 0.5 scale x Ei E
, where scale is a pa
rameter that controls the range of the fitness. In that case, the energy is calculated as follows
Eext = |IR (u(p, q)) IL (v(p, q))|2 where IR and IL are the gray levels of the right and the left
images, respectively.

2.2.4.

Critical Discussion

As said, only a few works dealing with active nets have been developed during the last ten
years by a reduced number of researchers. Those works have mainly focused on the proposal
of global optimization techniques, however just a few of them [102, 160, 163] have provided
solutions to the main active nets drawbacks and limitations: topological changes, external
energy definition, and local deformations. Potential applications of evolutionary active nets
special properties have not been explored yet. However, they have already been applied to
different problems such as iris location [162], stereo matching [192], and the segmentation of
different structures (knee and lung) in medical CT images [102, 160, 163, 167] with successful
results.
In the TAN/TAV literature, each new solution has been compared against the previous
ones. However, those comparisons were in most of the cases developed using a small set of
images and a visual assessment of the results. In [102] the authors evaluate the segmentation
results of the BILS, the GAs, and the MAs over 7 synthetic images and 3 CTs of human bones.
The evaluation of the results is divided in the following categories: parameter values sensitivity, segmentation of fuzzy edges, sensitivity to noise, net division capability, adjustment to
non-uniform objects, and computation time. The memetic approach overcomes the other two

2.3. EXTENDED TOPOLOGICAL ACTIVE NET

71

in all the categories except the computation time. In [163], the results of the multi-objective
approach over 7 synthetic, 2 CTs of the knee, and 5 retinal images indicated that this approach
outperforms the previous BILS and evolutionary approaches. It obtains nets with more homogeneous distributions of the nodes while having a very low sensitivity to noise. Anyway,
the main advantage is the automatic tuning of the weights. However, this algorithm involves
a higher computation time. In [167], the new approach was tested over a set of 20 images.
This set is composed of 2D and 3D, synthetic and medical images, with one or more target
objects. The results obtained applying 10 to 20 runs of the proposed method, in comparison
to the one developed in [102, 160], showed faster convergence and better outcomes.
Although the segmentation results obtained by all the evolutionary approaches improve
over the BILS approach, their applicability is still limited in real world and complex synthetic
images. In particular, the existing proposals failed to design proper evolutionary operators
able to effectively combine nets and consequently required very large populations of solutions to operate. Indeed, it is a huge effort since the employed EAs used populations of 1000
individuals, each one storing a complete mesh and relative topology. In addition, they lack
a proper energy definition for a global optimization scenario.

2.3.

Extended Topological Active Net

This section is devoted to the description of the new ETAN model features and properties.
First, we will describe the extensions we researched for the model itself. Then, we will detail
the optimization process and, finally, we will deal with some relevant complementary tasks.

2.3.1.

Model

As seen in Sec. 1.1.1.2, the performance obtained by the GVF and VFC-snakes is encouraging. However, they are one-dimensional DMs. For this reason, while they show good
capabilities of segmenting the contours of the objects, they lack information about the inner
part of the objects. Conversely, the bi-dimensional structure of TANs allows them to segment
the internal features of the objects, for instance holes. Therefore, in this work we propose an
ETAN model with the aim of endowing this DM with an advanced distance to the gradient
external energy term derived from the VFC field. Furthermore, this extended model will
incorporate a new method to handle topological changes (including holes), an enhanced local search, and an improved image filtering mechanism. We will describe all of them in the
following sections.
2.3.1.1.

Energy

The VFC field has some advantages over the GVF field: a better convergence to boundary
concavities, a reduced computational cost, robustness to noise, and the flexibility to be easily tailored upon a particular application [185]. However, when dealing with highly convex
shapes, the VFC field forms an area where the forces point in opposite directions and the
snake stops before getting into the concavity [185].
In the following subsections we will deal with the tailoring of the VFC force, some techniques to mix the VFC with a standard gradient of the edge map, and a new energy definition
for TANs based on VFC.

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


72
AND GLOBAL OPTIMIZATION
The anisotropic tailoring driven by the image Nevertheless, as introduced in [129], the
VFC field can be tailored by adding an anisotropic term that incorporates a displacement of
the vector field upon a certain direction.
The modified VFC magnitude function is:
m0 (x, y) = c0 (x, y)m(x, y)
where c0 (x, y) is the anisotropic term
c0 (x, y) =

1
,
2 d n(x, y)

with d being a unit vector representing the displacement direction and denoting the vector
dot product. If n(x, y) and d have the same direction, c0 (x, y) is close to 1; if they have opposite
direction, than c0 (x, y) is close to 1/3.
In [185] the authors extended the initial tailoring deriving the anisotropic term from the
image itself, with the aim of drawing a vector field suitable for the segmentation of highly
non-convex shapes. The idea is convolving the blurred gray-level image instead of the edge
map with the kernel to generate the anisotropic term. Fig. 2.4 shows an example of this.
In order to derive the anisotropic term from the image itself with the aim of drawing a
vector field suitable for the segmentation of highly non-convex shapes, in [185] the authors
also introduced an algorithm to generate the result from the gray-level image.
The conceptual transition from d to d(x, y) extended the initial tailoring idea to derive the
following displacement vector map:
d(x, y) = (G1 I) k(x, y)
where k(x, y) is given by Eq. (1.2). The image I is scaled in the range [0, 1] and the displacement vector map is in the following way:
d1 (x, y) = z

d(x, y)
,
|d(x, y)|

where z is the scaling factor. A typical value for z is 0.5. The displacement vector map for the
image shown in Fig. 2.4(a) is depicted in Fig. 2.4(d).
This displacement vector map is used to generate the anisotropic term and the VFC field:
c1 (x, y) =

1
.
2 d1 (x, y) n(x, y)

The anisotropic term for the displacement vector map shown in Fig. 2.4(d) is depicted in Fig.
2.4(e).
Given the nature of the convolution, it is very important that the features of interest in
the image are brighter than the background. This could require to perform an inversion of
the gray levels of the image. The following step will calculate the VFC using this term:
m1 (x, y) = c1 (x, y)m(x, y)
k1 (x, y) = m1 (x, y)n(x, y)
fvfc2 (x, y) = f1 (x, y) k1 (x, y)
with the edge map given by f1 (x, y) = |(G2 I)|. The results of the VFC using the
anisotropic tailoring on the image shown in Fig. 2.4(a) is depicted in Fig. 2.4(f).
It should be noted that in the normal VFC, r is a parameter while the size of the anisotropic
kernel is always the same of the image.

2.3. EXTENDED TOPOLOGICAL ACTIVE NET

73

1.0

11.5

22.0

32.5

43.0

53.5

64.0

Standard VFC

1.0

(a)

11.5

22.0

(b)

32.5

43.0

53.5

64.0

53.5

64.0

(c)

53.5
43.0
32.5
22.0
11.5
1.0

1.0

11.5

22.0

32.5

43.0

53.5

64.0

Anisotropic VFC

64.0

Displacement vector map

22.0

32.5

43.0

53.5

64.0

1.0

32.5

43.0

(f)
Anisotropic VFC and Sobel Convex Mix

22.0

32.5

(g)

43.0

53.5

64.0

53.5
1.0

11.5

22.0

32.5

43.0

53.5
43.0
32.5
22.0
1.0

11.5

22.0

32.5

43.0

53.5

64.0

(e)
Standard VFC and Sobel Otsu Mix

11.5

11.5

22.0

(d)

1.0
1.0

11.5

Sobel vector field


64.0

11.5

64.0

1.0

1.0

11.5

22.0

32.5

(h)

43.0

53.5

64.0

1.0

11.5

22.0

32.5

43.0

53.5

64.0

(i)

Figure 2.4: (a) original image; (b) Sobel gradient of (a); (c) standard VFC of (a); (d) displacement vector map of the inverse of (a); (e) anisotropic term derived by (d); (f) anisotropic VFC
(EVFC) of (a) using (d); (g) gradient of (b) ; (h) mix of (g) and (c) using the fixed threshold ;
(i) mix of (g) and (f) using the convex combination.
Mixing techniques According to [129], in some situations a leakage of the DM over weaker
edges could appear. To handle this, two techniques were introduced in [185] to mix the VFC
and the standard gradient of the edge map:
the fixed threshold, where a threshold determines the transition between the two fields;
the convex combination, that consists of taking a convex combination between the normalized VFC and the standard gradient of the edge map.

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


74
AND GLOBAL OPTIMIZATION
A result of the mix of the vector fields shown in Fig. 2.4(g) and Fig. 2.4(c) using the former
method in given in Fig. 2.4(h). Conversely, Fig. 2.4(i) is the result of mixing the vector fields
shown in Fig. 2.4(g) and Fig. 2.4(f) using the latter method.
The fixed threshold technique, introduced in [129], is defined as
f (x, y)
fvfc (x, y)

fmix1 (x, y) =

if |f (x, y)|
if |f (x, y)| <

where is a threshold that determines the edges to preserve. Threshold selection methods for
gray-level images, such as the Otsu method [174], can be employed to determine the threshold
by treating |f (x, y)| as a gray-level image. A result of the mix of the vector fields shown
in Fig. 2.4(g) and Fig. 2.4(c) in given in Fig. 2.4(h).
The convex combination technique, introduced in [185], performs the convex combination
between the normalized VFC and the gradient of the standard gradient of the edge map as
follows:
fmix2 (x, y) = h(|f (x, y)|)f (x, y) + g(|f (x, y)|)fvfc (x, y)
2

where g(|f (x, y)|) = e(|f (x,y)|/K) and h(|f (x, y)|) = 1 g(|f (x, y)|). The large gradient values correspond to the edges. The h function will preserve the standard external force
near to the edges whereas the g function will preserve the VFC field farther away from the
edges. The K parameter will determine the strength of the edges taken into account. For
large values of K only the strongest edges will keep their influence, whereas for less large
values the weaker edges too. Typical values for K are in the range [0.01, 0.5]. A result of the
mix of the vector fields shown in Fig. 2.4(g) and Fig. 2.4(f) in given in Fig. 2.4(i).
New energy definition based on extended vector field convolution We observed that it is
possible to derive a better expression of the gradient distance term used in the TAN model
from the VFC and, in particular, from the Extended Vector Field Convolution (EVFC). With
this aim, we adapted the vector field in the following way.
First of all, in [129] and [185], the authors generated normalized fields, that is, fields whose
vectors have a unitary length. In other words, they only use the information related to the
direction and do not take into account the length, or strength, of the vectors in the field.
This information, however, can be profitably exploited in DM optimization algorithms, so
we developed a different approach. Instead of imposing the length of every vector in the
field to be 1, we impose the mean of the length of the vectors in the field to be 1, that is
1
wh

|v| = 1,

(2.1)

vF

where v is a vector in the vector field F , and w and h are, respectively, the width and the
height of the image I (there are wh vectors in a vector field generated from I).
However, when mixing the gradient of the edge image and a EVFC, a problem arises.
While the EVFC (Fevfc ) is a dense field, that is, a non-zero length vector is associated to every
point in the field, the vector field FG of the edge image, f (x, y), is sparse, that is, there
are many points in the fields to which an almost zero length vector is associated. Therefore,
applying Eq. (2.1) to the gradient of the edge image vector field would lead to abrupt changes
in the resulting field when mixing it with a EVFC field.

2.3. EXTENDED TOPOLOGICAL ACTIVE NET

75

32

64

96

128

Anisotropic VFC and Sobel Otsu Mix

32

(a)

64

96

128

(b)

(c)

(d)

Figure 2.5: (a) original image; (b) mix of the EVFC of (a) and the Sobel field of (a); (c) energy
term derived from (b); (d) equalized version of (c).
We propose to solve this by multiplying every vector in FG by a scaling factor
s = EVFCOtsu /GOtsu ,
where
EVFCOtsu =

1
n

|v|,

GOtsu =

vFevfc ,
|f (v)|>thOtsu

1
n

|v|,

vFG ,
|f (v)|>thOtsu

v is a vector belonging to a vector field, thOtsu is the Otsus threshold [174] calculated considering |f | as a gray-level image, and n is the number of points of f (x, y) for which the
condition |f (v)| > thOtsu applies. In this way, the mean length of the vectors of the gradient
vector field is imposed to be the same of the one in EVFC, for the vectors whose corresponding points have a value higher than the Otsus threshold in the gradient image. Mixing the
two vector fields this way, a smooth transition between the two of them is achieved.
As shown earlier, several authors have employed vector fields as external forces to adjust
the snake contour to the borders of the object. However, the adjustment of a TAN is guided
by energies rather than forces. Therefore, we need to derive an energy map from the vector
field. Since such external energy will drive the adjustment of the external nodes, it should
have low values close to the object contour and high values inside the object or far from it.
Therefore we propose to calculate a distance to gradient image DGevfc as
DGevfc (p) =

1
|v|

(2.2)

vw (p)

where v is a vector of the EVFC field belonging to the neighborhood of size w of the pixel p.
To fully take advantage of the smoothness of this term, we perform an equalization [84] of the
gray tones of the energy image. Fig. 2.5 shows an example of the construction of this image.
Therefore, the final energy functional will be the same shown in Section 1.1.1.3 changing the
term GD(v(r, s)), introduced in [102] and used by every posterior proposal, with the one
shown in Eq. 2.2.
2.3.1.2.

Topological changes

If the shape of the object(s) rises the need for cutting links and, eventually, changing the
topology of the TAN, it is necessary to properly adapt the structure of the net. As explained

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


76
AND GLOBAL OPTIMIZATION
h

w
(a)

(b)

(c)

(d)

Figure 2.6: (a) part of a net over the object: the link highlighted in red is going to be cut; (b)
the areas over which the link energy are calculated in the EVFC and original images; (c) the w
and h parameters that define the size of the area A; (d) the resulting net after the link cutting.
in Sec. 1.1.1.3, the previously existing solution [102] was to perform cuts of links between the
adjacent external nodes with the highest gradient distance.
Although this solution has given acceptable results, it does not take into account the underlying image, only the shape of the net. Moreover, it cannot open holes into the mesh to
perform the segmentation of object holes. In order to solve these issues, we have developed a
novel approach to tackle topological changes in ETANs. The following paragraphs introduce
our new approach.
Cutting links The link to be cut is chosen depending on the energy the link is withstanding:
the higher the energy, the higher the possibility to cut the link. The underlying idea is to cut
the links which bear an energy1 higher than a threshold. This threshold is not fixed, in fact
it depends on the mean energy the links of a mesh are experiencing.
The energy of a link is calculated as

Elink =

pA

DGevfc (p)
|A|

I(p)
Imax

(2.3)

where p is a pixel belonging to the area A over which the energy is computed, DGevfc is the
EVFC distance to gradient image, I is the original image, Imax is the maximum intensity value
in the original image, and |A| is the size of the area A. This area is defined by the w and h
parameters; while h can be set to zero, w has to be non-zero. Fig. 2.6 illustrates the cut of a
mesh link.
The link cutting procedure calculates the energy of all the links that can be cut in the
mesh, derives the mean energy Emean a link in the mesh is withstanding, and cuts the most
stressed one (with energy Emax ) if Emax /Emean > thcut . Robust values (experimentally found)
for thcut are around 3.
Not all links in a mesh can be cut, even if they bear a high energy. The reason for this is that
the topology of the net can be damaged, as explained in Sec. 1.1.1.3 and [102]. Therefore the
only links whose energy is measured and contributes to Emean are those which can actually
be cut. In order to be a candidate link for the cut, a node has to pass some cuttability tests:
The side links test. First of all, the two links at the sides (in the same line) of the link to
be cut must exist, otherwise a wire would appear. One of the conditions to consider a
1

The energy a link is withstanding can be thought as the tension that link is bearing.

2.3. EXTENDED TOPOLOGICAL ACTIVE NET

77

link as a wire is to connect a node which has only one neighbor. Fig. 2.7(a) depicts this
test.
The external nodes test. A link to be cut has to connect two external nodes. The cut
of external-internal and internal-internal links are not allowed because it would open
holes in the mesh. Fig. 2.7(a) shows internal and external nodes of a small mesh (respectively in blue and white). Opening of holes in the mesh is performed to segment
holes in the objects but it is managed in a different way and explained in Subsection
2.3.1.2. This test is fast and simple, however, it is not sufficient to prevent the opening
of non desired holes in a mesh.
The existing faces test. It is possible that although a link connects two external nodes, its
removal would lead to a hole in the mesh, as the yellow link in Fig. 2.7(a). To prevent
this and other undesirable statuses, we introduced the idea of face. A face is a zone
of the mesh delimited by four nodes (and the corresponding four links between pairs
of them). It can be full, if all the four links connecting the nodes exist, or empty, if at
least one is missing. Therefore, removing a link whose lateral faces are both full is not
allowed.
The allowed faces test. When removing a link, if the previous test is passed, it is necessary to test for not allowed configurations in a search window of 3 4 faces (see Fig.
2.7(b)). Inside the search window, we test that all the six configurations of 2 2 faces
subwindows are different from the incorrect ones, in case the link had been removed.
The 2 2 faces incorrect configurations are shown in Fig. 2.7(c). All the remaining possible configurations are allowed. It is possible to cut the link only if the configurations
of all of the six subwindows are allowed (see Fig. 2.7(d)).
Once a link has been cut, it is necessary to perform a further test, the topology test. In some
cases, only two links and the face between them connect two subnets, like the ones in Fig.
2.7(e). If one of the two links is removed, the face would be set empty and the other link
would become a wire. To avoid this, four faces in the same column (two one each direction) of
the removed link are considered, and every link laying between two empty faces is removed,
successfully cutting the possible wires. This test is recursively applied to subsequent cuts and
can lead to the separation of two subnets, changing the topology of the net, as the example
shown in Fig. 2.7(e).
Hole segmentation TANs have the ability of segmenting holes in the object(s) due to the
presence of internal nodes. In fact, the holes are recognized starting from misplaced internal
nodes. For every internal node n, r(n) is calculated as
r(n) =

Eext (n)
,
Eext (n) + Eint (n)

(2.4)

where Eext (n) is the external energy of the node n and Eint (n) is the internal energy of the
node n. The idea of Eq. 2.4 is that a misplaced internal node (in a hole) has a high external
energy in comparison with its internal energy. The node nh with the highest ratio is selected
and if r(nh ) > thholes , a hole is opened in the net starting from this node. If this is the case,
the values of the energy of the links connecting nh and its neighbors are calculated and the
highest one is chosen. This internal link is removed and the topology test is applied, possibly
removing wires.

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


78
AND GLOBAL OPTIMIZATION
f3
f1
f2

f1

f4

f2

(a)

(b)

(c)

(d)

(e)

Figure 2.7: In all the figures above dashed lines represent already cut links and white areas
represent empty faces. (a) the side links test: the link to be cut is drawn in red while the
side links are in green: to cut the red link, the green links have to exist; the external nodes
and existing faces tests: the yellow link cannot be cut even if it is between two external nodes
because it lays between two full faces (f1 and f2). (b) The link whose removability is being
tested is drawn in red while the 3 4 faces search window has pink background. The empty
faces are depicted in lighter pink. (c) the three incorrect configurations. No link is shown
as cut because it does not matter the origin of the empty faces. (d) two of the six 2 2 faces
subwindows of (b), in case the red link shown in (b) had been cut. The green subwindow is
allowed while the blue one is not. This means that cutting the red link of (b) is not allowed.
(e) the topology test. In this case, after the red link had been cut and face f2 had been set empty,
only the green link will connect the two subnets (yellow and blue nodes, respectively) and it
would be considered a wire. To avoid wires, after removing a link, faces f1, f2, f3 and f4 are
tested and every link between two empty faces is removed. If f3 or f4 do not exist (because f1
or f2 are the last faces before the end of the net) they are considered empty faces.

2.3.2.

Optimization process

The adjustment of the mesh to the object is a procedure that comprises several steps. After
the mesh is initialized to the starting position, the adjustment process begins. The first step
comprises the application of an extended version of the BILS algorithm, EBILS, that optimizes
the position of every node in the mesh. To do so, a square window centered on each node,
and a location (i.e. a pixel on the image) with a lower energy is searched for it, according to
the energy functional of Section 1.1.1.3. If a better position is found, the node is immediately
moved to this location. The search is performed sequentially from the first to the last node of
the net and the whole search is repeated until no node can be further moved. When it is not
possible to stretch or compress the mesh anymore, the link cutting procedure is activated.
This procedure, affected by several constraints, cuts the links located in high energy areas
and allows the mesh to adapt itself to the shape or topology of the object. A specific heuristic
procedure is called after this phase to correct the position of misplaced nodes. At this point
the mesh should be adjusted to the contour of the object and it is now possible to segment the
holes which eventually exist inside the object. The last step of the process is the activation of
a less constrained version of the cutting procedure to finalize the segmentation. Algorithm
2 shows the full mesh adjustment procedure while Fig. 2.17 depicts the overall scheme. The
next subsections explain the different phases of the segmentation process in more detail.

2.3. EXTENDED TOPOLOGICAL ACTIVE NET

79

ETAN

Segmentation

Optimization
EBILS
Misplaced nodes correction
Fixnet heuristic

Optional tasks

Input image

Image ltering
Automatic net
initialization

Topological changes
Cutting links
Holes segmentation
No topology restrictions

Figure 2.8: The complete ETAN segmentation process.

(a)

(b)

(c)

(d)

Figure 2.9: (a) a misplaced internal node. (b) the node in (a) is repositioned. (c) a misplaced
external node: the red node lays on the left of the first three vectors (black arrows) but it is on
the right of the last one. (d) the node in (c) is repositioned.

2.3.2.1.

Extended Best Improvement Local Search

As said, the BILS algorithm looks for the best position of each node inside a specific window. However, one of its drawbacks described in [9] is the absence of constraints to limit the
position of the moving nodes with respect to the other nodes in the mesh. This can lead to
crossings of the links connecting nodes. In order to avoid that, the first step of our extended
BILS search (EBILS) is testing if the node is located outside the safe area, that is the polygonal
area delimited by the node neighbors. While for an internal node this reduces to testing if
a point is inside or outside the polygonal area, for an external node the safe area is not limited in one or more sides. Thus, we test if the target node lays on the left side of the vectors
connecting the neighbors of the target node, in anticlockwise direction. If a node (internal or
external) lays outside of its safe area, it is moved to a position that is calculated as the mean
position of its neighbors. Fig. 2.9 shows two misplaced nodes and their repositioning.
Once the position of a node has been tested and eventually corrected, the EBILS search

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


80
AND GLOBAL OPTIMIZATION
calculates the local energy the target node would have if located on the pixels belonging to
a square neighborhood of size Sneighs of pixels centered on the current location. This EBILS
search only considers positions in the safe area and if it finds a location with a lower energy
than the current one, it moves the node to its new position.
Finally, it iterates on all the nodes of the mesh (a pass) until no node was moved in the last
pass. Algorithm 3 shows this procedure.

2.3.2.2.

Cutting links procedure

Once it is not possible to improve the energy of the net just relocating its nodes, the cutting
links procedure begins. This procedure chooses a link to be cut (if it exists) to improve the
net adjustment to the object. If it is the case, the EBILS node movement procedure described
in Sec. 2.3.1.2 is run after the cut. The cutting links procedure ends when it is not possible to
cut more links. Algorithm 4 shows this procedure.

1
2
3
4

1
2
3
4
5
6
7
8

1
2
3
4
5
6
7

Data: The pre-processed input image, the energy image


Result: The segmented image
EBILS_with_cuts();
correct_positions();
segment_holes();
EBILS_no_constraints();
Algorithm 2: Mesh full adjustment procedure.

Enew = calculate_energy(net);
do
Ecurrent = Enew ;
forall the nodes do
find_better_position(node, window_size);
end
Enew = calculate_energy(net);
while Enew > Ecurrent ;
Algorithm 3: EBILS without cuts.

EBILS_without_cuts();
do
cut_link = select_and_remove_link(net);
if cut_link then
EBILS_without_cuts();
end
while cut_link;
Algorithm 4: EBILS with cuts.

2.3. EXTENDED TOPOLOGICAL ACTIVE NET

(a)

(b)

(i)

(c)

(d)

(j)

(k)

81

(e)

(f)

(l)

(g)

(h)

(m)

Figure 2.10: An example of the Fixnet heuristic. (a) a close-up of an area in which the adjustment of the mesh to the object is good apart from two gradient-misplaced nodes; (b)-(d)
different steps of the Fixnet heuristic on the original image; (e)-(h) steps on the DGevfc image;
(i)-(l) steps using a net constructed by a weighted average of the segmentation net and the
topology net, for the sake of clarity; (m) a close-up of the topology net of the segmentation
net of (a). The topology net has the same structure of the segmentation net but its nodes are
uniformly spaced.

2.3.2.3.

Fixnet heuristic

After the EBILS application, it can happen that some nodes are gradient-misplaced. Those
nodes are topologically part of a subnet but are located far from it, in a position closer to
another subnet. This can happen if, after some cuts, the misplaced nodes followed the energy
gradient and were attracted far away from their neighbors. Moreover, these nodes cannot be
cut because they did not pass the cuttability tests. The gradient-misplaced nodes are recognized
measuring the energy and the length of every link in the mesh connecting pairs of external
nodes. The links are ordered by length and the mean of the energies and the median of the
lengths are calculated. If the length of a link is more than thlinklength times the median length
and thlinkenergy times the mean energy, then one of the two nodes connected by the link is
considered as gradient-misplaced.
Our solution is to move the misplaced nodes towards their good placed neighbors. Since it
is not easy to detect which of the two nodes of the link is gradient-misplaced, we move the node
that has the lowest number of neighbors. If both nodes have the same number of neighbors,
the neighbors of neighbors are calculated and the node with the lower value is moved. If
that number is still the same, the node with the higher energy is moved. This heuristic is
applied to every link that complies the conditions illustrated above. The whole procedure is
applied if at least one node moved in the previous iteration up to a max of thlinkiters iterations.
After this procedure, a pass of the EBILS is applied to finely adjust the position of the moved
nodes. Robust values (experimentally achieved) for thlinklength , thlinkenergy and thlinkiters are 5, 3
and 5, respectively. Fig. 2.10 illustrates an an example of this procedure.

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


82
AND GLOBAL OPTIMIZATION

(a)

(b)

(c)

(d)

(e)

(f)

Figure 2.11: (a)-(c) the two steps of the procedure on the original image; (d)-(f) the steps using
a net constructed by a weighted average of the net and the topology net (not shown).
2.3.2.4.

Hole segmentation

After the procedure shown in Section 2.3.1.2, a complete EBILS with cuts enabled is applied to the mesh. Moreover, up to thlinkiters iterations of the Fixnet heuristic are applied to the
net. This whole procedure is repeated for as many internal nodes have a r() ratio (Eq. 2.4)
above the thholes threshold. Since an EBILS and the heuristic have been applied, it is necessary
to recalculate the nodes energies again at every pass of this procedure.
2.3.2.5.

Local search with no topology restrictions

After the holes have been segmented, the net is ready for the last step of the segmentation
algorithm. Occasionally, a good segmentation is spoiled by some links that cannot be cut
because of the link cut topology restrictions. An example of this is shown in Fig. 2.11(a).
Here the net is well adjusted to the object apart from three links that cannot be cut, as shown
by the topology net in Fig. 2.11(d).
In order to solve this, the last step of our segmentation process is to run the EBILS with
cuts enabled and without the topological restrictions described in Subsection 2.3.1.2. Since
this way it is possible that the net ends up with wires, we developed a correction procedure.
This procedure removes every link between two empty faces and links connecting nodes with
only one neighbor. Figs. 2.11(b,e) show, respectively, the segmentation and the topology nets
after applying the EBILS without restrictions while Figs. 2.11(c,f) show the two nets after
applying the correction procedure. Note the presence of a free node (i.e. with no neighbors)
in the net, as shown in Fig. 2.11(f). Since a free node has no neighbors, it is removed from the
net and it does not contribute to the energy of the net in any way.

2.3.3.

Complementary tasks

The following subsections are not strictly related to the proposed ETAN model. In contrast, they deal with the image filtering process and the net initialization, tasks that depend
on the kind of images to segment.
2.3.3.1.

Image filtering

The gradient image, which is also the starting point of the distance to gradient image, is
usually constructed applying an edge detector over the the original image. As an alternative,
we propose to use a K-means clustering [139] generated pre-segmentation. The two introduced methods have pros and cons. While the K-means based distance to gradient image can
provide cleaner images, in particular in the case of synthetic image with fuzzy borders or real

2.3. EXTENDED TOPOLOGICAL ACTIVE NET

(a)

(b)

(c)

(d)

83

(e)

(f)

(g)

Figure 2.12: (a) original image; (b) resulting image after clustering image (a); (c) resulting
segmentation from (b); (d) gradient image after applying the Canny edge detector to (a); (e)
gradient image obtained from (c); (f) distance gradient image obtained from (d); (g) distance
gradient image obtained from (e).
images affected by noise, it is heavily based on the quality of the K-means pre-segmentation
result.
In our case, every pixel of the image is assigned by K-means to a cluster, on the basis of
its gray tone [127]. In order to reduce the noise of the obtained segmentation, the original
image undergoes a mean filtering by means of a 10 10 kernel.
K-means assigns every pixel in the image to a specified cluster. A result of this clustering
procedure for the example medical image in Fig. 2.12(a) is shown in Fig. 2.12(b), where each
one of the three gray tones identifies a cluster. The smallest, brightest one corresponds to the
target object. However, it is not possible to know a priori which cluster corresponds to the
target segmentation object. Therefore, we take as object the cluster with the most compact
spatial location, that is the one given by
1
arg min
|c|
cC

|c|
2

dE (pc,i c )
i=1

1
, with c =
|c|

|c|

pc,i ,
i=1

where C is the set of clusters, pc,i is the i th point of cluster c, c is the centroid in image
space (and not in the gray level space) of the cluster c, and the function dE () is the Euclidean
distance.
Once the cluster t corresponding to the target object is known, the K-means segmentation
image is created. A pixel ps of this image has value 0, if pk t and 255, if pk
/ t, where pk
is a pixel of the K-means cluster image. The resulting K-means segmented image obtained
applying the algorithm to the image in Fig. 2.12(a) is shown in Fig. 2.12(c).
Then, applying an edge detector and the EVFC procedure to this image, it is possible
to obtain the gradient and the distance gradient images shown in Figs. 2.12(e) and 2.12(g).
These images are affected by a lower amount of noise than the corresponding ones shown in
Figs. 2.12(d) and 2.12(f) but the object contours are less precise.
2.3.3.2.

Net initialization

The net is initialized by uniformly sampling all the nodes of the mesh between two endpoints, the upper left node and the lower right node. The two endpoints can be chosen manually, automatically, or to cover the whole image. With the manual initialization, the user has
to choose two points on the image in such a way that the mesh will surely cover the object
to segment. With the automatic initialization the endpoints are chosen using the K-means
segmentation image. For this purpose, a morphological opening and a closing are applied to

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


84
AND GLOBAL OPTIMIZATION
the image in order to remove small artifacts. Then, a bounding box is calculated around the
object. Finally, the two endpoints are inferred from the bounding box.

2.4.

Extended Topological Active Net model performance study

2.4.1.

Experimental design

To evaluate the actual performance of our new ETAN model proposal, we benchmark
it against five different segmentation algorithms: two state-of-the-art snakes models, GVF
and VFC; two state-of-the-art LS models, CaV and GAC; and the original TAN. The final aim
of selecting such a different set of DM-based image segmentation models is to show that
ETAN can actually improve previous the TAN formulation and, at the same time, it can be
competitive when dealing with state-of-the-art parametric and geometric DMs.
The two snake models could be considered the state-of-the-art parametric DMs. While
the former is the most extended parametric model the latter has been recently proven to outperform it in several scenarios [129]. The CaV LS model can detect objects whose boundaries
are not necessarily defined by the gradient as the LS minimizes an energy which can be seen
as a particular case of the minimal partition problem. Finally, the GAC is based on the relation
between active contours and the computation of geodesics or minimal distance curves.
In this experimentation we consider the use of a K-means pre-segmentation to generate
the EVFC employed by the external energy term of our algorithm (see Sec. 2.3.3.1). Although
this is just a pre-processing, it succeeds in filtering the image and simplifies the segmentation process. In order to perform a fair comparison, we also evaluated the other algorithms
providing them with the same pre-processing step.
We tested our method on two different image datasets, one of them composed of synthetic images and the other of real-world medical images. The synthetic images show various difficulties and have a ground truth allowing us to properly evaluate the segmentation
performance.
The results of the K-means clustering are highly dependent on the initial number of clusters, that is the value of the K parameter. Experimental tests have proven that the best results
are obtained with K = 2 for synthetic images (which are fundamentally composed of only
two areas, the object and the background), and with K = 3 for the medical images in our
dataset.
In order to quantitatively assess our results on the dataset, we compute the spatial accuracy index S, which is a similarity index based on the overlapping rate between the segmentation result and the ground truth [243]:
S =2

Card(R T )
,
Card(R) + Card(T )

(2.5)

where R is the segmentation result, T is the ground truth, and Card(X) is the cardinality of
the X set, that is the number of pixels it contains. Therefore, this index is dimensionless and
varies in the range [0, 1]. The higher its value, the better.
We also compute the mean distance MdRT between the contours of the segmentation results and the ground truth as well as the mean distance MdTR between the ground truth and

2.4. EXTENDED TOPOLOGICAL ACTIVE NET MODEL PERFORMANCE STUDY

85

the segmentation results:


min d(r, t)

min d(r, t)

MdRT =

rR tT

Card(R)

MdTR =

tT rR

Card(T )

where d(x, y) is the Euclidean distance.


Finally, we compute the Hausdorff distance [100], that is the maximum distance between
the two contours:
dH (R, T ) = max sup inf d(r, t), sup inf d(r, t) .
rRtT

tT rR

(2.6)

The latter three distances are measured in pixels. While zero is the lower bound, the upper
bound depends on the size of the image. In any case, the lower the value, the better the
obtained segmentation. Among the three distance metrics, dH (R, T ) is the most relevant one
since it measures the worst case of the segmentation. A low value in this metric implies per
se an effective segmentation.
The two snakes codes are the original MATLAB R implementations released by the authors. The two LSs (obtained from the Free Software Ofeli [19]) are C++ implementations of
the Shi and Karl Fast-Two-Cycle (FTC) algorithm [200]. The original TAN is implemented in
C while the code relative to our proposal has been implemented in C++. All the tests were
run on an Intel R CoreTM 2 Quad CPU Q8400 at 2.66 GHz with 4 GB DDR3 RAM.
The TAN and ETAN parameters used for all the 18 images in the synthetic images dataset
are shown in Table 2.1. As shown, the ETAN has all the parameters of the TAN and some
new ones. Although this could suggest that tuning an ETAN is harder than a TAN, we experimentally arrived at the opposite conclusion. Indeed, in order to achieve an appropriate
performance for the TAN, there is a need to perform a specific experimental study testing different values for each parameter in a wide range, for each image. For instance, for the TAN
case, using a net size of 30 30 nodes provided poor results and it was necessary to manually
set this parameter to lower vales, different for each image. Conversely, in the ETAN case we
simply fixed the values for almost every parameter. Only the last two ETAN parameters were
changed, and only for a few cases. No changes were done to parameters of the two snakes
([129,231]), which are shown in Table 2.2. Finally, Table 2.3 lists the parameters values relative
to the LSs. While the GAC did not require any adjustment (since no performance difference
was detected on the used image datasets changing the morphological gradient structuring
element size parameter - see Section 2.4.2), the CaV algorithm benefitted from the extensive
adjustment of the two shown parameters we developed.
The synthetic images of our dataset are divided into six categories:
1. images with concavities or complex shapes;
2. images with fuzzy borders;
3. images with noise;
4. images with holes;
5. images with multiple objects; and
6. images with a combination of several difficulties.

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


86
AND GLOBAL OPTIMIZATION
The three images in every category are listed in increasing difficulty order. All the images of
this dataset have a size of 375 375 pixels and are shown in Fig. 2.13 (they are shown with
the resulting ETAN superposed).
Parameter

TAN

ETAN

Net size
evf c (as in [129])
thcut
thholes
Sneighs

[0.1, 60]
[0.0001, 5.5]
[1.0, 4.0]
[1.0, 6.0]
[1.0, 2.0]
[2.0, 10.0]
[10 10, 20 20]
-

3.0
3.0
3.0
1.0
3.0
15.0
30 30
2.5
3.4
[0.65, 1.0]
[1, 5]

Table 2.1: TAN and ETAN parameters.

Parameter

GVF-snake

VFC-snake

k
VFCradius

0.05
0.0
1.0
0.5
-

0.5
0.0
1.8
300

Table 2.2: The two snakes parameters.

Parameter

CaV-level set

GAC-level set

in
out
SE size

[1, 71]
1
-

Table 2.3: The two LSs parameters.

2.4.2.

Image datasets

Finally, we also evaluated the six algorithms on three real-world medical images, which
are shown in Fig. 2.14. These images are extracted from a dataset without ground truth.
Therefore we only assess the segmentation accuracy in a qualitative way.

2.4. EXTENDED TOPOLOGICAL ACTIVE NET MODEL PERFORMANCE STUDY

(a) cc1

(b) cc2

(c) cc3

(d) fb1

(e) fb2

(f) fb3

(g) n1

(h) n2

(i) n3

(j) h1

(k) h2

(l) h3

(m) mo1

(n) mo2

(o) mo3

(p) c1

(q) c2

(r) c3

87

Figure 2.13: ETAN results on: (a)-(c) images with concavities and/or complex shapes; (d)-(f)
images with objects with fuzzy borders; (g)-(i) images affected by noise; (j)-(l) images with
holes; (m)-(o) images with multiple objects; (p)-(r) images that combine several difficulties.

2.4.3.

Analysis of the obtained results

Fig. 2.13 shows the resulting ETANs adjusted over the original images, while the results
achieved by the six methods with and without the K-means pre-segmentation are resumed
using boxplots in Fig. 2.15. Table 2.4 shows the numeric results of the five metrics for the six
methods on every synthetic image. The table also shows the mean () and the standard deviation () for all the algorithms and for every metric on the whole synthetic images dataset.
For every image, the best result for every metric is highlighted in boldface.
As it can be seen in Table 2.4, our proposal achieves the highest mean index S, the lowest
mean Hausdorff distance, and the lowest mean MdT R distance, that is, the ground truth image
is closer to the ETAN segmentation result than any other one. Conversely, the mean mdRT
distances provided by the two snakes using the K-means pre-segmentation and the CaV LS
without using it are smaller than the ETAN result. This means that those solutions, when
able to find the right contour, can adjust to it in a slightly better way (0.3, 0.49 and 0.86 vs 1.27)
than ETAN. Regarding the time, the fastest algorithm is the TAN, followed by the LSs and our
ETAN proposal. It is worth to note that the Shi LS implementation used in Ofeli is significantly
faster than typical LSs, with speedups up to 100 compared to other implementations [200].
Moreover, we must note that the times shown for the TAN and the LSs do not include the kmeans pre-segmentation phase, when applicable. Moreover, the TAN (regardless how it was
initialized) never achieved the best segmentation. The two snakes algorithms are from 10
to 70 times slower than ETAN. Below, we provide a detailed analysis of the obtained results
divided by categories.

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


88
AND GLOBAL OPTIMIZATION

(a) TAN-k

(b) TAN-k

(c) TAN-k

(d) TAN-c

(e) TAN-c

(f) TAN-c

(g) GVF-k

(h) GVF-k

(i) GVF-k

(j) GVF-c

(k) GVF-c

(l) GVF-c

(m) VFC-k

(n) VFC-k

(o) VFC-k

(p) VFC-c

(q) VFC-c

(r) VFC-c

(s) CaV-k

(t) CaV-k

(u) CaV-k

(v) CaV-c

(w) CaV-c

(x) CaV-c

(y) GAC-k

(z) GAC-k

(aa) GAC-k

(ab) GAC-c

(ac) GAC-c

(ad) GAC-c

(ae) ETAN-k

(af) ETAN-k

(ag) ETAN-k

(ah) ETAN-c

(ai) ETAN-c

(aj) ETAN-c

Figure 2.14: The results on the medical images obtained by: the TAN with (a-c) or without
(d-f) the K-means pre-segmentation; the GVF snake with (g-i) or without (j-l) the K-means
pre-segmentation; the VFC snake with (m-o) or without (p-r) the K-means pre-segmentation;
the CaV level set with (s-u) or without (v-x) the K-means pre-segmentation; the GAC level
set with (y-aa) or without (ab-ad) the K-means pre-segmentation; our ETAN proposal with
(ae-ag) or without (ah-aj) the K-means pre-segmentation.

2.4. EXTENDED TOPOLOGICAL ACTIVE NET MODEL PERFORMANCE STUDY


Complex shapes
Fuzzy contours
Approach Metric
cc1
cc2
cc3
fb1
fb2
fb3
Time
0.340 0.100
0.710 0.240 0.220
0.760
Index S
0.857 0.807 0.831 0.805 0.847 0.916
TAN
MdRT
4.656 5.822 4.067 15.770 12.136 5.160
MdTR
13.906 22.895 18.182 15.200 11.523 4.291
dHausdorff 66.910 81.000 73.784 26.172 20.518 28.284
Time
0.330
0.120 0.470
0.270 0.230 0.340
Index S
0.857 0.789 0.780 0.984 0.987 0.857
TAN
MdRT
4.693 5.690 7.094 1.152 0.978 4.731
+ k-means
MdTR
13.985 21.327 23.809 1.155 0.973 14.043
dHausdorff 67.231 78.638 80.498 3.000 2.000 67.082
Time
195.545 213.977 218.230 180.562 153.322 207.745
Index S
0.928 0.994 0.943 0.973 0.996 0.913
GVF snake MdRT
0.935 0.318 0.336 1.916 0.320 1.891
MdTR
6.512 0.324 4.105 1.471 0.306 7.384
dHausdorff 55.471 1.000 50.990 7.211 1.414 56.569
Time
197.921 218.970 227.280 142.507 156.135 191.047
Index S
0.928 0.993 0.941 0.987 0.989 0.928
GVF snake
MdRT
0.987 0.371 0.438 0.991 0.871 0.986
+ k-means
MdTR
6.446 0.388 4.213 0.991 0.865 6.581
dHausdorff 55.109 3.000 50.990 2.236 2.000 55.471
Time
44.308 58.270 43.767 15.943 13.149 46.667
Index S
0.995 0.975 0.946 0.921 0.996
0.980
VFC snake MdRT
0.285 0.727 0.369 5.243 0.285
1.064
MdTR
0.289 2.391 3.658 4.890 0.271
0.921
dHausdorff 1.000 35.128 51.000 8.062 1.414 5.385
Time
46.447 58.236 44.210 17.241 15.628 45.905
Index S
0.994 0.974 0.943 0.988
0.991 0.994
VFC snake
MdRT
0.342 0.750 0.483 0.883
0.702 0.349
+ k-means
MdTR
0.348 2.470 3.904 0.882
0.699 0.351
dHausdorff
2.000 35.735 51.000 2.000 1.000
1.414
Time
3.476 3.609 3.656 2.701 2.795 3.605
Index S
1.000 1.000 0.999
0.987 0.990 0.992
CaV
MdRT
0.013 0.013 0.045
0.962 0.809 0.435
MdTR
0.013 0.021 0.056
0.935 0.804 0.428
dHausdorff 1.000
2.236 2.236
3.000 1.414 1.414
Time
1.843 1.922 1.869 1.530 1.532 1.850
Index S
0.998 0.998 0.994 0.986 0.989 0.996
CaV
MdRT
0.114 0.092 0.241 1.031 0.846 0.252
+ k-means
MdTR
0.118 0.112 0.263 1.024 0.842 0.251
dHausdorff 1.000
4.000 2.236 2.000
1.414 1.000
Time
3.728 4.018 3.738 7.556 3.290 3.784
Index S
0.998 0.999 0.995 0.979 0.990 0.982
GAC
MdRT
0.119 0.076 0.179 1.551 0.751 0.947
MdTR
0.117 0.080 0.187 1.452 0.738 0.903
dHausdorff 1.000
2.000 2.236
6.000 2.000 4.000
Time
2.270 2.367 2.238 1.874 1.881 2.237
Index S
0.996 0.997 0.991 0.985 0.988 0.994
GAC
MdRT
0.227 0.142 0.344 1.114 0.901 0.374
+ k-means
MdTR
0.230 0.162 0.368 1.105 0.895 0.374
dHausdorff
1.414 4.000 2.236
2.236 1.414 1.414
Time
2.752 2.707 2.926 2.484 2.501 2.780
Index S
0.980 0.984 0.971 0.971 0.977 0.972
Proposed
MdRT
1.060 0.834 1.043 2.050 1.628 1.451
ETAN
MdTR
1.062 0.902 1.102 2.053 1.619 1.411
dHausdorff
6.083 8.944 6.083 5.831 2.828 5.000

n1
0.530
0.708
11.597
23.945
70.000
0.790
0.866
4.219
12.861
67.469
363.880
0.797
14.005
13.802
80.399
161.878
0.928
0.974
6.442
55.109
46.361
0.997
0.145
0.148
1.000
48.262
0.994
0.340
0.347
2.000
3.391
1.000
0.013
0.013
1.000
1.825
0.998
0.139
0.144
1.000
3.777
0.997
0.153
0.152
1.000
2.220
0.996
0.245
0.246
1.414
4.965
0.981
1.044
1.053
3.606

Noise
n2
0.910
0.701
11.041
25.278
73.027
0.590
0.840
6.191
16.107
75.822
502.252
0.516
46.187
53.673
131.848
171.184
0.928
1.008
6.452
55.317
220.614
0.869
18.497
4.630
89.320
45.511
0.994
0.361
0.368
1.414
3.283
0.999
0.053
0.054
1.000
1.858
0.997
0.154
0.159
1.414
4.053
0.997
0.162
0.157
2.000
2.225
0.996
0.253
0.255
1.414
4.813
0.978
1.258
1.132
8.062

n3
0.650
0.662
24.917
34.721
108.259
0.570
0.839
6.368
16.186
76.792
704.513
0.502
49.657
54.164
130.361
180.635
0.928
0.983
6.489
55.109
196.642
0.815
21.705
8.850
102.956
43.758
0.994
0.357
0.363
1.414
3.516
0.999
0.068
0.068
1.000
1.855
0.997
0.170
0.176
1.000
4.305
0.996
0.220
0.220
2.828
2.230
0.995
0.273
0.275
1.414
8.850
0.979
1.132
1.139
4.243

h1
0.240
0.908
0.472
14.072
66.910
0.240
0.908
0.491
14.125
67.000
158.855
0.910
0.300
13.819
67.000
172.684
0.909
0.343
13.854
67.067
33.694
0.909
0.326
13.771
66.611
23.871
0.909
0.339
13.824
67.000
2.555
0.913
0.023
13.634
66.370
2.731
0.912
0.112
13.748
67.000
2.728
0.912
0.155
13.764
66.370
1.873
0.911
0.213
13.841
67.000
2.435
0.981
1.053
1.086
5.000

Holes
h2
0.280
0.862
0.422
19.365
82.201
0.280
0.862
0.469
19.484
82.801
163.579
0.863
0.309
19.082
81.609
169.167
0.863
0.332
19.175
82.000
25.585
0.863
0.322
19.012
81.400
20.993
0.863
0.329
19.147
82.000
3.700
0.944
0.040
5.104
45.354
1.955
0.941
0.239
5.298
45.607
2.731
0.865
0.155
19.110
82.000
1.856
0.864
0.270
19.261
82.000
2.584
0.973
1.095
1.086
5.000

h3
0.430
0.787
0.380
30.814
115.317
0.420
0.786
0.476
31.021
115.521
178.103
0.787
0.278
30.413
115.209
205.151
0.787
0.325
30.629
115.694
43.685
0.787
0.287
30.349
114.978
37.596
0.787
0.310
30.559
115.447
2.478
0.811
0.021
23.698
101.390
1.377
0.808
0.357
24.059
101.637
2.745
0.789
0.125
30.539
114.978
1.773
0.787
0.382
30.842
115.447
2.783
0.946
1.629
1.693
5.831

Multiple objects
mo1
mo2
mo3
0.190 0.140 0.110
0.875 0.807 0.585
4.053 4.396 6.767
5.121 10.060 23.240
33.242 46.065 77.666
0.190
0.160 0.120
0.881 0.806 0.570
3.820 4.404 6.243
4.831 10.219 21.967
31.828 46.872 68.007
157.305 169.177 215.885
0.994 0.923 0.716
0.327 1.070 1.724
0.273 1.335 13.655
17.889 18.111 61.717
178.796 153.478 204.867
0.994 0.923 0.716
0.282 1.100 1.745
0.284 1.377 13.611
1.000 18.111 62.626
23.047 27.859 115.186
0.994 0.993 0.864
0.314 0.302 1.101
0.288 0.279 2.641
17.000 4.243 39.825
21.210 22.190 114.931
0.994 0.992 0.865
0.338 0.332 1.118
0.310 0.308 2.616
8.485 10.630 38.600
2.950 3.025 3.156
0.999 0.999 0.998
0.046 0.025 0.055
0.047
0.025 0.059
1.000 1.000 1.414
1.620 1.660 1.692
0.997 0.997 0.993
0.164 0.135 0.203
0.166 0.136 0.210
1.000 1.000 1.414
3.197 3.209 3.508
0.998 0.997 0.996
0.109 0.112 0.123
0.107 0.106 0.121
1.000 1.000 1.414
1.971 1.977 2.159
0.995 0.995 0.991
0.236 0.217 0.238
0.234 0.210 0.240
1.414 1.000
2.000
3.452 2.912 5.284
0.983 0.977 0.959
0.822 0.906 1.092
0.811 0.855 1.058
3.162 5.831 9.220

Combinations
c1
c2
c3
0.980 0.410 1.780
0.832 0.720 0.779
5.078 4.598 4.109
9.139 28.664 15.681
40.718 104.019 74.148
0.330 0.140 0.390
0.779 0.723 0.769
4.905 3.128 4.425
22.495 28.604 29.001
91.679 108.000 102.201
251.152 270.732 243.693
0.841 0.786 0.826
1.905 0.656 1.532
12.930 15.700 15.978
66.843 74.007 63.325
220.745 226.878 211.217
0.851 0.786 0.825
1.528 0.702 1.602
11.978 15.961 16.211
66.219 74.007 63.891
77.392 96.841 85.071
0.951 0.822 0.943
0.749 0.401 0.386
3.813 8.915 5.326
36.069 48.000 51.662
78.841 109.783 95.435
0.958 0.821 0.939
0.406 0.512 0.532
3.824 9.195 5.774
37.643 48.042 52.326
3.521 4.020 3.495
0.971
0.846 0.971
1.069 3.712 0.381
1.055
3.711 3.250
3.000 4.472 49.396
1.849 1.669 1.792
0.960 0.822 0.971
0.278
0.439 0.382
3.744 9.174 3.288
37.643 49.000 49.244
4.258 3.246 3.528
0.950 0.822 0.973
0.752 0.337 0.297
4.094 8.858 3.140
37.590 48.000 48.765
2.250 2.023 2.185
0.958 0.822 0.969
0.385 0.463 0.497
3.857 9.194 3.416
38.013 49.000 49.396
2.996 4.130 3.221
0.959 0.942
0.959
1.619 1.690 1.511
1.573 1.713 1.495
8.602 6.000 7.071

89

0.501
0.794
6.969
18.116
66.013
0.332
0.827
3.860
16.788
68.469
252.695
0.845
6.870
14.718
60.054
188.363
0.900
0.865
8.997
49.164
67.449
0.923
2.917
6.136
41.947
49.447
0.944
0.488
5.294
31.008
3.274
0.968
0.432
2.943
15.983
1.802
0.964
0.297
3.495
20.478
3.744
0.958
0.351
4.658
23.566
2.089
0.957
0.376
4.722
23.457
3.587
0.971
1.273
1.269
5.911

0.376
0.079
5.504
7.766
25.361
0.163
0.085
2.000
7.963
27.908
126.859
0.132
13.660
14.648
36.099
24.038
0.071
0.408
7.256
28.417
52.404
0.063
5.707
7.089
33.105
27.788
0.060
0.206
7.356
30.112
0.389
0.050
0.797
5.509
26.251
0.250
0.053
0.228
5.686
27.121
0.957
0.059
0.354
7.532
31.083
0.164
0.059
0.224
7.596
31.412
1.425
0.011
0.311
0.311
1.696

Table 2.4: Numeric results of the five metrics for all the algorithms on all the synthetic images.
The index S is dimensionless and varies in the range [0, 1]. Times are expressed in seconds
while the three distances are measured in pixels.

2.4.3.1.

Images with concavities or complex shapes

Images cc1, cc2, and cc3 belong to this category. The TAN provides unsatisfactory results
for all the images. The GVF snake fails to properly segment image cc1 while the VFC snake
does not provide a completely satisfactory segmentation of image cc2. Both algorithms does
not provide completely satisfactory segmentations of image cc3, as shown in Table 2.4. On
the contrary, the two LSs provide perfect segmentations on this subset, and our proposal
provides good results for the three images, as shown in Figs. 2.13(a-c). The behavior of the
algorithms do not change using the K-means pre-segmentation.
2.4.3.2.

Images with fuzzy borders

Images fb1, fb2, and fb3 belong to this category. The TAN obtains good results on the first
two images when using the K-means pre-segmentation while it stops too far from the real

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


90
AND GLOBAL OPTIMIZATION
1.0
0.9

CaVK

GAC

GACk

ETAN

CaVK

GAC

GACk

ETAN

0.8

0.7
0.6

0.5

TAN

120

TANK

GVF

GVFK

VFC VFCK
(a) Index S

CaV

100

80

60

40

20
0

TAN

TANK

GVF

GVFK

VFC VFCK
(b) dHausdorff

CaV

Figure 2.15: Results obtained on the eighteen synthetic images.


borders without it. Conversely, the segmentation of fb3 is unsatisfactory in both cases. The
GVF snake is able to segment the first two images using the K-means while without it the
result for fb1 is suboptimal. It fails to properly segment fb3. The VFC snake (regardless the
chosen pre-segmentation), the LSs and our proposal perform well on the three images (see
Figs. 2.13(d-f)).
2.4.3.3.

Images with noise

These images, n1, n2, and n3, are affected by an increasing amount of noise. The snake and
the TAN algorithms are very influenced by the presence or not of the K-means segmentation.
In fact, it effectively filters the image providing a simpler input for the segmentation process.
In particular, the TAN provides unsatisfactory results in any case. The GVF snake without Kmeans is unable to segment the images while, when using it, it is able to provide suboptimal
segmentations because of its incapacity of segmenting this particular shape (see Table 2.4).
The VFC snake provides better results, being capable of segmenting correctly n1 even without
K-means. Moreover, with the pre-segmentation, it is able to segment correctly all the images.
Differently from the other algorithms, the LS and our proposal achieve good results in the
three cases (see Fig. 2.13(g-i) for ETAN).
2.4.3.4.

Images with holes

Images h1, h2, and h3 belong to this category. All the models are able to perform a near
perfect segmentation of the external borders of the objects. However, only our proposal is
able to segment the holes of the objects (see in Table 2.4 how no algorithm, apart from ETAN
and CaV, achieved a Hausdorff distance lower than 45 pixels). In these cases, the LSs are not

2.4. EXTENDED TOPOLOGICAL ACTIVE NET MODEL PERFORMANCE STUDY

91

able to segment the holes in the objects, when initialized outside the object, like the other algorithms. This is due to the monodimensional nature of the contour. Therefore, our approach
obtains much better overall results (see Figs. 2.13(j-l)) than all the others. This behavior does
not change when the competitors are endowed with the K-means pre-segmentation.
2.4.3.5.

Images with multiple objects

For images belonging to this category (mo1, mo2, and mo3), the TAN provides the worst
results. The meshes did not divide so that they cover the objects and the area among them.
Conversely, the snakes, the LSs and our proposal provide better results. While the GVF snake
is able to segment mo1, it fails to segment the other two. The VFC snake, in turn, only fails
to segment mo3 (see Table 2.4) while the LSs and our proposal are both able to properly
segment all the three images (see Figs. 2.13(m-o)). It is important to note that the snakes
are not able to change their topology, hence small wires connect the segmented object, even
when the segmentation is of good quality. The results do not change when the K-means
pre-segmentation is employed.
2.4.3.6.

Images with combinations of difficulties

Images c1, c2, and c3 belong to this category. The TAN performs badly on the three images. The GVF snake performs better but it is unable to separate the objects, to segment some
complex shapes, and to segment the holes. The VFC snake succeeds in separating the objects
but it is unable to segment some complex concavities and the holes (hence the high values of
the Hausdorff distance in Table 2.4). The LSs show the same behavior as the VFC snake, as
they fail in segmenting the holes in the objects. Our proposal is the only method being able
to provide good segmentations of the three images (see Fig. 2.13(p-r)). Our competitors are
scarcely influenced by the pre-segmentation.
2.4.3.7.

Real-world images

The images in this second dataset, m1, m2, and m3 are real-world CT images2 . They
are depicted in Fig. 2.14. The segmentation has been performed over the K-means presegmentation and over the original image. In the latter case, a Canny filtering [29] has been
applied to the image (this does not apply to the two LSs as the CaV does not employ edges
and the GAC relies on its own detector). Moreover, the initialization is different for the six
algorithms, because of the different implementations. In the case of the TAN, the mesh is
initialized on the whole image. This implies that the net gets stuck in the first local minimum, that is, the tissue part around the bone, as shown in Figs. 2.14(d-f). Conversely, with
the K-means pre-segmentation, the mesh can reach a zone closer to the object because the
pre-segmentation removes the tissue region, providing better results (see Figs. 2.14(a-c)).
The two snakes and the two LSs are initialized manually as circles around the objects: this
means that there is no overlap between the DMs and the external tissue contour. Even in this
case, the GVF snake falls in local minima for the case of m2 and m3 images (see Figs. 2.14(k,l))
while it provides a very good result for the case of image m1 (see Fig. 2.14(j)). Contrarily, the
VFC snake is able to obtain good results for the three images (see Figs. 2.14(p-r)). On these
2

The gray value of all pixels have been inverted so the bone becomes the darker object in the image.

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


92
AND GLOBAL OPTIMIZATION
images the LSs show some limitations. On the one hand, the CaV LS provides acceptable results on images m2 and m3 while it leaks inside the object for the case of image m1. Changing
the values of the two parameters of the algorithm in a wide range gave only two outcomes:
either the one shown in Fig. 2.14(v) or another, similar to the one obtained by the GAC in Fig.
2.14(ab), segmenting the tissue around the bone. On the other hand, the GAC LS failed in
segmenting the bones, providing a result similar to the TAN one. This is due to the attraction
of the edges dividing the air and the tissue which have a much stronger response compared
to the edges dividing the tissue and the bone.
Differently, our proposal is initialized automatically, using the K-means pre-segmentation,
with a rectangular mesh. This means that, in the case of images m2 and m3, there is an overlap
between the tissue external border and the mesh. Although this really increases the difficulty
level of the segmentation, our model is able to provide acceptable segmentations. In the area
occupied by the upper part of the bone in image m2 the mesh falls in a local minimum (see Fig.
2.14(ai)). Moreover, our proposal obtains an excellent result for images m1 (see Fig. 2.14(ah))
and m3 (see Fig. 2.14(aj)).
In the case of using the K-means pre-segmentation, the two snakes, the two LSs, and our
proposal obtain very similar results. Although the segmentation is acceptable, it is highly
dependent on the K-means pre-segmentation result; the borders, indeed, are not exactly the
ones of the objects (see Figs. 2.14(g-i, m-o, s-u, y-aa, ae-ag)).
2.4.3.8.

Time

The execution times of the six algorithms span four order of magnitudes, as shown in
Table 2.4. While the TAN never needs more than one second to segment the image, the two
snakes implementations require several minutes. In particular, the GVF snake required more
than ten minutes to segment one of the noisy images. This is partly because of the MATLAB
implementation of both snakes and partly because of the slowness of the optimization algorithm used. The LS showed to be a fast algorithm, when the Shi model is used, as it required
from 1.5 to 7.5 seconds to segment the images. These times, however, do not include the
creation of the k-means pre-segmentation, when applicable. Our proposal never required
more than nine seconds to segment an image, with most of the images segmented in less
than three seconds (see Table 2.4). Moreover, the k-means pre-segmentation and the creation
of the EVFC require approximately 2-3 seconds to be calculated. Concluding, ETAN and
the two LSs have similar processing times. The original TAN proposal is much faster, but
the quality of the results is much lower than that of the competitors. Finally, the two tested
snakes resulted to be much slower.

2.5.

A Scatter Search Framework for Extended Topological Active


Nets optimization

As seen in the previous section, the results obtained by ETANs have been encouraging.
They outperformed TANs, state-of-the-art snake models [185], and two state-of-the-art LS implementations. Moreover, the robustness achieved was significantly better than the previous
TAN method, and the ETAN together with the EBILS optimization procedure less sensitive
to parameter values changes.

2.5. A SCATTER SEARCH FRAMEWORK FOR EXTENDED TOPOLOGICAL ACTIVE NETS


OPTIMIZATION
93
However, since the ETANs were optimized using a local search procedure, the model
can reach wrong segmentations, local minima from the optimization viewpoint, due to the
presence of noise and/or artifacts or simply to the complexity found in the images. Fig. 2.16
illustrates this fact by showing two of these cases. A feasible solution is to complement the
EBILS optimizer with a global search. Indeed, such a global optimizer could consider multiple
meshes at the same time, combining them to generate more accurate ones until approaching
the global minimum of the energy function.
In this section we describe our proposal of such an ETAN global search method which
will be based on the SS EA. While the basis of SS were introduced in Sec. 1.3.2, here we
provide a motivation for the use of this specific MA. Finally, we deal with the customization of
the SS general framework to fit our ETAN optimization problem, describing every designed
component in detail.

2.5.1.

Motivation for the use of Scatter Search in ETAN optimization

Despite the huge dimensionality of the search space in our problem, we can rely on an
effective local search, EBILS (see Sec. 2.3.2). In fact, ETANs outperformed TANs in every
experiment we did in the previous section. The counterpart is that ETAN EBILS-based optimization process is two orders of magnitude slower than the TAN-BILS one. These time
constraints have to be taken into account when designing how our global search will consider
multiple alternatives in the segmentation process.
The simplest option we implemented was a Multi Start Local Search (MSLS) [81]. In this
case, we initialized a large number of meshes, different in size and location in the image
plane, run the EBILS on them and chose, as final result, the one with the lowest energy. This
method is simple and fast, but it lacks the capability of mixing solutions. In our case, this
problem is particularly evident in the case of images with multiple, distant objects. In fact,
it is unlikely that a single mesh is able to divide and move toward distant objects without
getting stuck in local minima, e.g. noise, along the way. Other more advanced approaches
as iterated local search and variable neighborhood search would show the same problem [81].
Another possibility could be to use an EA, or even better, its hybridization with a local
search, a MA [171]. Contrary to the MSLS, they have the ability of combining candidate solutions and of improving them to provide high quality candidates, thus providing a better
intensification-diversification tradeoff. According to [167], the segmentation results obtained
by the DE-based memetic approach (using TANs) improve previous global and local proposals. However, it implies a high computational cost. Indeed, it is a huge effort since the

(a)

(b)

Figure 2.16: Two cases of ETANs inaccuracies (in red).

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


94
AND GLOBAL OPTIMIZATION
employed EA used a population of 1000 individuals, each one storing a complete mesh and
relative topology. Actually, the proposed DE applies the BILS just every 10 generations, and
for only a few steps, because the authors needed to reduce the heavy computational burden
implied by those huge populations.
There are two main reasons for the need of such large populations. Firstly, the absence of
an adequate combination method, as seen in Sec. 2.2.2, the most of the existing ones employ
simple flavors of arithmetical combination operators which prove to be inefficient in combining meshes with moderate differences in shape, or even small differences in topology. In
fact, the decision to force a single topology in [167] or to employ niches of topologies in [102]
arise from the inability to deal with meshes with different topologies. This implies that those
proposals fail in segmenting objects with complex shapes or, even worse, multiple, distant
objects, above all when separated by heavy noise. Ad hoc combination procedures to generate
feasible and usable offsprings are then strongly required.
Secondly, large populations are required due to the evolutionary framework employed,
where randomized combinations of individuals are considered. Within the large umbrella
of MAs [171], SS is endowed with some specific and very attractive capabilities [123], as introduced in the Sec. 1.3.2. In fact, SS has been successfully applied to other computer vision
tasks, such as image registration [58]. The SS methodology is very flexible since each of its
elements can be implemented in a variety of ways and degrees of sophistication. Besides,
the SS approach relies on systematically injecting diversity in the RefSet to achieve better
exploration and, therefore, avoiding the need for a large population. Considering the time
constraints imposed by ETANs (which are effective but slower than TANs), employing a reduced population of high-quality solutions is quite an advantage in order to deal with a
global search for our problem. Thanks to this fact, the EBILS can be deeply applied at every
generation, thus getting a better intensification-diversification tradeoff. The aggregation of
this intensive EBILS application and a problem-specific solution combination method allowing us to properly mix nets with different topologies would become a very convenient way to
deal with ETAN optimization. These are the reasons why we considered SS the metaheuristic
which fits our problem the most.

2.5.2.

Scatter Search-based framework overview

Each individual in the SS population encodes a different ETAN definition, using a double
encoding, A and B, as shown in Fig. 2.1.
The overall scheme of the designed algorithm is shown in Fig. 2.8. The first step is performing some preliminary operations to incorporate image-specific information into the SS process with two objectives: i) generating proper gradient and distance gradient images, which are
employed in the external energy term; and ii) achieving a rough pre-segmentation, by the Kmeans clustering described in Sec. 2.3.3.1, whose result is employed to define the reference
net size and to bias the population initialization.
The diversification generation method is implemented using a frequency memory, with the
purpose of creating an initial population of diverse meshes, P . The generated solutions,
coherently with the SS framework, are enhanced through the use of the improvement method,
in our case the whole EBILS procedure described in Sec. 2.3.2. Then, the Reference set update
method selects the best meshes in terms of quality and diversity and insert them in the RefSet.
We thus consider a two-tier RefSet approach (see Sec. 1.3.2).

2.5. A SCATTER SEARCH FRAMEWORK FOR EXTENDED TOPOLOGICAL ACTIVE NETS


OPTIMIZATION
95
In the next step, the subset generation method generates all possible solution pairs to perform structured combinations of them by means of the solution combination method. The obtained results are also enhanced applying the improvement method, i.e. the EBILS. The best
solutions obtained are selected to replace the worst ones in the Reference set.
The main SS loop is repeated until one of the following events happens:
the RefSet did not change in the last iteration;
the diversity among RefSet1 solutions is below a threshold;
a new population has not been generated in the last thpop iterations.
Then, a restart is performed. All individuals but the best solution are removed from the
RefSet and a new base population is reinitialized in order to inject diversity. The algorithm
stops if the fitness of the best individual did not improve after a given number of restarts
(NR ) or the maximum number of SS iterations (N SS ) has been reached.
The remaining specific SS components are described in the next subsections.
Preprocessing

Population P

Diversication
Generation Method

Solution
Generator

Frequency
Memory
Generator:
new square
meshes are
generated
over less
searched areas
of the image

Repeat until
|P| = PSize

Edges

EVFC

EBILS

Input image
Phenotypic/Genotypic Solution
Combination Method

RefSet Update Method

EBILS

Quali
ty
Dive

rsity

Scatter Search main loop

New Subsets

Output image
RefSet
Subset Generation
Method

Stop
conditions

Quality
Subset

Diversity
Subset

Output net

Restart

End of SS run

Decision made on the basis of number of


iterations, diversity and age of Population

No signicative improvement
since last N Population generations

Figure 2.17: The complete segmentation process using ETANs and SS.

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


96
AND GLOBAL OPTIMIZATION

2.5.3.

Objective function definition: internal energy terms

As already mentioned in this PhD dissertation, when tackling image segmentation as an


energy minimization task, the correlation between fitness function and segmentation quality
plays a critical role. Indeed, if this correlation is loose, even the perfect optimizer would lead
to suboptimal segmentation results. Although the mesh adjustment to the object contour
(local search) and the evaluation of a segmentation through its mesh position (global search)
are both tackled as minimization problems, they actually are quite different tasks. Hence, it
is not surprising that the most suitable fitness functions for the two tasks do not match. As a
consequence, we employ two different energy function definitions.
The fitness function is the sum of the internal and external energy of every node (see Sec.
1.1.1.3). While we opted for the same external energy formulation for both local and global
searches, the internal energy function has been redesigned for the global search in order to
solve some specific problems which are described below.
In fact, in (E)TANs (including previous evolutionary TAN optimization proposals) the
energy function is derived from the original formulation, designed with the consideration of
a local minimization approach. In that formulation, the contraction term gives energy values
directly proportional to the distance among the nodes, thus forcing the mesh to contract. The
contraction is stopped by the external nodes in presence of edges. However, the contraction
term is not suitable in a global search framework because it penalizes big nets regardless of
the size of the target segmentation object. To deal with this issue we propose to substitute
the contraction term of the internal energy with an area-related one. Different from the other
terms which are calculated on a per-node basis, this term only takes into account the total
area of the meshes. Its magnitude is proportional to the ratio of the area of the candidate net
A(nc ) (the net whose fitness is being calculated) and the reference area Ar , taken from the
K-means pre-segmentation (see Sec. 2.3.3.1). The obtained value of the ratio is the input of a
proper function f , computed as follows:
Eint ,nt = f

A(nc )
; f (x) =
Ar

x1
x

if x < 1
if x 1,

The f function is shown in Figure 2.18. where is a constant (with a typical value of 5).
Another important change we performed is the removal of the bending term in the global
search fitness function. The aim of this term is ensuring a smooth mesh shape and this plays
a central role in the local adjustment to the object, helping to keep the net together. However,
this term is not suitable for the evaluation of the meshes in a global search framework. It
strongly penalizes meshes which divide into parts to adapt their topology to the objects, in
particular when the segmentation target is composed of many objects.
In some cases, the desired segmentation is made up of several objects of different sizes.
If the size of the smaller objects is negligible compared to the bigger ones, the contribution
provided by the global area energy term is not enough to distinguish among nets segmenting
only big objects and nets segmenting all the objects. This implies that, in these cases, the
fitness values are only dependent on the external energy value. Moreover, if the smaller
objects are brighter3 than the bigger ones, the fitness values of the meshes segmenting them
will be worse than the fitness values of meshes segmenting only the bigger, darker objects.
3

In this work the target objects are dark and the background is bright.

2.5. A SCATTER SEARCH FRAMEWORK FOR EXTENDED TOPOLOGICAL ACTIVE NETS


OPTIMIZATION
97

energy

10

12

14

alpha function

ratio of the areas

Figure 2.18: The f function.


The solution we propose involves the use of submeshes, i.e. isolated parts of a mesh resulting from topological changes to adapt to more than one object. Every node in a mesh is
part of only one submesh. When evaluating the fitness of a mesh, we calculate its submesh
relevance, that is g = i gi , where gi refers to every submesh the mesh is divided into. For
every submesh i, gi = 1 if the following three conditions hold:
1. the ratio between the covered area si and the total area of the image is above a threshold,
Gmin (0, 1);
2. the ratio between the area of the bounding box containing the submesh and its area is
lower than a threshold, Ga ;
3. the form factor of the bounding box, defined as the ratio between the longest and shortest edges of the rectangle, is below a threshold, Gff .
The Gmin threshold defines the minimum area of a structure in the image to be considered
an object. Together with the net reference size, they are the only kind of prior knowledge we
insert into the process, thanks to the SS flexibility to do so. If one of the three previous conditions does not hold, then gi = si /Gmin . Experimentally found proper values for the three
thresholds are 0.005, 8 and 5, respectively.

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


98
AND GLOBAL OPTIMIZATION

(a) A mesh with 2 submeshes

(b) A mesh with 3 submeshes

Figure 2.19: The submesh reward procedure. For the image in (a): -term = 20, Eext = 310,
reward = -66, total = 264. For the image in (b): -term = 10, Eext = 350, reward = -108, total =
252. With the reward, (b) has a lower energy than (a).
Once the submesh relevance g has been determined, we reward the net on the basis of
g, calculating fr = f (1 rg g), where f is the fitness value calculated so far, fr is the
fitness value considering the submesh reward, and rg is a weighting coefficient in (0, 1). The
rationale is rewarding (that is, lowering the fitness value of) the meshes with many relevant
submeshes and penalizing the meshes with many non-relevant ones. In particular, the nonrelevant submeshes penalize the mesh depending on their size: the larger, the worse. An
experimentally found proper value for rg is 0.1. Fig. 2.19 shows an example of the submesh
reward term.

2.5.4.

Diversity function

Our diversity function is meant to measure how much a net solution is different from
another one, with the aim of selecting candidate nets for the diversity subset of the RefSet.
We intend this subset as a reservoir of meshes which are not good enough to be considered as
a possible outcome of the segmentation process but that, nevertheless, segment some objects
(or parts of them) in a proper way. Moreover, these objects should be located far away, on the
image plane, from the ones segmented by the nets in the quality subset of the RefSet. With
this in mind, we designed the following diversity function:
d(m1 , m2 ) =

n
i=1

(m1,i,x m2,i,x )2 + (m1,i,y m2,i,y )2


,
Eext (m1 ) + Eext (m2 )

where m1 and m2 are two meshes, n is the number of nodes in a mesh, and ma,i,k is the
component k = {x, y} of the ith node of mesh a. In this way, the numerator of d() implies
that the farther the meshes are located on the image plane, the higher the d() measure will
be. Besides, the denominator implies that the poorer the meshes adjustments, the lower the
d() measure. We only consider the external energy because we are especially interested in
small, distant objects. In fact, the internal energy of nets only segmenting small objects is
usually high because their areas are probably different from the reference area.

2.5. A SCATTER SEARCH FRAMEWORK FOR EXTENDED TOPOLOGICAL ACTIVE NETS


OPTIMIZATION
99

(a) k-means bias

(b) After 20 meshes

(c) After 40 meshes

Figure 2.20: The frequency-memory population initialization.

2.5.5.

Diversification generation method

Our diversification generation method employs controlled randomization and frequencybased memory, typically used in SS, to generate an initial set of diverse and good quality
solutions. We generate Psize candidate rectangular nets with uniformly spaced nodes. The
EBILS is applied to each of them and we select the b1 fittest nets and the b2 most diverse nets
to form the RefSet.
The Psize initial rectangular nets are generated in the following way. For each of them (let
a new mesh be defined as Mn ), we first randomly generate a candidate mesh size (in pixels),
a candidate form factor, and a position in the image plane. The target mesh size is generated
randomly according to the following linear Probability Distribution Function (PDF) monotonically decreasing between Smin and Smax :

if x < Smin
0
PDFS (x) = ms x if Smin x Smax

0
if x > Smax ,
where Smin and Smax are, respectively, the minimum and maximum admitted sizes, and ms =
2
. Experimentally found proper values for the two limits are respectively 0.01A
(Smax Smin )2
and 23 A, where A is the image size. This means that we bias the candidate mesh size toward
small meshes, with the aim of finding small objects in the image.
Then, we generate a form factor for the rectangular mesh by randomly sampling a Normal
distribution with unitary mean and standard deviation. The next step is to test if a rectangular mesh with these characteristics can fit inside the image area. If it is not the case, new
candidate mesh size and form factor are generated with the same procedure, until a combination of the two can fit.
Finally, we place Mn in the image plane on the basis of the frequency-memory image, I fm ,
that has the same size of the image to be segmented. The position of the candidate rectangular
mesh in the image is restricted by some constraints. Let us define the dimensions of the image
as W and H while the dimensions of the mesh as w and h. Then, the zone of the image where
it is possible to locate the center of the mesh, ensuring that the whole mesh fits in the image,
is a rectangular area of dimensions W w and H h, with the same center of the image.
We generate at random up to Nm suitable locations for the center of the new mesh and we

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


100
AND GLOBAL OPTIMIZATION
calculate the mean grey level intensity of I f m , for the pixels covered by the mesh, Pm , for
every location.
To keep record of the already searched areas, we lower the intensity values of I fm pixels
covered by every mesh we generated. Therefore, we always place Mn over the brightest area
of I fm in which it could fit.
To improve the convergence time of the segmentation process, we bias the search toward the most promising areas by initializing I fm with a blurred version of the K-means
pre-segmentation (see Fig. 2.20(a)).
As the brightest area of I fm is always chosen and its intensities lowered, after the generation of some meshes, I fm becomes flatter. Therefore, new meshes are placed over the image
more uniformly (see Fig. 2.20(b,c)), permitting the exploration of the whole search space.

2.5.6.

Solution combination method

In previous works as [102, 167], either the DE operator [183] or the arithmetic crossover
[94] were employed for TAN combination. The formulation of the latter is as follows:
mo,i = m1,i + (1 )m2,i ,
where ma,i is the ith node of the mesh a and is a real number randomly generated in [0, 1],
the same for all the nodes of the two combined meshes.
Unfortunately, this operator is only useful at the very beginning of the search process,
producing nets worsening their parents fitnesses whenever the search process starts to converge. In addition, it does not incorporate the same information the parents hold and infeasible offsprings are obtained when combining nets with different topologies. To overcome
this problem, we propose an advanced solution combination method based on two different
Solution Combination Operators (SCOs), genotypic and phenotypic, that will be introduced
in the next two subsections.
2.5.6.1.

The genotypic SCO

The rationale of this operator is trying to combine two nets which perform a good segmentation of different parts of the object(s). Regardless of the topology of the parents, the
combination will have a basic topology, with every link in place. The EBILS, always called
after the combination, will eventually take the function of cutting links and/or open holes in
the mesh.
The genotypic SCO calculates a different (a combination weight) for every pair of homologous nodes of the parent nets. This value is inversely proportional to the local energy
of the nodes, in such a way that the location of the corresponding offspring node will be
more similar to the parent node with a lower local energy (hence, it works as a heuristic realcoded crossover [94]). The weights are only derived for the external nodes located in the
four edges of the net. The genotypic SCO performs a boost of the combination weights by
means of the fcw function (Eq. (2.7), shown in Fig. 2.21) to further increase high values and
further decrease lower values. The idea is to keep the position of a parent node placed over
an edge in the offspring net, since a final node position depends on the value of both parents. Moreover, the weights are smoothed substituting them with the mean of their external

2.5. A SCATTER SEARCH FRAMEWORK FOR EXTENDED TOPOLOGICAL ACTIVE NETS


OPTIMIZATION
101

0.6
0.4
0.0

0.2

boosted cross weight

0.8

1.0

cross weight function

0.0

0.2

0.4

0.6

0.8

1.0

original cross weight

Figure 2.21: The fcw function.

neighbors (including the node itself) to prevent link crossings. The relations are:
i =

e2,i
e1,i + e2,i

b,i = fcw (i ) =

1
sin i
2
2

1
2

(2.7)

,i = b,E (i) ; mo,i = ,i m1,i + (1 ,i )m2,i .


The weights obtained with this procedure for the parent nets shown in Fig. 2.22(a,c) are
shown in Fig. 2.23(a). These i weights multiply the parent 1 net, while those multiplying the
parent 2 are computed as 1 i . Note how the weights of the first row of the table correspond
to the upper nodes of the net shown in Fig. 2.22(a). These nodes are well positioned over the
object contour (and have a low energy) while the corresponding nodes of parent 2 net are
far away from the contour (and have a high energy). This condition leads to the high values
of the first row of Table 2.23(a). Conversely, the values shown in the last row of the table
are very low (mostly zero) since the corresponding nodes of parent 1 net are far away from
the object edge, while the homologous nodes of parent 2 are well positioned. Moreover, the
weights shown in the first and last columns of Table 2.23(a) correspond to the lateral nodes
of the parent 1 net. As the picture and the table show, the quality of the placement of these
nodes is decreasing going from upper to lower nodes while the opposite can be noticed for
the other parent net. In order to take advantage of the well positioned external nodes in both
parents while providing a smooth placement for all the nodes of the offspring net, we propose
a double linear interpolation procedure developed, separately over the x and y coordinates.
Let ma,c (j, k) be the c {x, y} coordinate of the node located at the j-th column and the k-th
row of the net a. Besides, let E (j, k) be the combination weight calculated earlier for the
external node located at the k-th row and the j-th column of the net and x (j, k) and y (j, k)
be the weights for the final x and y coordinates of a generic (j, k) node. If N and M are

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


102
AND GLOBAL OPTIMIZATION

(a) P1 (42474)

(b) Offs. (31781)

(c) P2 (69772)

(d) P1 (65277)

(e) Offs. (50218)

(f) P2 (100459)

(g) P1 (31356)

(h) Offs. (26774)

(i) P2 (49335)

(j) P1 (9310)

(k) Offs. (8391)

(l) P2 (13743)

(m) P1 (245886)

(n) Offs. (245833)

(o) P2 (257898)

Figure 2.22: The genotypic SCO. The first and last columns show the parents, while the middle one depicts the offspring. The numbers between parenthesis are the relative fitness values
(lower is better).
respectively the number of columns and rows of the nets, then:
mo,x (j, k) = x (j, k) m1,x (j, k) + (1 x (j, k)) m2,x (j, k)
mo,y (j, k) = y (j, k) m1,y (j, k) + (1 y (j, k)) m2,y (j, k)
E (N, k) E (1, k)
x (j, k) = E (1, k) + j
N 1
E (j, M ) E (j, 1)
y (j, k) = E (j, 1) + k
.
M 1

2.5. A SCATTER SEARCH FRAMEWORK FOR EXTENDED TOPOLOGICAL ACTIVE NETS


OPTIMIZATION
103
Therefore, the weights for the x coordinates are obtained interpolating the extremes of the
row while the weights for the y coordinates are calculated interpolating the extremes of the
column. The results of the interpolations for the weights of Fig. 2.23(a) are shown in Fig.
2.23(b,c).
Finally, Fig. 2.22(b) shows the result of the genotypic SCO applied to the nets shown in
Fig. 2.22(a,c). The following sub-figures, from (d) to (o), show the results of other combinations. Note how the energy of the offspring net (shown in the caption) and the segmentation
obtained are better than those of their corresponding parents.

r1
r2
r3
r4
r5
r6

c1
1.00
0.82
0.49
0.16
0.00
0.00

c2
1.00
0.00

c3
1.00
0.00

c4
1.00
0.00

c5
1.00
0.00

c6
1.00
0.00

c7
1.00
0.00

c8
1.00
0.00

c9
1.00
0.00

c10
1.00
0.00

c11
1.00
0.00

c12
1.00
0.00

c13
1.00
0.01

c14
0.95
0.01

c15
0.95
0.95
0.89
0.58
0.25
0.02

c14
0.96
0.94
0.87
0.55
0.23
0.02

c15
0.95
0.95
0.89
0.58
0.25
0.02

c14
0.95
0.76
0.57
0.38
0.20
0.01

c15
0.95
0.77
0.58
0.40
0.21
0.02

(a) Weights of the external nodes for the net in Fig. 2.22 (6 15 nodes).

r1
r2
r3
r4
r5
r6

c1
1.00
0.82
0.49
0.16
0.00
0.00

c2
0.99
0.83
0.52
0.19
0.02
0.00

c3
0.99
0.84
0.55
0.22
0.04
0.01

c4
0.99
0.85
0.58
0.25
0.06
0.01

c5
0.98
0.86
0.61
0.28
0.07
0.01

c6
0.98
0.87
0.63
0.31
0.09
0.01

c7
0.98
0.88
0.66
0.34
0.11
0.01

c8
0.97
0.89
0.69
0.37
0.12
0.01

c9
0.97
0.90
0.72
0.40
0.14
0.02

c10
0.97
0.91
0.75
0.43
0.16
0.02

c11
0.97
0.92
0.78
0.46
0.18
0.02

c12
0.96
0.93
0.81
0.49
0.19
0.02

c13
0.96
0.93
0.84
0.52
0.21
0.02

(b) Interpolation of the external weights along the columns (x coordinate).

r1
r2
r3
r4
r5
r6

c1
1.00
0.80
0.60
0.40
0.20
0.00

c2
1.00
0.80
0.60
0.40
0.20
0.00

c3
1.00
0.80
0.60
0.40
0.20
0.00

c4
1.00
0.80
0.60
0.40
0.20
0.00

c5
1.00
0.80
0.60
0.40
0.20
0.00

c6
1.00
0.80
0.60
0.40
0.20
0.00

c7
1.00
0.80
0.60
0.40
0.20
0.00

c8
1.00
0.80
0.60
0.40
0.20
0.00

c9
1.00
0.80
0.60
0.40
0.20
0.00

c10
1.00
0.80
0.60
0.40
0.20
0.00

c11
1.00
0.80
0.60
0.40
0.20
0.00

c12
1.00
0.80
0.60
0.40
0.20
0.00

c13
1.00
0.80
0.60
0.40
0.20
0.01

(c) Interpolation of the external weights along the rows (y coordinate).

Figure 2.23: The weights of the genotypic SCO for Fig. 2.22(a,c)

2.5.6.2.

The phenotypic SCO

Despite its good performance in combining good segmentations of different parts of the
object(s), the genotypic SCO does not perform properly when trying to combine nets which
perform good segmentations of different objects. Fig. 2.24(c) shows an example.
As a solution to this problem, we propose a phenotypic SCO. This SCO tackles the problem
of combining two meshes with a top-down approach, the opposite of the bottom-up approach
of the genotypic SCO. The first step is to derive the segmentation images of the two parents,
as if they were the final results of the process. A two-step filtering is applied to the binary

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


104
AND GLOBAL OPTIMIZATION

(a) Parent 1

(b) Parent 2

(c) Genotypic offspring

(d) Phenotypic offspring

Figure 2.24: The results of the two SCOs on nets which perform good segmentations of different objects.
images in order to remove the segmentation noise. First, a submesh filtering is applied
to remove any submesh which is not considered relevant. Second, a morphological closing
followed by an opening are applied to further smooth the resulting shape.
The union of the resulting two binary images is calculated, merging them through a logic
OR. The following step is adjusting a mesh to the shape of the object(s) in the union image,
a task the EBILS has been demonstrated capable of. With this in mind, we initialize the
offspring net using the bounding box of the synthetic object of the union image. Then, we
run EBILS to fit the mesh to the synthetic object(s). The resulting net will have the same
shape of the union of the two parent nets, including a new proper topology, calculated by
EBILS. While this SCO has only been applied to subsets of size two, it can be extended easily
to combinations of more than two solutions. Fig. 2.25 depicts the phenotypic SCO process
while Fig. 2.24(d) shows the result of the combination of the nets in Figs. 2.24(a,b).
2.5.6.3.

Combined application of the SCOs

The phenotypic and genotypic SCOs are fully complementary. The former is appropriate
in combining high quality meshes segmenting different objects while the latter is useful to

2.5. A SCATTER SEARCH FRAMEWORK FOR EXTENDED TOPOLOGICAL ACTIVE NETS


OPTIMIZATION
105
Two nets

Segmentations
Union of the
segmentations

EVFC of the
union

Result

EBILS
optimization

Figure 2.25: The phenotypic SCO process.

derive better nets while combining solutions segmenting the same object(s). Therefore, we
use both of them in the solution combination method. We propose to alternate them on the
basis of the RefSet1. In order to do so, at every iteration of the algorithm, we test if

fmargin =

fmean fbest
< thmargin ,
fbest

450
400

R
P
445
P
383

350

energy (/ 1000)

where fbest is the fitness value of the best mesh in the RefSet, fmean is the mean fitness value of
the meshes in the quality subset of the RefSet, and thmargin is a proper threshold experimentally set to 0.1. If the inequality is true, we apply the genotypic SCO, otherwise we apply the
phenotypic version. We also apply the phenotypic SCO after every population generation,
to exploit its superior capabilities in merging different nets. Moreover, if the best solution
improved in the last SS iteration, we apply again the same SCO to the whole population in
the current iteration. Fig. 2.26 depicts an extract of a convergence graph that shows how both
SCOs synergically contribute to the fitness improvement.

G
376
3

R
P

P
374

G
368

4
iteration

Figure 2.26: A convergence graph of RefSet1. The green, blue and red lines show the fitness
of the best, mean and worst solutions, respectively. P" stands for phenotypic SCO, G for
genotypic SCO, and R" for restart.

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


106
AND GLOBAL OPTIMIZATION

2.6.

Performance study for the optimization of Extended


Topological Active Nets with Scatter Search

2.6.1.

Experimental design

Our ETAN-SS proposal has been tested against three different segmentation algorithms:
the TAN-DE [167], the ETAN-EBILS we also proposed and already introduced in Sec. 2.3,
initialized by the bounding box of the K-means pre-segmentation, and an ETAN-MSLS considering EBILS as local search method. In this section we will refer to the four algorithms as
DE, LS, MS and SS, respectively. The three ETAN-based algorithms employ the same fitness
function and set of parameters. DE, MS and SS are run ten times each while LS, being deterministic, is executed just one time. Notice that other kinds of DMs as snakes and LSs are
not considered in this case since the good performance of the ETAN model with respect to
theirs was already studied in Sec. 2.4.
MS considers multiple alternative solutions applying the EBILS starting from different
positions. The meshes are initialized with the same frequency-memory procedure described
in Section 2.5.5. Since the time needed to calculate the value of the objective function is much
lower than the duration of the EBILS itself, the stopping condition of the MS is the mean run
time of the ten executions of the SS.
The considered image dataset is made up of a mix of 20 synthetic and real-world medical
images (see Sec. 2.6.2). The images show various difficulties and have a ground truth allowing
us to properly evaluate the segmentation performance. In order to quantitatively assess our
results on the dataset, we compute the same metrics we introduced in Sec. 2.4.
The parameters used to run the algorithms over the 20 images in the dataset are shown
in Table 2.5. Some, related to ETANs, have already been covered in Sec. 2.3. As shown, the
ETAN-based algorithms have all the parameters of the TAN and some new ones.
The three ETAN-based algorithms have been implemented in C++, while the DE in C.
The tests were run on an Intel R CoreTM 2 Quad CPU Q8400 at 2.66 GHz with 4 GB RAM.
Parameter

DE

ETAN-based

Net size
thcut
thholes
SCanny
Psize
b1 , b2

[0.01, 1.0]
[0.1, 0.5]
[1.0, 20.0]
1.0
[1.0, 2.0]
[2.0, 9.0]
1.0
[12 12, 20 20]
1000
-

[5.0, 15.0]
3.0
3.0
1.0
3.0
15.0
[1.0, 7.0]
[20 20, 30 30]
3.4
[0.6, 1.0]
[15, 150]
20
3, 3

Table 2.5: The parameters used in the experimentation.

2.6. PERFORMANCE STUDY FOR THE OPTIMIZATION OF EXTENDED TOPOLOGICAL


ACTIVE NETS WITH SCATTER SEARCH
107

(a) k1

(b) k2

(c) k3

(d) k4

(e) l1

(f) l2

(g) l3

(h) l4

(i) l5

(j) l6

(k) s1

(l) s2

(m) s3

(n) s4

(o) s5

(p) s6

(q) s7

(r) s8

(s) s9

(t) s10

Figure 2.27: SS results on: (a)-(j) real-world CT images; (k)-(t) synthetic images. For every
image, the mesh shown is the most similar to the mean one, resulting from the ten runs,
according to the S index and Hausdorff distance statistics.

2.6.2.

Image dataset

The 20 images selected try to cover the most typical difficulties in images segmentation:
concavities, complex shapes, fuzzy borders, holes, noise, and multiple objects. They are
shown in Fig. 2.27 with the SS results superposed. We divided them into two categories:
Real-world medical images. Two groups of images belong to this category. Images from
k1 to k4 (Fig. 2.27(a-d)) are real-world CT images of a human knee. The gray value of all
pixels have been inverted so the bone becomes the darker object in the image. Images from
l1 to l6 (Fig. 2.27(e-j)) are CT images of human lungs. The target objects are, respectively, the
bones and the lungs. The ground truth has been derived manually. The images have sizes
up to 432 470 pixels.

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


108
AND GLOBAL OPTIMIZATION
Synthetic images. We drew these images trying to cover the mentioned segmentation
difficulties. In addition, these images have been artificially perturbed with three different
kinds of noise: Gaussian (with = 20), Lorentz (salt-and-pepper, with = 7), and tinyobjects, that is, small dots or lines which are not part of the target objects. The only exception
is image s1, which is only affected by the latter kind on noise. Since they are synthetic images,
they have been generated starting from the ground truth. All the images have size 375 375
pixels.
Metric

MdRT

MdTR

dH

LS
MS
DE
SS
MS
DE
SS
LS
MS
DE
SS
MS
DE
SS
LS
MS
DE
SS
MS
DE
SS
LS
MS
DE
SS
MS
DE
SS

k1
0.92
0.90
0.44
0.93
0.05
0.20
0.01
2.84
12.93
14.10
3.03
8.12
10.68
0.43
4.46
3.39
28.99
2.73
1.25
11.29
0.40
28.60
74.43
74.60
17.17
16.98
18.58
1.35

k2
0.96
0.92
0.40
0.97
0.03
0.18
0.01
10.17
5.14
8.83
1.45
5.69
2.64
0.27
1.63
11.30
53.59
1.51
4.90
13.31
0.31
62.10
86.58
127.19
8.48
10.66
24.48
2.28

k3
0.95
0.90
0.57
0.96
0.01
0.07
0.01
10.17
4.39
21.86
2.13
1.05
0.98
0.26
1.77
13.73
24.97
2.04
2.45
1.75
0.35
72.44
92.29
88.09
17.49
16.28
6.15
10.24

k4
0.92
0.93
0.28
0.94
0.01
0.07
0.01
17.14
7.84
23.80
2.90
4.05
3.71
0.35
2.00
9.30
32.84
3.50
5.28
3.25
3.46
83.01
91.58
86.67
30.12
12.15
4.55
23.81

Real-world
l1
l2
0.89 0.97
0.86 0.97
0.63 0.56
0.98 0.99
0.10 0.01
0.03 0.11
0.00 0.00
10.86 4.13
6.62 4.89
17.90 25.03
1.57
0.94
1.58 1.18
4.23 4.83
0.38 0.11
6.27 1.49
31.01 2.51
80.76 66.43
1.60 1.42
31.44 0.68
4.37 14.29
0.24 0.16
54.45 44.94
121.94 49.79
234.70 183.04
24.19 22.64
77.92 9.45
7.76 24.90
2.82 2.75

l3
0.96
0.95
0.80
0.99
0.03
0.04
0.00
5.39
5.14
17.73
0.89
1.39
1.60
0.09
2.25
3.92
34.86
1.84
2.68
2.45
0.20
37.44
42.05
118.06
25.72
11.43
6.25
3.09

l4
0.98
0.97
0.82
0.99
0.02
0.01
0.00
3.80
3.22
18.27
0.64
0.83
1.41
0.08
1.29
5.37
31.31
1.30
4.92
2.50
0.07
45.28
53.10
104.81
14.81
15.02
5.96
0.40

l5
0.92
0.94
0.54
0.97
0.02
0.10
0.00
6.01
5.05
39.70
1.34
0.81
12.39
0.27
2.81
2.09
34.23
2.72
0.69
7.70
0.19
31.83
38.40
94.33
38.15
14.90
10.01
15.34

l6
0.87
0.83
0.64
0.94
0.06
0.05
0.02
7.41
5.64
21.80
2.17
0.82
4.77
0.89
7.96
14.64
35.22
3.87
16.25
9.50
0.74
48.80
59.88
104.61
27.08
43.55
24.57
3.86

s1
0.97
0.98
0.98
1.00
0.01
0.01
0.00
18.68
11.17
1.89
0.57
7.44
1.10
0.03
0.91
0.94
1.56
0.56
0.41
0.83
0.02
98.99
70.03
17.23
2.38
28.34
12.34
1.88

s2
0.97
0.98
0.65
0.99
0.00
0.01
0.00
2.76
2.37
12.75
0.59
0.61
1.23
0.02
1.00
0.77
36.76
0.60
0.10
2.08
0.02
34.89
32.73
98.95
4.08
4.60
4.48
1.28

s3
0.94
0.97
0.64
0.99
0.02
0.04
0.01
20.64
10.15
3.77
0.84
3.37
1.61
0.49
1.09
1.08
41.61
0.67
0.78
3.10
0.24
135.91
96.88
126.73
8.92
24.63
4.96
12.52

s4
0.93
0.96
0.57
0.98
0.01
0.12
0.00
21.52
4.91
3.90
0.74
1.33
1.31
0.23
1.34
1.03
37.58
0.63
0.22
4.46
0.15
126.57
72.92
100.05
7.19
13.40
6.99
3.67

Synthetic
s5
s6
0.97 0.97
0.98 0.96
0.77 0.72
0.99 0.98
0.00 0.02
0.02 0.02
0.00 0.00
7.33 2.52
5.80 3.04
14.24 12.13
0.99 1.61
3.97 0.65
2.16 2.98
0.38 0.42
0.87 0.86
0.84 1.12
22.56 24.68
0.80 0.83
0.05 0.51
2.12 2.50
0.05 0.04
76.69 30.41
78.33 34.43
70.44 81.21
9.52 24.01
21.73 14.01
2.72 7.47
9.12 2.07

s7
0.96
0.95
0.47
0.97
0.02
0.03
0.01
2.19
2.17
9.55
0.72
0.84
1.86
0.10
0.84
1.06
35.19
0.66
0.55
9.49
0.10
35.90
38.25
95.99
10.79
19.41
24.87
3.72

s8
0.47
0.95
0.61
0.98
0.02
0.02
0.00
51.54
5.29
11.27
0.83
3.54
1.44
0.10
59.33
2.48
29.53
0.88
2.15
1.95
0.15
121.81
74.86
83.25
8.65
12.19
11.97
3.40

s9
0.94
0.89
0.56
0.96
0.05
0.03
0.01
5.91
5.47
7.28
1.62
1.09
0.64
0.14
1.84
6.25
42.42
1.89
5.23
3.22
0.79
71.06
82.13
123.24
21.23
18.12
8.35
13.01

s10
0.93
0.90
0.50
0.93
0.04
0.04
0.02
2.59
3.06
11.38
2.14
0.67
3.15
0.25
1.82
3.01
40.03
1.71
2.59
4.26
0.32
36.00
38.10
100.63
31.41
8.87
13.67
7.76

Overall

0.92 0.10
0.93 0.04
0.61 0.16
0.97 0.02

10.68
5.71
14.86
1.39

10.27
2.79
8.65
0.73

5.09
5.79
36.76
1.59

11.54
7.19
15.91
0.96

63.86
66.43
105.69
17.70

30.35
24.42
42.50
9.82

Table 2.6: Numeric results of the four segmentation metrics for all the algorithms on all the
images.

2.6.3.

Analysis of the obtained results

Table 2.6 shows the numeric results obtained by the four methods in the four considered
metrics for every image. Since DE, MS and SS are run ten times for each image, the table
collects the corresponding mean () and standard deviation (). It also shows their overall
values on the whole dataset. Since it is not possible to normalize the values provided by the
three distance metrics, the corresponding overall figures are only an indication. For every
image, and for the global and , the best result for every metric is highlighted in boldface.
It can be clearly seen that our proposal achieves the best mean results in the four quality
metrics in almost every case. It also showed the best behavior regarding robustness. Notice
that our SS method achieved the lowest standard deviation values in the four segmentation
metrics.
With regards to the execution time, while LS is quite fast, with a mean time of 1.18 s, SS
and MS are almost 300 times slower, on average. DE is slightly faster, at about 150 times
slower. This is an expected result, given their global search nature and the use of a simpler
local search with a very limited number of applications (see Sec. 2.2) in the case of DE. It is
worth noting that any other MA employing the introduced operators and the EBILS would
be slower than our SS proposal which uses a very reduced population. In addition, our

2.6. PERFORMANCE STUDY FOR THE OPTIMIZATION OF EXTENDED TOPOLOGICAL


ACTIVE NETS WITH SCATTER SEARCH
109
approach is a fully automatic segmentation method while LS needs a good initialization,
with the consequent time consumption.
In all the images, the target objects are structures, generally darker than the background,
surrounded by a wide range of other structures. Segmentations including both the target
objects and part of the background correspond to local optima of the energy function. They
often have high values of the dRT metric. Incomplete segmentations, lacking part of the target
objects, are another kind of local optima. Typically, these segmentations showed high values
of the dT R metric. Conversely, segmentations including all and only the target objects are
close to the global optimum, with low values of the dRT , dT R , and, consequently, dH metrics.
When focusing on each method individually, we can notice how LS, got trapped in local
optima on the knee images, triggered by the tissue part around the bone. On the lung CTs,
the results were similar. In these cases, the local optima are caused by the presence of the
ribs, the vertebral column, and the interface between the external air and the tissue. On the
synthetic images, LS obtained slightly better results. It was able to segment the holes in the
images and to successfully filter the two kinds of punctual noise. However, the algorithm
tends to segment the small dots and other structural noise, even getting stuck into them, like
in the case of image s8. Figs. 2.28(a)-(c) show some examples of the problems described.
The segmentations obtained by MS are expressions of different local minima (see Figs.
2.28(d)-(f) for illustrative examples). In these cases, the algorithm found meshes with a lower
energy than LS. However, in doing so, it lost some high energy objects, not being able to cut
the connections between these objects and the background. Indeed, it often was unable to
segment the smallest object in images k2, k3 and k4. Moreover, the segmentation of the lungs
are often incomplete, lacking some important parts. Connections with external borders are
also present, as in the case of LS, but to a lesser extent. The values of the dRT and the dT R
distances for the LS and MS algorithms confirm the analysis of the segmentation defects.
Unsurprisingly, on the synthetic images, MS was able to filter the structural noise better than
LS, but it showed the same tendency to undersegment the objects. It successfully filtered the
two kinds of punctual noise.
The results obtained by DE are poor. The meshes got stuck in both kinds of local optima
exposed so far. In addition, they failed in locating the objects, eventually taking degenerated
shapes. The only successful result was s1, which is only affected by tiny-objects noise, showing
how the global search is able to filter these kinds of structures. Conversely, the algorithm
proved to be heavily affected by the punctual noise on the synthetic images. Although DE
achieved good performance in [167], the images considered there were significantly simpler
than the ones in our dataset. The lack of an energy term rewarding the segmentation of
multiple objects, the loss of the topology information of every net but the best individual at
every new generation, the inability of the BILS to adjust the mesh to objects with complex
shapes, and the absence of crossover operators considering the characteristics of the problem
strongly limit the performance of that proposal.
Finally, the results obtained by SS are clearly the best ones. For all the images, it performed better than the other three methods in almost every statistic. Indeed, it respectively
ranked first 20 and 19 out of 20 times, according to the mean of the S and dH index values, as
shown in Table 2.6. The low values of the dRT and the dT R distances, smaller than the three
competitors, are in line with the quality of the segmentations. SS gets the best result in 19
cases for dRT and in 15 cases for dT R . Focusing on the medical images, the resulting nets on
images k2, k3 and k4 properly segment the small objects and there are few connections to the
background on all the images of this category. The segmentations are almost complete but,

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


110
AND GLOBAL OPTIMIZATION

(a) LS on k2

(b) LS on l1

(c) LS on s3

(d) MS on k2

(e) MS on l1

(f) MS on s3

Figure 2.28: Examples of LS and MS inaccuracies (in red).


in some cases, SS was not able to segment some small structures, as in Figs. 2.27(g, h, i). As
for the synthetic images, the segmentations are complete while there is almost no presence
of structural noise.

2.6.4.

Statistical analysis of the results

In the previous section, we provided a detailed analysis of the numerical results obtained,
giving an insight of the performance of the four compared methods. To prove the significant
superiority of the segmentation capabilities of our proposal, in this section we provide a
statistical analysis of the obtained results. With this aim, we performed a two-tailed Wilcoxon
signed-rank test [17] for each of the four segmentation metrics as follows.
Let N = 20 be the sample size, that is, the number of images in the dataset and hence the
number of pairs in the test. We compare the mean performance over the ten runs for each
image. Each pair is composed of the result of our SS proposal and the best result among
those achieved by the other three algorithms. This is the hardest case since, for every image,
we always compare SS against an aggregate algorithm whose performance is the best one
obtained by the set C = {LS, MS, DE}. Since we are now comparing only two algorithms (C
and SS), we are allowed to use the Wilcoxon test, as outlined in [66]. The null and alternative
hypothesis are defined, respectively, as:
H0 : median difference between the pairs is zero,
H1 : median difference is not zero.
The p-values obtained for the four metrics are shown in Table 2.7. For SS, the values of
the medians are higher for the S index and lower for the three distances. Given the obtained
p-values, we found enough evidence to reject the null hypothesis with a confidence level of

2.6. PERFORMANCE STUDY FOR THE OPTIMIZATION OF EXTENDED TOPOLOGICAL


ACTIVE NETS WITH SCATTER SEARCH
111
0.05 for every metric. It is worth noting that the confidence level could have been much lower
for the S, MdRT and dH metrics, for example 0.01.
Although the Wilcoxon test is significant enough to prove the superiority of the performance of our proposal with respect to the other three methods, for the sake of clarity we also
show a boxplot of the two most relevant metrics, S and dH , in Fig. 2.29. These boxplots are
a quick way to graphically examine the distributions of the 200 results obtained by the three
stochastic algorithms over the 20 images, considering the 10 runs (being deterministic, LS
has been run just once per image).

1.0

250

0.8

200

150

0.6

100

0.4

50

0.2

LS

MS
DE
(a) Index S

0
SS

LS

MS
DE
(b) dHausdorff

SS

Figure 2.29: The distributions of the results (1/10 runs for 20 images) obtained by the four
methods in the S and dH metrics.

Metric

p-value

S
MdRT
MdT R
dH

1.907 106
3.815 106
2.958 102
5.722 106

median(C)

median(SS)

0.958
4.266
1.700
47.036

0.977
1.164
1.465
17.328

Table 2.7: The results obtained by the Wilcoxon test.

CHAPTER 2. NEW ADVANCES IN TOPOLOGICAL ACTIVE NETS: MODEL EXTENSION


112
AND GLOBAL OPTIMIZATION

2.7.

Conclusion

In this chapter we have proposed an accurate, robust and automatic segmentation method
that is able to perform in a reasonable time. Our method is based on an advanced TAN model
and on an optimization approach based on developing a global search using SS.
On the one hand, we extended the TAN model by introducing a new external energy term,
derived from the EVFC, that substitutes the previous distance gradient image. We designed
new ways to tackle topological changes, such as link cuttings, net division, and holes segmentation while ensuring the net correctness during the local search. We also designed a heuristic to fix the position of misplaced nodes and endowed the ETANs with a pre-segmentation
phase based on K-means clustering with the aim of performing automatic initialization. On
the other hand, the new SS-based global search method, along with the several specific components we researched, allows us to really take advantage of the population-based optimization framework as none of the previous approaches were able to do. In particular, we introduced a global-search suitable internal energy term, two proper SCOs, a diversity function,
and a frequency-memory population generator.
The two key contributions were tested separately. First, we tested the ETAN DM along
with other local search methods. The obtained results were encouraging. Our proposal improved the accuracy of the segmentation of synthetic images in comparison with state-ofthe-art snake models, while needing lower computational resources. Besides, it was also
competitive with LSs on synthetic images while it even outperformed them on the tested
real-world medical images. Moreover, the robustness achieved was significantly better than
the previous TAN method and it was found to be less sensitive to parameter values changes.
Second, we tested the ETAN-based SS framework along with other global search models and
the plain ETAN, achieving very promising results. Our SS proposal significantly improved
the accuracy of the segmentation on real-world medical images, as well as on synthetic ones,
in comparison with the ETAN-EBILS, an ETAN-MSLS, and the state-of-the-art DE-based MA
for TANs.

CHAPTER

Deformable models supervised


guidance: a novel paradigm for
automatic segmentation
3.1.

Introduction

From the classic snake model [114] to recent hybrid approaches using gradient vector
flows [176] or watersheds [238], DMs have always involved an optimization process. Different families of optimization methods have been developed along years. However, the
requirements of many real image segmentation tasks are difficult to model with a robust
mathematical formulation.
In particular, the energy minimization methods are often too generic to lead to a fine
segmentation of thin structures, and when they give satisfactory results, they generally use
weighting parameters that need to be manually tuned. Often these numerical parameters are
obscure and their refinement time consuming, as pointed out by Sguier and Cladel [194]:
the weighting coefficients are difficult to find especially when contours to be identified vary
from one image to another.
There are many image segmentation problems where the numerical optimization process
find problems due to the non-homogeneous intensities within the same class of object and
the high complexity of their shape. This is, for example, the case of many medical images
which involve non-homogeneous intensities within the same class of tissue, high complexity
of anatomical structures, as well as a large variability.
Some of these problems can be solved by incorporating prior knowledge to the model.
In fact, recent approaches [175, 217] generally use energy minimization techniques to define
the additional terms of the force from prior information. However, in fields such us medical
imaging, the structures of interest are often very small compared to the image resolution and
may have complex shapes. This makes it difficult to define energy constraints that remain
both general and adapted to specific structures and pathologies.
Another important drawback of DMs is that the formulated minimization problem is difficult to solve due to the presence of numerous local minima and a large number of variables.
On the one hand, this difficulty may lead to sensitivity to the initialization, complicating the
113

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


114
FOR AUTOMATIC SEGMENTATION
unsupervised use of DMs. As images are assumed to be noisy, the external energy term is
most probably multi-modal. Hence, algorithms aimed for local optimization have problems
to optimize deformable surface meshes. On the other hand, the global minimum of the energy function does not correspond to the best segmentation results in most of the cases. This
makes harder the optimization task following a global approach, since in contrast to local approaches, it is really difficult to stop the model evolution in satisfactory local minima. In fact,
the assumption of these techniques is that the global minimum of the energy function corresponds to the optimum segmentation result. However, modeling the segmentation problem
with a mathematical formulation able to express such a function is a very difficult task, if not
impossible, when a high precision of the segmentation results is mandatory.
An alternative approach involving a different type of decision process consists of translating the available information into a ML model that is directly used to drive the DM evolution. This approach gives the opportunity to automatically generate complex, non-linear,
and data-driven relationships among different sources of information (e.g. both global and
local image cues, shape-related prior knowledge, etc.) whereas energy optimization techniques usually derive term weights for a linear, manually defined energy function. Thus, in
our novel approach, the learning process is guided by the ground truth information represented in a training image set. This way, the problem is tackled from a different perspective:
instead of designing a priori a general purpose energy function (and the values of the associated parameters) aiming at performing well with the problem at hand when being optimized,
our approach is to derive the model directly from the desired results themselves using a ML
method.
The key contribution of this chapter is the introduction of a general ML-based IS framework. Given a dataset of training images, the framework will allow us to automatically design
a model able to segment targets of the same type as the ones found in the training dataset,
with minimal human intervention. For that aim, the framework is made up of four main components: the deformable model, the driver, the term set, and the localizer. The driver is a general
purpose machine learning tool whose output directly guides the selected DM evolution, on
the basis on the available information contained in the term set. A serious limitation of existing DMs is that the final result is sensitive to the location of the initialization. To deal with
this, we introduced the localizer. It aims at finding a rough location of the target object in
the image area, providing a proper initialization to the DM. Finally, an additional transversal
component, the integration mechanism, deals with the specific implementation details, dependent on the choice of the main components.
The structure of this chapter is as follows. Sec. 3.2 reviews the relevant proposals dealing
with the application of ML techniques to the adjustment of DMs for IS. Sec. 3.3 introduces our
general purpose, ML-based IS framework while Sec. 3.4 describes an implementation of our
framework tailored to the medical imaging field. Finally, Sec. 3.5 is devoted to the evaluation
of our proposal performance in comparison with other extended IS methods while Sec. 3.6
summarizes some conclusions on the carried out work.

3.2.

Background

In specialized literature, the typical IS and recognition process employing DMs is organized as a pipeline comprising the following steps: initialization, evolution, and recognition
(optional).

3.2. BACKGROUND

115
Number of articles per year

20

Frequency

15

10

1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Year

Figure 3.1: The number of relevant articles per year.


DM evolution is seen, in most of the cases, as an optimization problem, in particular as a
numerical optimization task involving the minimization of an energy function. In fact, given
a configuration of the DM, its energy can be calculated. The energy is typically divided into
two parts: internal and external, as seen in Sec. 1.1. While the internal energy enforces a
smooth DM shape, the minimization of the external term is responsible of fitting the target
object. Hence, the external term is based on image visual cues and is calculated starting from
a set of image descriptors. Given an energy function and a proper initialization, the DM
is optimized toward the position and shape with lowest energy, corresponding, at least in
theory, to the ideal target object segmentation. If the process not only aims at segmenting
the object but, rather, at classifying it as belonging to some specified classes (e.g. lesion/no
lesion), a recognition step is performed, based on image and DM characteristics.
The application of ML to IS and, in particular, to DMs has seen widespread adoption in
the last decade. In order to analyze the related state of the art, we have developed a specific
search on the Scopus website. The employed query has been the following:

stt t rtt
r tt t
sttst s t t ts
st s t t s
t tr t r
r s ssr r rst
r tr s s tr
Performing this search on July 3th, 2013, we obtained a list of 862 results, later manually
refined up to 95 articles to select the most representative ones. Considering this smaller list,
the number of published articles per year is shown in Fig. 3.1.
After an extensive review of the latter list of papers, we defined a taxonomy recognizing
four different categories and classifying the ML-based strategies accordingly. The first category defines those works which use ML to initialize the DM or to impose some constraints to

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


116
FOR AUTOMATIC SEGMENTATION
its evolution (e.g. avoid further evolution if the DM is already far from the provided initialization), as in [132, 206, 214]. Apart from this characteristic, in these works the DM evolves
using the standard energy minimization approaches introduced in Sec. 1.1. We named this
category initialization.
A different approach is to employ ML in the generation of an external energy term which
is used in the standard DM optimization procedure [135, 184, 229]. A popular choice consists
of performing texture analysis. In this case, texture statistics are calculated in a small window
centered at each pixel. A classifier is trained to distinguish between object and background
on the basis of these statistics. Finally, a DM external energy term is generated to take into
account the output of the classifier. We named this category energy term generation.
A very popular approach is to employ ML in the recognition step only. In these works the
segmentation is performed by the DM as usual. However, the segmentation result is classified
as belonging to two or more categories. This process is performed by a classifier trained with
features derived from the image itself and/or the shape and position of the final state of the
DM evolution [140, 221, 239]. We named this category recognition as the role of ML in these
approaches fulfills this higher level image processing task. It is rather a post-processing task
instead of a segmentation process itself.
Finally, the last category comprises those works which employ ML to directly guide the
DM evolution. In this case, the output of the classifier (or regressor) is the direction (and/or
speed) toward which the DM should evolve (locally or globally). Usually, classifier inputs
are composed of image features along with geometric features of the DM. We named this
category guidance. This is the most related category to our proposal but just a handful of
works [170, 180, 208] following this approach were found in literature, with [170] being the
most representative.
The next four subsections are devoted to introduce each category, including a short description of the most representative proposals of each group.

3.2.1.

Machine learning-based deformable model initialization approaches

DM evolution is strongly influenced by the model initialization. In fact, the evolution


depends on both image features and DM status. Moreover, many of the image dependent
forces have a relatively limited range of attraction. Therefore, a model will difficulty converge
toward its target structure if initialized far away from it. Employing ML techniques to derive
a proper initialization is a widespread strategy in literature, where different ML tools have
been applied. In [133], an adaptive probabilistic neural network is responsible to extract the
initial contour in MR IS while in [132] this task is performed by an SVM classifier, in a dental
X-rays analysis system. The authors of [125] applied NNs, in particular contextual constraint
neural networks [142] and Self-Organizing Maps (SOMs) [121], to perform initial contour
extraction for a LS model, dealing with MR knee segmentation. We can find some more of
these kinds of approaches in the MR IS field as [99], where fuzzy C-means [20] is employed
to properly initialize a LS model. In [79], a supervised learning scheme based on random
forest for automatic initialization and propagation of statistical shape and appearance model
is presented. The aim is the prostate segmentation in Trans Rectal Ultrasound images. In
this case, unlike traditional statistical models of shape and intensity priors, the appearance
model is derived from the posterior probabilities obtained from random forest classification.
This probabilistic information is then used for the initialization (but also the propagation)

3.2. BACKGROUND

117

of the statistical model. In [130], the DM is provided an initial contour by the segmentation
result of EM-based [65] fuzzy threshold method for a problem of lung cancer detection.
Finally, ML-based initialization is also performed in multi-frame segmentation problems.
In [113], another lung cancer detection system is presented. In this case, the tumor in the
lungs is not static and changes its shape and position during each breathing cycle. The tumor
is segmented from each slice of the first frame using wavelet features and SVM classifier. This
result is used in initialization of the contour for segmentation of the tumor in subsequent
frames by the DM.
To provide a complete example of the application of ML techniques to the initialization
of DMs, the following section gives a resume of a recognized work dealing with brain MRI
segmentation [214].
3.2.1.1.

Coupling of Radial-Basis Network and Active Contour Model for Multispectral


Brain MRI Segmentation

In [214], the authors introduced a framework to perform MRI segmentation. The presented model couples a segmentation algorithm, based on a Radial Basis Neural Network enhanced with Cylindrical Coordinates (RBNNcc) [233], and an ACM, based on a Cubic Spline
Active Contour (CSAC) [55] interpolation.
Since the input images are multispectral, the first step of the method consists of a multispectral intensity normalization and a image registration [244] procedure. A presegmentation stage using a RBFNNcc follows. The final step corresponds to the computation of the
CSAC model. The authors presented two coupling variations between the RBFNNcc and
CSAC: static and dynamic coupling. The resulting contour is obtained when either 1) the
CSAC model converges for the static coupling case, or 2) when iterations of the CSAC-RBFNcc
loop converge for the dynamic coupling scheme.
In the nonadaptive (static) RBFNNcc-Snake coupling, the net has two goals: on the one
hand, it works as a presegmenter, providing an initial contour for the CSAC, in the form of
a binary image. On the other hand, the net is responsible for the generation of a restriction
term, employed in the ACM energy. The latter is defined as
Esnake = Eint + Eext = Eint + Eimage + Erestriction ,
where Eint is a smoothness term related to the snake geometry, Eimage is defined as a linear
combination of the gradient magnitudes of each independent band, and Erestriction is defined
from a function of the forces of the discriminant images obtained with the RBFNNcc. The
optimal contour is obtained with a normalized gradient descent procedure in an iterative
optimization process.
Adaptive (Dynamic) RBFNNcc-Snake coupling works in a different way. In this case, in
each iteration, the estimation of the desired border is improved. This information can be used
by the RBFNNcc to reduce uncertainty in the frontier between classes. If the ACM is near the
desired border at each iteration, it is possible to make a fine adjustment of the discriminant
estimation at that particular region. A feedback strategy is introduced, which links the ACM
output with the RBFNNcc to adjust the network parameters at each iteration. This improves
the restriction force incorporated in the energy term of the snake and enhances the networks
performance. With all of these new parameters, the snakes restriction term is recalculated for
the next iteration. The distances between every pair of control points of successive contours

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


118
FOR AUTOMATIC SEGMENTATION
are measured. The process stops when the difference between successive contours (for every
point) is less than a predefined threshold.
The systems were tested on a set of brain images provided by the Brainweb Simulator
[38]. The obtained results outperformed the competitor methods on the same dataset.

3.2.2.

Machine learning-based deformable model energy term generation


approaches

Generating an energy term is a stronger way to couple ML and DMs. In this case, the
term will influence the DM evolution through the whole segmentation process. In a recent
article [37], the authors introduced a neuro-LS segmentation system for dynamic susceptibility contrast enhanced and diffusion weighted magnetic resonance images. In this case, three
different artificial NN systems, Multi-Layer Perceptron (MLP), SOM, and Radial Basis Function (RBF) were employed to derive three independent speed images to be used as terms in
three LS evolution equations. Also employing an MLP, the authors in [152] combine NNs
and ACMs to perform MR IS. The perceptron is trained to produce a binary classification of
each pixel as either a boundary or a non-boundary point. Subsequently, the resulting binary
(edge-point) image forms the external energy function for a snake, used to link the candidate
boundary points into a continuous, closed contour. The inputs to the MLP are normalized
intensity values of the pixels from a window scanned across the image and the spatial coordinates of the model. In [197], the authors introduced an ML-based extra speed propagation
term to the motion equation of the ACM for liver segmentation. This term is based on a Gaussian mixture model fitted to the intensity distribution of the medical image. It is calculated as
the difference between the maximum membership of the intensities belonging to the classes
of the object and those of the background.
The following subsections provide a more comprehensive insight about three relevant
proposals applying ML for the generation of DM energy terms.
3.2.2.1.

Application of Artificial Neural Networks in Automatic Cartilage Segmentation

In [184] the authors aim at segmenting cartilage in MRI images. To do so, they employ
a feedforward back-propagation NN [106] to calculate the value of the external term of an
ACM (snake). In particular, the NN is trained pixel-wide in the following way. First of all,
100 points from a MRI image are chosen, 50 points belonging to cartilage parts and 50 points
belonging to background. Each point P and its neighborhood form an input column vector
with 81 elements. The target vector consists of the 100 relative valuations. Value 0 relating to
input vector indicate background and value 1 relating to input vector indicate cartilage parts.
The output of the network generates real values in [0, 1] where 0 indicates that the input vector
belongs to background and 1 indicates the input vector belongs to cartilage parts. Finally,
they binarize the value (ENN ) through a threshold.
Directly segmenting the image with the trained NN resulted in several inaccuracies and
pattern mismatching. Therefore, after training, the obtained NN is instead used for calculating the value of the external energy term for each pixel in the image. Indeed, the snake
external energy is simply set equal to ENN . This strategy combines the overall pointwise accuracy of the NN with the smoothing capabilities of the DM, able to filter out small inaccuracies
of the NN results.

3.2. BACKGROUND
3.2.2.2.

119

Interactive Image Segmentation Based on Level Sets of Probabilities

In [135], the authors integrated a LS method, which can effectively segment objects with
complex topologies, with a probabilistic pixel classifier that overcomes a typical limitation of
the LS method, that is, being driven by forces computed from local image data only.
The authors presented a method that integrates discriminative classification models and
distance transforms with the LS method. The LS function approximates a transformed version of pixelwise posterior probabilities of being part of a target object. The evolution of
its zero level set is driven by three force terms, region force, edge field force, and curvature
force. These forces are based on a probabilistic classifier and an unsigned distance transform
of salient edges.
Given the attributes of a pixel, the classifier outputs an estimated likelihood of the pixel
being part of the target object. To integrate the classifier with the LS method, the LS function
is defined to approximate the posterior probabilities of the pixels. The estimated likelihoods
from the classifier is used as an initialization, which is further improved by the LS method.
Since an accurate classifier does not exist at the beginning of the segmentation process, classifier training and the LS method are alternated to improve the performance of both.
3.2.2.3.

Implicit active shape model employing boundary classifier and probability for
organ segmentation

In these papers [228, 229], the authors employ the implicit ASM [189] to segment liver
CT images. Apart from the shape model typical of ASMs, they endow the model with a
ML-based contour detection term (in [228]) and a ML-based region term (in [229]).
While in common ASMs the appearance model is assumed to have normal distribution,
this is not always the case in the problem considered in these papers. Hence, the authors
extended it to arbitrary distributions by employing a nearest neighbor classifier.
During the training stage, additional false profiles are sampled off the true organ boundary. In order to evaluate the boundary probability, the k nearest neighbors found in the training set are determined using the L2 norm. Amongst these, the number of true boundary
profiles divided by k yields the desired probability. For each point which is part of the LS
narrow band around the active surface, an intensity profile is sampled. The boundary probability is then determined by finding the nearest neighbors in the training set and calculating
the ratio. The result is a probability image, from which a stopping function is calculated.
Since the capture range of the boundary model is limited, the algorithm can converge to
local minima. To avoid this, the authors researched a region model. The learning is done
on the basis of Haar-like features [219]. The classifier employed is a cascade of M boosted
classifiers. At each stage i = 1, 2, ..., M , a strong classifier either rejects a sample or passes
it on to the next stage. Only when the sample is accepted at the final stage, it is classified as
part of the organ. Each strong classifier is a weighted combination of weak classifiers and is
determined by the AdaBoost algorithm [77].
The resulting LS equation is the following:

= g||div
t
||

+ g + g||,

where the first term accounts for the smoothness of the boundary, while the second part
constitutes an advection term. The stopping function g is the reciprocal of an edge detector,

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


120
FOR AUTOMATIC SEGMENTATION
which in turn depends on the image I. For the sake of readability, here we do not show the
full LS evolution equation comprising the shape term.
The proposed system was evaluated on the sliver 07 dataset1 . While the method proposed in [229] is a fully automatic method, the results were competitive even with interactive
methods.

3.2.3.

Machine learning and deformable models for image recognition and


understanding

Most of the works in this section belong to the Computer-Aided Diagnosis (CAD) field.
CADs are procedures in medicine that assist doctors in the interpretation of medical images.
A typical case is disease detection. In these works, after the segmentation of the target structures, the system performs a detection or decision step, generally with the aim of classifying
some characteristics of the segmented structures. In these cases, ML is employed to perform the classification. In [140], the authors perform automatic diagnosis of breast cancer.
After a DM segmentation step, textural features are extracted by wavelet transforms in the
segmented areas, assuming different textures in tumoral and other tissue areas. A fuzzy Cmeans clustering algorithm [20] is the chosen ML tool to classify the images into malign and
benign ones. A different type of classifier is used in [221], to perform a similar task. In this
latter case, the aim is the segmentation and classification of leukocytes in color images. After
a segmentation step performed by a GVF snake, cell nuclei are classified by an SVM using
features of morphometry, color and texture. In [207], the same classifier is used, but in a different way. In this case the authors aim at segmenting different structures human brain MR
images. As in the previous cases, the strategy involves a previous DM-based segmentation
step (in this case performed by means of VFC-based ACM). However, in this case, different
features are selected according to the target structure, so that a specific classifier is used for
each structure. In [186], the authors aim at automatically recognizing five types of white
blood cells to assist hematologists in the diagnosis of many diseases. After the segmentation
of the nucleus and cytoplasm (by means of Gram-Schmidt orthogonalization method and a
snake algorithm, respectively), the cells are classified by means of two different ML tools:
SVM and ANN. The features chosen are extracted by LBP and GLCM. They are also selected
by a Sequential Forward Selection algorithm [227].
The following section provides deeper insight about a relevant proposal that uses DM
and ML for image recognition and understanding.
3.2.3.1.

Congenital aortic disease: 4D magnetic resonance segmentation and


quantitative analysis

In this paper [239], the authors introduce a segmentation and detection system for the
early detection of congenital aortic disease in 4D (3D+time) cardiovascular MRI.
After an initial step of multi-view image registration, the segmentation method combines
LS and optimal surface segmentation algorithms in a single optimization process to determine the aortic surface. This resulting surface is registered with an aortic model followed by
calculation of modal indices of aortic shape and motion. The modal indices reflect the differences of any individual aortic shape and motion from an average aortic behavior. An SVM
1

http://sliver07.org

3.2. BACKGROUND

121

classifier is employed for the discrimination between normal and connective tissue disorder
subjects.
The actual 4D segmentation is performed in three steps:
Aortic surface presegmentation: A 4D fast marching LS method simultaneously yields
approximate 4D aortic surfaces.
Centerline extraction: Aortic centerlines are determined from each approximate surface
by skeletonization.
Accurate aortic surface segmentation: Accurate 4D aortic surface is obtained simultaneously with the application of a 4D optimal surface detection algorithm.
The disease assessment method is where ML comes into play and it is directly based on
the analysis of the 4D segmentation result. First, a point distribution model (PDM) is built
representing the aortic shape and its motion during the cardiac cycle. Then, the modal indices
of the PDM are used as input to a SVM classifier. The PDM is built by means of an automatic
generation of aortic landmarks representing the segmented aorta in 4D and subsequently
capturing the shape variation by performing principal component analysis (PCA) on the 4D
shape vectors of the aorta.
Each 4D aortic instance is represented by the modal indices describing the observed shape
and motion variations, the SVM classifier is used to classify 4D aortic instances into classes
of normal and connective tissue disorder subjects. Given M input training samples d Rn
with class labels g {1, 1}, the SVM maps a sample d into a high-dimensional space using
an appropriate kernel function and constructs an optimal hyperplane separating the 2 classes
in this space. Moreover, the grid search method recommended in [34] is used to determine
the SVM parameters.
Both segmentation and disease detection results are remarkable, with a classification accuracy of 90.4%.

3.2.4.

Machine learning-based deformable model guidance approaches

Differently from the previous three strategies, only three works were found in literature
employing a ML tool that directly guides the DM evolution according to specific features.
The following sections are devoted to the analysis of those proposals.
3.2.4.1.

An automated method for lumen and media-adventitia border detection in a


sequence of IVUS frames

In [180] the authors aim at segmenting the Lumen and Media-Adventitia in a sequence of
Intra-Vascular Ultra-Sound (IVUS) frames. They employ a parametric snake whose energy
(composed of internal and external terms) is optimized by a Hopfield NN [98, 242]. The
structure of the NN incorporates knowledge of the image pattern (through the bias term of
the NN energy) while a Simulated Annealing (SA) [118] scheme is employed to avoid local
minima of the energy function.
A Hopfield NN consists of a single layer of neurons, where each neuron has one of the
two outputs, o = 0 or o = 1 (firing or not firing). This NN is fully interconnected with no
specific input or output layer. The state of a neuron depends on the input it receives from
other neurons. Each node has a bias i and is connected with every other node. The network

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


122
FOR AUTOMATIC SEGMENTATION
converges when the energy function reaches a local minimum. The nodes of the network
correspond to image pixels and the pixels that minimize the total energy of the network form
the desired boundary. In this approach, N line segments consisting of M points, that are
perpendicular to the initial contour, at 15 of intervals, determine the area where the snake
can deform. Each neuron, in fact, represents a candidate point of the final boundary.
In this work, the authors seek for points that have high image gradient. If the gradient
of the image at a pixel is greater than a specific threshold, then the pixel is considered to be
a candidate boundary point. Therefore, neurons of the NN that correspond to pixels with
low gradient have low probability to be members of the desired boundary. The iterative
procedure for the minimization of the energy function is as follows. The state of each neuron
is updated and if the new state reduces the total energy of the network, the new state is
acceptable and the states of all neurons and the total energy of the network are updated.
Then, the total energy of the network is checked and if the energy does not change anymore,
the network has reached the global minimum. The firing neurons form the new boundary of
the region of interest.
While this method achieved interesting results, a manual initialization of the DM is needed
to provide an initial contour of the structures. The segmentation task is still seen as an optimization problem.
3.2.4.2.

Active contours driven by supervised binary classifiers for texture segmentation

In [170] the authors introduce a different ACM for IS driven by a binary classifier instead
of a standard motion equation. In this method, the expert segmentation is used to build the
learning dataset composed of samples defined by their Haralick texture features. Then, the
learned classifier is employed to guide the LS on the test images. In this work the authors
used the LS introduced by Shi et al. [200]. This LS approximation does not depend on solving
a partial differential equation (PDE) but its evolution only depends on the sign of the speed
function F . Therefore, this property makes it suitable to derive the speed function values
from the discrete output of a classifier. The model proposed in this paper evolves in two steps
as a supervised process. First, the use of supervised classifiers induces the development of a
learning phase using classified samples. During an interactive step, an expert segmentation
is carried out on a learning image. Then, local Haralick texture features from each pixel of this
segmentation are used to create the samples of the learning dataset, allowing the classifier
to carry out its learning task. In a second step, the ACM driven by the pre-learned classifier
is launched on several test images to carry out the desired segmentations. In order to derive
the learning dataset, the user is asked to create an expert segmentation by manually giving
two ideal regions C in (inside the target object) and C out (outside the target object). For each
pixel in C in and C out , m Haralick features are calculated using a neighborhood window with
a size defined by the user. Thus, a learning dataset X is determined whose samples s(i) are
described as:
s(i) = k1 (i), k2 (i), ..., km (i), l(i)
l(i) =

1
+1

if s(i) belongs to C in
if s(i) belongs to C out ,

with k1 (i), k2 (i), ..., km (i) being the m Haralick coefficients of sample i and l(i) a label representing its region belonging.

3.3. A GENERAL FRAMEWORK FOR DEFORMABLE MODELS SUPERVISED GUIDANCE


123
The authors tested three different classifiers on the proposed approach: a K-NearestNeighbors (KNN), a SVM, and a NN. In every case, the speed function output will be the
class predicted by the classifier. While the experimentation presented in [170] is limited, this
approach outperformed the Chan-Vese region based ACM [33] in terms of a generic discrepancy measure based on the partition distance [30].
3.2.4.3.

Using classifiers as heuristics to describe local structure in Active Shape Models


with small training sets

In this article [208], the authors present a general procedure for the use of heuristics to
guide an ASM-based [48] search algorithm and an implementation using ML classifiers. In
this work the authors paid attention to the development of general local appearance models
for small image training sets. For dealing with training sets containing a small number of
images, they propose building heuristics (based on ML) with the local structure around each
landmark (one heuristic per landmark).
While the process of training the ASM and thus building the global shape model takes
place, classifiers are trained for each one of the landmarks. Following the standard ASM
procedure, the authors take ASM profiles normal to the shape boundary as the local structure
around each point. The gray level value of k pixels are sampled on each side of the landmark
(with a length of 2k + 1 pixels).
Profiles of length 2k + 1 are sampled moving the original landmark along the normal to
the shape boundary ns k positions on each side of the original one (ns is the number of
total sampled pixels on each side, and it is larger than or equal to the profile size k). They
thus have, for each landmark, a training set containing 2(ns k) + 1 feature vectors of length
2k + 1. The label value is related to the distance from the center of the profile to the true
position of the landmark, giving higher values for profiles further away and lower values for
those closer to the true position.
During the estimation phase, each landmark is moved to a new position. In the algorithm,
the position towards which a landmark should be moved is estimated by the classifier that
was trained in the training phase for that landmark. In order to estimate the most promising
profile, during the search stage 2(ns k) + 1 profiles of length k are sampled in an analogous
way to the training phase. Each sampled profile is passed to the classifier and the profile that
returns the smallest value is declared the winner. The landmark is then moved to the center
of this profile. This procedure is repeated for each landmark. After this, the global shape
model [208] is applied to ensure that the shape is compatible with the shape training set. The
procedure continues until a convergence criterion is met.
The presented system is compared to other two classical approaches to local appearance
models in ASMs. They are evaluated over a synthetic dataset of shapes and a dataset of pictures of hands. The results show that the proposed system is competitive or even outperforms
the other models.

3.3.

A general framework for deformable models supervised


guidance

In this work we present a novel automatic IS framework based on ML and DMs. The input
of our system is a Dataset of Images (DI). Each image in the DI has an associated Ground

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


124
FOR AUTOMATIC SEGMENTATION
Truth (GT) binary image which partitions the original image I in two regions: foreground
(FG), delineating the segmentation target, and background (BG), such that GT = FG BG and
FG BG = . The output is a ML model or, rather, a set of models, able to segment a dataset
of images similar to the ones in DI, thus having generalization capabilities. Being a general
purpose framework, it could be applied to any kind of image, modality, and target structure.
The framework consists of several components and the connections among them. In this
case, the sophistication comes from the choice of the components and the way they are interconnected. The opportunity of examining different designs allows the exploration of strategic possibilities that may prove effective in a particular application or image modality. The
framework components are:
Localizer: any method that is capable of detecting a region of interest (ROI) within an
image. Possible choices are, for instance, model-based detection systems or registration
algorithms.
Deformable Model: any DM model (e.g. LS) whose final position within the image domain represents the segmentation result.
Term set: a set of image descriptors (like Haralick features, HOG or LBP) and DMrelated features (like local curvature). Basically any kind of relevant feature can be
part of the term set.
Driver: a ML-based classification or regression method whose output directly guides
the DM evolution.
The framework components are not connected in a pipeline-like manner, but rather they
are tightly interconnected, in a way also dependent on the implementation. We call integration
mechanism the way the framework components are connected to each other. Fig. 3.2 shows
the framework general scheme.
A key aspect of the framework is the role of the driver. While in the vast majority of
the works in literature DM evolution is tackled as an energy minimization problem, in our
proposal the driver directly guides the evolution of the model using a model derived from a
ML method. The driver is intended as a general purpose ML tool able to drive the DM toward
its final position, the target object. It makes decisions considering the values assumed by the
features in the term set. These features comprise image related cues, the local and global
status of the DM, or other information from different sources, as the output of the localizer.
The prediction calculated by the driver will affect the evolution of the DM in a way dependent
on the specific framework instantiation. In particular, it is strongly dependent on the nature
of the DM employed.
As the driver is a general purpose, supervised ML tool, a training phase is needed. It
has to be carried out by means of a specific procedure to infer a function from a labeled set
of training examples. In supervised learning, each example is a pair consisting of an input
vector and a desired output value, called the label. A supervised learning algorithm analyzes
the training data and produces an inferred function, which can be used for predicting new
examples (see Sec. 1.4). However, in our case, the input of the framework, the DI, is a dataset
of images and associated GTs, not a standard plain set of vector-label pairs, as expected by
many ML algorithms. Therefore, it is necessary to construct a proper dataset of vector-label
pairs, which we name Image Vector-Label Dataset (IVLD). Each example in the IVLD will be

3.3. A GENERAL FRAMEWORK FOR DEFORMABLE MODELS SUPERVISED GUIDANCE


125

Figure 3.2: The framework overall scheme.


made up of a vector of the values the terms in the term set assume in a given location of the
DM in the image, and a (continuous or discrete) label indicating the behavior the DM should
show in that specific case.
In general, the values assumed by the terms in a given image point depend on both the
image itself and the status of the DM in that point. This implies that the use of an evolving
DM is mandatory to generate the IVLD. It is not possible to generate the IVLD a priori, directly
from the image itself. Moreover, the meaning and the type of label are strictly dependent on
the structure of the employed DM. For instance, geometric and parametric DMs will need
different class of labels to reflect the different ways they evolve. While the former evolve
(that is, move) only along the line perpendicular to the contour (in a given point), the latter
can move toward any direction.
To generate the IVLD, we exploit GT information by forcing the DM to evolve in a way
that, in its final state, will make it perfectly cover FG, as defined earlier. In other words, each
label in an IVLD example will be the one that is expected to guide the DM toward a segmentation identical to the GT. For instance, if the employed DM is a LS, the label would be positive
(that is, the DM grows) for each point p of the LS being over the foreground FG while it would
be negative (that is, the DM shrinks) if p is over the background BG. A more complex label
set is expected in the parametric DM case, as each control point can, in principle, be moved
toward any direction. Depending on the DM initialization, different evolution trajectories
can be produced. By performing the IVLD creation starting from different initializations, we
can generate examples from different areas of the images, increasing diversity in the IVLD.
Once the IVLD has been generated, the driver can learn a classifier or model from it using
any supervised ML technique. The output of the learning is, in an optimal scenario, a ML
model with the ability to correctly determine labels for unseen instances. We call this the
Guide Model (GM). During the prediction phase, that is, when unseen images are segmented,
the driver constructs a feature vector for each point of the DM by calculating the values of the
features of the term set (in the same way it was done for the creation of IVLD), then applies the
GM to this vector and calculates the label responsible for the DM movement in that specific
point. The process stops as soon as the termination criterium is satisfied. This condition

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


126
FOR AUTOMATIC SEGMENTATION
depends on the specific DM employed.
Different choices are possible for each component of our framework. This fact provides
flexibility, as it can be implemented in a variety of ways and degrees of sophistication. As the
components are interconnected, the decision might affect the integration of the components
to some extent.
In the framework, the role of the localizer is to detect a ROI within the image. While it is
not strictly necessary to perform segmentation, it effectively reduces the search space, allowing the driver to focus on the relevant areas. Moreover, it provides the capability of discerning
similar objects on the basis of their location in the image space (e.g. recognizing one of the
two lungs). In any case, the information dispensed by the localizer can be exploited in different forms, depending on its nature. For instance, if the localizer was a part-based method,
the provided information would be in the form of the coordinates (and, possibly, the size) of
the model parts in the image space. Differently, a registration-based localizer would provide
a somewhat more comprehensive kind of information in terms of shape definition and accuracy of the localization. However, compared to a part-based model, a registration localizer
could be more sensible to noise and high variance in shapes, apart from needing much higher
computation times. Moreover, the choice of the localizer would influence the initialization
of the DM and the features of the term set. In fact, in the case of a registration-based localizer,
the DM could be straightforwardly initialized with the result of the registration step. Moreover, a feature of the term set could be the minimum distance between the registration result
and the evolving DM. Conversely, in the case of a N part-based localizer, the DM could be
initialized by dividing it in N chunks located over the localizer parts. In this case, a feature
of the term set could be the distance between a point in the DM and the closest localizer part
center.
The choice of the DM is also very relevant in the framework. On the one hand, geometric
DMs provide the advantage of being able to easily change topology and, therefore, are suitable to segmentation problems with high variability in shape. On the other hand, parametric
DMs are faster than their geometric counterparts and, while tackling topological changes less
easily, they can effectively deal with strong shape constraints.
Being the driver a general purpose ML tool, a plethora of different learning algorithms is
available (see Sec. 1.4). However, the nature of the employed DM can constrain the available
drivers. A common LS, for instance, is guided by a certain speed function, with a real valued
result. To calculate this velocity, a regressor would be more appropriate than a classifier.
Differently, for parametric DMs the desired output could be the destination pixel of a DM
control point. This problem can be dealt with a classification algorithm. Apart from what
has been explained so far, there is no clear advantage in using a specific ML method, with a
wide range of possibilities. However, given the possible large size of the IVLD, fast algorithms
are preferred, in both training and prediction phases.
Finally, the features in the term set are strongly dependent on the characteristics of the DI.
In fact, in some datasets, a set of features is more discriminative than others. For example, if
the goal is segmenting structures in satellite images, the color would be very relevant, as some
structures (e.g. forests) are best described by it. Conversely, in medical imaging, where color
is mostly absent, other features such as edges and textures are best suited at discriminating
different tissues and organs.
In general, the choice of the most appropriate components is dependent on the IS problem
at hand. Being able to easily replace the framework components to tailor the system to specific
datasets is one of the key strengths of the proposed framework.

3.4. A SPECIFIC IMPLEMENTATION OF THE FRAMEWORK FOR MEDICAL IMAGE


SEGMENTATION
127

3.4.

A specific implementation of the framework for medical


image segmentation

As said earlier, there is a vast plethora of design possibilities when implementing a segmentation system within the proposed framework. To illustrate this capability, we tailored
the system to segment diverse structures within different medical image modalities. Therefore we integrated the components focusing on edge and texture image descriptors along
with shape. We chose a part-based localizer able to capture the variability of the forms and
quickly recognize different parts of the target structures. We opted for the Shi LS [200], a
flexible, fast DM, able to handle complex topologies and simple to control. In fact, its evolution algorithm only considers the sign of the speed function value, thus allowing to treat
that calculation as a classification problem. Hence, we chose an extended, quick, accurate
and easy to train classifier, RF [27] (see Sec. 1.4.3). The decision to choose fast components
was motivated by the large size of one of the datasets to be considered.
In the following sections, we provide insight for the design of every component in the
framework to customize it to our medical IS problem.

3.4.1.

Localizer

In recent years in the field of computer vision, object detection methods based on discriminatingly trained part based models gained a lot of attention because of their high accuracy.
The most influential paper in this area is Felzenszwalb et al.s [70], which introduced an object detection system based on mixtures of multiscale deformable part models. Their system
relies on a novel approach for discriminative training with partially labeled data. They have
introduced a new method, called latent SVM, which is a reformulation of MI-SVM in terms of
latent variables. A latent SVM is semi-convex and the training problem becomes convex once
latent information is specified for the positive examples. This leads to an iterative training
algorithm that alternates between fixing latent values for positive examples and optimizing
the latent SVM objective function.
Most of the object detection methods are tailored to a specific application. For example,
Kainmller et al. [112] detect the liver by searching for the right lung lobe, which can be relatively easily detected by thresholding and voxel counting. The problem of such approaches
is that they do not generalize well to other structures or image modalities. Learning-based
approaches such as Discriminative Generalized Hough transform [190] and Marginal Space
Learning [134, 240] are more general and can be adapted to a wide variety of detection tasks
by simply exchanging the training data.
Jung et al. [111] propose a learning-based organ detection approach for 3D medical images based on the Viola-Jones object detection algorithm [218]. They use a bootstrapping
approach to automatically select important training data and propose several extensions to
the original approach tailored to medical images, namely a new pure-intensity feature type,
and a multi-organ detection that exploits spatial coherence in medical data.
Criminisi et al. [53] introduced a new parametrization of the anatomy localization task
as a continuous multivariate parameter estimation problem. This is addressed effectively via
non-linear regression, in the form of regression forests [54, 177]. The approach is fully probabilistic and, unlike other techniques [71, 241], maximizes the confidence of output predictions. The method yields salient anatomical landmarks, i.e. automatically selected anchor
regions that help localize organs of interest with high confidence. The algorithm can localize

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


128
FOR AUTOMATIC SEGMENTATION
both macroscopic anatomical regions, like abdomen, thorax, etc. and smaller scale structures
like heart, left adrenal gland, etc. using a single model.
As said, the localization method for medical images are mostly tailored for a specific application. For our proposed framework it is more desired to have a more general, universal
localizer that could be used for locating any pattern type in an image, after customizing it to
the specific problem.
The method introduced in [78] is a good candidate for such a localizer. In this system an
object or pattern in an image is modeled with K number of small deformable parts. Over
these deformable parts a relational graph is defined that specifies which pairs of parts are
connected as shown in Figure 3.3.

Figure 3.3: Right-lung localization with three parts.


Formally, let I be an image, pi = (x, y) the pixel location of part i, and ti the mixture
component of part i, where i {1, . . . , K}, pi {1, . . . , L} and ti {1, . . . , T }. ti is the
type of part i.
In order to determine the score of a configuration of parts, i.e. how the parts are distributed within the image and where exactly they are located, the authors first define a compatibility function for part types that factors into a sum of local and pairwise scores:
t ,t

btii +

S(t) =
iV

biji j .

(3.1)

ijE

The parameter bti favors particular type assignments for part i, while the pairwise pat ,t
rameter biji j favors particular co-occurrences of part types. Let us define a G = (V, E) for
a K-node relational graph whose edges specify which pairs of parts are constrained to have
consistent relations. One can now write the full score associated with a configuration of part
types and positions:
t ,tj

witi (I, pi ) +

S(I, p, t) = S(t) +
iV

wiji
ijE

(pi pj ),

(3.2)

3.4. A SPECIFIC IMPLEMENTATION OF THE FRAMEWORK FOR MEDICAL IMAGE


SEGMENTATION
129
with (I, pi ) being a feature vector extracted from pixel location pi in image I and (pi pj ) =
[dx dx2 dy dy 2 ]T , where dx = xi xj and dy = yi yj are the relative location of part i with
respect to j.
Inference corresponds to maximizing S(I, p, t) over p and t. When the relational graph
G = (V, E) is a tree, this can be performed efficiently by dynamic programming. Let kids(i)
be the set of children of part i in G. The authors compute the message part i passes to its
parent j by the following:
scorei (ti , pi ) =btii + wtii (I, pi ) +

mk (ti , pi )
kkids(i)

mi (tj , pj )

t ,t
= max biji j +
t

(3.3)

t ,tj

+ max score(ti , pi ) + wiji


pi

(pi pj ),

where the first equation computes the local score of part i, at all pixel locations pi and for all
possible types ti , by collecting messages from the children of i. Equation (3.3) computes for
every location and possible type of part j, the best scoring location and type of its child part
i. Once messages are passed to the root part (i = 1), score1 (c1 , p1 ) represents the best scoring
configuration for each root position and type. One can use these root scores to generate multiple detections in image I by thresholding them and applying non-maximum suppression.
Learning The authors assume a supervised learning paradigm. Given labeled positive examples In , pn , tn and negative examples In , they define a structured prediction objective function. To do so, let us define zn = (pn , tn ) and note that the scoring function 3.2 is linear in
model parameters = (w, b), and so can be written as S(I, z) = (I, z). A model would
be learned of the form:
1
arg minw,i 0 2 + C
2
s.t.

n pos
n neg, z

n
n

(In , zn ) 1 n
(In , z) 1 + n

(3.4)

The above constraint states that positive examples should score better than 1 (the margin),
while negative examples, for all configurations of part positions and types, should score less
than 1. The objective function penalizes violations of these constraints using slack variables
n .

3.4.2.

Deformable Model

As already introduced in the previous section, the output of the driver, that is, the meaning and number of prediction labels, depends on the employed DM.
We decided to use a simple and fast LS implementation: the Shi approximation [200].
This LS, while retaining the same attractive qualities of standard LSs in terms of accuracy
and flexibility, is way faster than them and is easy to control. In fact, with this LS, only two
values of the speed function F are significant with respect to the curve evolution: positive
and negative. In the former case, the LS will grow (in the direction normal to the contour, as

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


130
FOR AUTOMATIC SEGMENTATION
usual), while in the latter it will shrink. Since the absolute value of the speed is irrelevant, we
can effectively treat the calculation of the speed function as a binary classification problem,
with the obvious advantage of reducing complexity. If a different LS was to be used, the absolute value would be, indeed, relevant. In this case, the driver would rather provide a real
value and the calculation of the speed would be more appropriately tackled as a regression
problem. Apart from the calculation of the speed function, which in our framework is performed by the driver, every other aspect of the LS evolution is kept as in its original proposal
[200], including the smoothing cycle.

3.4.3.

Terms

As shown in Sec. 1.5, many different features can be used to design term for the LS evolution. Apart from edges, that are local in this case, the texture information can be considered
at different levels. We distinguish among statistics calculated over the entire image and over
local portions of it, centered on the LS contour. Indeed, the border of the DM allows us to
calculate the texture features also distinguishing between inside and outside the contour, at
local level. Finally, other terms are relative to the shape of the contour, both at global and local
level. These are due to the localizer and the LS local appearance, respectively. In the following sections we will define which image descriptors and shape related features are included
in our reference implementation.
3.4.3.1.

Haralick

In Sec. 1.5.1, we introduced the Haralick image descriptor. There, a GLCM offset distance
of 1 was chosen. It was shown in literature [10] that the higher these values, the higher the
values dispersion around the main diagonal. This leads to a random-like matrix, caused by
the reduction of the spatial correlation among pixels. We decided, therefore, to keep the
offset distance value to 1.
The first order statistical parameters consider point-wise values. Their computational
complexity is dependent on the computation of the intensity histogram, which is linearly
dependent on the size of the area on which the calculation is performed.
A common image depth in textural analysis is 256 gray levels (1 byte per pixel). For the
calculation of second order statistics, this would result in a 256 256 GLCM matrix. Hence,
apart from the area size, complexity depends on the square of image depth. This is different
from first order statistics, as they have a linear dependence on that value. Moreover, such
a large matrix would probably be composed of a large number of 0-valued cells, making it
a bad approximation of a probability density. Therefore, we decided to quantize intensity
information reducing the number of levels to 32, giving rise to 32 32 GLCMs.
The Haralick features values assume different results depending on the areas the calculation is performed on. Thus, we implemented three different procedures to calculate the
Haralick statistics which are defined as follows.
Plain Haralick This is the most straightforward option. In this case, we define a closed ball
of radius r, centered at a given point p of the LS contour. We calculate the Haralick statistics
introduced in Sec. 1.5.1 on the area inside the border of the ball, that is, the light blue area
of Fig. 3.4(a). To take into account image features at different scales, we define two balls

3.4. A SPECIFIC IMPLEMENTATION OF THE FRAMEWORK FOR MEDICAL IMAGE


SEGMENTATION
131
of different sizes, rsmall and rbig . The values of these two constants are parameters of our
algorithm.

Inside (<0)
p
Outside (>0)
(a) Plain Haralick

Inside (<0)
p
Outside (>0)
(b) Split Haralick

Figure 3.4: The plain and split Haralick terms.


Split Haralick This term is similar to the previous one. However, in this case, we calculate
the Haralick statistics in two separate areas inside the ball: inside and outside of the LS, that
is, the orange and light blue areas of Fig. 3.4(b), respectively.
Global Haralick This term is similar to the split Haralick one. However, in this case, the
features are calculated on the whole internal and external areas of the LS.
3.4.3.2.

Local Chan and Vese

This is a local version [126] of the widely employed Chan and Vese term [33]. The formulation is cvlocal = (I ext )2 (I int )2 , where I is the intensity value of the pixel p, and ext
and int are the mean intensity values of the image areas enclosed by the same balls used for
the Haralick terms, outside or inside of the LS, respectively.
3.4.3.3.

Gabor

In our implementation we consider the image response to the application of 8 different


filters, one for each direction. The size of the complex wave employed for the calculation is
fixed at 21 pixels, for a total of 8 different features.
3.4.3.4.

LBP

A LBP is a string of bits obtained by binarizing a local neighborhood of pixels with respect
to the brightness of the central pixel. With a neighborhoods size of 3 3 pixels, the LBP at
location (x,y) is a string of 8 binary values. Therefore, in this case an LBP is a string of 8
bits and so there are 256 possible LBPs. These are usually too many for a reliable statistics
(histogram) to be computed. Therefore, the 256 patterns are further quantized into a smaller
number of patterns. Considering a uniform quantization, there is one quantized pattern for
each LBP that has exactly a transitions from 0 to 1 and one from 1 to 0 when scanned in
anti-clockwise order, plus one quantized pattern comprising the two uniform LBPs, and one
quantized pattern comprising all the other LBPs. This yields a total of 58 quantized patterns.
In all cases, the employed cell size has been 8 pixels. In our implementation the values of the
features in the term set are the same for the pixels in the same cell.

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


132
FOR AUTOMATIC SEGMENTATION
3.4.3.5.

HOG

The HOG features are widely used for object detection. HOG decomposes an image into
small squared cells, computes an histogram of oriented gradients in each cell, normalizes the
result using a block-wise pattern, and returns a descriptor for each cell. In our implementation we used the variant introduced in [70]. The main difference is that this variant computes
both directed and undirected gradients as well as a four dimensional texture-energy feature,
but projects the result down to 31 dimensions. Also in this case, the employed cell size has
always been 8 pixels. In our implementation the values of the features in the term set are the
same for the pixels in the same cell.
3.4.3.6.

Deformable model-related and local shape

These features are calculated at the same scales (i.e., the same balls) we do for the Haralick
ones. They are as follows.
Pseudo curvature. This is a curvature-like feature and is defined as curv = |int||ext|
|int|+|ext| ,
with curv [1, 1], where |ext| and |int| are the number of pixels in the image areas
enclosed by the ball and outside or inside of the LS, respectively.
Local perimeter. This is the length of the LS contour in the image area inside of the given
ball. Since the Shi LS that we use in our implementation is made up of two lists, we
define this feature as the average of the length of the two lists (inside the ball).
the Shi LS list in which p is contained: it can be either +1, if p Lin , or -1, in case
p Lout .
3.4.3.7.

Node priors

This term is intended as a way to inject some prior knowledge relative to the structures
to be segmented. Given a point p of the LS contour, the node priors term is defined as the
relative shift, in the x and y directions, between p and each of the central points of the localizer
nodes. Mathematically, it is formulated as:
np = {xp xn1 , yp yn1 , xp xn2 , yp yn2 , , xp xnm , yp ynm },
where xp and yp are the x and y coordinates in the image plane of the point p, respectively,
while xk , yk are the coordinates of the center point of localizer node k and m is the total
number of parts of the localizer model.

3.4.4.

Driver

As explained in Sec. 3.3, the driver is a model derived from a general purpose ML method
able to perform supervised learning. Given the nature of the DM we decided to use, the driver
chosen is a binary classifier. Among many possible choices, RF (see Sec. 1.4.3) is a very
attractive one. Apart from being the most extended classifier ensemble found in literature
[122], it is quite fast and easily parallelizable, a strong advantage when dealing with large
datasets. Decision trees are not affected by the modulus of the feature values, a property
that other classifiers (like SVM) do not share. Then, being RF a kind of decision tree, it is not
necessary to perform any sort of scaling of the values of the used features. This fact sensibly
contributes to simplify the implementation of our system.

3.4. A SPECIFIC IMPLEMENTATION OF THE FRAMEWORK FOR MEDICAL IMAGE


SEGMENTATION
133

3.4.5.

Integration mechanism

As already introduced, generating the IVLD is a key point in the proposed framework.
To do so, we had to make some decisions about how to derive it from DI. First of all, the LS
must be initialized in the image plane. We decided to perform three different initializations
as follows:
from small ellipses;
from big ellipses;
from the localizer described in Sec. 3.4.1.
In the first two cases, the LS is initialized as a uniform grid of ellipses in the image plane.
The position of each ellipse is shifted by a small random amount in the x and y directions.
In the small ellipses case, the grid has a 10 10 size and all the ellipses have the same size:
xe = 0.05 X and ye = 0.05 Y , where xe and ye are the ellipses x and y axes, respectively,
and X and Y are the image dimensions. In the big ellipses case, the grid has a 5 5 size
and the size of the ellipses is uniformly sampled in [5, min(X, Y )/8]. Of course, the LS can
gracefully merge some of the ellipses, in case of overlap. Finally, in the third case, the LS is
initialized as circles centered in the output points as provided by the localizer. The size of the
circles is a fraction of the localizer window square. Fig. 3.5 shows an example of the three LS
initializations.

(a) Initialization with small ellipses (b) Initialization with large ellipses

(c) Initialization with localizer

Figure 3.5: LS initialization by means of: (a) small ellipses, (b) big ellipses, (c) localizer. The
localizer output points are highlighted in green.
Once the LS has been initialized, it has to evolve toward its final state, corresponding to
the target object position in the image plane. Since the GT images are available, the LS evolves
taking into account this information. In particular, the LS speed function for a point p is
F (p) =

+1,
1,

if p F
if p B.

This speed function implies that the LS contour will grow (in the direction perpendicular to
the contour) in the areas that are included in the target object, while it will shrink in the areas
that are included in the background. Fig. 3.6 shows an example of this procedure.

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


134
FOR AUTOMATIC SEGMENTATION

Figure 3.6: The LS speed function for the generation of the IVLD. The LS will grow in the
green areas while it will shrink in the yellow one.

The IVLD generation procedure operates as follows. For each point in the LS contour, at
each iteration, the term set values and the corresponding speed, grow (+1) or shrink (-1) are
calculated. So, each example e(p) in the IVLD is described as:
e(p) = {t1 (p), t2 (p), , tm (p), l(p)}.
l(p) =

+1,
1,

if p F
if p B,

(3.5)
(3.6)

where p is a point of the LS contour, tk (p) is the k th term in the term set, and m is the term
set size.
Since IVLD can grow indiscriminately, even in the case of small DIs, we insert a given
example e(p) in IVLD only if the following two conditions hold:
p has not been inserted previously;
r mod k b(p) + 1

< fsaveTerm ,

where r is a random variable uniformly distributed in the interval [0, RAND_MAX], k is a


constant, fsaveTerm is a proper threshold, and b(p) is a bias function, dependent of p. The b(p)
function aims at rising the probability of the point p to be included in the IVLD if p is close to
the object contour. It is defined as b(p) = d(p, C), where C is the set of the points belonging
to the target object contour and d() is the Euclidean distance function. Moreover, the values
of b(p) are normalized in the interval [0, 255] to ensure that its values are not dependent of
the image size. Finally, to take into account which points were already inserted in the IVLD,

3.4. A SPECIFIC IMPLEMENTATION OF THE FRAMEWORK FOR MEDICAL IMAGE


SEGMENTATION
135
we employ the array S(p). This array can have three values:

gray,
S(p) = white,

black,

if p
/ IVLD
if p IVLD and l(p) = +1
if p IVLD and l(p) = 1.

Fig. 3.7 shows the data structures needed to perform the construction of the IVLD.

(a) The target object contour

(b) b(p)

(c) S(p)

Figure 3.7: The data structures to perform the construction of the IVLD: (a) the target object
contour, (b) b(p), i.e. the normalized distance to the contour, (c) S(p), i.e. the points already
introduced in IVLD.
In order to maximize the number of possible examples on the IVLD, we can perform
the evolution procedure several times, starting from different initializations. We perform up
to esmall , elarge , and elocalizer , evolution procedures starting from, respectively, small ellipses,
large ellipses, and localizer output.
As the foreground and background segments in images often differ in size, in most cases
the instances of one class significantly outnumber the ones of the other class. So, due to the
kind of ML task tackled, this becomes an imbalanced classification problem [36,224]. Hence,
the last step in the IVLD creation is the balancement of the number of instances for each class.
This is a common practice in ML and it has been proven that class balancement improves the
accuracy of the classifiers [16, 35, 205]. To do so, we apply a simple undersampling procedure [91, 205] by randomly removing Nmaj Nmin instances of the majority class from the
IVLD, where Nmaj and Nmin are the number of instances of the majority and minority classes,
respectively.
A key aspect of the implementation of our system is the use of a classifier ensemble. In
fact, instead of employing just one classifier to learn from the IVLD, we use Nparts different
classifiers, where Nparts is the number of parts in the localizer model (see Sec. 3.4.1). Since
each part is located in roughly the same area in each image, the rationale of using multiple
drivers is reducing the learning space of each driver, therefore increasing the overall accuracy
of the system. To associate each example e(p) to the appropriate driver, we calculate the
distance of p to each localizer part center. e(p) will be then inserted to the IVLD associated to
the closest driver. Algorithms 5 and 6 show the overall IVLD creation procedure. As showed
in 5, the whole GT guided LS evolution procedure is repeated multiple times for the small

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


136
FOR AUTOMATIC SEGMENTATION
and big ellipses initialization. Since these initializations employ a random component, the
initializations are different each time. Performing these procedures multiple times allows to
increase the diversity in the IVLD, leading to better generalization in the optimal case.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Data: The DI dataset


Result: The IVLD dataset
foreach I DI do
for i = 0; i < nse ; + + i do
initSmallEllipses(I);
evolveIVLD();
end
for i = 0; i < nbe ; + + i do
initBigEllipses(I);
evolveIVLD();
end
initLocalizer(I);
evolveIVLD();
for i = 0; i < Nparts ; + + i do
balanceIVLD(i);
end
end
Algorithm 5: Overall IVLD creation procedure.

Data: The DI dataset


Result: The IVLD dataset
1 repeat
2
foreach point p in level set contour do
3
r = rand();
4
if r mod [k(b(p) + 1)] < fsaveTerm AND p
/ IVLD then
5
e = calculateTermValues(p);
6
d = getAssociatedDriver(p);
7
add e to IVLD(d);
8
end
9
end
10
moveLeveSetContour();
11 until level set converges;
Algorithm 6: evolveIVLD(): IVLD for a single image, given a single initialization.
Once the Nparts IVLDs have been defined, each RF classifier is trained with the associated
IVLD. The output are Nparts different classification models. These models are used in the
prediction phase to guide the LS when segmenting unseen images.
In the prediction phase, the LS is initialized using the output points of the localizer, as
described earlier. For each point of the LS contour, the values of the terms are calculated.
These values form the vector to be predicted by the associate driver, giving as output the
action to be performed. The usual Shi LS stopping conditions hold.

3.5. EXPERIMENTS

3.5.

137

Experiments

To test the performance of our proposal, we applied the developed algorithm to two popular medical image datasets which include the GT. In both cases the evaluation metric employed is the standard Jaccard index [105]. It is defined as:
J=

TP
,
TP + FP + FN

where TP is true positive area (correctly classified as object), FP is false positive area (classified
as object, but in fact background), FN is false negative area (classified as background, but in
fact object), and TN is true negative area (correctly classified as background).
Since we compare against different sets of competitors, we deal with the experimental
design and analysis of the results separately, in the following two sections.

3.5.1.

SCR database: lungs segmentation in chest radiographs

The Segmentation in Chest Radiographs (SCR) database [216] has been established to facilitate comparative studies on segmentation of the lung fields, the heart and the clavicles
in standard posterior-anterior chest radiographs. All chest radiographs are taken from the
Japanese Society of Radiological Technology (JSRT) database [201], which is a publicly available database of 247 PA (that is, front view) chest radiographs. In each image the lung fields,
hart and clavicles have been manually segmented to provide a standard reference. Here, we
only test and compare on the lungs part.
3.5.1.1.

Experimental design

As in [216], the 247 cases in the SCR database were split in two folds. One fold contained
all 124 odd numbered images in the SCR database. The other fold contained the 123 even
numbered images. Images in one fold were segmented with the images in the other fold as
training set, and vice versa.
The parameters employed by our algorithm on this dataset are shown in Table 3.1.
Parameter

Value

fsaveTerm
esmall
elarge
elocalizer
terms
rbig
rsmall
examples per class
Classifier ensemble size

5
5
5
1
NPdistance, haralick, splitHaralick, globalHaralick, HOG, LBP, Gabor
15
5
200000
400

Table 3.1: The parameters used by our system on the SCR dataset.
We compare our proposal against the segmentation results of the 12 algorithms detailed
in [216]. In that work, a comprehensive explanation of the competitors is provided, as well
as the numerous parameters involved. An overall description of each of the 12 algorithms is
given as follows.

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


138
FOR AUTOMATIC SEGMENTATION
Human observer Two observers segmented the objects in each image: a medical student
and a computer science student specialized in medical image analysis. The segmentations
of the first observer were taken as gold standard, while the ones produced by the second
observer were used to compare computerized methods with human performance.
Mean shape It serves as a reference method, since it just calculates the mean shape of each
object in a way independent of the actual image contents.
Active Shape Model The ASM scheme consists of three elements: a global shape model;
a local, multi-resolution appearance model; and a multi-resolution search algorithm. The
implementation the authors employed is introduced in [47].
In [47] suitable values are suggested for all the parameters in the ASM scheme. They are
listed in [216] and they have been used in the current experiments, referred to as ASM default.
After performing pilot experiments varying the parameters range, the authors in [216] kept
the overall best setting, referred as ASM tuned.
Active Appearance Models The AAM segmentation and image interpretation method [45]
is an evolution of the ASM. AAM uses the same input as ASM, a set of training images in
which a list of corresponding points has been indicated. The major difference to ASM is that
an AAM considers all objects pixels, compared to the border representation from ASM, in a
combined model of shape and appearance. The search algorithm is also different. The tested
implementation was based on the freely available C++ AAM implementation described in
[204]. This algorithm is referred as ASM default.
AAM with whiskers To include information about the contour edges into the texture model,
this AAM is characterized by the presence of contour normals pointing outwards on each object, denoted whiskers. It is referred as ASM whiskers.
Refinement of AAM Search Results This algorithm, referred as ASM whiskers BFGS, endows AAMs with a global search to avoid local minima of the cost function. It was optimized
by a quasi-Newton method using the BFGS (Broyden, Fletcher, Goldfarb and Shanno) update
of the Hessian [73].
Pixel classification Pixel classification (PC) is an established technique for IS. In PC, a training and a test stage can be distinguished. The training stage consists of choosing a working
resolution, selecting a number of samples (positions) in each training image, and computing
a set of features (the input) for each sample. An output is associated with each sample. This
output lists to which class this position belongs. The last step is training a classifier with the
input feature vectors and the output.
The test stage consists of computing the features for each pixel in the images (in the test
set), the optional application of the transformation to each feature vector, and, finally, running
the trained classifier on the feature vector to obtain the posterior probabilities that the pixel
belongs to each class.
Furthermore, a post processing step is applied to reduce the grainy appearance at object
boundaries. Subsequently the largest connected object is selected, and holes in this object are
filled. This algorithm is referred as PC post-processed.

3.5. EXPERIMENTS

139

Hybrid approaches Since different segmentation methods providing complementary information were tested in [216], the authors also developed three hybrid segmentation schemes.
The first one, referred as hybrid voting scheme, considers the output of different methods. Since all methods can output hard classification labels for each pixel, the hybrid voting
scheme takes the classification labels of the best performing ASM, AAM and PC scheme and
assigns pixels to objects according to majority voting [120, 124].
A different approach is to take the output of one method as input for another scheme. An
obvious approach is to use the posterior probabilities for each pixel as obtained from the PC
method and convert these into an image where different objects have different gray value.
This probability image is used as input for the ASM segmentation method (this is referred
to as the hybrid ASM/PC method) and the AAM segmentation method (the hybrid AAM/PC
method).
3.5.1.2.

Results and discussion

A summary of the numeric results obtained by our proposal, along with the other 12
algorithms, is shown in Table 3.2. A selection of the resulting segmentations is shown in Fig.
3.8 while Fig. 3.9 shows the distributions of the results in boxplot form.
In general, the left lung is more difficult to segment than the right lung because of the
presence of the stomach below the diaphragm, which may contain air, and the heart border,
which can be difficult to discern [216].
First of all, ASM and AAM produced satisfactory results in most cases, with the tuned
and whiskers variants outperforming their default counterparts. However, occasionally the
segmentation proved to be problematic, especially for the left lung. In fact, in those cases
where the lung and the heart are close and their borders are fuzzy, ASM and AAM followed
different edges, providing unsatisfactory results. In particular, the AAM default algorithm
provided a very low minimum value of its results distribution, as shown in the one before
the last row of Table 3.2.
On the other hand, the PC algorithm approximated well the lung shapes and produced
very satisfying results, close to the human observer or even better, in the case of the postprocessed PC. This is probably due to the key importance of texture analysis in the considered
SCR dataset. The hybrid systems that use the PC output for ASM and AAM outperformed
the direct usage of ASM and AAM. This can be explained by the fact that PC works very well
for lung segmentation and thus provides reliable input to ASM and AAM [216]. Moreover,
the hybrid voting algorithm obtained the best results on this dataset by combining the best
performing ASM, AAM and PC schemes.
As clearly expected, the mean shape algorithm obtained the worst results while the human observer occasionally deviated from the gold standard, probably due to interpretation
errors. Indeed, the human observer ranked fourth, according to the shown results (see Table
3.2).
Finally, the results obtained by our system are encouraging. It outperformed every nonhybrid proposal, ranking second according to the accuracy measure and being one of the
three methods outperforming the human observer. Only the Hybrid voting method provided
slightly better results. Most of the images were successfully segmented by our proposal, with
just a few lower quality cases. In these cases, the results were affected by oversegmentation,
that is, the DM leaked the object borders and ended up overlapping part of the background.
This is probably mostly due to the liquid nature of the LS DM. Given the general scope

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


140
FOR AUTOMATIC SEGMENTATION
of our proposal, we used almost the same components for both SCR and Allen Brain Atlas
problems (see next subsection), avoiding the implementation of specific components. In particular, we are not employing a hard shape prior, strongly enforcing resemblance between
the DM current shape and the prior, which could even improve the accuracy for the specific
case of the SCR dataset.
Lungs

Min

Q1

Median

Q3

Max

Hybrid voting
Our proposal
PC post-processed
Human observer
PC
Hybrid ASM/PC
Hybrid AAM/PC
ASM tuned
AAM whiskers BFGS
ASM default
AAM whiskers
AAM default
Mean shape

0.949 0.020
0.947 0.023
0.945 0.022
0.946 0.018
0.938 0.027
0.934 0.037
0.933 0.026
0.927 0.032
0.922 0.029
0.903 0.057
0.913 0.032
0.847 0.095
0.713 0.075

0.818
0.803
0.823
0.822
0.823
0.706
0.762
0.745
0.718
0.601
0.754
0.017
0.46

0.945
0.943
0.939
0.939
0.931
0.931
0.926
0.917
0.914
0.887
0.902
0.812
0.664

0.953
0.952
0.951
0.949
0.946
0.945
0.939
0.936
0.931
0.924
0.921
0.874
0.713

0.961
0.961
0.958
0.958
0.955
0.952
0.95
0.946
0.94
0.937
0.935
0.906
0.768

0.978
0.974
0.972
0.972
0.968
0.968
0.966
0.964
0.961
0.96
0.958
0.956
0.891

Table 3.2: The results achieved on the SCR dataset by the 13 segmentation algorithms (Jaccard
index).

3.5.2.

Allen Brain Atlas

The Allen Brain Atlas (ABA) [4] contains a genome-scale collection of histological images
(cellular resolution gene-expression profiles) obtained by In Situ Hybridization of serial sections of mouse brains. There is great interest in automated methods to accurately, robustly,
and reproducibly localize the hippocampus in brain images, after discoveries which established its role as an early biomarker for Alzheimers disease and epilepsy [15].
The anatomical structure to segment was the hippocampus and the GT was created manually by an expert in molecular biology. Every image was manually segmented 5 times and,
for each group of 5 manual segmentations, the consensus image was calculated and used as
GT. The typical resolution of ABA images is about 15,000 7,000 pixels and the ROIs one
is about 2,500 2,000 pixels. However, in our case, we employed a rescaled version of the
images of about 600 400 pixels.

3.5.2.1.

Experimental Design

We compare the results of our proposal with the ones obtained in [151]. In that research,
the authors included both deterministic and non-deterministic methods in the comparison,
as well as classic and very recent proposals. In that paper, the authors considered a 22 images
subset of the ABA dataset, composed of 10 training images and 12 test images. They evaluated
the algorithms on the whole 22 images dataset. To compare with them, we provide results
over the whole dataset. However, we also compare the algorithms over the test partition

3.5. EXPERIMENTS

141

(a) JPCLN023

(b) JPCLN025

(c) JPCLN044

(d) JPCLN048

(e) JPCLN049

(f) JPCLN052

(g) JPCLN064

(h) JPCLN067

(i) JPCLN074

(j) JPCLN081

(k) JPCLN115

(l) JPCLN128

(m) JPCLN153

(n) JPCNN030

(o) JPCNN033

(p) JPCNN043

(q) JPCNN047

(r) JPCNN064

(s) JPCNN080

(t) JPCNN091

Figure 3.8: Some results of our proposal on the SCR lungs dataset. Green is TP, red is FP,
yellow is FN, and transparent is TN.

0.6

0.7

0.8

0.9

1.0

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


142
FOR AUTOMATIC SEGMENTATION

e
S
al
er
ult
ed
ing
PC M/PC M/PC tuned
ers efault
FG
hap
erv
pos
efa
isk
vot
ess
B
s
o
d
d
h
S
A
c
r
b
d
s
M
i
o
A
A
r
ans
p
w
o
r
r
M
M
S
e
e
r
b
p
d
d
n
S
A
k
i
i
A
M
u
M
A
A
t
O
br
br
Hy
ma
his
AA
Hy
Hy
pos
Hu
Mw
PC
AA

Figure 3.9: Results obtained on the SCR dataset.


only2 . The parameters employed by our algorithm on this dataset are shown in Table 3.3.
Notice that they are exactly the same ones considered for the previous SCR database (see
Table 3.1 but for the fsaveTerm case.
Parameter

Value

fsaveTerm
esmall
elarge
elocalizer
terms
rbig
rsmall
examples per class
Classifier ensemble size

4
5
5
1
NPdistance, haralick, splitHaralick, globalHaralick, HOG, LBP, Gabor
15
5
200000
400

Table 3.3: The parameters used by our system on the ABA dataset.
An overall description of each of the 8 competitor algorithms is given as follows.
HybridLS This technique [151] is a hybrid LS approach for medical IS. This geometric DM
combines region- and edge-based information with the prior shape knowledge introduced
2
We asked the authors of [151] for the results on each image and we recalculated the statistics separately for
the two subsets.

3.5. EXPERIMENTS

143

using deformable image registration. The proposal consists of two phases: training and test.
The former implies the learning of the LS parameters by means of a GA (see Sec. 1.3). The
latter is the proper segmentation, where another evolutionary algorithm, in this case SS (see
Sec. 1.3.2), derives the shape prior. Finally, HybridLS has the ability to learn optimal parameter settings for every specific dataset. Provided a training set of already segmented images
of the same class, the parameters are learned using a classic ML approach: configurations of
parameters are tested on the training data and the results are compared with the GT to assess
their quality.
Active Shape Models ASMs (and Iterative Otsu Thresholding Method) refined using RF
(ASM + RF) [149, 150]. This method uses a medial-based shape representation in polar coordinates with the objective of creating simple models that can be managed in an easy and fast
manner. Such a parametric model is moved and deformed by an EA, DE [60], according to
an intensity-based similarity function between the model and the object itself. Finally, RF is
applied to expand the segmented area to the regions that were not properly localized.
Soft Thresholding Soft Thresholding (ST) [2] is a recent deterministic method based on relating each pixel in the image to the different regions via a membership function, rather than
through hard decisions. Such a membership function is derived from the image histogram.
Atlas-based deformable segmentation This method [215], called DS, refers to the atlasbased segmentation procedure used in HybridLS to compute the prior. This is actually a
stand-alone segmentation method, therefore it is included in the experimental study as a
representative of registration-based segmentation algorithms.
Geodesic Active Contours This method [32], referred as GAC, employes the Shi LS (Sec.
1.1.2.3) and the speed function of GAC (Sec. 1.1.2.4). In this work, two implementations of
GAC have been tested. The first one uses as initial contour the whole image while the second
one, called DSGAC, employs the segmentation obtained using DS to create the initial contour
of the geometric DM.
Chan&Vese Level Set Model This technique [33], referred as CV, employs the Shi LS (Sec.
1.1.2.3) and the CV speed function (Sec. 1.1.2.4). As in GAC, two implementations have been
tested. The first one uses the whole image as initial contour while the second employs the
segmentation result obtained by DS as the LS initial contour.
3.5.2.2.

Results and discussion

A summary of the numeric results obtained by our proposal, along with the other 8 algorithms, is shown in Tables 3.4 and 3.5, for the whole dataset case and for the test set only,
respectively. The resulting segmentations are shown in Fig. 3.10 (test set) and Fig. 3.11 (training set). On the other hand, Fig. 3.12 shows the distributions of the results in boxplot form
for the whole dataset while Fig. 3.13 depicts the same information but for the test set only.
The DS method has been one of the best-performing algorithms, ranking fourth or third
over the two partitions. This is probably due to the high relevance of the shape information

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


144
FOR AUTOMATIC SEGMENTATION
ABA

Min

Q1

Median

Q3

Max

Our Proposal
Hybrid
ASM+RF
DS
DSGAC
ST
DSCV
CV
GAC

0.845 0.140
0.806 0.109
0.797 0.061
0.787 0.108
0.674 0.172
0.597 0.192
0.538 0.203
0.460 0.242
0.528 0.192

0.398
0.321
0.461
0.342
0.147
0.185
0.058
0.088
0.146

0.819
0.796
0.770
0.760
0.589
0.501
0.332
0.213
0.395

0.884
0.849
0.812
0.829
0.709
0.632
0.618
0.567
0.564

0.940
0.869
0.836
0.852
0.808
0.729
0.695
0.648
0.660

0.960
0.884
0.876
0.877
0.873
0.825
0.804
0.827
0.833

Table 3.4: The results achieved by the 9 segmentation algorithms on the whole ABA dataset.
ABA

Min

Q1

Median

Q3

Max

Hybrid
ASM+RF
DS
Our Proposal
DSGAC
ST
DSCV
CV
GAC

0.800 0.106
0.790 0.056
0.774 0.118
0.764 0.146
0.659 0.130
0.583 0.225
0.548 0.193
0.516 0.187
0.521 0.176

0.478
0.599
0.427
0.398
0.351
0.185
0.058
0.154
0.195

0.774
0.754
0.744
0.728
0.590
0.440
0.531
0.404
0.426

0.850
0.793
0.828
0.823
0.676
0.647
0.628
0.606
0.538

0.866
0.836
0.847
0.839
0.748
0.751
0.671
0.644
0.641

0.881
0.876
0.873
0.899
0.856
0.825
0.712
0.677
0.807

Table 3.5: The results achieved by the 9 segmentation algorithms on ABA test set only.

in this dataset. In fact, both DSCV and DSGAC, two algorithms relying on registration-based
initialization, ranked better than their standard counterparts.
Overall, DSGAC delivered an acceptable performance, ranking fifth in both partitions.
This is remarkable as the regular GAC ranked last in both partitions, a fact that can be explained by the high sensitivity of this algorithm to its initialization.
DSCV ranked seventh in both partitions, performing significantly worse than DSGAC.
This is plausibly due to the nature of the energy function, which is probably too global for
the dataset at hand, even when initialized close to the target object contour. However, it
outperformed the plain CV method, which achieved a bad performance, ranking second to
last in both partitions.
ST ranked sixth in both partitions, consistently between DSGAN and DSCV performance.
This is a remarkable result as it outperformed a method employing registration-based initialization (DSCV) and two extended LSs, CV and GAC. However, being ST based on the image
histogram only, it showed limited ability to cope with complex scenarios.
ASM+RF obtained some of the best results on this dataset, ranking third or second on
the two partitions. However, it is fair to underline its ad-hoc nature, since it was developed
specifically for the ABA dataset. Moreover, it is not able to manage topological changes in a
natural way as geometric DMs can do.
HybridLS has obtained the best results on the test-only partition of this dataset and it
ranked second when measuring its performance on the whole dataset. This is presumably
due to the combination of a well performing prior (i.e., the DS method) and the simultaneous use of visual image cues along with a flexible LS implementation. However, its draw-

3.6. CONCLUSION

145

backs are its long execution times [151] and its complexity, as the method relies on an elaborate registration-based initialization and a separate optimization phase to perform parameter
learning.
Finally, the results obtained by our proposal are encouraging, as its accuracy is close to
the best methods in the comparison. In particular, it ranked fourth on the test partition (with
very close performance to the best three algorithms) and first on the whole dataset. In fact,
most of the images were successfully segmented by our proposal. Some of them, however,
present under or oversegmentations. While the former are probably due to the faint borders
in some areas of the target structures or very similar textures across different structures, the
latter are presumably caused by the somewhat arbitrary (at least to a non expert eye) borders
of the lower part of the upper structure of the Hippocampus. In particular, Fig. 3.10(e) shows
the worst result. In this case, the DM leaks and expands until the gray area around the target
structure. The reason behind this behavior is plausibly attributable to the absence of similar
cases in the small training set considered.

3.6.

Conclusion

In this chapter we introduced an accurate, flexible and automatic IS framework using ML


and DMs. Differently from typical DM-based segmentation algorithms, based on an energy
optimization paradigm, our proposal uses a different type of decision process consisting of
translating the available information into a ML model that is directly used to drive the DM
evolution. The framework is made up of four components: the localizer, the DM, the term set,
and the driver. The localizer is a ML-based image recognition tool able to find a ROI within
the image space. The DM is a model whose final position delineates the segmentation result.
The term set is a set of image-related and DM-related features while the driver is a model
derived from a general purpose ML method that directly guides the DM evolution on the
basis of the values of the features in the term set. An additional component, the integration
mechanism, deals with how the components are connected to each other.
We also provided a reference implementation for the framework aiming at segmenting
different medical image modalities. We chose the part-based object detector introduced in
[78] as localizer, the Shi LS as DM, the RF classifier ensemble as driver, and, finally, a large
set of image descriptors based on edges, texture and shape were inserted in the term set.
We tested our proposal on two different medical image datasets of different modalities.
The obtained results were encouraging. We showed how the proposed framework, using a
ML-based paradigm, is competitive with other state-of-the-art algorithms, showing the advantage of requiring minimal human intervention.

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


146
FOR AUTOMATIC SEGMENTATION

(a) 1300018I05Rik_86_ROI

(b) A030009H04Rik_117_ROI

(c) Atp1b2_2725_ROI

(d) Azin1_96_ROI

(e) Camk2a_102_ROI

(f) Camk2b_102_ROI

(g) Gad1_104_ROI

(h) Mbp_130_ROI

(i) Nmt1_140_34_ROI

(j) Slc17a7_120_ROI

(k) Tubb3_114_ROI

(l) Wars_109_ROI

Figure 3.10: Results of our proposal on the ABA dataset (test set). Green is TP, red is FP,
yellow is FN, and transparent is TN.

3.6. CONCLUSION

147

(a) Atrx_133_32_ROI

(b) Cutl2_30_ROI

(c) Gfap_95_2307_ROI

(d) reference10_ROI

(e) reference12_ROI

(f) reference14_ROI

(g) reference15_ROI

(h) reference16_ROI

(i) reference17_ROI

(j) Zim1_117_ROI

Figure 3.11: Results of our proposal on the ABA dataset (training set). Green is TP, red is FP,
yellow is FN, and transparent is TN.

0.0

0.2

0.4

0.6

0.8

1.0

CHAPTER 3. DEFORMABLE MODELS SUPERVISED GUIDANCE: A NOVEL PARADIGM


148
FOR AUTOMATIC SEGMENTATION

al

pos

ro
ur P

Hy

brid

RF

DS

M+

AS

C
GA

DS

ST

CV

DS

CV

C
GA

0.0

0.2

0.4

0.6

0.8

1.0

Figure 3.12: Results obtained on the ABA dataset on the whole ABA dataset.

brid

Hy

AS

RF

M+

al

DS

os
rop

ur P

C
GA

DS

ST

CV

DS

CV

C
GA

Figure 3.13: Results obtained on the ABA dataset on ABA test set only.

CHAPTER

Conclusions and future work


4.1.

Concluding remarks

In this PhD dissertation we have proposed different automatic methods based on DMs,
SC and ML techniques to solve the IS problem. On the one hand, we introduced an extended
version of the TAN model, the ETAN, providing a solution to the long-standing issues of the
original model. We tested ETANs over a diverse group of image sets against a set of the most
extended parametric and geometric DMs. The quality of the achieved results confirms the
effectiveness of the proposed extensions. ETANs outperformed other state-of-the-art parametric DMs while being competitive with the best performing geometric DMs. Then, we
tackled the tendency of the ETAN to get stuck in local minima in the optimization process
by embedding the DM in a novel SS-based global search framework. We tailored the SS
framework to the segmentation task by designing a set of specific components. We tested the
proposed system over a mix of synthetic and real-world images against a set of TAN/ETANbased global search proposals along with the original ETAN model. The obtained results
were encouraging. Our proposal clearly proved the effectiveness of a global search framework for ETAN optimization by outperforming the state-of-the-art DE-based proposal for
TAN optimization.
On the other hand, we proposed an accurate, flexible, automatic, and general purpose
system for IS employing ML techniques to perform direct guidance of the DM contour. We
provided a reference implementation of the proposed system using LSs as DM, an ensemble classifier as ML method, the part-based object detector introduced in [78] as localizer,
and a proper set of image visual descriptors. Finally, we tested it over two medical imaging
datasets, achieving encouraging results. Our method was competitive with other state-ofthe-art medical IS algorithms while requiring minimal human intervention.
In the following items, the results obtained in this PhD dissertation as well as the fulfillment degree for each of the objectives set up at the beginning of the current work are
analyzed:
Study the state of the art in TAN optimization. After a deep study of the TAN-related literature, its fundamentals, and the main contributions to the topic, we concluded that
the technique has a solid formulation and very promising features. However, we no149

150

CHAPTER 4. CONCLUSIONS AND FUTURE WORK


ticed several pitfalls of both the model and the optimization procedure. On the one
hand, we noticed the lack of a proper external energy definition, leading to suboptimal adjustment of the mesh contour in case of highly convex concavities and complex
shapes. We also noted how the integrity and correctness of the mesh are not enforced
by previous proposals. On the other hand, the design decisions that characterize previous evolutionary-based TAN design proposals neglect the main advantage of a global
search approach. In fact, those proposals failed in defining proper evolutionary operators able to effectively coalesce meshes. This led to the need of huge populations, with
consequent long processing times.
Propose an extension of the TAN model. To solve the long-standing issues of the original
TAN model, we proposed an extended version, the ETAN model. First of all, we introduced an external energy definition based on VFC. It is a wide-range external force
composed of a vector field, derived from discontinuities of the image intensity values,
i.e. edges. From the VFC force field we derived an energy term image suitable for ETAN
adjustment. It is able to attract the DM contour toward highly convex concavities, to let
the model adapt to complex shapes of the target objects. Then, we proposed extensions
to tackle topological changes while enforcing the net integrity. We introduced a new
link cut procedure that takes into account the image features. It also performs several
cuttability tests to enforce the net correctness while changing the mesh topology. We
also introduced a hole opening method based on the ratio of external and internal energies at each node. Moreover, we extended the original BILS node adjustment procedure
by checking the position of misplaced nodes.
The work carried out over the ETAN model resulted in a paper accepted in a JCR journal:
N. Bova, O. Ibez, I. Cordn, Extended topological active nets,. Image and
Vision Computing, 2013, in press. Impact factor 2012: 1.959. Category: Computer
Science, Artificial Intelligence. Order: 25/114. Technical report available at
http://docs.softcomputing.es/public/afe/Tech_rep_2012-06.pdf.
Inclusion of the extended TAN in a convenient global search framework. To solve the TAN
optimization problems noticed in literature, we embedded the proposed ETAN model
in the memetic SS global search framework. This MA relies on solution combinations
and controlled randomizations to avoid the need of huge populations. In fact, in our
system only six individuals are part of the solution population (i.e., the RefSet) at a
given iteration, compared to population sizes of hundreds or even a few thousands individuals which are common in previous evolutionary-based TAN design proposals.
In particular, we developed a new diversification generation method able to focus on
untested and promising regions of the image. Moreover, we defined a problem-specific
diversity function to measure the diversity in the population. Besides, we modified the
ETAN energy function to better fit the introduced global optimization procedure. Finally, to fully take advantage of the introduced global search framework, we developed
two complementary solution combination operators, genotypic and phenotypic, able
to effectively coalesce meshes.
The described ideas were exposed in the following papers, published in a JCR journal
and an international and a national conference proceedings:

4.1. CONCLUDING REMARKS

151

N. Bova, O. Ibez and O. Cordn. Topological active nets optimization: a scatter search-based approach. In the VIII Congreso Espaol de Metaheursticas, Algoritmos Evolutivos y Bioinspirados - MAEB 2012 conference, Albacete, Spain,
February 2012. pp. 503-510.
N. Bova, O. Ibez and O. Cordn. Image segmentation using Extended Topological Active Nets optimized by Scatter Search. IEEE Computational Intelligence
Magazine, 8(1): 16-32 (2013), Impact factor 2012: 4.629. Category: Computer Science, Artificial Intelligence. Order: 5/114.
N. Bova, O. Ibez and O. Cordn. Local and Global optimization of Extended
Topological Active Nets for Image Segmentation. Proceedings of the International
Conference on Medical Imaging using Bio-Inspired and Soft Computing - MIBISOC
2013, Bruxelles, Belgium, May 2013. pp. 123-130.
Study the state of the art in ML application to DMs design. We deeply studied the relevant
literature of the application of ML to DM design, with particular interest in those proposals directly guiding the DM adaptation. We classified the existing proposals composing a taxonomy based on four categories that employ ML to DMs aiming at: model
initialization, external energy term generation, image recognition and understanding,
and direct DM guidance. We noticed how just a handful of proposals belong to the last
family we defined, leaving enough room for improvement to explore this research line.
Propose a ML-based generic framework for IS using DMs. We proposed a generic, automatic
IS framework able to learn the segmentation model from example data. Our framework
is composed of four main components: the localizer, to detect the image ROI; the DM;
the set of terms, that is the image features we take into account; and the driver, a general
purpose ML-based method to perform classification or regression in order to automatically adapt the DM. The main feature of the system is that the driver directly guides the
DM contour adjustment to the object, instead of relying on an optimization procedure.
A fifth component, the integration mechanism, takes into account how the other four
components connect to each other. The presented framework shows a high degree of
flexibility and can be tailored to different IS problems.
Develop an ML-based IS framework implementation for medical imaging. To prove the feasibility of the proposed ML-based IS framework, we provided a specific implementation
for medical IS. We chose an efficient part-based localizer. Then, we decided to employ
a fast LS implementation, the Shi approximation, as DM. With the aim of capturing the
variability of the image characteristics, we grouped a large set of image- and modelrelated features to define the term set. Besides, we chose RF, a fast, accurate and flexible decision tree classifier ensemble as driver. Finally, the integration mechanism was
designed to fully take advantage of the characteristics of the four components while
combining them.
The work carried out for the last three objectives of this PhD dissertation will result in
a paper to be submitted to a medical imaging journal in the short future.
Analyze the performance of the proposed methods. Along this dissertation, we carried out
three different experimentations to assess the performance of the proposed methods.
First, we tested the ETAN model over synthetic and real-world images against a set of

CHAPTER 4. CONCLUSIONS AND FUTURE WORK

152

state-of-the-art parametric and geometric DMs. The obtained results were encouraging
as ETANs outperformed the parametric competitors while being competitive or even
outperforming the geometric ones.
Then, we compared the performance of our SS-based ETAN optimization proposal over
a set of synthetic and medical images with the state-of-the-art algorithm for TAN optimization based on DE, a multi start local search ETAN method, and the local searchbased ETAN. Our SS-ETAN proposal provided promising results as it outperformed
the competitors on the employed image datasets.
Finally, the proposed ML-based DM guidance system was tested on two medical image
datasets against a large set of state-of-the-art segmentation algorithms. The obtained
results were very encouraging as our system showed to be competitive with respect to
the best algorithms on both datasets.

4.2.

Future work

The different algorithms we introduced in this dissertation showed promising segmentation capabilities. Nevertheless, regardless their good performance, our methods still have
room for improvement, as in some cases the obtained segmentations were not fully correct. A
number of different research lines can be opened to improve the proposed IS systems. Next,
we describe some tentative extensions of our proposals that we aim at developing as future
work.
Proposal of a 3D extension of the ETAN model. Many segmentation algorithms, in particular in the medical imaging field, deal with 3D or 4D structures. In the TAN field,
Barreira et al. already introduced TAVs in [13] to perform segmentation of 3D structures. A further development of the current ETAN proposal could be its extension to
3D as done in that work with the original TAN.
Inclusion of a texture-based external energy term for ETANs. As said through this PhD
dissertation (see, for example, Sec. 1.5), texture is one of the important characteristics
used to identify objects or ROIs in an image. To better take advantage of the regionbased segmentation capabilities of the ETAN, texture analysis could be incorporated to
the model. Texture features could be considered by the pre-segmentation phase and/or
by the ETAN itself (analyzing the face areas, that is, the areas delimited by four nodes)
to further improve the segmentation process.
Definition of SCOs with multiple parents for SS-ETAN. The SCOs we introduced in this
dissertation (see Sec. 2.5.6) make use of two parent nets. However, the proposed operators could be easily extended to deal with three or more parents (multi-parent crossover
operators), as done in some GAs [64, 119, 212]. On the one hand, extending the phenotypic SCO (Sec. 2.5.6.2) is pretty straightforward. The only modification needed would
be deriving the binary image for the set of nets considered by the operator and perform
a logic OR operation on all of them. The remaining parts of the operator would be unchanged. On the other hand, extending the genotypic SCO (Sec. 2.5.6.1) would require
the definition of the relative mathematical formulation, which needs to be derived.

4.2. FUTURE WORK

153

Proposal of a 3D extension of the ML-based IS framework implementation. As many IS problems involve the use of 3D images, we advise a 3D extension of the current ML-based IS
framework implementation. Apart from a significant increase in computational needs,
each of the employed image-related tools or features are easily extensible to operate on
3D images. Therefore, the main effort must be done in the considered ML methods,
which must take into account handling big data sets derived from the 3D images, a
current challenge in the area.
Use of different DMs for the implementation of the ML-based IS framework. As pointed out in
Sec. 3.3, our framework is customizable and independent of the choice of the specific
DM to be considered for the segmentation task. Among geometric DMs, the sparse field
approximation (Sec. 1.1.2.3) is attractive due to the provided accuracy-computational
load tradeoff. The choice of this LS, however, would suggest the use of a regressor to
define the driver, instead of a classifier. Conversely, among parametric DMs, the introduced ETAN model would be the preferred choice for its ability to handle topological changes while, at the same time, being endowed with region-based segmentation
capabilities. The easiest control strategy to perform contour adjustment would be to
consider the pixel of each node and its 8-neighborhood. Each of these 9 pixels would
be mapped to a class label. This would transform the problem in a 9-classes classification problem. In this case, ETAN internal energy constraints would be considered as
additional features of the term set.
Consideration of a different driver for the implementation of the ML-based IS framework. RF,
the ML method of choice in the provided reference implementation of the ML-based
IS framework, showed an interesting performance. However, given the current state of
development of the ML field, a vast amount of alternatives can be considered. Among
them, Rotation Forest [187] has some attractive characteristics. In fact, it employs an
embedded Principal Component Analysis [110] to perform automatic feature selection.
This is a desirable property when dealing with problems with a large amount of features as the one at hand. A different alternative could be one of the numerous flavors
of the vastly employed SVM classifier [25,34,51,56]. In this case, we would opt for a fast
implementation, to keep the processing time low. Finally, depending on the nature of
the DM employed in the framework, the use of a regressor (instead of a classifier) could
be a more appropriate choice, as mentioned in the previous item. In that case, the use
of ML methods of a different nature would be required.
Incorporation of new features for the implementation of the ML-based IS framework. To further
improve the accuracy of the current implementation, the results provided by advanced
edge detectors could be introduced in the term set. With this purpose in mind, the use
of the attractive edge detection algorithm described in [143] could be a good choice.
Benchmarking of a different localizer for the implementation of the ML-based IS framework.
Apart from testing different part-based localizers, registration-based localizers could
also be successfully introduced in the current implementation. In particular, the SSbased image registration algorithm proposed in [215] represents an attractive option as
it has been employed to perform IS in [151] with a significant accuracy.
Testing the ML-based IS framework on different medical imaging problems. The presented
ML-based IS framework implementation can also be applied to the large number of

CHAPTER 4. CONCLUSIONS AND FUTURE WORK

154

different medical IS problems found in literature. Among them, active and upcoming
MICCAI grand challenges challenges1 represent one of the best choices for testing on
open problems in medical IS.

ttrr

Conclusiones y trabajos futuros


Conclusiones
En esta tesis doctoral hemos propuesto diferentes mtodos automticos basados en Modelos Deformables (MDs), soft computing y tcnicas de Aprendizaje Automtico (AA) para
resolver el problema de Segmentacin de Imgenes (SI). Por un lado, se introdujo una versin
ampliada del modelo de Malla Topolgica Activa (MTA), la Malla Topolgica Activa Extendida (MTAE), ofreciendo una solucin a los problemas del modelo original. Hemos probado
las MTAEs sobre un grupo diverso de conjuntos de imgenes comparando contra varios de
los MDs paramtricos y geomtricos mas populares. La calidad de los resultados obtenidos
confirma la eficacia de las ampliaciones propuestas. Las MTAEs superaron a otros MDs paramtricos del estado del arte y resultaron ser competitivos con los mejores MDs geomtricos.
A continuacin, abordamos la tendencia de la MTAE a quedar atrapada en mnimos locales
durante el proceso de optimizacin mediante la incorporacin del MD en un novedoso marco de bsqueda global, la Bsqueda Dispersa (BD). Se adapt el marco de BD para la tarea
de segmentacin de MTAEs mediante el diseo de un conjunto de componentes especficos.
Se prob el sistema propuesto sobre un conjunto de imgenes sintticas y reales comparando
contra un conjunto de algoritmos de bsqueda global para MTAs y MTAEs. Los resultados
obtenidos fueron alentadores. Nuestra propuesta demostr claramente la efectividad de un
marco de bsqueda global para optimizacin de MTAEs obteniendo mejores resultados que
la propuesta basada en evolucin diferencial que representa el estado del arte en la optimizacin de MTAs.
Por otra parte, hemos propuesto un sistema preciso, flexible, automtico y de uso general para la SI empleando tcnicas de AA que ejercen un control directo sobre el ajuste del
contorno del MD. Hemos proporcionado una implementacin de referencia del sistema propuesto utilizando level sets como MD, un multi-clasificador como mtodo de AA, el detector
de objetos basado en partes introducido en [78] como localizador y un conjunto adecuado
de descriptores visuales de imgenes. Por ltimo, hemos probado nuestra propuesta sobre
dos conjuntos de datos de imgenes mdicas, con resultados alentadores. Nuestro mtodo
result ser competitivo con otros algoritmos de SI mdicas del estado del arte, requiriendo
una mnima intervencin humana.
En los siguientes puntos, se analizan los resultados obtenidos en esta tesis, as como el
grado de cumplimiento de cada uno de los objetivos fijados al comienzo de la misma:
Estudiar el estado del arte en la optimizacin de MTAs. Despus de un profundo estudio de
la literatura relacionada con las MTAs, sus fundamentos, y las principales contribuciones al tema, llegamos a la conclusin de que la tcnica tiene una formulacin slida y
presenta caractersticas muy prometedoras. Sin embargo, no est carentes de problemas
155

156

CONCLUSIONES Y TRABAJOS FUTUROS


que afectan principalmente al modelo y al procedimiento de optimizacin. Por un lado,
nos dimos cuenta de la falta de una definicin adecuada de la energa externa, lo que
impide un ajuste ptimo del contorno de la malla en el caso de concavidades altamente
convexas y formas complejas de los objetos a segmentar. Tambin observamos cmo la
integridad y la exactitud de la malla no se cumplen en las propuestas anteriores. Por
otra parte, las decisiones de diseo que caracterizan las propuestas evolutivas anteriores basadas en algoritmos evolutivos no aprovechan la principal ventaja de un enfoque
de bsqueda global. En concreto, esas propuestas fracasaron en la definicin de operadores evolutivos adecuados capaces de fusionar mallas correctamente. Esto conlleva
la necesidad de grandes poblaciones, con los consiguientes tiempos de procesamiento
altos.
Proponer una extensin del modelo de MTA. Para resolver los problemas que heredan las
diferentes propuestas existentes del modelo de MTA original, se propuso una versin
ampliada de la misma, el modelo de MTAE. En primer lugar, presentamos una definicin de la energa externa basada en la convolucin de campos vectoriales. Se trata de
una fuerza externa de amplio alcance compuesta de un campo vectorial, derivado de las
discontinuidades de los valores de intensidad de la imagen, es decir, los bordes. Desde
el campo vectorial resultante se deriva una imagen como trmino de energa adecuado
para el ajuste de MTAEs. Este trmino es capaz de atraer el contorno del MD hacia concavidades altamente convexas para permitir el ajuste del modelo a las formas complejas
de los objetos a segmentar. Adems, hemos propuesto extensiones para hacer frente a
los cambios topolgicos que al mismo tiempo tienen en cuenta la integridad de la malla. Hemos introducido un nuevo procedimiento de corte de enlaces que considera las
caractersticas de la imagen a la par que comprueba si es factible hacer dichos cortes a lo
largo del proceso de ajuste de la malla. Tambin hemos introducido un mtodo de apertura de agujeros que opera en funcin del ratio entre energa externa e interna de cada
nodo. Adems, ampliamos el procedimiento de ajuste local controlando, y corrigiendo
si fuera necesario, la posicin de los nodos mal situados.
El trabajo realizado sobre el modelo de MTAE result en un artculo aceptado en una
revista JCR, que actualmente se encuentra en su segunda ronda de revisin:
N. Bova, O. Ibez, I. Cordn, Extended Topological Active Nets, Image and Vision Computing, 2013, en prensa. Factor de impacto 2012: 1.959. Categora: Computer Science, Artificial Intelligence. Orden: 25/114. Informe tcnico disponible en
http://docs.softcomputing.es/public/afe/Tech_rep_2012-06.pdf.
Inclusin de la MTAE en un marco de bsqueda global adecuado. Para resolver los problemas
de optimizacin de MTAs comunes a todas las propuestas previamente publicadas, hemos incorporado el modelo propuesto de MTAE en el marco de bsqueda global denominado BD. Este algoritmo memtico se basa en combinaciones de soluciones y aleatorizaciones controladas para evitar la necesidad de grandes poblaciones. De hecho, en
nuestro sistema la poblacin est compuesta slo por seis individuos, en comparacin
con las poblaciones de cientos o incluso de unos pocos miles de individuos que son
comunes en las propuestas evolutivas anteriores. En particular, hemos desarrollado un
nuevo mtodo de generacin de la diversificacin capaz de concentrarse tanto en las
regiones no exploradas de la imagen como en las ms prometedoras. Por otra parte, se

CONCLUSIONES

157

defini una funcin de diversidad especfica para medir la diversidad de la poblacin.


Adems, hemos modificado la funcin de energa de las MTAEs para adaptarse mejor al procedimiento de optimizacin global. Por ltimo, para aprovechar al mximo
el marco de bsqueda global presentado, hemos desarrollado dos operadores complementarios de combinacin de soluciones, uno genotpico y el otro fenotpico, capaces
de mezclar mallas correctamente.
Las ideas descritas fueron expuestas en los siguientes artculos, publicados en una revista JCR, un congreso internacional y otro nacional:
N. Bova, O. Ibez y O. Cordn. Topological active nets optimization: a scatter
search-based approach. En el VIII Congreso Espaol de metaheursticas, Algoritmos Evolutivos y Bioinspirados - MAEB conferencia de 2012, Albacete, Espaa,
febrero de 2012. pp. 503-510.
N. Bova, O. Ibez y O. Cordn. Image segmentation using Extended Topological
Active Nets optimized by Scatter Search. IEEE Computational Intelligence Magazine, 8(1): 16-32 (2013), Factor de impacto 2012: 4.629. Categora: Computer Science,
Artificial Intelligence. Orden: 5/114.
N. Bova, O. Ibez y O. Cordn. Local and Global optimization of Extended Topological Active Nets for Image Segmentation. Actas de la Conferencia Internacional
sobre Medical Imaging using Bio-Inspired and Soft Computing - MIBISOC 2013,
Bruselas, Blgica, mayo de 2013. pp. 123-130.
Estudio del estado del arte en la aplicacin de AA al diseo de MDs. Se estudi profundamente la literatura relevante sobre la aplicacin de tcnicas de AA al diseo de MDs,
poniendo particular inters en las propuestas que guan directamente la adaptacin del
MD. Hemos clasificado las propuestas existentes dando lugar a una taxonoma basada en cuatro categoras: inicializacin del modelo, generacin de trminos de energa,
reconocimiento y comprensin de imgenes, y adaptacin directa del MD. Esta ltima
familia incluye un conjunto muy reducido de propuestas, dejando suficiente margen
de mejora para explorar esta lnea de investigacin.
Proponer en marco genrico basado en AA para la SI mediante MDs. Propusimos un marco
de SI genrico, automtico y capaz de aprender el modelo de segmentacin a partir de
ejemplos. Nuestra estructura consta de cuatro componentes principales: el localizador,
encargado de detectar las regiones de inters en la imagen; el MD; el conjunto de trminos, es decir, los atributos o caractersticas de la imagen considerados; y el adaptador,
un mtodo genrico de AA de clasificacin o regresin que opera con el fin de adaptar automticamente el MD. La principal caracterstica del sistema es que el adaptador
gua directamente el ajuste del contorno del MD al objeto, en lugar de depender de un
procedimiento de optimizacin. Un quinto componente, el mecanismo de integracin,
determina cmo se conectan los otros cuatro entre s. El marco que se presenta tiene un
alto grado de flexibilidad y se puede adaptar a diferentes problemas de SI.
Desarrollar una implementacin del marco de segmentacin basado en AA para su aplicacin a
imgenes mdicas. Para demostrar la viabilidad del marco de SI basado en AA que hemos
propuesto, presentamos una aplicacin especfica para la SI mdicas. En primer lugar,
hemos seleccionado un localizador eficiente basado en un modelo de partes. As mismo,

CONCLUSIONES Y TRABAJOS FUTUROS

158

hemos empleado un level set rpido, la aproximacin de Shi, como MD. Con el objetivo
de captar la variabilidad de las caractersticas de la imagen, agrupamos un conjunto
de atributos relacionados con las imgenes y con el modelo para definir el conjunto
de trminos. Adems, hemos optado por emplear random forest como adaptador. Este
mtodo es un clasificador rpido, preciso y flexible basado en un conjunto de rboles
de decisin. Por ltimo, el mecanismo de integracin ha sido diseado para aprovechar
al mximo las caractersticas de los cuatro componentes, a travs de una combinacin
apropiada de los mismos.
El trabajo llevado a cabo durante los ltimos tres objetivos de esta tesis se traducir en
un artculo que se someter a una revista de imagen mdica prximamente.
Analizar el rendimiento de los mtodos propuestos. A lo largo de esta tesis doctoral se llevaron a cabo tres experimentos diferentes para evaluar el rendimiento de los mtodos
propuestos. En primer lugar, hemos validado el modelo de MTAE sobre imgenes sintticas y reales comparando contra un conjunto MDs paramtricos y geomtricos del
estado del arte. Los resultados obtenidos fueron alentadores ya que las MTAEs superaron a los competidores paramtricos siendo competitivos o incluso superando a las
geomtricos.
Posteriormente, comparamos nuestra propuesta basada en la BD para la optimizacin
de MTAEs sobre un conjunto de imgenes sintticas y mdicas contra el algoritmo considerado como estado del arte en optimizacin de MTAs (basado en la evolucin diferencial), una bsqueda local multi-arranque basada en MTAEs y una bsqueda local
tambin basada en MTAEs. Nuestra propuesta proporcion resultados prometedores,
ya que super a los competidores en los conjuntos de imgenes empleados.
Por ltimo, el sistema propuesto basado en el ajuste directo del MD por medio de algoritmos de AA se puso a prueba en dos conjuntos de imgenes mdicas en comparacin
con un gran conjunto de algoritmos de segmentacin del estado del arte. Los resultados
obtenidos fueron muy alentadores ya que nuestro sistema result competitivo con los
mejores algoritmos sobre ambos conjuntos de datos.

Trabajos futuros
Los distintos algoritmos que presentamos en esta tesis han demostrado capacidades de
segmentacin prometedoras. Sin embargo, a pesar de su buen rendimiento, nuestros mtodos
an tienen margen de mejora, ya que en algunos casos las segmentaciones obtenidas no han
sido totalmente correctas. Pueden abrirse varias lneas de investigacin para mejorar nuestras
propuestas de sistemas de SI. A continuacin, se describen algunas extensiones posibles de
dichas propuestas que nos planteamos desarrollar como trabajos futuros.
Propuesta de una extensin 3D del modelo de MTAEs. Muchos algoritmos de segmentacin,
en particular en el campo de la imagen mdica, tratan con estructuras 3D o 4D. En relacin con el modelo de las MTAs, Barreira et al. ya introdujeron los volumes topolgicos
activos en [13] para realizar la segmentacin de estructuras tridimensionales. De manera anloga, una posible lnea de investigacin futura sera la extensin del modelo de
MTAEs a 3D.

TRABAJOS FUTUROS

159

Inclusin de un trmino de energa externa basado en la textura para MTAEs. Como ya se


ha comentado en esta tesis doctoral (vase, por ejemplo, Sec. 1.5), la textura es una de
las caractersticas ms importantes en la identificacin de objetos o regiones de inters
en una imagen. Para un mejor aprovechamiento de las capacidades de segmentacin
de regiones de las MTAEs, se podra incorporar el anlisis de texturas al modelo. Las
caractersticas de textura podran considerarse o bien en la fase de pre-segmentacin, o
bien en la propia MTAE (a travs del anlisis de las caras, es decir, las zonas delimitadas
por cuatro nodos), para mejorar an ms el proceso de segmentacin.
Definicin de operadores de combinacin de soluciones con mltiples padres para la BD-MTAE.
Los operadores de combinacin de soluciones que presentamos en esta tesis (vase la
Sec. 2.5.6) hacen uso de dos padres. Sin embargo, los operadores propuestos podran
ampliarse fcilmente para hacer frente a la combinacin de tres o ms padres, como
se hace en algunos algoritmos genticos [64, 119, 212]. Por un lado, la ampliacin del
operador de combinacin de soluciones fenotpico (Sec. 2.5.6.2) es bastante sencilla. La
nica modificacin necesaria sera derivar la imagen binaria para el conjunto de las mallas consideradas y llevar a cabo una operacin lgica OR sobre dicho conjunto. El resto
del operador quedara sin cambios. Por otro lado, la extensin del operador de combinacin de soluciones genotpico (Sec. 2.5.6.1) requerira la definicin de la formulacin
matemtica correspondiente.
Propuesta de una extensin 3D del marco de SI basado en AA. Como muchos de los problemas de SI implican el uso de imgenes 3D, sera interesante una extensin 3D de nuestro
marco de segmentacin basado en AA. Aparte del consiguiente aumento significativo
de las necesidades computacionales, cada uno de los mtodos o atributos empleados
son fcilmente extensibles para operar con imgenes 3D. Por lo tanto, el principal esfuerzo debe hacerse en los mtodos de AA considerados, que deben tener en cuenta el
manejo de grandes conjuntos de datos derivados de las imgenes 3D, un reto actual en
el rea.
Uso de diferentes MDs para la aplicacin del marco de SI basado en AA. Como se ha sealado en la Sec. 3.3, nuestro marco es adaptable e independiente de la eleccin del MD
especfico. Entre los MDs geomtricos, la aproximacin de sparse field de level sets (Sec.
1.1.2.3 ) es atractiva debido al buen balance entre la precisin y la carga computacional. La eleccin de este level set, sin embargo, sugiere el uso de un regresor para definir
el adaptador, en lugar de un clasificador. En cambio, entre los MDs paramtricos, el
modelo de MTAE introducido sera la opcin preferida por su capacidad para manejar
los cambios topolgicos y, al mismo tiempo, estar dotado de capacidades de segmentacin basada en regiones. La estrategia de control ms simple para realizar el ajuste del
contorno sera considerar el pxel de cada nodo y sus 8 vecinos. A cada uno de estos
9 pxeles se le podra asignar una etiqueta de clase. Operando de esta forma, el problema se transformara en un problema de clasificacin con 9 clases. En este caso, las
restricciones del trmino interno de energa de las MTAEs seran consideradas como
atributos adicionales a incluir en el conjunto de trminos.
Consideracin de un adaptador diferente para la aplicacin del marco de SI basado en AA. Random forest, el mtodo de AA que elegimos en la implementacin del marco de SI basado en AA, mostr un rendimiento interesante. Sin embargo, dado el estado actual de

CONCLUSIONES Y TRABAJOS FUTUROS

160

desarrollo del campo de AA, pueden considerarse. Entre ellas, Rotation forest [187] tiene algunas caractersticas interesantes. De hecho, emplea un anlisis de componentes
principales [110] para realizar la seleccin automtica de atributos. Esta es una propiedad deseable cuando se trata de problemas con una gran cantidad de atributos como
el que nos ocupa. Una alternativa diferente podra ser uno de los numerosos tipos del
clasificador SVM [25,34,51,56]. En este caso, podramos optar por una implementacin
rpida, para mantener un tiempo de procesamiento bajo. Finalmente, dependiendo de
la naturaleza del MD empleado en el marco, el uso de un regresor (en lugar de un clasificador ) podra ser una opcin ms apropiada, como se ha mencionado en el punto
anterior. En ese caso, sera necesario el uso de mtodos de AA de diferente naturaleza.
Incorporacin de nuevos atributos para la aplicacin del marco de SI basado en AA. Para mejorar an ms la precisin de la implementacin actual, el resultado proporcionado por
detectores de bordes avanzados podra ser introducido en el conjunto de trminos. El
uso del algoritmo de deteccin de bordes descrito en [143] podra ser una buena opcin
para este propsito.
Uso de un localizador diferente para la aplicacin del marco de SI basado en AA. Adems de
probar diferentes localizadores basados en modelos de partes, tambin podran ser introducidos con xito en la implementacin actual otros basados en el registrado de imgenes. En particular, el algoritmo de registrado de imagen basado en BD propuesto en
[215] representa una opcin atractiva ya que se ha empleado para llevar a cabo la SI en
[151] con una precisin significativa.
Aplicacin del marco de SI basado en AA sobre diferentes problemas de imagen mdica. El
mtodo de SI basado en AA presentado tambin se podra aplicar a gran cantidad de
problemas de segmentacin de imagen mdica que se encuentran en la literatura. Entre
ellos, los grand challenges activos y futuros del congreso MICCAI 2 representan una
de las mejores opciones para probar nuestro mtodo sobre problemas abiertos en la SI
mdicas.

ttrr

Bibliography
[1]

D. Adalsteinsson and J. A. Sethian. A fast level set method for propagating interfaces.
Journal of Computational Physics, 118:269277, 1994.

[2]

S. Aja-Fernandez, G. Vegas-Sanchez-Ferrero, and M. Martin Fernandez. Soft thresholding for medical image segmentation. In Proceedings of the International Conference of
the IEEE Engineering in Medicine and Biology Society (EMBC), pages 4752 4755, 2010.

[3]

K. M. Ali, M. J. Pazzani, and L. Saitta. Error reduction through learning multiple descriptions. In Machine Learning, pages 173202, 1996.

[4]

Allen Institute for Brain Science. Allen Reference Atlases, 20042006.


ttsrr.

[5]

E. Alpaydin. Introduction to Machine Learning. The MIT Press, 2nd edition, 2010.

[6]

L. Alvarez, P. Lions, and J. Morel. Axioms and fundamental equations of image processing. Archive for Rational Mechanics and Analysis, 123:199257, 1993.

[7]

A. Amini, T. Weymouth, and R. Jain. Using dynamic programming for solving variational problems in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence,
12:855867, 1990.

[8]

F. M. Ansia. Automatic 3D shape reconstruction of bones using active nets based segmentation. In Proceedings of the International Conference on Pattern Recognition - Volume
1, Washington, DC, USA, 2000. IEEE Computer Society.

[9]

F. M. Ansia, M. G. Penedo, C. Mario, and A. Mosquera. A new approach to active


nets. Pattern Recognition Image Analysis 2, pages 7677, 1999.

[10]

L. Ascari, C. Morandi, P. Carrai, and G. Adorni. Studio di un sistema di percezione


della qualit per immagini riscalate contenti trame. Masters thesis, Universit degli
Studi di Parma - Facolt di Ingegneria, 1999.

[11]

T. Back, D. B. Fogel, and Z. Michalewicz, editors. Handbook of Evolutionary Computation.


IOP Publishing Ltd., Bristol, UK, UK, 1st edition, 1997.

[12]

D. H. Ballard and C. M. Brown. Computer Vision. 1982.

[13]

N. Barreira and M. G. Penedo. Topological active volumes. EURASIP Journal on Advances in Signal Processing, 13(1):19371947, 2005.
161

162

BIBLIOGRAPHY

[14]

N. Barreira, M. G. Penedo, L. D. Cohen, and M. Ortega. Topological active volumes:


A topology-adaptive deformable model for volume segmentation. Pattern Recognition,
43(1):255266, 2010.

[15]

J. W. Bartlett, L. A. van de Pol, C. T. Loy, R. I. Scahill, C. Frost, P. Thompson, and N. C.


Fox. A meta-analysis of hippocampal atrophy rates in Alzheimers disease. In Neurobiology of aging, pages 17111723. Elsevier, 2009.

[16]

G. E. Batista, R. C. Prati, and M. C. Monard. A study of the behavior of several methods


for balancing machine learning training data. ACM SIGKDD Explorations Newsletter,
6(1):2029, 2004.

[17]

D. F. Bauer. Constructing confidence sets using rank statistics. Journal of the American
Statistical Association, 67(339):687690, 1972.

[18]

S. Belongie, J. Malik, and J. Puzicha. Matching shapes. In Proceedings of the International


Conference on Computer Vision, volume 1, pages 454461, 2001.

[19]

F. Bessy. Ofeli, an open, fast and efficient level set implementation, 2010.
tt.

[20]

J. C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell, MA, USA, 1981.

[21]

C. M. Bishop. Neural networks for pattern recognition. Oxford University Press, USA,
1995.

[22]

P. P. Bonissone. Soft computing: the convergence of emerging reasoning technologies.


Soft Computing, 1(1):618, 1997.

[23]

P. P. Bonissone, Y. Chen, K. Goebel, and P. Khedkar. Hybrid soft computing systems:


Industrial and commercial applications. In Proceedings of the IEEE, Special Issue on Computational Intelligence, volume 87, pages 16411667, 1999.

[24]

G. Borgefors. Distance transformations in arbitrary dimensions. Computer Vision,


Graphics and Image Processing, 27:321345, 1984.

[25]

B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin


classifiers. In Proceedings of the fifth annual workshop on Computational learning theory,
COLT 92, pages 144152, New York, NY, USA, 1992. ACM.

[26]

A. Boulesteix, S. Janitza, J. Kruppa, and I. R. Knig. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. WIREs Data Mining Knowl Discov, 2(6):493507, 2012.

[27]

L. Breiman. Random forests. Maching Learning, 45:532, 2001.

[28]

M. Bro-Nielsen. Active nets and cubes. IMM Technical Report, 1994.

[29]

J. Canny. A Computational Approach to Edge Detection. IEEE Transactions on Pattern


Analysis and Machine Intelligence, 8(6):679698, 1986.

BIBLIOGRAPHY

163

[30]

J. S. Cardoso and L. Corte-Real. Toward a generic evaluation of image segmentation.


IEEE Transactions on Image Processing, 14(11):17731782, 2005.

[31]

V. Caselles. Geometric models for active contours. In Proceedings of the 1995 International
Conference on Image Processing (Vol. 3)-Volume 3 - Volume 3, ICIP 95, Washington, DC,
USA, 1995. IEEE Computer Society.

[32]

V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active contours. International Journal


of Computer Vision, 22(1):6179, 1997.

[33]

T. Chan and L. Vese. Active contours without edges. IEEE Transactions on Image Processing, 10:266277, 2001.

[34]

C. C. Chang and C. J. Lin. LIBSVM: a library for support vector machines, 2006.

[35]

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321357, 2002.

[36]

N. V. Chawla, N. Japkowicz, and A. Kotcz. Editorial: special issue on learning from


imbalanced data sets. ACM SIGKDD Explorations Newsletter, 6(1):16, 2004.

[37]

V. Chinnadurai and G. Damayanti Chandrashekhar. Neuro-levelset system based segmentation in dynamic susceptibility contrast enhanced and diffusion weighted magnetic resonance images. Pattern Recognition, 45(9):35013511, 2012.

[38]

C. A. Cocosco, V. Kollokian, R. K.-S. Kwan, G. B. Pike, and A. C. Evans. Brainweb:


Online interface to a 3d mri simulated brain database. NeuroImage, 5:425, 1997.

[39]

C. A. C. Coello, G. B. Lamont, and D. A. V. Veldhuizen. Evolutionary Algorithms for


Solving Multi-Objective Problems (Genetic and Evolutionary Computation). Springer-Verlag
New York, Inc., Secaucus, NJ, USA, 2006.

[40]

I. Cohen, L. D. Cohen, and N. Ayache. Using deformable surfaces to segment 3-d images and infer differential structures. CVGIP: Image Understanding, 56(2):242263, 1992.

[41]

L. D. Cohen. On active contour models and balloons. CVGIP: Image Understanding,


53:211218, 1991.

[42]

L. D. Cohen and I. Cohen. Finite element methods for active contour models and balloons for 2D and 3D images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15:11311147, 1991.

[43]

T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active Appearance Models. In Proceedings


of the European Conference on Computer Vision, volume 2, pages 484498, 1998.

[44]

T. F. Cootes, G. J. Edwards, and C. J. Taylor. Comparing active shape models with active
appearance models. In Procs. of British Machine Vision Conference, pages 173182, 1999.

[45]

T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. IEEE Transactions on Pattern Analysis And Machine Intelligence, 23(6):681685, 2001.

[46]

T. F. Cootes, A. Hill, C. J. Taylor, J. Haslam, and M. Manchester. The Use of Active Shape
Models For Locating Structures in Medical Images, 1994.

164

BIBLIOGRAPHY

[47]

T. F. Cootes and C. J. Taylor. Statistical models of appearance for medical image analysis
and computer vision. pages 236248, 2001.

[48]

T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active shape models - their


training and application. Computer Vision and Image Understanding, 61(1):3859, 1995.

[49]

T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active shape models-their


training and application. Computer Vision and Image Understanding, 61:3859, 1995.

[50]

O. Cordn, F. Herrera, F. Hoffmann, and L. Magdalena. Genetic Fuzzy Systems: Evolutionary Tuning and Learning of Fuzzy Knowledge Bases, volume 19 of Advances in Fuzzy
SystemsApplications and Theory. World Scientific Publishing Co. Pte. Ltd., 1st edition,
2001.

[51]

C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273297,


1995.

[52]

R. Courant and D. Hilbert. Methods of Mathematical Physics: Volume I. 1989.

[53]

A. Criminisi, D. Robertson, O. Pauly, B. Glocker, E. Konukoglu, J. Shotton, D. Mateus,


A. M. Mller, S. Nekolla, and N. Navab. Anatomy detection and localization in 3d
medical images. In Decision Forests for Computer Vision and Medical Image Analysis, pages
193209. Springer, 2013.

[54]

A. Criminisi, J. Shotton, D. Robertson, and E. Konukoglu. Regression forests for efficient anatomy detection and localization in CT studies. In MCV10: Proceedings of the
2010 international MICCAI conference on Medical computer vision: recognition techniques
and applications in medical imaging. Springer-Verlag, 2010.

[55]

V. R. Cristerna and Y. O. Suarez. Active contours and surfaces with cubic splines for
semiautomatic tracheal segmentation. SPIE Journal of Electronic Imaging, 12(1):8196,
2003.

[56]

N. Cristianini and J. Shawe-Taylor. An introduction to support Vector Machines: and other


kernel-based learning methods. Cambridge University Press, New York, NY, USA, 2000.

[57]

N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In


Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR05) - Volume 1 - Volume 01, CVPR 05, pages 886893, Washington,
DC, USA, 2005. IEEE Computer Society.

[58]

S. Damas, O. Cordn, and J. Santamara. Medical image registration using evolutionary computation: An experimental survey. IEEE Computational Intelligence Magazine,
6(4):2642, 2011.

[59]

P.-E. Danielsson. Euclidean distance mapping. Computer Graphics and Image Processing,
14:227248, 1980.

[60]

S. Das and P. N. Suganthan. Differential Evolution: A Survey of the State-of-the-Art.


IEEE Transactions on Evolutionary Computation, 15:431, 2011.

BIBLIOGRAPHY

165

[61]

R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends
of the new age. ACM Computing Surveys, 40(2):5:15:60, 2008.

[62]

J. G. Daugman. Two-dimensional spectral analysis of cortical receptive field profiles.


Vision Research, 20(10):847856, 1980.

[63]

J. G. Daugman. Uncertainty relation for resolution in space, spatial frequency, and


orientation optimized by two-dimensional visual cortical filters. Journal of the Optical
Society of America A: Optics, Image Science, and Vision, 2(7):11601169, 1985.

[64]

K. Deb, A. Anand, and D. Joshi. A computationally efficient evolutionary algorithm


for real-parameter evolution. Evolutionary Computation 10, 2002.

[65]

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete


data via the em algorithm. Journal of the royal statistical society, series b, 39(1):138, 1977.

[66]

J. Demar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:130, 2006.

[67]

A. Eiben and J. Smith. Introduction to Evolutionary Computation. Springer, Berlin, 2003.

[68]

A. El-Baz, R. U, M. Mirmehdi, and J. Suri. Multi Modality State-of-the-Art Medical Image


Segmentation and Registration Methodologies. Number v. 1 in Multi Modality State-ofthe-art Medical Image Segmentation and Registration Methodologies. Springer, 2011.

[69]

B. S. Everitt, S. Landau, and M. Leese. Cluster Analysis. Wiley Publishing, 4th edition,
2009.

[70]

P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with


discriminatively trained part-based models. Pattern Analysis and Machine Intelligence,
IEEE Transactions on, 32(9):16271645, 2010.

[71]

M. Fenchel, S. Thesen, and A. Schilling. Automatic labeling of anatomical structures in


MR FastView images using a statistical atlas. In Medical Image Computing and ComputerAssisted InterventionMICCAI 2008, pages 576584. Springer, 2008.

[72]

J. M. Fitzpatrick and M. Sonka. "Handbook of Medical Imaging, Volume 2. Medical Image Processing and Analysis (SPIE Press Monograph Vol. PM80)". SPIEThe International
Society for Optical Engineering, 1 edition, 2000.

[73]

R. Fletcher. Practical methods of optimization; (2nd ed.). Wiley-Interscience, New York,


NY, USA, 1987.

[74]

D. A. Forsyth and J. Ponce. Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference, 2002.

[75]

W. T. Freeman and M. Roth. Orientation Histograms for Hand Gesture Recognition. In


In International Workshop on Automatic Face and Gesture Recognition, pages 296301, 1994.

[76]

W. T. Freeman, K. Tanaka, J. Ohta, and K. Kyuma. Computer vision for computer


games. In Proceedings of the 2nd International Conference on Automatic Face and Gesture
Recognition (FG 96), FG 96, pages 100, Washington, DC, USA, 1996. IEEE Computer
Society.

166

BIBLIOGRAPHY

[77]

Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the Second European Conference
on Computational Learning Theory, EuroCOLT 95, pages 2337, London, UK, UK, 1995.
Springer-Verlag.

[78]

V. Gal, E. Kerre, and M. Nachtegael. Multi-Modal Organ Detection with Discriminatively Trained Part Based Models. In Proceedings of IEEE International Symposium on
Biomedical Imaging, 2013.

[79]

S. Ghose, J. Mitra, A. Oliver, R. Mart, X. Llad, J. Freixenet, J. C. Vilanova, J. Comet,


D. Sidib, and F. Meriaudeau. A supervised learning framework for automatic prostate
segmentation in trans rectal ultrasound images. In Proceedings of the 14th international
conference on Advanced Concepts for Intelligent Vision Systems, ACIVS12, pages 190200,
Berlin, Heidelberg, 2012. Springer-Verlag.

[80]

F. Glover. Heuristics for integer programming using surrogate constraints. Decision


Sciences, 1977.

[81]

F. Glover and G. A. Kochenberger, editors. Handbook of Metaheuristics. Kluwer Academic


Publishers, 2003.

[82]

D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning.


Addison-Wesley, 1989.

[83]

D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning.


Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1989.

[84]

R. C. Gonzalez and R. E. Woods. Digital Image Processing. Addison-Wesley Longman


Publishing Co., Inc., Boston, MA, USA, 2nd edition, 2001.

[85]

M. Hall-Beyer. The glcm tutorial home page.


ttrttrt.

[86]

J. Han and M. Kamber. Data mining: concepts and techniques. Morgan Kaufmann, second
edition, 2006.

[87]

L. K. Hansen and P. Salamon. Neural network ensembles. IEEE Transactions on Pattern


Analysis and Machine Intelligence, 12(10):9931001, 1990.

[88]

R. M. Haralick. Statistical and structural approaches to texture. Proceedings of the IEEE,


67(5):786804, 1979.

[89]

R. M. Haralick, K. Shanmugam, and I. Dinstein. Textural Features for Image Classification. IEEE Transactions on Systems, Man, and Cybernetics, 3(6):610621, 1973.

[90]

S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall, 1999.

[91]

H. He and E. A. Garcia. Learning from imbalanced data. IEEE Transactions on Knowledge


and Data Engineering, 21(9):12631284, 2009.

[92]

L. He, Z. Peng, B. Everding, X. Wang, C. Y. Han, K. L. Weiss, and W. G. Wee. A comparative study of deformable contour methods on medical image segmentation. Image
and Vision Computing, 26(2):141163, 2008.

BIBLIOGRAPHY

167

[93]

T. Heimann and H.-P. Meinzer. Statistical shape models for 3d medical image segmentation: a review. Medical Image Analysis, 13:543563, 2009.

[94]

F. Herrera, M. Lozano, and A. M. Snchez. A taxonomy for the crossover operator for
real-coded genetic algorithms: An experimental study. International Journal of Intelligent
Systems, 18:309338, 2003.

[95]

J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press,


Ann Arbor, 1975.

[96]

J. H. Holland. Adaptation in natural and artificial systems. Artificial Intelligence, 1989.

[97]

H. Hoos and T. Sttzle. Stochastic Local Search. Morgan Kaufmann, 2004.

[98]

J. J. Hopfield. Neurons with Graded Response Have Collective Computational Properties like Those of Two-state Neurons. Proceedings of the National Academy of Scientists,
81:30883092, 1984.

[99]

C. Huang, B. Yan, H. Jiang, and D. Wang. Mr image segmentation based on fuzzy cmeans clustering and the level set method. In Proceedings of the 2008 Fifth International
Conference on Fuzzy Systems and Knowledge Discovery - Volume 01, FSKD 08, pages 6771,
Washington, DC, USA, 2008. IEEE Computer Society.

[100] D. P. Huttenlocher, G. A. Klanderman, and W. A. Rucklidge. Comparing images using


the hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence,
15(9):850863, 1993.
[101] O. Ibez, N. Barreira, J. Santos, and M. G. Penedo. Topological active nets optimization
using genetic algorithms. In Proceedings of the Third international conference on Image
Analysis and Recognition - Volume Part I, ICIAR06, pages 272282, Berlin, Heidelberg,
2006. Springer-Verlag.
[102] O. Ibez, N. Barreira, J. Santos, and M. G. Penedo. Genetic approaches for topological
active nets optimization. Pattern Recognition, 42(5):907917, 2009.
[103] O. Ibez. Topological active nets optimization by means of genetic algorithms. Masters thesis, VARPA research group, University of La Corua, 2006.
[104] F. Idris and S. Panchanathan. Review of Image and Video Indexing Techniques. Journal
of Visual Communication and Image Representation, 8(2):146166, 1997.
[105] P. Jaccard. The Distribution of the Flora in the Alpine Zone. New Phytologist, 11(2):37
50, 1912.
[106] B. Jahne and H. Haubecker. Computer Vision and Applications. Academic Press, 2000.
[107] A. K. Jain, Y. Zhong, and M.-P. Dubuisson-Jolly. Deformable template models: a review.
Signal Processing, 71(2):109129, 1998.
[108] J.-S. R. Jang, C.-T. Sun, and E. Mizutani. Neuro-Fuzzy and Soft Computing. Prentice-Hall,
1997.

168

BIBLIOGRAPHY

[109] F. V. Jensen. Bayesian Networks and Decision Graphs. Springer-Verlag New York, Inc.,
Secaucus, NJ, USA, 2001.
[110] I. T. Jolliffe. Principal Component Analysis. Springer, second edition, 2002.
[111] F. Jung, M. Kirschner, and S. Wesarg. A generic approach to organ detection using
3D haar-like features. In Bildverarbeitung fr die Medizin 2013, pages 320325. Springer,
2013.
[112] D. Kainmller, T. Lange, and H. Lamecker. Shape constrained automatic segmentation
of the liver based on a heuristic intensity model. In Proceedings of the MICCAI Workshop
3D Segmentation in the Clinic: A Grand Challenge, pages 109116, 2007.
[113] A. Kanakatte, J. Gubbi, B. Srinivasan, N. Mani, T. Kron, D. Binns, and M. Palaniswami.
Pulmonary tumor volume delineation in pet images using deformable models. Proceedings of the IEEE Conference of Engineering in Medicine and Biology Society, 2008:311821,
2008.
[114] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. International
Journal of Computer Vision, 1(4):321331, 1988.
[115] S. Kichenassamy, A. Kumar, P. Olver, A. Tannenbaum, and A. Yezzi. Conformal curvature flows: From phase transitions to active vision. Archive for Rational Mechanics and
Analysis, 134:275301, 1996. 10.1007/BF00379537.
[116] B. Kimia, A. Tannenbaum, and S. Zucker. Shapes, shocks, and deformations I: The
components of two-dimensional shape and the reaction-diffusion space. International
Journal of Computer Vision, 15:189224, 1994.
[117] R. Kimmel, A. Amir, and A. M. Bruckstein. Finding shortest paths on surfaces using
level sets propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence,
17:635640, 1995.
[118] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by Simulated Annealing.
Science, 220:671680, 1983.
[119] K. S. Kita H. Multi-parental extension of the unimodal normal distribution crossover
for real-coded genetic algorithms. In Proceedings of the international conference on evolutionary computation-99, 1999.
[120] J. Kittler, M. Hatef, R. Duin, and J. Matas. On combining classifiers. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 20(3):226238, 1998.
[121] T. Kohonen. Neurocomputing: foundations of research. chapter Self-organized formation of topologically correct feature maps, pages 509521. MIT Press, Cambridge, MA,
USA, 1988.
[122] L. I. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms.
Interscience, 2004.

Wiley-

[123] M. Laguna and R. Marti. Scatter Search: Methodology and Implementations in C. Kluwer
Academic Publishers, Norwell, MA, USA, 2002.

BIBLIOGRAPHY

169

[124] L. Lam and C. Suen. Application of majority voting to pattern recognition: An analysis
of its behavior and performance. IEEE Transactions on Systems, Man, and Cybernetics,
27:553568, 1997.
[125] H.-C. Lan, T.-R. Chang, W.-C. Liao, Y.-N. Chung, and P.-C. Chung. Knee mr image segmentation combining contextual constrained neural network and level set evolution.
In Proceedings of the 2009 IEEE Symposium on Computational Intelligence in Bioinformatics
and Computational Biology, 2009.
[126] S. Lankton, S. Member, and A. Tannenbaum. A.: Localizing region-based active contours. IEEE Transactions on Image Processing, pages 20292039, 2008.
[127] T. H. Lee, M. F. A. Fauzi, and R. Komiya. Segmentation of CT brain images using Kmeans and EM clustering. In Proceedings of the 2008 Fifth International Conference on Computer Graphics, Imaging and Visualisation, pages 339344, Washington, DC, USA, 2008.
IEEE Computer Society.
[128] T. S. Lee. Image representation using 2d gabor wavelets. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 18:959971, 1996.
[129] B. Li and S. T. Acton. Active contour external force using vector field convolution for
image segmentation. IEEE Transactions on Image Processing, 16(8):20962106, 2007.
[130] B. Li, J. Zhang, L. Tian, L. Tan, S. Xiang, and S. Ou. Intelligent recognition of lung nodule combining rule-based and c-svm classifiers. International Journal of Computational
Intelligence Systems, pages 7692, 2012.
[131] J. Li and N. M. Allinson. A comprehensive review of current local features for computer
vision. Neurocomputing, 71(10-12):17711787, 2008.
[132] S. Li, T. Fevens, A. Krzyzak, and S. Li. An automatic variational level set segmentation
framework for computer aided dental x-rays analysis in clinical environments. Computerized Medical Imaging and Graphics, 30(2):6574, 2006.
[133] Y. Lian and F. Wu. Integrating adaptive probabilistic neural network with level set
methods for mr image segmentation. In Proceedings of the 2011 6th IEEE Conference on
Industrial Electronics and Applications, pages 17461749, 2011.
[134] H. Ling, S. K. Zhou, Y. Zheng, B. Georgescu, M. Suehling, and D. Comaniciu. Hierarchical, learning-based automatic liver segmentation. In Computer Vision and Pattern
Recognition, 2008. CVPR 2008. IEEE Conference on, pages 18, 2008.
[135] Y. Liu and Y. Yu. Interactive image segmentation based on level sets of probabilities.
IEEE Transactions on Visualization and Computer Graphics, 18(2):202213, 2012.
[136] W.-Y. Loh. Classification and regression trees. Wiley Interdisc. Rew.: Data Mining and
Knowledge Discovery, 1(1):1423, 2011.
[137] D. G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the
International Conference on Computer Vision-Volume 2 - Volume 2, ICCV 99, pages 1150,
Washington, DC, USA, 1999. IEEE Computer Society.

170

BIBLIOGRAPHY

[138] G. Luger. Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th
Edition). Pearson Addison Wesley, 2004.
[139] J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In L. M. L. Cam and J. Neyman, editors, Proceedings of the fifth Berkeley Symposium
on Mathematical Statistics and Probability, volume 1, pages 281297. University of California Press, 1967.
[140] J. Malek, A. Sebri, S. Mabrouk, K. Torki, and R. Tourki. Automated breast cancer diagnosis based on gvf-snake segmentation, wavelet features extraction and fuzzy classification. Journal of Signal Processing Systems, 55(1-3):4966, 2009.
[141] R. Malladi, J. Sethian, and B. Vemuri. Shape modeling with front propagation: A level
set approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17:158175,
1995.
[142] J. A. Marshall. Unsupervised learning of contextual constraints in neural networks for
simultaneous visual processing of multiple objects, 1992.
[143] D. R. Martin, C. C. Fowlkes, and J. Malik. Learning to detect natural image boundaries
using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 26(5):530549, 2004.
[144] S. Marelja. Mathematical description of the responses of simple cortical cells. Journal
of the Optical Society of America, 70(11):12971300, 1980.
[145] J. McCarthy. What is artificial intelligence?, 2007.
ttrstrts.
[146] T. Mcinerney and D. Terzopoulos. A dynamic finite element surface model for segmentation and tracking in multidimensional medical images with application to cardiac 4D
image analysis. Computerized Medical Imaging and Graphics, pages 6983, 1995.
[147] T. McInerney and D. Terzopoulos. Deformable models in medical image analysis: a
survey. Medical Image Analysis, 1:91108, 1996.
[148] T. McInerney and D. Terzopoulos. Topology adaptive deformable surfaces for medical
image volume segmentation. IEEE Transactions on Medical Imaging, 18:840850, 1999.
[149] P. Mesejo, R. Ugolotti, S. Cagnoni, F. Di Cunto, and M. Giacobini. Automatic Segmentation of Hippocampus in Histological Images of Mouse Brains using Deformable
Models and Random Forest. In Proceedings of Symposium on Computer-Based Medical
Systems, CBMS 12, 2012.
[150] P. Mesejo, R. Ugolotti, F. D. Cunto, M. Giacobini, and S. Cagnoni. Automatic hippocampus localization in histological images using differential evolution-based deformable
models. Pattern Recognition Letters, 34(3):299 307, 2013.
[151] P. Mesejo, A. Valsecchi, L. Marrakchi-Kacem, S. Cagnoni, and S. Damas. Biomedical
image segmentation using geometric deformable models and metaheuristics. Computerized Medical Imaging and Graphics, 2013.

BIBLIOGRAPHY

171

[152] I. Middleton and R. I. Damper. Segmentation of magnetic resonance images using a


combination of neural networks and active contour models. Medical Engineering and
Physics, 26(1):7186, 2004.
[153] T. M. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition,
1997.
[154] J. Montagnat, H. Delingette, and N. Ayache. A review of deformable surfaces: topology,
geometry and deformation. Image and Vision Computing, 9(14):10231040, 2001.
[155] D. Mumford and J. Shah. Optimal approximations by piecewise smooth functions
and associated variational problems. Communications on Pure and Applied Mathematics, 42(5):577685, 1989.
[156] H. Mller, N. Michoux, D. Bandon, and A. Geissbuhler. A review of content-based
image retrieval systems in medical applications - clinical benefits and future directions.
International Journal of Medical Informatics, 73(1):123, 2003.
[157] N. J. Nilsson. Probabilistic logic. Artificial intelligence, 28(1):7187, 1986.
[158] N. J. Nilsson. Artificial intelligence: a new synthesis. Morgan Kaufmann Publishers Inc.,
San Francisco, CA, USA, 1998.
[159] J. Novo, N. Barreira, M. G. Penedo, and J. Santos. Genetic approaches for the automatic division of topological active volumes. In Proceedings of the 3rd International WorkConference on The Interplay Between Natural and Artificial Computation: Part II: Bioinspired
Applications in Artificial and Natural Computation, IWINAC 09, pages 2029, Berlin, Heidelberg, 2009. Springer-Verlag.
[160] J. Novo, N. Barreira, M. G. Penedo, and J. Santos. Topological active volume 3d segmentation model optimized with genetic approaches. Natural Computing, 2011.
[161] J. Novo, M. G. Penedo, and J. Santos. Optic disc segmentation by means of ga-optimized
topological active nets. In Proceedings of the 5th international conference on Image Analysis
and Recognition, ICIAR 08, pages 807816, Berlin, Heidelberg, 2008. Springer-Verlag.
[162] J. Novo, M. G. Penedo, and J. Santos. Localisation of the optic disc by means of gaoptimised topological active nets. Image and Vision Computing, 27(10):15721584, 2009.
[163] J. Novo, M. G. Penedo, and J. Santos. Evolutionary multiobjective optimization of topological active nets. Pattern Recognition Letters, 31(13):17811794, 2010.
[164] J. Novo, J. Santos, and M. G. Penedo. Differential evolution optimization of 3d topological active volumes. In Proceedings of the 11th international conference on Artificial neural
networks conference on Advances in computational intelligence - Volume Part I, IWANN11,
pages 282290, Berlin, Heidelberg, 2011. Springer-Verlag.
[165] J. Novo, J. Santos, and M. G. Penedo. Multiobjective optimization of the 3d topological
active volume segmentation model. In Proceedings of the 3rd International Conference on
Agents and Artificial Intelligence, ICAART11, 2011.

172

BIBLIOGRAPHY

[166] J. Novo, J. Santos, and M. G. Penedo. Optimization of Topological Active Nets with
Differential Evolution. In Dobnikar, A. and Lotric, U. and Ster, B., editor, Adaptive and
natural computing algorithms, pt i, volume 6593 of Lecture Notes in Computer Science, pages
350360, 2011. 10th International Conference on Adaptive and Natural Computing
Algorithms, Ljubljana, Slovenia.
[167] J. Novo, J. Santos, and M. G. Penedo. Topological active models optimization with
differential evolution. Expert Systems with Applications, 39(15):1216512176, 2012.
[168] J. Novo, J. Santos, M. G. Penedo, and A. Fernandez. Optimization of topological active
models with multiobjective evolutionary algorithms. In Proceedings of the 2010 20th
International Conference on Pattern Recognition, ICPR 10, pages 22262229, Washington,
DC, USA, 2010. IEEE Computer Society.
[169] T. Ojala, M. Pietikinen, and D. Harwood. A comparative study of texture measures
with classification based on featured distributions. Pattern Recognition, 29(1):5159,
1996.
[170] J. Olivier, R. Bon, J.-J. Rousselle, and H. Cardot. Active contours driven by supervised
binary classifiers for texture segmentation. In Proceedings of the 4th International Symposium on Advances in Visual Computing, ISVC 08, pages 288297, Berlin, Heidelberg,
2008. Springer-Verlag.
[171] Y.-S. Ong, M. H. Lim, and X. Chen. Research frontier: memetic computation-past,
present & future. IEEE Computational Intelligence Magazine, 5(2):2431, 2010.
[172] S. J. Osher and R. P. Fedkiw. Level Set Methods and Dynamic Implicit Surfaces. Springer,
2002.
[173] S. J. Osher and J. Sethian. Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics,
79(1):1249, 1988.
[174] N. Otsu. A threshold selection method from gray-level histograms. IEEE Transactions
on Systems, Man and Cybernetics, 9(1):6266, 1979.
[175] N. Paragios. A variational approach for the segmentation of the left ventricle in cardiac
image analysis. International Journal of Computer Vision, 50(3):345362, 2002.
[176] N. Paragios, O. Mellina-Gottardo, and V. Ramesh. Gradient vector flow fast geometric
active contours. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(3):402
407, 2004.
[177] O. Pauly, B. Glocker, A. Criminisi, D. Mateus, A. M. Mller, S. Nekolla, and N. Navab.
Fast multiple organ detection and localization in whole-body MR Dixon sequences. In
Medical Image Computing and Computer-Assisted InterventionMICCAI 2011, pages 239
247. Springer, 2011.
[178] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.

BIBLIOGRAPHY

173

[179] W. Pedrycz. Fuzzy control and fuzzy systems (2nd, extended ed.). Research Studies Press
Ltd., Taunton, UK, UK, 1993.
[180] M. E. Plissiti, D. I. Fotiadis, L. K. Michalis, and G. E. Bozios. An automated method
for lumen and media-adventitia border detection in a sequence of ivus frames. IEEE
Transactions on Information Technology and Biomedicine, 8(2):131141, 2004.
[181] D. Poole, A. Mackworth, and R. Goebel. Computational Intelligence: A Logical Approach.
Oxford University Press, 1998.
[182] C. S. Poon and M. Braun. Image segmentation by a deformable contour model incorporating region analysis. Physics in Medicine and Biology, 1997.
[183] K. V. Price, R. M. Storn, and J. A. Lampinen. Differential Evolution A Practical Approach
to Global Optimization. Natural Computing Series. Springer-Verlag, Berlin, Germany,
2005.
[184] N. Quang Long, D. Jiang, and C. Ding. Application of artificial neural networks in
automatic cartilage segmentation. In Proceedings of the Third International Workshop on
Advanced Computational Intelligence, 2010.
[185] T. Radulescu and V. Buzuloiu. Extended vector field convolution snake for highly nonconvex shapes segmentation. 2009 9th International Conference on Information Technology
and Applications in Biomedicine, (November):14, 2009.
[186] S. H. Rezatofighi, K. Khaksari, and H. Soltanian-Zadeh. Automatic recognition of five
types of white blood cells in peripheral blood. In Proceedings of the 7th international
conference on Image Analysis and Recognition - Volume Part II, ICIAR10, pages 161172,
Berlin, Heidelberg, 2010. Springer-Verlag.
[187] J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619
1630, 2006.
[188] R. Ronfard. Region-based strategies for active contour models. International Journal of
Computer Vision, 13(2):229251, 1994.
[189] M. Rousson, N. Paragios, and R. Deriche. Implicit active shape models for 3d segmentation in mr imaging. In C. Barillot, D. R. Haynor, and P. Hellier, editors, MICCAI (1),
volume 3216 of Lecture Notes in Computer Science, pages 209216. Springer, 2004.
[190] H. Ruppertshofen, C. Lorenz, S. Strunk, P. Beyerlein, Z. Salah, G. Rose, and
H. Schramm. Discriminative Generalized Hough Transform for Localization of Lower
Limbs. Journal of Computer ScienceResearch & Development, 2010.
[191] S. J. Russell, P. Norvig, J. F. Candy, J. M. Malik, and D. D. Edwards. Artificial intelligence:
a modern approach. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996.
[192] K. Sakaue. Stereo matching by the combination of genetic algorithm and active net.
Systems and Computers in Japan, 27(1):22392246, 1996.

174

BIBLIOGRAPHY

[193] G. Sapiro and A. Tannenbaum. Affine invariant scale-space. International Journal of


Computer Vision, 11(1):2544, 1993.
[194] R. Seguier and N. Cladel. Genetic snakes: Application on lipreading. In International
Conference on Artificial Neural Networks and Genetic Algorithms, 2003.
[195] J. Sethian. Level set methods and fast marching methods: evolving interfaces in computational
geometry, fluid mechanics, computer vision, and materials science. Cambridge monographs
on applied and computational mathematics. Cambridge University Press, 1999.
[196] J. A. Sethian. Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science ... on Applied
and Computational Mathematics). 1999.
[197] Y. Shang, A. Markova, R. Deklerck, E. Nyssen, X. Yang, and J. de Mey. Liver segmentation by an active contour model with embedded gaussian mixture model based classifiers, 2010.
[198] L. Shapiro and G. Stockman. Computer vision. Prentice Hall, 2001.
[199] Y. Shi and W. C. Karl. A fast level set method without solving pdes. In Proceedings of the
IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 97100,
2005.
[200] Y. Shi and W. C. Karl. A real-time algorithm for the approximation of level-set-based
curve evolution. IEEE Transactions on Image Processing, 17(5):645656, 2008.
[201] J. Shiraishi, S. Katsuragawa, J. Ikezoe, T. Matsumoto, T. Kobayashi, K. Komatsu, M. Matsui, H. Fujita, Y. Kodera, and K. Doi. Development of a digital image database for chest
radiographs with and without a lung nodule: receiver operating characteristic analysis
of radiologists detection of pulmonary nodules. AJR. American journal of roentgenology,
174(1):7174, 2000.
[202] K. Siddiqi, Y. B. Lauzire, A. Tannenbaum, and S. W. Zucker. Area and length minimizing flows for shape segmentation. In IEEE Computer Society Conference on Pattern
Recognition and Image Processing, pages 621627, 1997.
[203] M. Sonka, V. Hlavac, and R. Boyle. Image Processing, Analysis, and Machine Vision.
Thomson-Engineering, 2007.
[204] M. B. Stegmann, B. K. Ersbll, and R. Larsen. Fame - a flexible appearance modelling
environment. IEEE Transactions on Medical Imaging, 22(10):13191331, 2003.
[205] Y. Sun, A. Wong, and M. Kemel. Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence, 23(4):687719, 2009.
[206] M. M. Swathanthira Kumar and J. M. Sullivan, Jr. Automatic brain cropping enhancement using active contours initialized by a pcnn, 2009.
[207] B. Tanoori, Z. Azimifar, A. Shakibafar, and S. Katebi. Brain volumetry: An active contour model-based segmentation followed by svm-based classification. Computers in
Biology and Medicine, 41(8):619632, 2011.

BIBLIOGRAPHY

175

[208] R. Tedn, J. A. Becerra, and R. J. Duro. Using classifiers as heuristics to describe local
structure in active shape models with small training sets. Pattern Recognition Letters,
2013.
[209] H. Tek and B. Kimia. Volumetric segmentation of medical images by three-dimensional
bubbles. In Proceedings of the Workshop on Physics-Based Modeling in Computer Vision,
1995.
[210] D. Terzopoulos and K. Fleischer. Deformable models. The Visual Computer, 4:306331,
1988.
[211] K. Tsumiyama and K. Yamamoto. Active net: Active net model for region extraction.
Information Processing Society of Japan SIG notes 89(96), 1989.
[212] S. Tsutsui, M. Yamamura, and T. Higuchi. Multi-parent recombination with simplex
crossover in real-coded genetic algorithms. In Genetic and Evolutionary Computation
Conference, 1999.
[213] M. Tuceryan and A. K. Jain. Texture analysis. The Handbook of Pattern Recognition and
Computer Vision (2nd Edition), 1988.
[214] R. Valds-Cristerna, V. Medina-Bauelos, and O. Yez-Surez. Coupling of radialbasis network and active contour model for multispectral brain mri segmentation. IEEE
Transactions on Biomedical Engineering, 51(3):45970, 2004.
[215] A. Valsecchi, S. Damas, J. Santamara, and L. Marrakchi-Kacem. Intensity-based image
registration using scatter search. Technical Report AFE 2012-14, 2012. Submitted to
Artificial Intelligence in Medicine.
[216] B. van Ginneken, M. Stegmann, and M. Loog. Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public
database. Medical Image Analysis, 10(1):1940, 2006.
[217] L. Vese and T. Chan. A multiphase level set framework for image segmentation using
the mumford and shah model. International Journal of Computer Vision, 50(3):271293,
2002.
[218] P. Viola and M. J. Jones. Rapid object detection using a boosted cascade of simple
features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the
2001 IEEE Computer Society Conference on, volume 1, pages I511, 2001.
[219] P. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer
Vision, 57(2):137154, 2004.
[220] L. Wang and D.-C. He. Texture classification using texture spectrum. Pattern Recognition, 23(8):905910, 1990.
[221] W. X. Wang and P. Y. Su. Blood cell image segmentation on color and gvf snake for
leukocyte classification on svm. Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 20(12):27812790, 2012. QC 20130128.

176

BIBLIOGRAPHY

[222] X. Wang, T. X. Han, and S. Yan. An HOG-LBP human detector with partial occlusion
handling. In Computer Vision, 2009 IEEE 12th International Conference on, pages 3239,
2009.
[223] Y. Wang, Z. Liu, and J.-C. Huang. Multimedia content analysis-using both audio and
visual clues. Signal Processing Magazine, IEEE, 17(6):1236, 2000.
[224] G. M. Weiss. Mining with rarity: a unifying framework. ACM SIGKDD Explorations
Newsletter, 6(1):719, 2004.
[225] R. T. Whitaker. Volumetric deformable models: active blobs. In Visualization in Biomedical Computing, 1994.
[226] R. T. Whitaker. A level-set approach to 3d reconstruction from range data. International
Journal of Computer Vision, 29(3):203231, 1998.
[227] A. W. Whitney. A direct method of nonparametric measurement selection. IEEE Transactions on Computers, 20(9):11001103, 1971.
[228] A. Wimmer, J. Hornegger, and G. Soza. Implicit active shape model employing boundary classifier. In Proceedings of the International Conference on Pattern Recognition, pages
14. IEEE, 2008.
[229] A. Wimmer, G. Soza, and J. Hornegger. A generic probabilistic active shape model for
organ segmentation. In Proceedings of the 12th International Conference on Medical Image
Computing and Computer-Assisted Intervention: Part II, MICCAI 09, pages 2633, Berlin,
Heidelberg, 2009. Springer-Verlag.
[230] I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques.
Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, 2nd edition, 2005.
[231] C. Xu and J. L. Prince. Snakes, shapes, and gradient vector flow. IEEE Transactions on
Image Processing, 7(3):359369, 1998.
[232] N. Yabuki, Y. Matsuda, Y. Fukui, and S. Miki. Region detection using color similarity.
In International Symposium on Circuits and Systems (ISCAS 1999), Orlando, Florida, USA,
pages 98101. IEEE, 1999.
[233] O. Yanez-Suarez, R. Valdes-Cristerna, V. Medina, and F. A. Barrios. Rbf network with
cylindrical coordinate features for multispectral mri segmentation, 2001.
[234] A. Yezzi, S. Kichenassamy, A. Kumar, P. Olver, and A. Tannenbaum. A geometric snake
model for segmentation of medical imagery. IEEE Transactions on Medical Imaging,
pages 199209, 1997.
[235] L. Zadeh. Fuzzy sets. Information and Control, 8(3):338353, 1965.
[236] L. Zadeh. Soft computing and fuzzy logic. IEEE Software, 11(6):4856, 1994.
[237] L. Zadeh. What is soft computing? Soft Computing, 1(1):1, 1997.

BIBLIOGRAPHY

177

[238] C. Zhao and T. Zhuang. A hybrid boundary detection algorithm based on watershed
and snake. Pattern Recognition Letters, 26:12561265, 2005.
[239] F. Zhao. Congenital Aortic Disease: 4D Magnetic Resonance Segmentation and Quantitative
Analysis. University of Iowa, 2007.
[240] Y. Zheng, A. Barbu, B. Georgescu, M. Scheuering, and D. Comaniciu. Four-chamber
heart modeling and automatic segmentation for 3-D cardiac CT volumes using
marginal space learning and steerable features. Medical Imaging, IEEE Transactions on,
27(11):16681681, 2008.
[241] S. K. Zhou, J. Zhou, and D. Comaniciu. A boosting regression approach to medical
anatomy detection. In Computer Vision and Pattern Recognition, 2007. CVPR07. IEEE
Conference on, pages 18, 2007.
[242] Y. Zhu and H. Yan. Computerized tumor boundary detection using a Hopfield neural
network. IEEE Transactions on Medical Imaging, 16(1):5567, 1997.
[243] A. P. Zijdenbos, B. M. Dawant, R. A. Margolin, and A. C. Palmer. Morphometric analysis
of white matter lesions in MR images: method and validation. IEEE Transactions on
Medical Imaging, 13(4):716724, 1994.
[244] B. Zitov and J. Flusser. Image registration methods: a survey. Image and Vision Computing, 21:9771000, 2003.

Abbreviations
AAM Active Appearance Model. 35
ABA Allen Brain Atlas. 140
ACM Active Contour Model. 34
AI Artificial Intelligence. 13
ASD Allowable Shape Domain. 35
BG background. 124
BILS Best Improvement Local Search. 30
CAD Computer-Aided Diagnosis. 120
CAT Computed Axial Tomography. 15
CSAC Cubic Spline Active Contour. 117
CV Chan-Vese. 43
DE Differential Evolution. 65
DI Dataset of Images. 123
DM Deformable Model. 15
EA Evolutionary Algorithm. 13
EBILS Extended Best Improvement Local Search. 66
EC Evolutionary Computation. 44
ETAN Extended Topological Active Net. 65
EVFC Extended Vector Field Convolution. 74
FG foreground. 124
FL Fuzzy Logic. 44
179

180
FN False Negative. 137
FP False Positive. 137
FTC Fast Two Cycle. 40
GAC Geodesic Active Contour. 43
GD Gradient Distance. 30
GLCM Gray Level Co-occurrence Matrix. 58
GM Guide Model. 125
GT Ground Truth. 123
GVF Gradient Vector Flow. 25
HOG Histogram of Oriented Gradients. 62
IO In-Out. 29
IOD Distance In-Out. 69
IS Image Segmentation. 14
IVLD Image Vector-Label Dataset. 124
IVUS Intra-Vascular Ultra-Sound. 121
JSRT Japanese Society of Radiological Technology. 137
KNN K-Nearest-Neighbors. 123
LBP Local Binary Pattern. 61
LS Level Set. 16
MA Memetic Algorithm. 65
ML Machine Learning. 13
MLP Multi-Layer Perceptron. 118
MRI Magnetic Resonance Imaging. 15
MSLS Multi Start Local Search. 93
NN Neural Network. 44
PC Pixel classification. 138

ABBREVIATIONS

ABBREVIATIONS
PCA Principal Component Analysis. 35
PDE Partial Differential Equation. 40
PDF Probability Distribution Function. 99
PDM Point Distribution Model. 34
PR Probabilistic Reasoning. 44
RBF Radial Basis Function. 118
RBNNcc Radial Basis Neural Network enhanced with Cylindrical Coordinates. 117
RefSet Reference Set. 48
RF Random Forest. 54
ROI Region Of Interest. 15
SA Simulated Annealing. 121
SC Soft Computing. 13
SCO Solution Combination Operator. 100
SCR Segmentation in Chest Radiographs. 137
SIFT Scale-Invariant Feature Transform. 62
SOM Self-Organizing Map. 116
SS Scatter Search. 19
ST Soft Thresholding. 143
SVM Support Vector Machine. 53
TAN Topological Active Net. 16
TAV Topological Active Volume. 33
TN True Negative. 137
TP True Positive. 137
VD Visual Descriptor. 55
VFC Vector Field Convolution. 26

181

You might also like