You are on page 1of 65

Unit III Robotics

Introduction to image geometry coordinate system:


Mathematical models are often used to describe images and other
signals. It will be a function depending on some variable with
physical meaning; it can be one, two or three dimensional. A scalar
function might be sufficient to describe a monochromatic image,
while vector functions are used to represent colour images consisting
of three component colours.
Functions used may be categorized as continuous , discrete or digital.
A continuous function has continuous domain and range; if the
domain set is discrete, then we get a discrete function, if the range set
is also discrete, then we have a digital function.
Image functions:
By image we shall understand the usual intuitive meaning image on
the human retina or the image captured by TV camera. The image
can be modeled by a continuous function of two or three variables; in
the simple case arguments are coordinates (x,y) in a plane, while if
image changes in time a third variable t might be added.
The image function values corresponds to the brightness values at
image points. The fuction value can express other values as well
(distance from the observer). Brightness integrates different optical
quantities- using brightness as a basic quantity allows to avoid the
description of the very complicated process of image formation.
The image on the retina or on TV camera sensor is intrinsically two
Dimensional (2 D). Such 2D images bearing brightness information
Are called intensity images.

The only information available in an intensity image is the
brightness of the appropriate pixel, which is dependent on number of
independent factors such as object surface reflectance properties
( given by the surface material, microstructure and marking),
illumination properties, and object surface orientation with respect to
viewer and light source. It is very difficult to separate these
components when trying to recover the 3D geometry of an object
from the intensity image. 2D images are used irrespective of original
object being 2D or 3D.
The image formation process disciplines are photometry -brightness
measurement and colorimetry- light reflectance or emission
depending on wavelength.
Image processing quite often deals with static images in which time t
is constant. a monochromatic static image is represented by a
continuous image function f ( x, y) whose arguments are two
coordinates in the plane.
The real world which surrounds us is intrinsically three dimensional.
The 2D intensity image is the result of a perspective projection of the
3D scene, which is modeled by the image by a pin hole camera , as
illustrated in the previous slide. In this figure P has coordinate (x,y,z)
F the focal length and coordinates of projected point is (x,y).
Orthographic projection is a limiting case of perspective projection.

xf yf
x y
z z
' '
= =
If f is made infinity it results in orthographic projection. To bring
the depth of points in 2D image we need a representation that is
independent of view point, and expressed in the coordinate system
of the object rather than of the viewer. If such a representation can
be recovered, then any intensity image view of the object may be
synthesized by standard computer graphics techniques.
Recovering the information lost by the perspective projection is
only one, mainly geometric, problems of computer vision- a
second problem is understanding image brightness.
Computerized image processing uses digital image functions which
are usually represented by matrices so coordinates are integers. The
domain of the image is a region R in the plane
( ) { }
, ,1 ,1
m n
R x y x x y y = ss s s s
where x
m
and y
n
represent maximal image coordinates. The customary
of axis, in n image is in normal cartesian fashion( horizontal x axis ,
vertical although the ( row, column) orientation used in matrices is also
often used in digital image processing.
The range of image function is also limited; by convention in
monochromatic images the lowest value corresponds to black and the
highest to white. Brightness values bounded by these limits are gray-
levels. The quality of digital image grows in proportion to the spatial,
radiometric, and time resolutions. The spatial resolution is given by
proximity of image samples in the image plane; spectral resolution is
give by the band width of the light frequencies captured by the sensor;
radiometric resolution corresponds to the no. of distinguishable gray
levels; the time resolution is given by the interval between time samples
at which images are captured.

The discretization of images from a continuous domain can be achieved
by convolutions with Dirac functions.
The Fourier transform can be a useful way of decomposing image data.
Images are statistical in nature, and it can be natural to represent them as
a stochastic process.

Image digitization:
An image to be processed by computer must be represented using an
appropriate discrete data structure, for example a matrix. An image
captured by a sensor is expressed as continuous function f (x, y) of
two coordinates in the plane. Image digitization means that the
f (x, y) is sampled in to a matrix with M rows and N columns. Image
quantization assigns to each continuous sample an integer value- the
continuous range of the image f (x, y) is split in to K intervals. The
finer the sampling (i e. the larger M and N) and quantization (larger
K) the better the approximation of the continuous image function
f (x, y) achieved. Two important points in sampling are detrmining
sampling period, the distance between two neighbouring sampling
pointsin the image, and geometric arrangement of sampling points
( sampling grid ) should be set.

Sampling:
A continuous image f(x,y) can be sampled using a discrete grid of
sampling points in the plane.( A second possibility is to expand the
image function using some orthonormal function as a base- the
Fourier transform is an example. The coefficients of this expansion
then represents the digitized image.)
The image is sampled at points x=jx, y=ky, for j=1,..... M,
k=1,......N. Two neighbouring sampling points are separated by a
distance x along the x axis and by y along y axis. Distances x
and y are called the sampling interval( on the x or y axis) and the
matrix of samples f(jx,ky) constitutes the discrete image. The
ideal sampling s(x,y) in the regular grid can be represented using
a collection of Dirack distributions


1 1
( , ) ( , ) ................... 1
M N
j k
s x y x j x y k y o
= =
= A A

The sampled image f


s
(x,y) is the product of the continuous image
f(x,y) and sampling function s(x,y)
1 1
( , ) ( , ) ( , )
( , ) ( , ) .................. 2
s
M N
j k
f x y f x y s x y
f x y x j x y k y o
= =
=
= A A

A continuous image is digitized at sampling points. These sampling


points are ordered in the plane, and their geometric relation is called
grid. The digital image is then a data structure usually a matrix.
Grids are in practice usually a square or hexagonal. See next slide.
It is important to distinguish the grid from the raster ; the raster is
the grid on which a neighbourhood relation between points is
defined (more later). One infinitely small sampling point in the grid
corresponds to one picture element (pixel) in the digital image. The
set of pixels together covers the entire image. Pixel is not further
divisible and referred to as point.

Quantization:
A value of the sampled image f
s
(jx, ky) is expressed as digital value
in image processing. The transition between continuous values
Of the image function (brightness) and its digital equivalent is called
quantization. The number of quantization levels should be high enough
to permit human perception of fine shading details in the image.
Most digital image processing devices use the quantization in to k equal
Intervals. If b bits are used to express the value of pixel brightness, then
The number of brightness levels is k=2
b
. Eight bits per pixel are
commonly used, although some systems use six or four bits. A binary
image, which is either black or white can be represented by one bit.
Specialized measuring devices use 12 and more bits per pixel, although
These are becoming more common.
The occurrence of false contours is the main problem in images which
Have been quantized with insufficient brightness levels, less than that
humans can easily distinguish. This number is dependent on many
factors like average local brightness. To avoid this effect normally about
100 intensity levels are provided.
This problem can be reduced when quantization into levels of unequal
length is used, the size of intervals corresponding to less less probable
brightness in the image is enlarged. These are called grayscale trans.,
techniques.
An efficient representation of brightness values in digital images
requires that eight bits, four bits, or one bit are used per pixel, meaning
that one, two or eight pixel brightness can be stored in one byte of
computer memory. Fig in the following slide demonstrate the effect of
reducing the number of brightness levels in an image.
Digital image properties:
A digital image has several properties, both metric and topological
which are somewhat different from those of continuous two dimensional
functions with which we are familiar from basic calculus. Another
feature of difference is human perception of images, since judgment of
image quality is also important.
Metric and topological properties:
A digital image consists of picture elements with finite size these pixels
carry information about the brightness of a particular location in the
image. Usually pixels are arranged in rectangular sampling grid. Such a
digital image is represented by a two dimensional matrix whose elements
are integer numbers corresponding to the quantization levels in the
brightness scale.
Euclidean distance D
E
(i, j), (h, k) =
The advantage of Euclidean distance is that it is obvious, the
disadvantages are costly calculation due to the square root, and its non
integer value.
( ) ( )
2 2
i h j k +
The distance between two points can also be expressed as the
minimum number of elementary steps in the digital grid which are
needed to move from the starting point to the end point. If only
horizontal and vertical steps are allowed the distance is known as D
4


D
4
(i, j), (h, k) = i-h + j-k



If moves in diagonal directions are allowed in the digitization grid,
the distance is called D
8

city block distance
D
8
(i, j), (h, k) = max i-h , j-k

Any of these metrics may be used as the basis of chamfering, in which
the distance of pixels from some large subset ( perhaps describing
some feature) is generated. The resulting image has pixel values of 0
for elements of the relevant subset, low values for close pixels, and
then high values for pixels remote from it- the appearance of this array
gives the name to the technique.
chess board distance
Pixel adjacency is another important concept in digital images. Any
two pixels are called 4- neighbours if they have D
4
=1 from each
other. Analogously, 8-neighbours are two pixels with D
8
=1. 4-
neighbours and 8- neighbours are illustrated below.
If there is a path between two pixels in the image , these pixels are
called contiguous. A region in the image is a contiguous set. The
compliment set R
c
which is contiguous with the image limits is
called background and the rest of the compliment R
c
is called holes.
If there are no holes in a region we call it simply contiguous region.
A region with holes is called multiply continugous. some regions in
the image are called objects.
The brightness of a pixel is a very simple property which can be
used to find objects in some images. If a point is darker than some
fixed value ( threshold), then it belongs to the object. All such points
which are also contiguous constitute one object. A hole consists of
points which do not belong to the object are surrounded by the
object, and all other points constitute background. An example is the
blue typed text on this slide in which the letters are objects. White
areas surrounded by the letters are holes( inside o)
Other white parts of the slide are background.
The border of a region is another important concept in image
analysis. The border of a region R is a set of pixels within the
region that have one or more neighbours outside R. The definition
corresponds to an intuitive understanding of the border as a set of
points at the limit of the region. This definition of border is
sometimes referred to as inner border to distinguish it from outer
border, that is the border of the background ( its compliment) of
the region.

An edge is a further concept. This is a local property of a pixel and
its immediate neighbourhood- it is a vector given by a magnitude
and direction. Images with many brightness levels are used for
edge computation and the gradient of the image function is used to
compute edges. The edge direction is perpendicular to the gradient
direction which points in the direction of image function growth.
Note that there is difference between border and edge. The border
is a global concept related to a region, while edge expresses local
properties of an image function.
The border and the edges are related as well. One possibility for
finding boundaries is chaining the significant edges ( points with
high gradient of the image function.
The edge property is attached to one pixel and its neighbourhood-
some times it is of advantage to asses properties between pairs of
neighbouring pixels, and the concept of the crack edge comes from
this idea. Four crack edges are attached to each pixel, which are
defined by its relation to its 4-neighbours. The direction of crack
edge is that of increasing brightness, and is a multiple of 90, while
magnitude is the absolute difference between the brightness of the
relevant pair of pixels. Crack edges are illustrated below.
Images have topological properties that are invariant to rubber sheet
transformations.
The convex hull of a region may be described as the minimal
convex subset containing it.
The brightness histogram is a global descriptor of image intensity.
Understanding Human visual perception is essential for design of
image display.
Human visual perception is sensitive to contrast , acuity, border
perception and colour. each of these may provoke visual paradoxes.
Live images are prone to noise. It may be possible to measure its
extent quantitatively.
White, Gaussian, impulsive, and salt and pepper are common
models.
Signal to noise ratio is a measure of image quality.
Colour images:
Colour is a property of enormous importance to human visual
perception, but historically it has not been particularly used in
digital image processing. The reason for this has been the cost
of suitable hardware, but since the 1980s this has declined
sharply, and colour images are now routinely accessible via TV
Cameras or scanners. The large memory requirement has also
been solved because of reduction in cost of memory. Colour
display is of course the default in most computer systems. It is
useful because monochromatic images may not contain enough
information for many applications, while colour or multi
spectral images can often help.
Colour is connected with the ability of objects to reflect electro
magnetic waves of different wave lengths; the chromatic
spectrum spans the electro magnetic spectrum from
approximately 400 nm to 700 nm. RGB model.
A particular pixel may have associated with it a three dimensional
vector (r,g.b) which provides the respective colour intensities,
where (0,0,0) is black and (k, k, k) is white, (k,0,0) is pure red,
and so on- k here is the quantization granularity for each primary
( 256 is common). This implies a colour space of k
3
distinct
colours (2
24
if k=256) primary (256 is common) which not all
displays, particularly older ones, can accomadate. For this reason
For display purposes, it is common to specify a subset of this
space that is actually used; this subset is often called a pallet.
The R G B may be thought of as a 3D coordinatization of colour
space ( fig. in previous slide); note secondary colours which are
combinations of two pure primaries.
Most image sensors provide data according to this model; the image
can be captured by several sensors, each of which is sensitive to a
rather narrow band of wavelengths, and the image function at the
sensor output is just as in the simple case. Each spectral band is
digitized independently and represented by an individual digital
image function as if it were a monochromatic image.
Other colour models turnout to be equally important. CMY-cyan,
magenta, and yellow which is based on the secondaries and is
used to construct a subtractive colour scheme.
The Y I Q model (or I Y Q ) is useful in colour TV broadcasting,
and is a simple linear transform of an R G B representation.
The alternative model of most relevance to image processing is HSI
(or IHS) Hue, Saturation and intensity. Hue refers to the perceived
colour (technically the dominant wavelength for example purple or
orange, ans saturation measures its dilution by white light
Hue refers to the perceived colour (technically the dominant wavelength
for example purple or orange, ans saturation measures its dilution by
white light giving rise to light purple or dark purple.
HSI decouples intensity information from the colour, while hue and
saturation correspond to human perception, thus making this
representation very useful for developing image processing algorithms.

Morphology:
Morphological operators:
Contents
Dilation - grow image regions

Erosion - shrink image regions

Opening - structured removal of image region boundary pixels

Closing - structured filling in of image region boundary pixels

Hit and Miss Transform- image pattern matching and marking

Thinning - structured erosion using image pattern matching

Thickening - structured dilation using image pattern matching

Skeletonization / Medial Axis Transform- finding skeletons of
binary regions
Dilation, Common Names: Dilate, Grow, Expand

Brief Description:
Dilation is one of the two basic operators in the area of mathematical
morphology, the other being erosion.

It is typically applied to binary images, but there are versions that work
on grayscale images. The basic effect of the operator on a binary image
is to gradually enlarge the boundaries of regions of foreground pixels
(i.e. white pixels, typically). Thus areas of foreground pixels grow in size
while holes within those regions become smaller.
How It Works
Useful background to this description is given in the mathematical
morphology section of the Glossary.
The dilation operator takes two pieces of data as inputs.
The first is the image which is to be dilated. The second is a
(usually small) set of coordinate points known as a structuring element
(also known as a kernel). It is this structuring element that determines the
precise effect of the dilation on the input image.
The mathematical definition of dilation for binary images is as follows:
Suppose that X is the set of Euclidean coordinates corresponding to the
input binary image, and that K is the set of coordinates for the structuring
element.
Let Kx denote the translation of K so that its origin is at x.
Then the dilation of X by K is simply the set of all points
x such that the intersection of Kx with X is non-empty.
The mathematical definition of grayscale dilation is identical except
for the way in which the set of coordinates associated with the input
image is derived. In addition, these coordinates are 3-D rather than 2-D.
As an example of binary dilation, suppose that the structuring element
is a 33 square, with the origin at its center, as shown in Figure 1. Note
that in this and subsequent diagrams, foreground pixels are represented
by 1's and background pixels by 0's.





Figure 1 A 33 square structuring element
To compute the dilation of a binary input image by this structuring
element, we consider each of the background pixels in the input image
in turn. For each background pixel (which we will call the input pixel)
we superimpose the structuring element on top of the input image so
that the origin of the structuring element coincides with the input pixel
position. If at least one pixel in the structuring element coincides with
a foreground pixel in the image underneath, then the input pixel is set
to the foreground value. If all the corresponding pixels in the image
are background, however, the input pixel is left at the background value.
For our example 33 structuring element, the effect of this operation is
to set to the foreground color any background pixels that have a
neighboring foreground pixel (assuming 8-connectedness). Such pixels
must lie at the edges of white regions, and so the practical upshot is that
foreground regions grow (and holes inside a region shrink). Dilation is
the dual of erosion i.e. dilating foreground pixels is equivalent to eroding
the background pixels.

Guidelines for Use
Most implementations of this operator expect the input image to be binary,
usually with foreground pixels at pixel value 255, and background pixels
at pixel value 0. Such an image can often be produced from a grayscale
image using thresholding. It is important to check that the polarity of the
input image is set up correctly for the dilation implementation being used.
The structuring element may have to be supplied as a small binary image,
or in a special matrix format, or it may simply be hardwired into the
implementation, and not require specifying at all. In this latter case, a 33
square structuring element is normally assumed which gives the expansion
effect described above. The effect of a dilation using this structuring element
on a binary image is shown in Figure 2.



Figure 2 Effect of dilation using a 33 square structuring element

The 33 square is probably the most common structuring element
used in dilation operations, but others can be used. A larger
structuring element produces a more extreme dilation effect, although
usually very similar effects can be achieved by repeated dilations using
a smaller but similarly shaped structuring element. With larger
structuring elements, it is quite common to use an approximately disk
shaped structuring element, as opposed to a square one.
The image

shows a thresholded image of

The basic effect of dilation on the binary is illustrated in

This image was produced by two dilation passes using a disk shaped
structuring element of 11 pixels radius. Note that the corners have been
rounded off. In general, when dilating by a disk shaped structuring
element, convex boundaries will become rounded, and concave boundaries
will be preserved as they are.
Dilations can be made directional by using less symmetrical structuring
elements. e.g. a structuring element that is 10 pixels wide and 1 pixel
high will dilate in a horizontal direction only. Similarly, a 33 square
structuring element with the origin in the middle of the top row rather
than the center, will dilate the bottom of a region more strongly than
the top.
Grayscale dilation with a flat disk shaped structuring element will
generally brighten the image. Bright regions surrounded by dark regions
grow in size, and dark regions surrounded by bright regions shrink in size.
Small dark spots in images will disappear as they are `filled in' to the
surrounding intensity value. Small bright spots will become larger spots.
The effect is most marked at places in the image where the intensity
changes rapidly and regions of fairly uniform intensity will be largely
unchanged except at their edges.
Figure 3 shows a vertical cross-section through a graylevel image and
the effect of dilation using a disk shaped structuring element.





Figure 3 Graylevel dilation using a disk shaped structuring element.
The graphs show a vertical cross-section through a graylevel image.
Erosion

Common Names: Erode, Shrink, Reduce

Brief Description
Erosion is one of the two basic operators in the area of mathematical
morphology, the other being dilation. It is typically applied to binary
images, but there are versions that work on grayscale images.
The basic effect of the operator on a binary image is to erode away
the boundaries of regions of foreground pixels (i.e. white pixels, typically).
Thus areas of foreground pixels shrink in size, and holes within those areas
become larger.
How It Works
Useful background to this description is given in themathematical
morphology section of the Glossary.
The erosion operator takes two pieces of data as inputs. The first is
the image which is to be eroded. The second is a (usually small) set
of coordinate points known as a structuring element
(also known as a kernel).It is this structuring element that determines
the precise effect of the erosion on the input image.
The mathematical definition of erosion for binary images is as follows:
Suppose that X is the set of Euclidean coordinates corresponding to the
input binary image, and that K is the set of coordinates for the structuring
element.
Let Kx denote the translation of K so that its origin is at x.
Then the erosion of X by K is simply the set of all points x such that Kx
is a subset of X.
The mathematical definition for grayscale erosion is identical except
in the way in which the set of coordinates associated with the input
image is derived. In addition, these coordinates are 3-D rather than
2-D. As an example of binary erosion, suppose that the structuring
element is a 33 square, with the origin at its center as shown in Figure 1.
Note that in this and subsequent diagrams, foreground pixels are
represented by 1's and background pixels by 0's.





Figure 1 A 33 square structuring element
To compute the erosion of a binary input image by this structuring
element, we consider each of the foreground pixels in the input
image in turn. For each foreground pixel (which we will call the
input pixel) we superimpose the structuring element on top of the
input image so that the origin of the structuring element coincides
with the input pixel coordinates. If for every pixel in the structuring
element, the corresponding pixel in the image underneath is a foreg
round pixel, then the input pixel is left as it is. If any of the
corresponding pixels in the image are background, however, the input
pixel is also set to background value.
For our example 33 structuring element, the effect of this operation
is to remove any foreground pixel that is not completely surrounded
by other white pixels (assuming 8-connectedness). Such pixels must
lie at the edges of white regions, and so the practical upshot is that
foreground regions shrink (and holes inside a region grow).
Erosion is the dual of dilation, i.e. eroding foreground pixels is
equivalent to dilating the background pixels.

Hit-and-Miss Transform

Common Names: Hit-and-miss Transform, Hit-or-miss Transform

Brief Description
The hit-and-miss transform is a general binary morphological operation
that can be used to look for particular patterns of foreground and
background pixels in an image. It is actually the basic operation of
binary morphology since almost all the other binary morphological
operators can be derived from it. As with other binary morphological
operators it takes as input a binary image and a structuring element,
and produces another binary image as output.

How It Works
The structuring element used in the hit-and-miss is a slight extension to the type that has
been introduced for erosion and dilation, in that it can contain both foreground and
background pixels, rather than just foreground pixels, i.e. both ones and zeros. Note
that the simpler type of structuring element used with erosion and dilation is often
depicted containing both ones and zeros as well, but in that case the zeros really stand
for `don't care's', and are just used to fill out the structuring element to a convenient
shaped kernel, usually a square. In all our illustrations, these `don't care's' are shown as
blanks in the kernel in order to avoid confusion. An example of the extended kind of
structuring element is shown in Figure 1. As usual we denote foreground pixels using
ones, and background pixels using zeros.








Figure 1 Example of the extended type of structuring element used in
hit-and-miss operations. This particular element can be used to find
corner points, as explained below.
The hit-and-miss operation is performed in much the same way as
other morphological operators, by translating the origin of the structuring
element to all points in the image, and then comparing the structuring
element with the underlying image pixels. If the foreground and
background pixels in the structuring element exactly match foreground
and background pixels in the image, then the pixel underneath the origin
of the structuring element is set to the foreground color. If it doesn't
match, then that pixel is set to the background color.
For instance, the structuring element shown in Figure 1 can be used to find
right angle convex corner points in images. Notice that the pixels in the
element form the shape of a bottom-left convex corner. We assume that the
origin of the element is at the center of the 33 element. In order to find all
the corners in a binary image we need to run the hit-and-miss transform
four times with four different elements representing the four kinds of right
angle corners found in binary images. Figure 2 shows the four different
elements used in this operation.


Figure 2 Four structuring elements used for corner finding in binary
images using the hit-and-miss transform. Note that they are really
all the same element, but rotated by different amounts.
After obtaining the locations of corners in each orientation, We can then
simply OR all these images together to get the final result showing the
locations of all right angle convex corners in any orientation.
Figure 3 shows the effect of this ccorner detection on a simple binary
image.





Figure 3 Effect of the hit-and-miss based right angle convex corner
detector on a simple binary image. Note that the `detector' is rather sensitive.

Implementations vary as to how they handle the hit-and-miss transform at
the edges of images where the structuring element overlaps the edge of
the image. A simple solution is to simply assume that any structuring
element that overlaps the image does not match underlying pixels, and
hence the corresponding pixel in the output should be set to zero.
The hit-and-miss transform has many applications in more complex
morphological operations. It is being used to construct the thinning and
thickening operators, and hence for all applications explained in these
worksheets.
Guidelines for Use
The hit-and-miss transform is used to look for occurrences of particular
binary patterns in fixed orientations. It can be used to look for several
patterns (or alternatively, for the same pattern in several orientations as
above) simply by running successive transforms using different structuring
elements, and then ORing the results together.
The operations of erosion, dilation, opening, closing, thinning and
thickening can all be derived from the hit-and-miss transform in
conjunction with simple set operations.
Figure 4 illustrates some structuring elements that can be used for locating
various binary features.





Figure 4 Some applications of the hit-and-miss transform. 1 is used to
locate
isolated points in a binary image. 2 is used to locate the end points on a
binary skeleton Note that this structuring element must be used in all its
rotations so four hit-and-miss passes are required. 3a and 3b are used to
locate the triple points (junctions) on a skeleton. Both structuring elements
must be run in all orientations so eight hit-and-miss passes are required.



Skeletonization/Medial Axis Transform

Common Names: Skeletonization, Medial axis transform

Brief Description
Skeletonization is a process for reducing foreground regions in a binary
image to a skeletal remnant that largely preserves the extent and
connectivity of the original region while throwing away most of the
original foreground pixels. To see how this works, imagine that the
foreground regions in the input binary image are made of some uniform
slow-burning material. Light fires simultaneously at all points along the
boundary of this region and watch the fire move into the interior. At points
where the fire traveling from two different boundaries meets itself, the fire
will extinguish itself and the points at which this happens form the so
called `quench line'. This line is the skeleton. Under this definition it is
clear that thinning produces a sort of skeleton.
Another way to think about the skeleton is as the loci of centers of
bi-tangent circles that fit entirely within the foreground region being
considered. Figure 1 illustrates this for a rectangular shape.





Figure 1 Skeleton of a rectangle defined in terms of bi-tangent circles.
The terms medial axis transform (MAT) and skeletonization are often
used interchangeably but we will distinguish between them slightly.
The skeleton is simply a binary image showing the simple skeleton.
The MAT on the other hand is a graylevel image where each point on
the skeleton has an intensity which represents its distance to a boundary
in the original object.
How It Works
The skeleton/MAT can be produced in two main ways. The first is to use
some kind of morphological thinning that successively erodes away pixels
from the boundary (while preserving the end points of line segments) until
no more thinning is possible, at which point what is left approximates the
skeleton. The alternative method is to first calculate the distance transform
of the image. The skeleton then lies along the singularities (i.e. creases or
curvature discontinuities) in the distance transform. This latter approach is
more suited to calculating the MAT since the MAT is the same as the
distance transform but with all points off the skeleton suppressed to zero.
Note: The MAT is often described as being the `locus of local maxima' on
the distance transform. This is not really true in any normal sense of the
phrase `local maximum'. If the distance transform is displayed as a 3-D
surface plot with the third dimension representing the grayvalue, the MAT
can be imagined as the ridges on the 3-D surface.
Lighting techniques:
Structured illumination:
One of the characteristics of robot vision that sets it apart from
artificial vision in general is the ability of the user, in most
instances, to carefully structure the lighting in the viewing area.
Optimal lighting is a very inexpensive way to increase the
reliability and accuracy of a robot vision system. In addition with
the use of structured lighting, three dimensional information about
the object, including complete height profile can be obtained.
Light sources:
Perhaps the most effective form of lighting, when it is feasible is
back lighting. For a scene that uses back lighting, the illumination
comes from behind the scene, so that objects appear as silhouttes as
shown in the next slide. In this case, the object should be opaque in
comparison with the background material. The principal advantage
of back lighting is it produces good contrast between the objects
and the background.

Consequently, the gray level threshold needed to separate the dark
foreground objects from the light background is easily found. Usually a
wide range of gay level thresholds will be effective in segmenting the
foreground area from the background.
A light table for backlighting can be constructed by shining one or more
lamps on to a diffused translucent surface such as ground glass
Diffused plastic, or frosted Mylar. Since only the silhouette of the object
is visible to the camera, no direct information about the height of the
object is available through backlighting. In fact backlighting is most
effective in inspection of thin, flat objects. More generally, back lighting
can be used to inspect objects whose essential characteristics are revealed
by profiles generated by one or mare physically stable poses. Can
distinguish between coins or keys
but not heads or tails of a coin.
when the outline or silhouette of an object does not provide sufficient
information about an object, then some form of front lighting must be
used. Front lighting is also necessary when backlighting is simply not
feasible, for example when parts are being transported on conveyor belt.
with front lighting, the light source or sources are on the same side as
the camera, as indicated below
There are several forms of front lighting which differs from one
another in the relative positions and orientations of the camera, the
light sources and the object. Scenes with front lighting are typically
low contrast scenes in comparison with scenes with backlighting. As
a result, considerable care must be taken in arranging the
As a result, considerable care must be taken in arranging the lighting
and background to produce a uniformly illuminated scene of maximum
contrast. even then, the gray level threshold needed to separate the
foreground area from background can be sensitive. If possible the
background should be selected to contrast sharply with the foreground
objects.
The last basic form of lighting is side lighting, which can be used to
inspect surface defects such as bumps or dimples on an otherwise flat
surface. If arranged at acute angle the defects will be emphasized by
either casting shadows or creating reflections depending upon the
material. Shown below is an arrangement.

Light Patterns:
The lighting schemes examined thus far are all based on uniform
illumination of the entire scene. With the advent of lasers and the
use of specialized lenses and optical masks, a variety of patterns
of light can be projected on to the scene as well. The presence of a
three dimensional object tends to modulate these patterns of light
when they are viewed from proper perspective. By examining the
modulated pattern, we can often infer such things as the presence
of an object, the objects dimensions, and the orientations of object
surfaces.
There is one simple light pattern that can be used to detect the
presence of three dimensional object and measure its height. Here
a line or stripe of light is projected on to a scene from the side with
a laser and a cylindrical lens as shown in next slide.
Triangulation:
A common method of measuring the depth of a particular point on
an object is to use range triangles. consider the arrangement shown
below.

Here the light source might be a laser or some other source capable
of projecting a narrow beam of light. From the diagram it can be
seen that the distance of lens from the point p on the object can be
calculated. If two points are measured like that the height between
the points can be found.
camera calibration:
In robotic applications, the objective is to determine the position and
orientation of each part relative the base plate of the robot. Once this
information is known the tool configuration
camera
base
T
can be selected
and then a joint space trajectory q(t) can be computed, so as to
manipulate the part. With the aid of a robot vision system, ne can
determine the position and orientation of a part relative to the camera.
Thus to determine the part coordinates relative to the robot base we
must have an accurate transformation from camera coordinates to the
base coordinates. Experimentally determining this transformation
is called camera calibration.
In general camera calibration requires determining both the position
and the orientation of the camera.

camera
base
T
Thresholding:
Gray-level thresholding is the simplest segmentation process. Many
objects or image regions are characterized by constant reflectivity, or
light absorption of their surfaces; a brightness constant or threshold
can be determined to segment objects and background. Thresholding
is computationally inexpensive and fast- it is oldest segmentation
method and is still used in simple applications; thresholding can easily
be done in real time using specialized hardware.
A complete segmentation of an image R is a finite set of regions R
1
....
..., R
s
,
R=

1
0
s
i i j
i
R R R i j
=
= =
Complete segmentation can result from thresholding in simple
scenes. thresholding is the transformation of an image f to an output
(segmented) binary image g as follows:

( , ) 1 ( , )
0 ( , ) .............. 2
g i j for f i j T
for f i j T
= >
=
where T is the threshold , g(i,j) =1 for image elements of objects,
and g(i,j)=0 for image elements of the background( or vice versa).
Basic thresholding:
Search all the elemnts f(i,j) of the image f. An image element g(i,j)
of the segmented image is an object pixel if f(i,j) T, an dis
background pixel otherwise.

fig 5.1 page 125 Sonka
Correct threshold selection is crucial for successful threshold
segmentation, this selection can be determined interactively or it can
be the result of some threshold detection method. only under certain
special circumstances can thresholding be successful using a single
threshold for the whole image ( global thresholding) since even in
very simple images there are likely to be gray level variations in
objects and background; this variation may be due to non-uniform
lighting, non- uniform input device parameters or a number of other
factors. Segmentation using various other thresholds where the
threshold value varies over the image as a function of local image
characteristics, can produce the solution in these cases.
Global thresholding T = T(f)
ocal threshold T = T(f,f
c
) Where f
c
is that image part in
which the threshold is determined.
Basic thresholding as defined by equation 2 of slide 59 has many
modifications. One possibility is to segment an image in regions of
pixels with gray levels from a set D and into background otherwise.
( , ) 1 ( , )
0 .
g i j for f i j D
otherwise
= e
=
This thresholding can be useful in micrscopic blood cell
segmentation, where a particular gray level interval represents
cytoplasma, the background is lighter and the cell kernel darker.
This thresholding definition can serve as border detector as well.
fig 5.2 page
126 Sonka
There are many modifications that use multiple thresholds, after
which the resulting image is no longer binary, but rather an image
consisting of a very limited set of gray levels
1
2
3
( , ) 1 ( , )
2 ( , )
3 ( , )
.................
( , )
0
n
g i j for f i j D
for f i j D
for f i j D
n for f i j D
otherwise
= e
= e
= e
= e
=
Where each D
i
is a specified subset f gray levels.
Another special choice of gray-level subset D
i
defines semi thresholding
which is sometimes used to make human assisted analysis easier:
( , ) ( , ) ( , )
0 ( , )
g i j f i j for f i j T
for f i j T
= >
= <
Threshold detection methods:
p-tile thresholding: If some property of an image after segmentation is
known, the task of threshold selection is simplified, since the threshold is
chosen to satisfy this property. a printed text sheet may be an example if
we know that characters of the text cover 1/p of the sheet area. Using this
prior information about the ratio between the sheet area and the character
area, it is very to choose a threshold T
such that 1/p of the image area has gray values less than T and the rest
has gray values larger than T.
More complex methods of threshold detection are based on histogram
shape analysis.
Optimal thresholding: Methods based on approximation of the histogram
of an image using a weighted sum of two or more probability densities
with normal distribution represent a different approach called optimal
thresholding. The threshold is set as gray level corresponding to the
minimum probability between the maxima of two or ore normal
distributions, which results in minimum error segmentation.
fig 5.4 page 129
sonka
Multi- spectral thresholding:
Many practical segmentation problems need more information than
is contained in one spectral band. Colour images are a natural
example, in which information is coded in three spectral bands for
example red green and blue; multi spectral remote sensing images
or meteorological satelite images may have even more spectral
bands. One segmentation approach determines threshold
independently in each spectral band and combines them in to a
single segmented image.
Thresholding in hierarchical data structures:
The general idea of thresholding in hierarchical data structures is
based on local thresholding methods, the aim being to detect the
presence of a region in a low resolution image, and to give the
region more precision in images higher to full resolution.

You might also like