Professional Documents
Culture Documents
Course 1
Course 1
Bibliography
R.C. Gonzales, R.E. Woods, Digital Image Processing, Prentice Hall,
2008, 3rd ed.
R.C. Gonzales, R.E. Woods, S.L. Eddins, Digital Image Processing
Using MATLAB,
MATLAB Prentice Hall,
Hall 2003
http://www.imageprocessingplace.com/
Course 1
Course 1
Evaluation
MATLAB image processing test (50%)
Articles/Books presentations (50%)
Course 1
Meet Lena!
The First Ladyy of the Internet
Course 1
Lenna
e a Sode
Soderberg
be g (Sjb
(Sjblom)
o )a
and
d
Jeff Seideman taken in May 1997
Imaging Science & Technology
Conference
Course 1
Wh t is
What
i Digital
Di it l Image
I
Processing?
P
i ?
f : D / 3
f(x,y) = intensity, gray level of the image at spatial point (x,y)
x, y, f(x,y) finite, discrete quantities digital image
Digital Image Processing = processing digital images by means of a digital
computer
A digital image is composed of a finite number of elements (location, value of
intensity):
( xi , y j , f ij )
These elements are called picture elements, image elements, pels, pixels
Course 1
Image processing is not limited to the visual band of the electromagnetic (EM)
spectrum
Image processing : gamma to radio waves, ultrasound, electron microscopy,
computer-generated images
image processing , image analysis , computer vision ?
Image processing = discipline in which both the input and the output of a
process are images
C
Computer
t Vi
Vision
i
= use computer
t tto emulate
l t h
human vision
i i (AI)
learning, making inferences and take actions
based on visual inputs
Image analysis (image understanding) = segmentation, partitioning images
into regions or objects
(link between image processing and image analysis)
Course 1
Course 1
Di it l IImage P
Digital
Processing
i (G
(Gonzalez
l +W
Woods)
d )=
processes whose inputs and outputs are images +
processes that extract attributes from images,
recognition of individual objects
(low and mid
(lowmid-level
level processes)
Example:
automated analysis of text =
acquiring an image containing text,
preprocessing the image (enhancement, sharpening),
extracting (segmenting) the individual characaters
characaters,
describing the characters in a form suitable for computer processing,
recognition of individual characters
Course 1
Course 1
A digital picture
produced in 1921
from a coded tape
by a telegraph
printer with
special type faces
(McFarlane)
A digital picture
made in 1922
from a tape
punched after the
signals had
crossed the Atlantic twice
(McFarlane)
Course 1
Course 1
Course 1
g
geographers
g p
use DIP to study
yp
pollution p
patterns from aerial and satellite imagery
g y
archeology DIP allowed restoring blurred pictures that recorded rare artifacts
lost or damaged after being photographed
physics enhance images of experiments (high-energy plasmas,
electron microscopy)
astronomy, biology, nuclear medicine, law enforcement, industry
DIP used in solving problems dealing with machine perception
extracting from an image information suitable for computer processing
(statistical moments, Fourier transform coefficients, )
automatic character recognition, industrial machine vision for product
assembly
bl and
d iinspection,
i
military
ili
recognizance,
i
automatic
i processing
i off
fingerprints, machine processing of aerial and satellite imagery for
weather prediction, Internet
Course 1
Course 1
Course 1
Course 1
Course 1
Bone scan
PET image
Course 1
X-ray imaging
Medical diagnostic,industry, astronomy
A X-ray tube is a vacuum tube with a cathode and an anode.
The cathode is heated, causing free electrons to be released.
The electrons flows at high
g speed
p
to the p
positively
y charged
g anode.
When the electrons strike a nucleus, energy is released in the
form of a X-ray radiation. The energy (penetreting power) of the
X-rays is controlled by a voltage applied across the anode, and
by a curent applied to the filament in the cathode
cathode.
The intensity of the X-rays is modified by absorbtion as they pass
through the patient and the resulting energy falling develops it
much in the same way that light develops photographic film.
Course 1
A i
Angiography
h = contrast-enhancement
t t h
t radiography
di
h
Angiograms = images of blood vessels
A catheter is inserted into an artery or vin in the groin. The catheter is threaded
into the blood vessel and guided to the area to be studied. When it reaches
the area to be studied, a X-ray contrast medium is injected through the catheter.
This enhances contrast of the blood vessels and enables radiologist to see any
irregularities or blockages.
X-rays are used in CAT (computerized axial tomography)
X-rays used in industrial processes (examine circuit boards for flows in manifacturing)
Industrial CAT scans are useful when the parts can be penetreted by X-rays
Course 1
Chest X-ray
Aortic angiogram
Head CT
Cygnus Loop
Circuit boards
Course 1
Course 1
Course 1
Taxol
(anticancer agent)
magnified 250X
Nickel oxide
thin film
(600X)
C o este o
Cholesterol
(40X)
Surface of
audio CD
(1750X)
Microprocessor
(60X)
Organic
superconductor
(450X)
Course 1
A t
Automated
t d visual
i
l iinspection
ti off manufactured
f t d goods
d
a b
c d
e f
Course 1
Course 1
Course 1
Course 1
Course 1
Gamma
X-ray
Optical
Infrared
Radio
Course 1
Course 1
Course 1
O t t are images
Outputs
i
image acquisition
image restoration
compression
morphological processing
Course 1
Course 1
I
Image
acquisition
i iti
- may involve
i
l preprocessing
i such
h as scaling
li
Image enhancement
manipulating an image so that the result is more suitable than
the original for a specific operation
enhancement is problem oriented
there is no general theory of image enhancement
enhancement use subjective methods for image emprovement
enhancement is based on human subjective preferences regarding
what is a good
good enhancement result
Course 1
Image restoration
improving the appearance of an image
j
- the techniques
q
for restoration are based on
restoration is objective
mathematical or probabilistic models of image degradation
Course 1
Compression
reducing the storage required to save an image or the bandwidth
required
i d tto ttransmit
it it
Morphological processing
tools for extracting image components that are useful in the representation
and description of shape
a transition from processes that output images to processes that output
image attributes
Course 1
Segmentation
partitioning an image into its constituents parts or objects
autonomous segmentation is one of the most difficult tasks of DIP
the more accurate the segmentation , the more likley recognition is to succeed
Representation and description (almost always follows segmentation)
segmentation
t ti produces
d
either
ith th
the b
boundary
d
off a region
i or allll th
the poits
it iin th
the
region itself
converting
g the data p
produced by
y segmentation
g
to a form suitable for
computer processing
Course 1
Course 1
Simplified diagram
of a cross section
of the human eye
Course 1
Course 1
Course 1
The lens
Th
l
i made
is
d up off concentric
t i llayers off fib
fibrous cells
ll and
d iis suspended
d db
by
fibers that attach to ciliary body (60-70% water, 6% fat, protein). The lens is
colored in slightly yellow. The lens absorbs approximatively 8% of the visible
g ((infrared and ultraviolet light
g are absorbed by
yp
proteins in lens))
light
The innermost membrane is the retina. When the eye is proper focused,
light from an object outside the eye is imaged on the retina.
Vision is possible because of the distribution of discrete light receptors on the
surface of the retina: cones and rods (6-7 milion cones, 75-150 milion rods),
Cones: located in the central part of the retina (fovea), they are sensitive to
colors, vision of detail, each cone is link to its own nerve
cone vision = photopic or bright-light vision
Fovea = the place where the image of the object of interest falls on
Course 1
Rods : distributed over al the retina surface, several rods are contected to
a single nerve, not specialized in detail vision,
serve to give a general, overall picture of the filed of view
not involved in color vision
sensitive to low level of illumination
Blind spot: region without receptors
Course 1
Distribution of
rods and cones
in the retina
Course 1
I
Image
formation
f
ti in
i the
th eye
Ordinary photographic camera: the lens has fixed focal length, focusing at
various distances is done by modifying the distance between the lens and the
image plane (were the film or imaging chip are located)
Human eye: the distance between the lens and the retina (the imaging region)
i fifixed,
is
d th
the focal
f
l length
l
th needed
d d to
t achieve
hi
proper ffocus iis obtained
bt i d b
by varying
i
the shape of the lens (the fibers in the ciliary body accomplish this,
flattening or thickening the lens for distant or near objects, respectively.
distance between lens and retina along visual axix = 17 mm
range of focal length = 14 mm to 17 mm
Course 1
Course 1
Course 1
All the inner squares have the same intensity, but they appear progressively darker as the
background becomes lighter
Course 1
Optical illusions
Course 1
0 r ( x, y) 1
Continuous image projected onto a sensor array Result of image sampling and quantization
f ( M 1,0)
a0,0
a
1,0
A
a M 1,0
a0,1
a1,1
a M 1,1
f (0,1)
f (1,1)
f ( M 1,1)
f (0, N 1)
f (1, N 1)
f ( M 1, N 1)
a0, N 1
a f ( x i , y j ) f (i , j )
a1, N 1
MN , i , j
a M 1, N 1
M, N 0, L=2k
ai , j , ai , j [0, L 1]
Dynamic range of an image = the ratio of the maximum measurable intensity to the
minimum detectable intensity level in the system
Upper limit determined by saturation, lower limit - noise
b M N k ,
for M N , b N 2 k
When an image can have 2k intensity levels, the image is referred as a k-bit image
256 discrete intensity values 8-bit image
Measures: line pairs per unit distance, dots (pixels) per unit distance
Image resolution = the largest number of discernible line pairs per unit distance
(e.g. 100 line pairs per mm)
Dots per unit distance are commonly used in printing and publishing
In U.S. the measure is expressed in dots per inch (dpi)
(newspapers are printed with 75 dpi, glossy brochures at 175 dpi)
Intensity resolution the smallest discernible change in intensity level
Fig.1 Reducing spatial resolution: 1250 dpi(upper left), 300 dpi (upper right)
150 dpi (lower left), 72 dpi (lower right)
Image Interpolation
- used in zooming, shrinking, rotating, and geometric corrections
Shrinking, zooming image resizing image resampling methods
Interpolation is the process of using known data to estimate values at unknown locations
Suppose we have an image of size 500 500 pixels that has to be enlarged 1.5 times to
750750 pixels. One way to do this is to create an imaginary 750 750 grid with the
same spacing as the original, and then shrink it so that it fits exactly over the original
image. The pixel spacing in the 750 750 grid will be less than in the original image.
Problem: assignment of intensity-level in the new 750 750 grid
Nearest neighbor interpolation: assign for every point in the new grid (750 750) the
intensity of the closest pixel (nearest neighbor) from the old/original grid (500 500).
This technique has the tendency to produce undesirable effects, like severe distortion of
straight edges.
Bilinear interpolation assign for the new (x,y) location the following intensity:
v( x, y ) a x b y c x y d
where the four coefficients are determined from the 4 equations in 4 unknowns that can
be written using the 4 nearest neighbors of point (x,y).
Bilinear interpolation gives much better results than nearest neighbor interpolation, with a
modest increase in computational effort.
Bicubic interpolation assign for the new (x,y) location an intensity that involves the 16
nearest neighbors of the point:
3
v ( x , y ) ci , j x i y j
i 0 j0
c
i 0 j 0
i j
x
y intensity levels of the 16 nearest neighbors of ( x , y )
i, j
Generally, bicubic interpolation does a better job of preserving fine detail than the
bilinear technique. Bicubic interpolation is the standard used in commercial image editing
programs, such as Adobe Photoshop and Corel Photopaint.
Figure 2 (a) is the same as Fig. 1 (d), which was obtained by reducing the resolution of
the 1250 dpi in Fig. 1(a) to 72 dpi (the size shrank from 3692 2812 to 213 162) and
then zooming the reduced image back to its original size. To generate Fig. 1(d) nearest
neighbor interpolation was used (both for shrinking and zooming).
Figures 2(b) and (c) were generated using the same steps but using bilinear and bicubic
interpolation, respectively. Figures 2(d)+(e)+(f) were obtained by reducing the resolution
from 1250 dpi to 150 dpi (instead of 72 dpi)
Fig. 2 Interpolation examples for zooming and shrinking (nearest neighbor, linear, bicubic)
Neighbors of a Pixel
A pixel p at coordinates (x,y) has 4 horizontal and vertical neighbors:
horizontal: ( x 1, y ) , ( x 1, y ) ; vertical: ( x , y 1) , ( x , y 1)
This set of pixels, called the 4-neighbors of p, denoted by N4 (p).
The 4 diagonal neighbors of p have coordinates:
( x 1, y 1) , ( x 1, y 1) , ( x 1, y 1) , ( x 1, y 1)
and are denoted ND(p).
The horizontal, vertical and diagonal neighbors are called the 8-neighbors of p, denoted
N8 (p).
If (x,y) is on the border of the image some of the neighbor locations in ND(p) and N8(p)
fall outside the image.
1
1
0
, 0
1 1
1
0
0
1
, 0
1 1
1
0
0
1
The three pixels at the top (first line) in the above example show multiple (ambiguous)
8-adjacency, as indicated by the dashed lines. This ambiguity is removed by using
m-adjacency.
A (digital) path (or curve) from pixel p with coordinates (x,y) to q with coordinates (s,t)
is a sequence of distinct pixels with coordinates:
( x0 , y0 ) ( x , y ) , ( x1 , y1 ), ... , ( xn , yn ) ( s , t )
( xi 1 , yi 1 ) and ( xi , yi ) are adjacent, i 1, 2,..., n
The length of the path is n. If ( x0 , y0 ) ( xn , yn ) the path is closed.
Depending on the type of adjacency considered the paths are: 4-, 8-, or m-paths.
Let S denote a subset of pixels in an image. Two pixels p and q are said to be connected
in S if there exists a path between them consisting only of pixels from S.
S is a connected set if there is a path in S between any 2 pixels in S.
Let R be a subset of pixels in an image. R is a region of the image if R is a connected set.
Two regions R1 and R2 are said to be adjacent if R1 R2 form a connected set. Regions
that are not adjacent are said to be disjoint. When referring to regions only 4- and
8-adjacency are considered.
R
k 1
, ( Ru )c the complement of Ru
We call al the points in Ru the foreground of the image and the points in ( Ru )c the
background of the image.
The boundary (border or contour) of a region R is the set of point that are adjacent to
points in the complement of R, (R)c. The border of an image is the set of pixels in the
region that have at least one background neighbor. This definition is referred to as the
inner border to distinguish it from the notion of outer border which is the corresponding
Distance measures
For pixels p, q, and z, with coordinates (x,y), (s,t) and (v,w) respectively, D is a distance
function or metric if:
De ( p, q ) ( x s ) ( y t ) ( x s )2 ( y t )2
2
The pixels q for which De ( p, q ) r are the points contained in a disk of radius r
centered at (x,y).
The D4 distance (also called city-block distance) between p and q is defined as:
D4 ( p, q ) | x s | | y t |
The pixels q for which D4 ( p, q ) r form a diamond centered at (x,y).
2
2 1
D4 2
2 1 0 1
2 1
2 2 2 2 2
D8 2
2 1 1 1
2 1 0 1
2 1 1 1
2 2 2 2 2
p3 {0,1}
p1 {0,1}
p4 1
p2 1
p1
Consider V={1}.
If p1 = p3 = 0 then Dm(p , p4) = 2.
If p1 = 1 , then p2 and p are no longer m-adjacent then Dm(p , p4) = 3 (p, p1, p2, p4).
If p1 = 0, p3 = 1 then Dm(p , p4) = 3.
If p1 = p3 = 1 then Dm(p , p4) = 4 (p, p1, p2, p3, p4).
An array operation involving one or more images is carried out on a pixel-by-pixel basis.
a11
a
21
a12
a22
b11
b
21
b12
b22
Array product:
a11
a
21
a12 b11
a22 b21
b12
a11b11
a b
b22
21 21
a12b12
a22b21
Matrix product:
a11
a
21
a12 b11
a22 b21
b12
a11b11 a12b21
a b a b
b22
22 21
21 11
a11b12 a12b21
a21b12 a22b22
0 2
6 3
6 5
maxa f1 b f 2 max 1
1)
max
4 7
2
2
3
0 2
6 5
1 max
1)
max
3 ( 1)7 4
2 3
4 7
For a random variable z with mean m, E[(z-m)2] is the variance (E( ) is the expected
value). The covariance of two random variables z1 and z2 is defined as E[(z1-m1) (z2-m2)].
The two random variables are uncorrelated when their covariance is 0.
Objective: reduce noise by adding a set of noisy images gi ( x , y ) (technique frequently
used in image enhancement)
1
g( x, y)
K
g ( x, y)
i 1
g2 ( x , y )
1 2
( x, y)
K
g( x, y)
1
( x, y)
K
As K increases, the variability (as measured by the variance or the standard deviation) of
the pixel values at each location (x,y) decreases. Because E g ( x , y ) f ( x , y ) , this
means that g ( x , y ) approaches f(x,y) as the number of noisy images used in the
averaging process increases.
An important application of image averaging is in the field of astronomy, where imaging
under very low light levels frequently causes sensor noise to render single images
virtually useless for analysis. Figure 2.26(a) shows an 8-bit image in which corruption
was simulated by adding to it Gaussian noise with zero mean and a standard deviation of
64 intensity levels. Figures 2.26(b)-(f) show the result of averaging 5, 10, 20, 50 and 100
images, respectively.
a b c
d e f
Fig. 3 Image of Galaxy Pair NGC 3314 corrupted by additive Gaussian noise (left corner); Results of averaging 5, 10, 20, 50,
100 noisy images
(a)
(b)
(c)
Fig. 4 (a) Infrared image of Washington DC area; (b) Image obtained from (a) by setting to zero the least
significant bit of each pixel; (c) the difference between the two images
Figure 4(b) was obtained by setting to zero the least-significant bit of every pixel in
Figure 4(a). The two images seem almost the same. Figure 4(c) is the difference between
images (a) and (b). Black (0) values in Figure (c) indicate locations where there is no
difference between images (a) and (b).
g ( x , y ) f ( x , y ) h( x , y )
h(x,y) , the mask, is an X-ray image of a region of a patients body, captured by an
intensified TV camera (instead of traditional X-ray film) located opposite an X-ray
source. The procedure consists of injecting an X-ray contrast medium into the patients
bloodstream, taking a series of images called live images (denoted f(x,y)) of the same
anatomical region as h(x,y), and subtracting the mask from the series of incoming live
images after injection of the contrast medium.
In g(x,y) we can find the differences between h and f, as enhanced detail.
Images being captured at TV rates, we obtain a movie showing how the contrast medium
propagates through the various arteries in the area being observed.
a b
c d
Fig. 5 Angiography subtraction
example
(a) mask image; (b) live image ;
(c) difference between (a) and (b);
(d) - image (c) enhanced
g ( x , y ) f ( x , y ) h( x , y )
f(x,y) the perfect image , h(x,y) the shading function
When the shading function is known:
f ( x, y)
g( x , y )
h( x , y )
h(x,y) is unknown but we have access to the imaging system, we can obtain an
approximation to the shading function by imaging a target of constant intensity. When the
sensor is not available, often the shading pattern can be estimated from the image.
(a)
(b)
(c)
Fig. 6 Shading correction (a) Shaded image of a tungsten filament, magnified 130 ; (b) - shading pattern ; (c) corrected image
Another use of image multiplication is in masking, also called region of interest (ROI),
operations. The process consists of multiplying a given image by a mask image that has
1s (white) in the ROI and 0s elsewhere. There can be more than one ROI in the mask
image and the shape of the ROI can be arbitrary, but usually is a rectangular shape.
(a)
(b)
(c)
Fig. 7 (a) digital dental X-ray image; (b) - ROI mask for teeth with fillings; (c) product of (a) and (b)
In practice, most images are displayed using 8 bits the image values are in the range
[0,255].
TIFF, JPEG images conversion to this range is automatic. The conversion depends on
the system used.
Difference of two images can produce image with values in the range [-255,255]
Addition of two images range [0,510]
Many software packages simply set the negative values to 0 and set to 255 all values
greater than 255.
A more appropriate procedure: compute
f m f min( f )
which creates an image whose minimum value is 0, then we perform the operation:
fs K
fm
0 , K
max( f m )
( K 255 )
Spatial Operations
- are performed directly on the pixels of a given image.
There are three categories of spatial operations:
single-pixel operations
neighborhood operations
geometric spatial transformations
Single-pixel operations
- change the values of intensity for the individual pixels
s T (z)
where z is the intensity of a pixel in the original image and s is the intensity of the
corresponding pixel in the processed image. Fig. 2.34 shows the transformation used to
obtain the negative of an 8-bit image
Intensity transformation
function for the
complement of an 8-bit
image
Neighborhood operations
Let Sxy denote a set of coordinates of a neighborhood centered on an arbitrary point (x,y)
in an image, f. Neighborhood processing generates an new intensity level at point (x,y)
based on the values of the intensities of the points in Sxy. For example, if Sxy is a
rectangular neighborhood of size m n centered in (x,y), we can assign the new value of
intensity by computing the average value of the pixels in Sxy.
g( x , y )
1
f (r , c)
m n ( r ,c )S xy
The net effect is to perform local blurring in the original image. This type of process is
used, for example, to eliminate small details and thus render blobs corresponding to the
largest region of an image.
Aortic angiogram
Spatial Operations
- are performed directly on the pixels of a given image.
There are three categories of spatial operations:
single-pixel operations
neighborhood operations
geometric spatial transformations
Single-pixel operations
- change the values of intensity for the individual pixels
s T (z)
where z is the intensity of a pixel in the original image and s is the intensity of the
corresponding pixel in the processed image.
Neighborhood operations
Let Sxy denote a set of coordinates of a neighborhood centered on an arbitrary point (x,y)
in an image, f. Neighborhood processing generates an new intensity level at point (x,y)
based on the values of the intensities of the points in Sxy. For example, if Sxy is a
rectangular neighborhood of size m x n centered in (x,y), we can assign the new value of
intensity by computing the average value of the pixels in Sxy.
g( x , y )
1
f (r , c)
m n ( r ,c )S xy
The net effect is to perform local blurring in the original image. This type of process is
used, for example, to eliminate small details and thus render blobs corresponding to the
largest region of an image.
( x , y ) T [(v , w )]
(v,w) pixel coordinates in the original image
(x,y) pixel coordinates in the transformed image
v w
T [( v , w )] ( , ) shrinks the original image half its size in both spatial directions
2 2
Affine transform
t11
[ x , y ,1] [v , w ,1]T [v , w ,1] t 21
t 31
t12
t 22
t 32
0
0
1
x t11v t 21 w t 31
y t12v t 22 w t 33
(AT)
This transform can scale, rotate, translate, or sheer a set of coordinate points, depending
on the elements of the matrix T. If we want to resize an image, rotate it, and move the
result to some location, we simply form a 3x3 matrix equal to the matrix product of the
scaling, rotation, and translation matrices from Table 1.
Affine transformations
forward mapping : scanning the pixels of the input image and, at each location (v,w),
computing the spatial location (x,y) of the corresponding in the image using (AT)
directly;
Problems:
- intensity assignment when 2 or more pixels in the original image are transformed to the
same location in the output image,
- some output locations have no correspondent in the original image (no intensity
assignment)
inverse mapping: scans the output pixel locations, and at each location, (x,y),
computes the corresponding location in the input image (v,w)
( v , w ) T 1 ( x , y )
It then interpolates among the nearest input pixels to determine the intensity of the output
pixel value.
Inverse mapping are more efficient to implement than forward mappings and are used in
numerous commercial implementations of spatial transformations (MATLAB for ex.).
x c1v c2 w c3v w c4
y c5v c6 w c7 v w c8
(v,w) and (x,y) are the coordinates of the tie points (we get a 8x8 linear system for {ci })
When 4 tie points are insufficient to obtain satisfactory registration, an approach used
frequently is to select a larger number of tie points and using this new set of tie points
subdivide the image in rectangular regions marked by groups of 4 tie points. On the
subregions marked by 4 tie points we applied the transformation model described above.
The number of tie points and the sophistication of the model required to solve the register
problem depend on the severity of the geometrical distortion.
ab
cd
(a) reference image
(b) geometrically distorted image
(c) - registered image
(d) difference between (a) and (c)
Probabilistic Methods
nk
MN
nk = the number of times that intensity zk occurs in the image (MN is the total number of
pixels in the image)
L 1
p( z
k 0
)1
m zk p( zk )
k 0
( zk m )2 p( zk )
2
k 0
The variance is a measure of the spread of the values of z about the mean, so it is a
measure of image contrast. Usually, for measuring image contrast the standard deviation
( ) is used.
The n-th moment of a random variable z about the mean is defined as:
L 1
n ( z ) ( z k m ) n p( z k )
k 0
( 0 ( z ) 1 , 1 ( z ) 0 , 2 ( z ) 2 )
Fig.1
(a)
Low contrast
- spatial filtering, the operator T (the neighborhood and the operation applied on it) is
called spatial filter (spatial mask, kernel, template or window)
Fig. 2
Intensity transformation functions.
left - contrast stretching
right - thresholding function
Figure 2 left - T produces an output image of higher contrast than the original, by
darkening the intensity levels below k and brightening the levels above k this technique
is called contrast stretching.
Figure 2 right - T produces a binary output image. A mapping of this form is called
thresholding function.
s T (r ) L 1 r
- equivalent of a photographic negative
- technique suited for enhancing white or gray detail embedded in dark regions of an
image
Fig. 3
Left original digital mammogram
Right negative transformed image
Log Transformations
s T ( r ) c log(1 r ) , c - constant , r 0
This transformation maps a narrow range of low intensity values in the input into a wider
range. An operator of this type is used to expand the values of dark pixels in an image
while compressing the higher-level values. The opposite is true for the inverse log
transformation. The log functions compress the dynamic range of images with large
variations in pixel values.
ab
(a) Fourier spectrum
(b) log transformation
applied to (a), c=1
Fig. 4
Power-law curves with 1 map a narrow range of dark input values into a wider range
of output values, with the opposite being true for higher values of input values. The
curves with 1 have the opposite effect of those generated with values of 1 .
c 1 - identity transformation.
A variety of devices used for image capture, printing, and display respond according to a
power law. The process used to correct these power-law response phenomena is called
gamma correction.
ab
cd
(a) aerial image
(b) (d) results of applying gamma
transformation with c=1 and
=3.0, 4.0 and 5.0 respectively
s1
r
r
1
s2 ( r r1 ) s1 ( r2 r )
T (r )
( r2 r1 )
( r2 r1 )
s2 ( L 1 r )
( L 1 r2 )
r [0, r1 ]
r [r1 , r2 ]
r [r2 , L 1]
r2 , s2 rmax , L 1
where rmin and rmax denote the minimum and maximum gray levels
in the image, respectively. Thus, the transformation function stretched the levels linearly
from their original range to the full range [0, L-1].
Figure 5(d) - the thresholding function was used r1 , s1 m ,0 , r2 , s2 m , L 1
where m is the mean gray level in the image.
The original image on which these results are based is a scanning electron microscope
image of pollen, magnified approximately 700 times.
Intensity-level slicing
- highlighting a specific range of intensities in an image
There are two approaches for intensity-level slicing:
1. display in one value (white, for example) all the values in the range of interest and in
another (say, black) all other intensities (Figure 3.11 (a))
2. brighten (or darken) the desired range of intensities but leaves unchanged all other
intensities in the image (Figure 3.11 (b)).
Figure 6 (left) aortic angiogram near the kidney. The purpose of intensity slicing is to
highlight the major blood vessels that appear brighter as a result of injecting a contrast
medium. Figure 6(middle) shows the result of applying technique 1. for a band near the
top of the scale of intensities. This type of enhancement produces a binary image which is
useful for studying the shape of the flow of the contrast substance (to detect blockages)
In Figure 3.12(right) the second technique was used: a band of intensities in the midgray image around the mean intensity was set to black, the other intensities remain
unchanged.
Bit-plane slicing
For a 8-bit image, f(x,y) is a number in [0,255], with 8-bit representation in base 2
This technique highlights the contribution made to the whole image appearances by each
of the bits. An 8-bit image may be considered as being composed of eight 1-bit planes
(plane 1 the lowest order bit, plane 8 the highest order bit)
The binary image for the 8-th bit plane of an 8-bit image can be obtained by processing
the input image with a threshold intensity transformation function that maps all the
intensities between 0 and 127 to 0 and maps all levels between 128 and 255 to 1.
The bit-slicing technique is useful for analyzing the relative importance of each bit in the
image helps in determining the proper number of bits to use when quantizing the image.
The technique is also useful for image compression.
Histogram processing
The histogram of a digital image is with intensity levels in [0 , L-1]:
h( rk ) nk , k 0,1,..., L 1
rk the k -th intensity level
nk the number of pixels in the image with intensity rk
nk
, k 0,1,..., L 1
MN
p(r ) 1
k 0
Histogram Equalization
- determine a transformation function that seeks to produce an output image that has a
uniform histogram
s T (r ) , 0 r L 1
(a) T(r) monotonically increasing
(b) 0 T ( r ) L 1 for 0 r L 1
T(r) monotonically increasing guarantees that intensity values in output image will not
be less than the corresponding input values
Relation (b) requires that both input and output images have the same range of intensities
( L 1) k
sk T ( rk ) L 1 pr ( rj )
nj
M
N
j 0
j 0
k
The output image is obtained by mapping each pixel in the input image with intensity rk
into a corresponding pixel with intensity sk in the output image.
Consider the following example: 3-bit image (L=8), 64x64 image (M=N=64, MN=4096)
Intensity distribution and histogram values for a 3-bit 6464 digital image
s0 T ( r0 ) 7 pr ( rj ) 7 pr ( r0 ) 1.33
j 0
s1 T ( r1 ) 7 pr ( rj ) 7 pr ( r0 ) 7 pr ( r1 ) 3.08
j 0
s2 4.55 , s3 5.67 ,
s4 6.23 ,
s0 1.33 1
s1 3.08 3
s2 4.55 5
s3 5.67 6
s5 6.65 ,
s4
s5
s6
s7
6.23 6
6.65 7
6.86 7
7.00 7
s6 6.86 , s7 7.00
( L 1) k
sk T ( rk ) L 1 pr ( rj )
n j , k 0,1,..., L 1
M
N
j 0
j0
k
G ( zq ) L 1 pz ( zi ) , q 0,1,..., L 1
i 0
(2)
(1)
Histogram-specification procedure:
1) Compute the histogram pr(r) of the input image, and compute the histogram
equalization transformation (1). Round the resulting values sk to integers in [0, L-1]
2) Compute all values of the transformation function G using relation (2), where pz(zi)
are the values of the specified histogram. Round the values G(zq) to integers in the
range [0, L-1] and store these values in a table
3) For every value of sk ,k=0,1,,L-1 use the table for the values of G to find the
corresponding value of zq so that G(zq) is closest to sk and store these mappings
from s to z. When more than one value of zq satisfies the property (i.e., the mapping
is not unique), choose the smallest value by convention.
4) Form the histogram-specified image by first histogram-equalizing the input image
and then mapping every equalized pixel value, sk , of this image to the corresponding
value zq in the histogram-specified image using the mappings found at step 3).
The intermediate step of equalizing the input image can bin skipped by combining the
two transformation functions T and G-1.
Reconsider the above example:
Fig. 9
Figure 9(a) shows the histogram of the original image. Figure 9 (b) is the new histogram
to achieve.
The first step is to obtain the scaled histogram-equalized values:
s0 1
s1 3
s2 5
s3 6
s4
s5
s6
s7
6
7
7
7
The results of performing step 3) of the procedure are summarized in the next table:
In the last step of the algorithm, we use the mappings in the above table to map every
pixel in the histogram equalized image into a corresponding pixel in the newly-created
histogram-specified image. The values of the resulting histogram are listed in the third
column of Table 3.2, and the histogram is sketched in Figure 9(d).
r0 0
r1 1
r2 2
r3 3
r4 4
r5 5
r6 6
r7 7
790
1023
850
656
329
245
122
81
s0 1
790
zq 3
s1 3
1023
zq 4
s2 5
850
zq 5
s3 s4 6
656 329
zq 6
s5 s6 s7 7 245 122 81
zq 7
n ( r ) ( ri m )n p( ri )
i 0
L 1
2 ( r ) ( ri m )2 p( ri ) , measure of contrast
2
i 0
1
m
MN
M 1 N 1
x 0 y0
f ( x, y) ,
1
MN
2
M 1 N 1
f ( x, y) m
x 0 y0
Spatial Filtering
The name filter is borrowed from frequency domain processing, where filtering means
accepting (passing) or rejecting certain frequency components. Filters that pass low
frequency are called lowpass filters. A lowpass filter has the effect of blurring
(smoothing) an image. The filters are also called masks, kernels, templates or windows.
If the operation performed on the image pixels is linear, the filter is called linear spatial
filter, otherwise the filter is nonlinear.
g ( x , y ) w ( 1, 1) f ( x 1, y 1) w ( 1,0) f ( x 1, y )
w (0,0) f ( x , y ) w (1,1) f ( x 1, y 1)
For a mask of size m n, we assume m=2a+1 and n=2b+1, where a and b are positive
integers. The general expression of a linear spatial filter of an image of size M N with a
filter of size m n is:
g( x , y )
s a
t b
w( s, t ) f ( x s, y t )
of products at each location. Convolution is similar with correlation, except that the filter
is first rotated by 180.
Correlation
w ( x , y ) f ( x , y )
s a
t b
s a
t b
w( s, t ) f ( x s, y t )
Convolution
w ( x , y ) f ( x , y )
w( s, t ) f ( x s, y t )
A function that contains a single 1 and the rest being 0s is called a discrete unit
impulse. Correlating a function with a discrete unit impulse produces a rotated
version of the filter at the location of the impulse.
Linear filters can be found in DIP literature also as: convolution filter,
convolution mask or convolution kernel.
R w1 z1 w2 z2 wmn zmn wk zk w T z
k 1
Where the w-s are the coefficients of an mn filter and the z-s are the corresponding
image intensities encompassed by the filter.
R w1 z1 w2 z2 w9 z9 wk zk w T z , w , z 9
k 1
The process of replacing the value of every pixel in an image by the average of the
intensity levels in the neighborhood defined by the filter mask produces an image with
reduced sharp transitions in intensities. Usually random noise is characterized by such
sharp transitions in intensity levels smoothing linear filters are applied for noise reduction.
The problem is that edges are also characterized by sharp intensity transitions, so
averaging filters have the undesirable effect that they blur edges.
A major use of averaging filters is the reduction of irrelevant detail in an image (pixel
regions that are small with respect to the size of the filter mask).
There is the possibility of using weighted average: the pixels are multiplied by different
coefficients, thus giving more importance (weight) to some pixels at the expense of other.
A general weighted averaging filter of size m n (m and n are odd) for an MN image is
given by the expression:
a
g( x , y )
w( s, t ) f ( x s, y t )
s a t b
w( s, t )
s a t b
x 0,1,..., M 1 , y 0,1,..., N 1
ab
cd
ef
(a) original image 500500
(b) (f) results of smoothing with square averaging filters
of size m=3,5,9,15, and 35, respectively
The black squares at the top are of size 3, 5, 9, 15, 25, 35,
45, 55. The letters at the bottom range in size from 10 cu 24
points. The vertical bars are 5 pixels wide and 100 pixels
high, separated bu 20 pixels. The diameter of the circles is
25 pixels, and their borders are 15 pixels apart. The noisy
rectangles are 50120 pixels.
Left image from the Hubble Space Telescope, 528485; Middle Image filtered with a 1515 averaging mask;
Right result of averaging the middle image
effective
against
impulse
noise
(also
called
salt-and-pepper noise).
The median,, of a set of values is such that half the values in
the set are less than or equal to , and half are greater than or equal
to .
For a 3 3 neighborhood with intensity values (10, 15, 20, 20,
30, 20, 20, 25, 100) the median is =20.
(a)
(b)
(c)
(a) X-ray image of circuit board corrupted by salt&pepper noise
(b) noise reduction with a 33 averaging filter
(c) noise reduction with a 33 median filter
f
f ( x 1) f ( x )
x
2 f
f ( x 1) 2 f ( x ) f ( x 1)
2
x
f
2
f
2
2
x
y
2 f
f ( x , y 1) 2 f ( x , y ) f ( x , y 1)
2
y
2 f ( x , y ) f ( x 1, y ) f ( x 1, y ) f ( x , y 1) f ( x , y 1) 4 f ( x , y )
Blurred image of the North Pole of the Moon; Lapalce filtered image
Let
f ( x, y)
g ( x , y ) f ( x , y ) k gmask ( x , y )
k = 1 unsharp masking
k > 1 highboost filtering
original image
g x2 g 2y
is not isotropic). In
(not isotropic)
M ( x , y ) 12 22
M ( x, y) 1 2
Sobel operators
g x f ( x 1, y 1) 2 f ( x , y 1) f ( x 1, y 1)
f ( x 1, y 1) 2 f ( x , y 1)
f ( x 1, y 1)
g y f ( x 1, y 1) 2 f ( x 1, y ) f ( x 1, y 1)
f ( x 1, y 1) 2 f ( x 1, y )
f ( x 1, y 1)
Sobel operators
Complex Numbers
C Ri I ,
C C e i
Fourier series
f(t) a periodic function ( f ( t T ) f ( t ) t )
n
ce
f (t )
1
cn
T
T
2
T
f ( t )e
2 n
t
T
2 n
t
T
dt n 0, 1, 2,...
(t )
0
if t 0
if t 0
satisfying
( t )dt 1
f ( t ) ( t t0 )dt f ( t0 ) , f continuous in t0
1
( x)
0
if x 0
if x 0
satisfying
( x) 1
f ( x ) ( x ) f (0)
f ( x ) ( x x0 ) f ( x0 )
s T ( t )
( t n T )
f ( t )e i 2 t dt
f ( t ) F ( ) e i 2 t d
F ( )
f ( t ) cos(2 t ) i sin(2 t ) dt
sin( x )
sinc( x )
, sinc(0) 1
x
F ( ) ( t )e i 2 t dt 1
F ( ) ( t t0 )e i 2 t dt cos(2 t0 ) i sin(2 t0 )
2 n
t
T
i 2Tn t
n
F e
Convolution
f ( t )h( t )
f ( s ) g ( t s ) ds , f , h continuous functions
F f ( t )h( t ) H ( )F ( )
F f ( t ) h( t ) H ( )F ( )
Convolution
in
the
frequency
domain
is
analogous
to
f ( t ) f ( t ) sT ( t )
f ( t ) ( t n T ) , f ( t ) sampled function
f ( t ) ( t k T )dt f ( k T )
f ( t )
F ( ) F f ( t ) = F f ( t ) sT ( t ) F ( )S ( ) ,
1
S( )
T
1
F ( ) F ( )S ( )
T
F ( )
f ( t )
is an
1
infinite, periodic sequence of copies of F(), the period is
.
T
The Sampling Theorem
Consider the problem of establishing the conditions under which a
continuous function can be recovered uniquely from a set of its
samples.
A function f(t) is called band-limited if its Fourier transform is 0
outside the interval [-max ,max].
1
Recall that F ( ) is continuous, periodic with period
. All we
T
need is one complete period to characterize the entire transform.
This implies that we can recover f(t) from that single period by
using the inverse Fourier transform.
Extracting from F ( ) a single period that is equal to F() is
possible if the separation between copies is sufficient, i.e.,
1
max
2 T
1
2 max
T
Sampling Theorem
A continuous, band-limited function can be recovered completely
from a set of its samples if the samples are acquired at a rate
exceeding twice the highest frequency content of the function.
The number 2 max is called Nyquist rate.
T
H ( )
0
max max
otherwise
F ( ) H ( )F ( )
f ( t ) F ( ) e i 2 t d
f ( t )e
i 2 t
dt
f ( t ) ( t n T ) e
i 2 t
dt
f n e i 2 n T
(1)
What is the discrete version of F ( ) ? All we need to characterize
m
M T
m 0,1,..., M 1
substitute it in (1):
Fm
M 1
fe
n 0
2 m n
M
m 0,1, 2,..., M 1
(2)
M 1
m 0
2 m n
M
n 0,1, 2,..., M 1
(t , z )
0
if t z 0
otherwise
( t , z ) dt dz 1
Sifting property
f ( t , z ) ( t , z ) dt dz f (0,0)
f ( t , z ) ( t t0 , z z0 ) dt dz f ( t0 , z0 )
Discrete case
1
( x, y)
0
if x y 0
otherwise
f ( x , y ) ( x , y ) f (0,0)
x y
x y
f ( x , y ) ( x x0 , y y0 ) f ( x0 , y0 )
f ( t , z )e i 2 ( t z ) dt dz
F ( , )e i 2 ( t z ) d d
s T Z ( t , z )
(t mT , z nZ )
m n
T
Z
1
2 max
1
2 max
1
2 max
T
1
2 max
Z
M 1 N 1
f ( x , y )e
ux v y
i 2
M N
x 0 y 0
M 1 N 1
F (u, v )e
u 0 v 0
ux v y
i 2
M N
, x 0,1,..., M 1 , y 0,1,..., N 1
u x v y
i 2 0 0
M N
f ( x x 0 , y y0 )
F ( u u0 , v v0 )
F ( u, v )e
u x v y
i 2 0 0
M N
x r cos
y r sin ,
u cos
v sin
we get the rotating f(x,y) by an angle 0, the same happens with the
Fourier transform, F:
f ( r , 0 )
F ( , 0 )
Periodicity
F ( u, v ) F ( u k1 M , v ) F ( u, v k2 N ) F ( u k1 M , v k2 N )
f ( x , y ) f ( x k1 M , y ) f ( x , y k2 N ) f ( x k1 M , y k2 N ) ,
k1 , k2 integers
f ( x , y )( 1)
x y
M
N
F (u
,v )
2
2
This last relation shifts the data so that F(0,0) is at the center of the
frequency rectangle defined by the intervals [0,M-1] and [0,N-1].
Symmetry Properties
Odd and even part of a function:
w ( x , y ) we ( x , y ) wo ( x , y )
w( x , y ) w( x , y )
we ( x , y )
2
w( x, y ) w( x, y )
wo ( x , y )
2
we ( x , y ) we ( x , y ) symmetric
wo ( x , y ) wo ( x , y ) antisymmetric
w ( x , y )w ( x , y ) 0
x 0 y 0
I ( u, v )
( u, v ) arctan
is the phase angle
R( u , v )
P ( u, v ) F ( u, v ) R 2 ( u, v ) I 2 ( u, v ) - the power spectrum
2
F ( u, v ) F ( u, v )
( u, v ) ( u, v )
F (0,0)
M 1 N 1
f ( x, y)
x 0 y 0
1
F (0,0) MN f , f
MN
M 1 N 1
f ( x, y)
x0 y0
F (0,0) MN f
M 1 N 1
m 0 n 0
F ( u, v ) H ( u , v )
F ( u, v )H ( u, v )
F ( u, v )
M 1 N 1
f ( x , y )e
ux v y
i 2
M N
x 0 y0
1
f ( x, y)
MN
M 1 N 1
F (u, v )e
ux v y
i 2
M
N
, x 0,1,..., M 1,
u 0 v 0
y 0,1,..., N 1
F (0,0) MN f
1
f
MN
M 1 N 1
f ( x, y)
x 0 y0
f ( r , 0 )
F ( u, v )e
x y y v
i 2 0 0
N
M
F ( , 0 )
image
Fourier spectrum
translated image
45 rotated image
Fourier spectrum
Fourier spectrum
Woman
Rectangle
phase angle
f ( x , y )h( x , y )
M 1 N 1
f (m , n)h( x m , y n) ,
m 0 n 0
x 0,1,..., M 1 , y 0,1,..., N 1
F ( u, v ) H ( u , v )
f ( x , y )h( x , y )
F ( u, v )H ( u, v )
P A B 1
This process is called zero padding.
Let f(x,y) and h(x,y) be two image arrays of size AB and
CD pixels, respectively. Wraparound error in their circular
0 x A 1 and 0 y B 1
A x P and B y Q
h( x , y )
hp ( x , y )
0
0 x C 1 and 0 y D 1
C x P and D y Q
P A C 1 ( P 2 M 1)
Q B D 1 (Q 2 N 1)
g ( x , y ) F 1 H ( u, v ) F ( u, v )
(1)
u M / 2, v N / 2( u v 0)
elsewhere
Fourier spectrum
F(0,0)=0
F ( u , v ) R( u, v ) i I ( u , v )
g( x , y ) F 1 H ( u, v ) R( u, v ) i H ( u, v ) I ( u, v )
The phase angle is not altered by filtering in this way. Filters
that affect the real and the imaginary parts equally, and thus
have no effect on the phase are called zero-phase-shift filters.
Even small changes in the phase angle can have undesirable
effects on the filtered output.
P Q
size PQ with center at coordinates , . Compute
2 2
the array product G ( u, v ) H ( u, v )F ( u, v )
6. Obtain the processed image:
g p ( x , y ) real F 1 G ( u, v ) ( 1) x y
The real part is selected in order to ignore parasitic
complex components resulting from computational
inaccuracies.
f ( x , y ) ( x , y ) F ( u, v ) 1
g ( x , y ) F 1 H ( u, v )F ( u, v ) h( x , y ) F 1 H ( u, v )
h( x , y ) H ( u, v )
h(x,y) is sometimes called as the (finite) impulse response
(FIR) of H(u,v).
H ( u) Ae
u2
2 2
h( x ) 2 Ae
2 2 2 x 2
ideal ,
Butterworth ,
Gaussian
if D( u, v ) D0
if D( u, v ) D0
P
Q
D ( u, v ) u v
2
2
(DUV)
1
D ( u, v )
1
D0
2n
D 2 ( u ,v )
2
D 2 ( u ,v )
2 D02
if D( u, v ) D0
if D( u, v ) D0
H ( u, v )
1
D0
1
(
,
)
D
u
v
2n
D 2 ( u ,v )
2 D02
H ( u, v ) 4 2 u 2 v 2
P
Q
2
H ( u, v ) 4 u v 4 2 D 2 ( u, v )
2
2
2 f ( x , y ) F 1 H ( u, v )F ( u, v )
g( x , y ) f ( x , y ) 2 f ( x , y )
(1)
g ( x , y ) F 1 F ( u, v ) H ( u, v )F ( u, v )
F
1 4
D ( u, v ) F ( u, v )
2
(2)
gmask ( x , y ) f ( x , y ) f LP ( x , y )
f LP ( x , y ) F 1 H LP ( u, v )F ( u, v )
HLP(u,v) is a lowpass filter. Here fLP(x,y) is a smoothed image
analogous to f ( x , y ) from the spatial domain.
g ( x , y ) f ( x , y ) k gmask ( x , y )
k=1 unsharp masking, k>1 highboost filtering
g ( x , y ) F 1 1 k H HP ( u, v ) F ( u, v )
g ( x , y ) F 1 k1 k2 H HP ( u, v ) F ( u, v )
Homomorphic Filtering
An image can be expressed as the product of its ilumination
i(x,y) and reflectance r(x,y):
f ( x , y ) i ( x , y )r ( x , y )
Because F f ( x , y ) F i ( x , y ) F r ( x , y ) , consider:
z ( x , y ) ln f ( x , y ) ln i ( x , y ) ln r ( x , y )
Taking the Fourier transform of this relation we have:
Z ( u, v ) Fi ( u, v ) Fr ( u, v )
ln r(x,y), respectively.
We can filter Z(u,v) using a filter H(u,v) so that
S ( u, v ) H ( u, v ) Z ( u, v ) H ( u, v )Fi ( u, v ) H ( u, v )Fr ( u, v )
The filtered image in the spatial domain is:
s( x , y ) F 1 S ( u, v ) F 1 H ( u, v )Fi ( u, v ) F 1 H ( u, v )Fr ( u, v )
Define:
i ( x , y ) F 1 H ( u, v )Fi ( u, v )
r ( x , y ) F 1 H ( u, v )Fr ( u, v )
g ( x , y ) e s ( x , y ) e i ( x , y )e r ( x , y ) i0 ( x , y )r0 ( x , y )
i0 ( x , y ) e
i( x , y )
Selective Filtering
There are applications in which it is of iterest to process
specific bands of frequencies (bandreject or bandpass filters)
or small regions of the frequency rectangle (notch filters)
0
H ( u, v )
1
W
W
if D0
D( u, v ) D0
2
2
otherwise
H ( u, v )
1
W D ( u, v )
1 2
2
D
u
v
D
(
,
)
2n
H ( u, v ) 1 e
D 2 ( u ,v ) D02
W D ( u ,v )
H BP ( u, v ) 1 H BR ( u, v )
Notch Filters
A notch filter rejects (or passes) frequencies in a predefined
neighborhood about the center of the frequency rectangle.
Zero-phase-shift filters must be symmetric about the origin,
so a notch filter with center at (u0,v0) must have a
corresponding notch at location (-u0,-v0).
Notch reject filters are constructed as products of highpass
filters whose centers have been translated to the center of the
notches. The general form is:
H NR ( u, v ) H k ( u, v ) H k ( u, v )
k 1
M N
2 , 2 . The distance computations for each filter are made
M
N
Dk ( u, v ) u
uk v vk
2
2
M
N
uk v vk
D k ( u, v ) u
2
2
H NR ( u, v ) {
k 1
1
D0 k
1
(
,
)
D
u
v
k
2n
}{
1
D0 k
1
(
,
)
D
u
v
k
2n
H NP ( u, v ) 1 H NR ( u, v )
One of the applications of notch filtering is for selectively
modifying local regions of the DFT. This type of processing
is done interactively, working directly on DFTs obtained
without padding.
g ( x , y ) H [ f ( x , y )] ( x , y )
Given g(x,y), some knowledge about the degradation function
Noise Models
g( x , y ) f ( x , y ) ( x , y ) ( H I )
The main sources of noise in digital images arise during
image
acquisition
and/or
transmission
(environmental
noise
may
be
considered
random
variable,
Gaussian noise
p( z )
( z z )2
2 2
Rayleigh noise
( z a )
2
( z a )e b
p( z ) b
0
b
4
b(4 )
4
2
for z a
for z a
p( z ) (b 1)!
0
for z 0
for z 0
b
z
a
b
2
a
2
a , b 0, b
Exponential noise
ae az
p( z )
0
for z 0
,
for z 0
1
z
a
1
2
a
2
(Erlang b=1)
a0
Uniform noise
p( z ) b a
0
for a z b
otherwise
ab
z
2
2
(
b
a
)
2
12
p( z ) Pb
0
for z a
for z b
otherwise
Periodic noise
z z i pS ( z i )
i 0
L 1
2 ( z i z ) 2 pS ( z i )
i 0
g( x , y ) f ( x , y ) ( x , y )
(1)
G ( u , v ) F ( u, v ) N ( u , v )
(2)
f ( x , y ) F 1 G ( u, v ) N e ( u, v )
Mean Filters
1
g( s, t )
m n ( s ,t )S xy
f ( x , y ) g ( s , t )
( s ,t )S xy
1
mn
f ( x , y )
( s , t )S xy
Harmonic mean filter works well for salt noise, but fails for
pepper noise. It also works well on Gaussian noise.
Contraharmonic mean filter
f ( x , y )
g ( s , t )Q 1
( s , t )S xy
( s , t )S xy
g( s, t )
Order-Statistic Filters
Median filter
f ( x , y ) median{ g ( s , t );( s , t ) S xy }
Midpoint filter
f ( x , y ) 1 max{ g ( s , t );( s , t ) S } min{ g ( s , t );( s , t ) S }
xy
xy
2
H f ( x , y ) g ( x , y )
, , , f
observation
2.
experimentation
3.
mathematical modelling
G ( u, v )
H ( u, v )
A
G(u,v) is the Fourier transform of the observed image and A
Estimation by Modelling
5
2 6
v )
g ( x , y ) f ( x x0 ( t ), y y0 ( t ))dt
0
G ( u, v ) F ( u, v ) e
i 2 ux0 ( t ) vy0 ( t )
H ( u, v ) e
i 2 ux0 ( t ) vy0 ( t )
dt
dt
G ( u , v ) H ( u, v ) F ( u , v )
If the motion variables x0(t) and y0(t)
at
bt
, y0 ( t )
x0 ( t )
T
T
T
sin ( ua vb ) e i ( ua vb )
H ( u, v )
( ua vb )
Inverse Filtering
F ( u, v )
array operation
H ( u, v )
G ( u , v ) H ( u, v ) F ( u, v ) N ( u, v )
N ( u, v )
F ( u, v ) F ( u, v )
H ( u, v )
estimate F ( u, v ) .
e 2 E ( f f )2
(1)
It is assumed that:
- the noise and the image are uncorrelated;
- the noise or the image has zero mean;
- the intensity levels in the estimate are a linear function of
the levels in the degraded image
From relation (1) we get:
H ( u, v )
1
G ( u, v ) (2)
F ( u, v )
S ( u, v )
H ( u, v )
2
H ( u, v )
S
(
u
,
v
)
f
SNR
2
|
F
(
u
,
v
)
|
u 0 v 0
M 1 N 1
2
N
u
v
|
(
,
)
|
u 0 v 0
M 1 N 1
f ( x , y ) f ( x , y )
x 0 y0
SNR
f ( x , y )2
x 0 y0
M 1 N 1
( x , y )
f
x
y
f
(
,
)
x 0 y0
The closer f and f are, the larger this ratio will be.
H
(
u
,
v
)
F ( u, v )
G ( u, v )
2
H ( u, v ) H ( u, v ) K
An
Cones are the sensors in the eye responsible for color vision.
It has been established that the 6 to 7 million cones in the
human eye can be devided into three principal sensing
categories, corresponding roughly to red, green, and blue.
Approximately 65% of all cones are sensitive to red light,
33% are sensitive to green light, an only about 2% are
sensitive to blue (but the blue cones are the most sensitive).
10
12
Saturation refers to
14
X
x
X Y Z
Y
y
X Y Z
Z
z
X Y Z
x y z 1
15
16
17
18
19
Color Models
A color model (color space or color system) is a
specification of a coordinate system and a subspace within
that system where each color is represented by a single point.
http://www.colorcube.com/articles/models/model.htm
Most color models in use today are oriented either toward
hardware (color monitors or printers) or toward applications
where color manipulation is a goal.
20
CMY
(cyan-magenta-yellow)
and
CMYK
21
22
2
8
16.777.216
25
26
28
Each safe color is formed from three of the two digit hex
numbers from the above table. For example purest red if
FF0000. The values 000000 and FFFFFF represent black and
white respectively.
Figure 6.10(a) shows the 216 safe colors, organized in
descending RGB values. Figure 6.10(b) shows the hex codes
for all the possible gray colors in the 216 safe color system.
Figure 6.11 shows the RGB safe-color cube.
29
http://www.techbomb.com/websafe/
30
C 1 R
M 1 G
Y 1 B
From this equation we can easily deduce, that pure cyan does
not reflect red, pure magenta does not reflect green, and pure
yellow does not reflect blue.
32
33
34
39
if B G
if B G
( R G ) ( R B )
2
arccos
1
( R G )2 ( R B )(G B ) 2
3
S 1
min{ R, G , B }
RG B
1
I R G B
3
41
42
B I (1 S )
S cos H
R I 1
cos(60
H
)
G 3I ( R B)
43
S cos H
H H 120 , G I 1
cos(60
)
H
B 3I ( R B)
S cos H
H H 240 , B I 1
cos(60
)
H
R 3I ( R B)
44
46
47
f ( x , y ) ck if f ( x , y ) Vk .
48
51
c R ( x , y ) R( x , y )
3
4
f : D / , f ( x , y ) c cG ( x , y ) G ( x , y )
cB ( x , y ) B( x , y )
C ( x , y )
M ( x , y )
f ( x, y) c
Y ( x , y )
K ( x, y)
52
Color Transformations
- processing the components of a color image within the
54
55
56
57
Color Complements
negatives).
58
59
HSI
space
equivalent.
The
saturation
background
- use the region defined by the colors as a mask for further
processing.
One of the simplest ways to slice a color image is to map
the colors outside some range of interest to a neutral color.
If the colors of interest are enclosed by a cube (or
hypercube, if n>3) of width W and centered at a
61
prototypical
(e.g.
average)
color
with
components
, 1 j n
0.5 if rj a j
si
2
ri
otherwise
, i 1, 2,..., n
62
For the RGB color space, for example, a suitable neutral point
is middle gray or color (0.5, 0.5, 0.5).
If a sphere is used to specify the colors of interest, the
transformations are:
n
2
2
0.5
if
(
r
a
)
R
j
j
0
si
j 1
r
otherwise
i
, i 1, 2,..., n
64
16
X
Y
a* 500 h
h
YW
XW
Y
b* 200 h
YW
66
Z
h
ZW
3 q
h(q )
16
7.787q
116
q 0.008856
q 0.008856
69
70
1
c ( x, y)
K
K
1
c ( x, y)
K
1
K
c( s , t )
( s , t ) S xy
71
R( s , t )
( s , t ) S xy
G ( s, t )
( s , t ) S xy
B( s , t )
( s , t ) S xy
72
73
2 R( x , y )
2
2
c( x , y ) G ( x , y )
2 B( x , y )
74
Preliminaries
The reflection of a set B, denoted B is defined as
B { w ; w b , for b B }
The translation of a set B by point z = (z1, z2), denoted (B)z
is defined as
( B ) z { c ; c z b for b B }
A B { z ; ( B )z A }
This definition indicates that the erosion of A by B is the set
of all points z such that B, translated by z, is contained in A.
A B { z ; ( B ) z Ac }
A B { w 2 ; w b A for every b B }
A B ( A) b
bB
Dilation
Let A and B be to sets in 2 . The dilation of A by B, denoted
A B is defined as:
A B { z ; ( B )z A }
The dilation of A by B is the set of all displacements, z, such
that B and A overlap by at least one element. The above
definition can be written equivalently as:
A B { z ; ( B ) z A A }
We assume that B is a structuring element.
Duality
Erosion and dilation are duals of each other with respect to set
complementation and reflection:
c
A
A
B
c
A
A
B
A
B
c
A B
Ac B
A B A
2.
if C D then C B D B
3.
A B B A B
A AB
2)
if C D then C B D B
3)
AB B AB
A B ( A D ) Ac (W D )
A B ( A B1 ) Ac B2
The set A B contains all the (origin) points at which,
simultaneously, B1 found a match (hit) in A and B2 found a
A B ( A B1 ) ( A B2 )
The above three equations for A B are referred as the
morphological hit-or-miss transform.
of
morphology
is
in
extracting
image
Boundary Extraction
The boundary of a set A, denoted (A), can be obtained by
first eroding A by B and then performing the set difference
between A and its erosion.
( A) A A B
where B is a suitable structuring element.
Hole Filling
A hole may be defined as a background region surrounded by
a connected border of foreground pixels.
We present an
, k 1, 2, 3,...
Xk=Xk-1. The set Xk then contains all the filled holes. The set
Convex Hull
C ( A) D i
i 1
Morphological Reconstruction
Morphological reconstruction is a powerful morphological
transformation that involves two images and a structuring
element. One image, the marker, contains the starting points
for the transformation. The second image, the mask,
constrains the transformation. The structuring element is used
to define connectivity.
DG(1) ( F ) ( F B ) G
The geodesic dilation of size n of F with respect to G is
defined as:
EG(1) ( F ) ( F B ) G
The geodesic erosion of size n of F with respect to G is
defined as:
Opening by reconstruction
In a morphological opening, erosion removes small objects
and the dilation attempts to restore the shape of the objects
that remains. The accuracy of this restoration depends on the
similarity of the shape of the objects and the structuring
element used. Opening by reconstruction restores exactly the
shapes of the objects that remain after erosion. The opening
by reconstruction of size n of an image F is defined as the
reconstruction by dilation of F from the erosion of size n of F
OR( n ) ( F ) RFD F nB
where ( F nB ) denotes n erosions of F by B.
Figure 9.29 shows an example of opening by reconstruction.
We are interested in extracting from this image the characters
that contain long, vertical strokes. Opening by reconstruction
requires at least one erosion operation that was performed and
produced Figure 9.29(b). The structuring elements length
was proportional to the average height of the tall characters
(51 pixels) and width of one pixel.
Filling holes
The following procedure is a fully automated procedure for
filling holes based on morphological reconstruction. Let
if ( x , y ) is on the border of I
otherwise
Then
H R ( F )
D
Ic
Gray-Scale Morphology
In this section we denote by f(x,y) the gray-scale image and
by b(x,y) the structuring element. Structuring elements in
gray-scale morphology are of two categories: nonflat and flat.
b ( x , y ) b( x , y ).
f b ( x , y ) min f ( x s, y t ) ; ( s, t ) b
To find the erosion of f by b, we place the origin of the
structuring element at every pixel location in the image. The
erosion at any location is determined by selecting the
(x,y):
f b ( x , y ) max f ( x s, y t ) ; ( s, t ) b
Because gray-scale erosion with a flat SE computes the
minimum value of f in every neighborhood of (x,y) coincident
( f b )c ( x , y ) ( f c b )( x , y ) ,
f c ( x , y ) f ( x , y ) , b b( x , y )
( f b )c ( f c b )
( f b )c ( f c b )
f b ( f b) b
f b ( f b) b
The opening and the closing for gray-scale images are dual
with respect to complementation and SE reflection:
( f b )c f c b
( f b )c f c b
1. f b f
2. If f1 f 2 then f1 b
f2 b
3. f b b f b
The notation e r indicates that the domain of e is a subset
of the domain of r, and also that e(x, y) r(x, y) for any (x, y)
in the domain of e.
Similarly, the closing operation satisfies the following
properties:
a. f f b
b. If f1 f 2 then f1 b
f2 b
c. f b b f b
Morphological gradient
Dilation and erosion can be used in combination with image
subtraction to obtain the morphological gradient of an image:
g f b f b
The dilation thickens regions in an image and the erosion
shrinks them. Their difference emphasizes the boundaries
between regions. Homogeneous areas are not affected (as
long as the SE is relatively small) so the subtraction operation
tends to eliminate them. The net result is an image in which
Image Segmentation
Segmentation subdivides an image into its constituent regions
and objects. The level of detail to which the subdivision is
carried depends on the problem being solved. Segmentation
should stop when the objects or regions of interest in an
application have been detected. For example, in the
automated inspection of electronic assemblies, interest lies in
analyzing images of products with the objective of
determining the presence or absence of specific anomalies,
Fundamentals
Let R represent the entire spatial region occupied by an
image. Image segmentation can be viewed as a process that
partitions R into n subregions R1, R2,, Rn such that:
(a)
R
i 1
Background
Abrupt, local changes in intensity can be detected using
derivatives, usually first- and second-order derivatives which
are defined in terms of differences.
Any approximation for a first derivative must be:
(1) zero in areas of constant intensity
(2) nonzero at the onset of an intensity step or ramp
(3) nonzero at points along an intensity ramp.
f
f ( x ) f ( x 1) f ( x )
x
2 f f ( x )
f ( x 1) f ( x ) f ( x 2) 2 f ( x 1) f ( x )
2
x
x
2 f
f ( x ) f ( x 1) f ( x 1) 2 f ( x )
2
x
The response of the mask at the center point of the region is:
9
R w1 z1 w2 z2 w9 z9 wk zk
(1)
k 1
f
2
f ( x, y)
2 the Laplacian
2
x
y
2 f
f ( x 1, y ) f ( x 1, y ) 2 f ( x , y )
2
x
2 f
f ( x , y 1) f ( x , y 1) 2 f ( x , y )
2
y
2 f ( x , y ) f ( x 1, y ) f ( x 1, y ) f ( x , y 1)
f ( x , y 1) 4 f ( x , y )
if
R( x , y ) T
otherwise
Edge Models
Edge detection is the approach used most frequently for
segmenting images based on abrupt (local) changes in
intensity.
Edge models are classified according to their intensity
profiles. A step edge involves a transition between two
intensity levels occurring ideally over the distance of 1 pixel.
Figure 10.8(a) shows a section of a vertical step edge and a
horizontal intensity profile through the edge.
f
g x x
f grad ( f )
g y f
y
M ( x , y ) mag(f )
g x2 g 2y
Gradient operators
f ( x , y )
gx
f ( x 1, y ) f ( x , y )
x
f ( x , y )
gy
f ( x , y 1) f ( x , y )
y
f
z9 z5 f ( x 1, y 1) f ( x , y )
gx
x
f
z8 z6 f ( x 1, y ) f ( x , y 1)
gy
y
Masks of size 2 2 are simple conceptually, but they are
not as useful for computing edge direction as masks that are
symmetric about the center point, the smallest of which are of
size 3 3.
Prewitt operators
f
gx
( z7 z8 z9 ) ( z1 z2 z3 )
x
f
gy
( z3 z6 z9 ) ( z1 z4 z7 )
y
Sobel operators
f
( z7 2 z8 z9 ) ( z1 2 z2 z3 )
gx
x
f
( z3 2 z6 z9 ) ( z1 2 z4 z7 )
gy
y
x2 y2
2 2
(2)
x y 2
G( x, y)
e
x2 y2
2 2
(3)
g ( x , y ) 2G ( x , y ) f ( x , y ) 2 G ( x , y ) f ( x , y )
The
Marr-Hildreth
edge-detection
algorithm
may
be
summarized as follows:
1. Filter the input image with an n n Gaussian lowpass
filter obtained by sampling equation (2)
2. Compute the Laplacian of the image resulting in Step 1
3. Find the zero crossing of the image from Step 2.
x2
d 2 2
x 2 2
2e
.
e
dx
Let f(x,y) denote the input image and G(x,y) denote the
Gaussian function:
G( x, y) e
x2 y2
2 2
f s ( x , y ) G ( x , y ) f ( x , y ) .
We compute the gradient magnitude and the angle for fs
M ( x, y)
g g ,
2
x
2
y
f s
f s
gx
, gy
x
y
gx
( x , y ) arctan
g y
M(x, y) contains ridges around local maxima. The next step is
to thin those ridges. One approach is to use nonmaxima
suppression. This can be done in several ways, but the
Let d1, d2 , d3 and d4 denote the four basic edge directions for
a 33 region: horizontal, -45, vertical, and +45, respectively.
(x,y):
1. Find the direction dk that is closest to (x,y).
2. If the value of M(x,y) is less than at least one of its two
neighbors
along
dk,
let
gN(x,y)=0
(suppression);
g NH ( x , y ) g N ( x , y ) TH
g NL ( x , y ) g N ( x , y ) TL
g NH g NL 0 initially
After thresholding gNH(x,y) will have fewer nonzero pixels
than gNL(x,y) in general, but all the nonzero pixels in
gNL(x,y) will be contained in gNH(x,y) because the later
nonmaxima
suppression
to
the
gradient
magnitude image.
4. Use double thresholding and connectivity analysis to
detect and link edges.
simplification
particularly
well
suited
for
real
time
1
g( x , y )
0
if M ( x , y ) TM AND ( x , y ) A TA
otherwise
3. Scan the rows of g and fill (set to 1) all gaps (sets of 0s)
in each row that do not exceed a specified length K.
Note that a gap is bounded at both ends by one or more
1s. The rows are processed individually, with no
memory between them.
4. To detect gaps in any other direction , rotate g by this
angle and apply the horizontal scanning procedure in
Step 3. Rotate the result back by .
Polygonal
approximations
are
particularly
Infinitely many lines pass through (xi , yi) , but they all satisfy
the equation yi a xi b for varying values of a and b.
However, writing this equation as:
b x i a yi
parallel, this line intersects the line associated with (xi , yi) at
some point (a', b'). In fact, all the points on this line have
lines in parameter space that intersect at (a', b').
x cos y sin
min , max
Any point (x,y) in the image for which f(x,y) > T is called
an object point; otherwise, the point is called a
if f ( x , y ) T
if f ( x , y ) T
g( x , y ) b
c
if f ( x , y ) T2
if T1 f ( x , y ) T2
if f ( x , y ) T1
L 1
with
p
i 0
1 , pi 0.
P1 ( k ) pi
i 0
L 1
i k 1
pi .
P (i )
1 k
m1 ( k ) iP ( i / C1 ) iP (C1 / i )
ipi
P (C1 ) P1 ( k ) i 0
i 0
i 0
P(i/C1) is the probability of value i , given that i comes from
class C1. We have used the Bayes formula:
P ( A)
.
P ( A / B ) P ( B / A)
P( B)
P(C1/i)=1 the probability of C1 given i (i belongs to C1).
Similarly, the mean intensity value of the pixels assigned to
class C2 is:
1 L 1
m2 ( k ) i P ( i / C 2 )
i pi .
P2 ( k ) i k 1
i k 1
The cumulative mean (average intensity) up to level k is
given by:
k
m ( k ) ipi
i 0
mG ipi
i 0
We have:
P1 m1 P2 m2 mG ,
P1 P2 1 .
B2
2
G
where G2 is the global variance:
L 1
G2 ( i mG )2 pi
i 0
B2 P1 ( m1 mG )2 P2 ( m2 mG )2 P1 P2 ( m1 m2 )2
mG P1 m
P1 (1 P1 )
From the above formula, we see that the farther the two
means m1 and m2 are from each other the larger B2 will be,
indicating that the between-class variance is a measure of
separability between classes. Because G2 is a constant, it
We have:
B2 ( k )
(k )
G2
mG P1 ( k ) m ( k )
(k )
P1 ( k ) 1 P1 ( k )
2
B
B2 ( k ) max{ B2 ( k ) ; 0 k L 1 , k integer }.
If the maximum exists for more than one value of k, it is
customary to average the various values of k for which B2 ( k )
is maximum.
Once k* has been obtained, the input image is segmented as:
1
g( x , y )
0
if f ( x , y ) k
if f ( x , y ) k
Multiple Thresholds
The idea of the thresholding method used by Otsus method
can be extended to an arbitrary number of thresholds, because
the separability measure on which it is based also extends to
an arbitrary number of classes. In the case of K classes, C1,
B2 Pk ( mk mG )2
k 1
Pk
iC k
1
mk
Pk
i p
iC k
B2 ( k1 , k2 ,..., k K 1 )
max
{ B2 ( k1 , k2 ,..., k K 1 )}
0 k1 k2 k K 1 L 1
k1 , k2 ,k K 1 integers
B2 P1 ( m1 mG )2 P2 ( m2 mG )2 P3 ( m3 mG )2
P1 pi
i 0
1 k1
m1
i pi
P1 i 0
P2
k2
i k1 1
1
, m2
P2
pi
k2
i k1 1
i pi
P1 m1 P2 m2 P3 m3 mG ,
P3
L 1
i k2 1
pi
1
m3
P3
L 1
i k2 1
i pi
P1 P2 P3 1.
B2 ( k1 , k2 )
max
0 k1 k2 L 1
B2 ( k1 , k2 ) .
g( x , y ) b
if f ( x , y ) k1
if k1 f ( x , y ) k2
if f ( x , y ) k2
(
k
,
k
B
1
2)
( k1 , k2 )
.
2
G
Variable Thresholding
Image partitioning
One of the simplest approaches to variable thresholding is to
subdivide an image into nonoverlapping rectangles. This
approach is used to compensate for non-uniformities in
illumination and/or reflectance. The rectangles are chosen
small enough so that the illumination of each is
approximately uniform.
Txy a xy bm xy , a , b 0
Txy a xy bmG , mG - global image mean .
The segmented image is computed as:
1
g( x , y )
0
if f ( x , y ) Txy
if f ( x , y ) Txy
1
g( x , y )
0
n i k 2 n
1
m ( k ) ( zk 1 zk n )
n
z1
, m (1)
n
Multivariable Thresholding
In some cases, a sensor can make available more than
one variable to characterize each pixel in an image, and
thus allow multivariable thresholding. A notable example
is color imaging where red (R), green (G), and blue (B)
components are used to form a composite color image. In
this case, each pixel is characterized by three values,
and can be represented as a 3-D vector z = (z1 , z2 , z3)T
whose components are the RGB colors at a point.
if D( z , a ) T
otherwise
, T is a threshold
1
2
Mahalanobis distance
D( z , a ) ( z a )T C 1 ( z a )
1
2
size.
basic
region-growing
algorithm
based
on
fQ(x,y)=0.
TRUE
Q
FALSE
Q(RjRk)=TRUE.
The procedure described above can be summarized as follows
1. Split into four quadrants any region Ri for which
Q(Ri)=TRUE
2. When no further splitting is possible, merge any
adjacent regions Rj and Rk for which Q(RjRk)=TRUE
3. Stop when no further merging is possible.
TRUE
Q
FALSE
if a AND 0 m b
otherwise
Representation
-
boundary following
chain codes
polygonal approximations
signatures
skeletons
Boundary Following
We assume that the points in the boundary of a region are
ordered in a clockwise (or counterclockwise) direction. We
also assume that:
1. we are working with binary images in which objects are
labeled 1 and background 0;
2. the images are padded with a border of 0s to eliminate
the possibility of an object merging with the image
border.
Chain Codes
Chain codes are used to represent a boundary by a connected
sequence of straight line segments of specified length and
direction.
The direction of each segment is coded by using a numbering
scheme such as the ones shown below. A boundary code
formed a sequence of such directional numbers is referred to
as a Freeman chain code.
Polygonal Approximations
The
MPP algorithm
The set of cells enclosing a digital boundary is called a
cellular complex. We assume that the boundaries under
consideration are not self intersecting, which leads to simply
connected cellular complexes. Based on these assumptions,
and letting white (W) and black (B) denote convex and
mirrored concave vertices, respectively, we state the
following observations:
y1 1
y2 1
y3 1
det A = 0
< 0
Merging technique
The idea is to merge points along a boundary until the least
square error line fit of the points merged so far exceeds a
Splitting technique
One approach to boundary segment splitting is to subdivide a
segment successively into two parts until a specified criterion
is satisfied. For instance, a requirement might be that the
maximum perpendicular distance from a boundary segment to
the line joining its two end points not exceed a preset
threshold. If it does, the point having the greatest distance
from the line becomes a vertex, thus subdividing the initial
segment into two subsegments.
Signatures
A signature is a 1-D functional representation of a boundary
and may be generated in various ways. One of the simplest is
to plot the distance from the centroid of the region to the
boundary as a function of angle.
The basic idea is to reduce the boundary representation to a
1-D function that presumably is easier to describe than the
original 2-D boundary. Signatures generated by the approach
just described are invariant to translation, but they do depend
Skeletons
The approach is representing the structural shape of a plane
image using graph theory. We first obtain the skeleton of the
image via a thinning (skeletonizing) algorithm.
The skeleton of a region may be defined via the medial axis
transformation (MAT) proposed by Blum. Let R be a region
with border B. The MAT of a region is computed as follows:
for each point p in R, we find its closest neighbor in B. If p
has more than one such neighbor then it belongs to the medial
Implementation
potentially
involves
Step 1
A contour point p1 is flaged for deletion if the following
conditions are satisfied:
a)
2 N(p1) 6
b)
T(p1)=1
c)
p2 p4 p6 = 0
d)
p4 p6 p8 = 0
N(p1)= p2+p3 ++p8+p9 (pi{0,1})
Boundary Descriptors
The length of a boundary is one of its simplest descriptors.
The number of pixels along a boundary gives a rough
approximation of its length.
The diameter of a boundary B is defined as:
Diam( B ) max{ D( pi , p j ); pi , p j B }
Shape numbers
Assume that the boundary is described by the first difference
of a the associated chain-coded. The shape number of such a
boundary, based on the 4-directional code, is defined as the
first difference of smallest magnitude. The order n of a shape
number is defined as the number of digits in its
representation. Moreover, n is even for a closed boundary,
and its value limits the number of possible different shapes.
Fourier descriptors
Assume we have a K-point digital boundary in the xy-plane:
( x0 , y0 ),( x1 , y1 ),...,( x K 1 , y K 1 ).
are
the
points
of
the
s( k ) x ( k ) i y( k ) , k 0,1,..., K 1
We compute the discrete Fourier transform (DFT) of s(k) is
K 1
a ( u) s( k )e i 2 uk / K , u 0,1,..., K 1
k 0
1
s( k )
K
K 1
i 2 uk / K
(
)
, k 0,1,..., K 1
a
u
e
u 0
1 P 1
s ( k ) a ( u)e i 2 uk / P , k 0,1,..., K 1.
P u 0
Although P terms are used to obtain each component of s ( k )
Statistical moments
The shape of boundary segments (and of signature
waveforms) can be described quantitatively by using
statistical moments, such as the mean, variance, and higher
order moments.
A1
i 0
i 0
n ( v ) ( v i m ) n p ( v i ) , m v i p( v i ) .
The quantity m is recognized as the mean or average value of
Regional descriptors
The area of a region is defined as the number of pixels in the
region. The perimeter of a region is the length of its
boundary. These two descriptors apply primarily to situations
in which the size of the regions of interest is invariant. A
more frequent use of these two descriptors is in measuring
compactness of a region:
(perimeter)2 P 2
compactness =
A
area
4 A
Rc 2 .
P
The value of this measure is 1 for a circular region and /4 for
a square. Compactness is a dimensionless measure and thus is
insensitive to uniform scale changes; it is insensitive also to
Topological Descriptors
Topology is the study of properties of a figure that are
unaffected by any deformation, as long as there is no tearing
or joining of the figure (sometimes these are called
rubber-sheet distortions).
For example, the above figure shows a region with two holes.
Thus if a topological descriptor is defined by the number of
holes (H) in the region, this property obviously will not be
affected by a stretching or rotation transformation. In general,
however, the number of holes will change if the region is torn
or folded. Note that, as stretching affects distance, topological
properties do not depend on the notion of distance or any
properties implicitly based on the concept of a distance
measure.
E = C H.
Regions represented by straight-line segments (referred to as
polygonal networks) have a particularly simple interpretation
in terms of the Euler number.
V-Q+F = C-H = E.
Texture
An important approach to region description is to quantify its
texture content. Although no formal definition of texture
exists, this descriptor provides measures of properties such as
smoothness, coarseness and regularity. The three principal
approaches for describing the texture of a region are
statistical, structural, and spectral. Statistical approaches yield
characterizations of textures as smooth, coarse, grainy,
Structural techniques deal with the arrangement of image
Statistical approaches
One of the simplest for describing texture is to use statistical
moments of the intensity histogram of an image or region. Let
n ( z ) ( z i m ) n p( z i )
i 0
L 1
, m z i p( z i ) .
i 0
R( z ) 1
1
1 2 (z)
U ( z ) p 2 ( zi )
i 0
L 1
e( z ) p( zi )log 2 p( zi )
i 0
Structural aproach
Structural techniques deal with the arrangement of image
primitives. They use a set of predefined texture primitives and
a set of construction rules to define how a texture region is
constructed with the primitives and the rules.
Spectral approaches
Spectral techniques use the Fourier transform of the image
and its properties in order to detect global periodicity in an
image, by identifying highenergy, narrow peaks in the
spectrum.
The Fourier spectrum is ideally suited for describing the
directionality of periodic or almost periodic 2-D patterns in an
image.
S ( r ) S ( r )
0
R0
, S ( ) Sr ( )
r 1
in
order
to
characterize
their
behavior
Test
Pattern
Feature
Extraction
Classifier
Sample
Pattern
Feature
Extraction
Learning
Classified
Output
P (C k | x )
p( x | C k ) P ( C k )
N
p( x | C ) P ( C )
i 1
p( x | C i )
1
( x i )T i 1 ( x i )
2
(2 )2 det i
Dk min{d ( x , i ); i 1, 2,..., d }
where d is a distance.
Minkowski Distance
d p ( y, z )
y
i 1
zi
between the test pattern with the feature vector x and that
pattern class C is given by
d ( x , C ) ( x )T 1 ( x ) .
Bounded Distance
In many pattern classification problems, it may be useful to
work with a bounded distance function, which lies in the
range [0,1]. Any given distance function D(x,y) may be
transformed into a bounded distance function d(x,y) , where:
D( x , y )
.
d ( x, y)
D( x , y ) 1
Nonparametric Classification
The nonparametric classification strategies are not dependent
on the estimation of parameters.
k-Neareast-Neighbor Classification
In many situations we may not have the complete statistical
knowledge about the underlying joint distribution of the
F = { f1 , . . . , fd }
of dimension n d, where n is the total number of pixels in
the image. It may be noted here that this matrix contains lot of
local information of the entire image, much of which is
g ( x ) w T x x0 0
is a hyperplane, which partitions the feature space in two
subspaces. In Fisher's linear discriminat approach, the
y wT x
are well separated. The measure of this separation can be
chosen as
2
(
m
m
)
J ( w T ) 12 22
S1 S2
S i2
2
(
y
m
)
y C i
1 m
2 ) , W 1 2 .
w W1 ( m
2 . K-means methods
3. Graph theoretic methods
In hierarchical algorithms, the data set is partitioned in a
number of clusters in a hierarchical fashion. The hierarchical
hierarchical
clustering
may
be
represented
by
1
Davg ( P1 , P2 )
2
1
2
d
(
p
,
p
i j)
i, j
n1n2
1.
(segmentation problem);
2.
supported
by
string
grammars
is
the
Syntactic Inference
A key problem in syntactic Pattern Recognition is inferring
an appropriate grammar using a set of samples belonging to
different pattern classes.
In syntactic pattern recognition, the problem of grammatical
inference is one of central importance. This approach is based
on the underlying assumption of the existence of at least one
grammar characterizing each pattern class. The identification
and extraction of the grammar characterizing each pattern
Neural Networks
The approaches discussed untill now are based on the use of
sample patterns to estimate statistical parameters of each
pattern class (mean vector of each class,covariance matrix).
The patterns (of known class membership) used to estimate
these parameters usually are called training patterns, and a
making
probability
assumptions
density
functions
regarding
or
the
other
underlying
probabilistic
d ( x ) wi xi wn 1
i 1
O
1
if
w x
i 1
wn1
wn1
if
w x
i 1
n1
d ( y ) w i yi y T w ,
i 1
Training algorithms
Linearly separable classes: A simple, iterative algorithm for
obtaining a solution weight vector for two linearly separable
training sets follows. For two training sets of augmented
pattern vectors belonging to pattern classes C1 and C2,
respectively, let w(l) represent the initial weight vector, which
may be chosen arbitrarily. Then, at the kth iterative step:
w ( k ) cy( k ) if y( k ) C1 and w T ( k ) y( k ) 0
w ( k 1) w ( k ) cy( k ) if y( k ) C 2 and w T ( k ) y( k ) 0
w(k )
otherwise
1
J ( w ) ( r w T y )2
2
Where r is the desired response (r=+1 if y belongs to C1 and
w ( k 1) w ( k ) r ( k ) w T ( k ) y( k ) y( k ), w (1) arbitrary.
Multilayer Perceptron
The most popular neural network model is the multilayer
perceptron (MLP), which is an extension of the single layer
perceptron proposed by Rosenblatt. Multilayer perceptrons, in
general, are feedforward network, having distinct input,
output, and hidden layers. The architecture of multilayered
perceptron with error backpropagation network is shown in
the figure below.
I wijh xi jh .
h
j
i 1
1
Oj f (I )
1 exp( I hj )
h
j
h
j
the
error
and
hence
the
name
error
w (jknew ) w (jkold ) j O j
where is the learning rate of the hidden layer neurons.
j O j (1 O j )(T j O j )
where Tj is the ideal response.