You are on page 1of 73

Characteristics of Digital Images

In general terms, a digital image is a digital representation of an object. Remotely sensed


image data are digital representations of the Earth. Image data are stored in data files, also
called image files, on magnetic tapes, computer disks, or other media. The data consist only
of numbers. These representations form images when they are displayed on a screen or are
output to hardcopy. Each number in an image file is a data file value. Data file values are
sometimes referred to as pixels (cell). The term pixel is abbreviated from picture element. A
pixel is the smallest part of a picture (the area being scanned) with a single value. The data
file value is the measured brightness value of the pixel at a specific wavelength.

Raster image data are laid out in a grid similar to the squares on a checkerboard. Each cell of
the grid is represented by a pixel, also known as a grid cell. In remotely sensed image data,
each pixel represents an area of the Earth at a specific location. The data file value assigned
to that pixel is the record of reflected radiation or emitted heat from the Earth‘s surface at that
location. Data file values may also represent elevation, as in digital elevation models
(DEMs).

Electromagnetic energy may be detected either photographically or electronically. The


photographic process uses chemical reactions on the surface of light-sensitive film to detect
and record energy variations. It is important to distinguish between the terms images and
photographs in remote sensing. An image refers to any pictorial representation, regardless of
what wavelengths or remote sensing device has been used to detect and record the
electromagnetic energy.

A photograph refers specifically to images that have been detected as well as recorded on
photographic film. The black and white photo to the left, of part of the city of Ottawa, Canada
was taken in the visible part of the spectrum. Photos are normally recorded over the
wavelength range from 0.3 mm to 0.9 mm - the visible and reflected infrared. Based on
these definitions, we can say that all photographs are images, but not all images are
photographs. Therefore, unless we are talking specifically about an image recorded
photographically, we use the term image.

1
A photograph could also be represented and displayed in a digital format by subdividing the
image into small equal-sized and shaped areas, called picture elements or pixels, and
representing the brightness of each area with a numeric value or digital number.

Indeed, that is exactly what has been done to the photo to the left. In fact, using the
definitions we have just discussed, this is actually a digital image of the original photograph.
The photograph was scanned and subdivided into pixels with each pixel assigned a digital
number representing its relative brightness. The computer displays each digital value as
different brightness levels. Sensors that record electromagnetic energy, electronically
record the energy as an array of numbers in digital format right from the start. These
two different ways of representing and displaying remote sensing data, either pictorially or
digitally, are interchangeable as they convey the same information (although some detail
may be lost when converting back and forth).

Bands
Digital Image data may include several bands of information. Each band is a set of data file
values for a specific portion of the electromagnetic spectrum of reflected light or emitted heat
(red, green, blue, near-infrared, infrared, thermal, etc.) or some other user-defined
information created by combining or enhancing the original bands, or creating new bands
from other sources.

Pixels and Bands in a Raster Image

2
Bands vs. Layers:
The, bands of data are occasionally referred to as layers. Once a band is imported into a GIS,
it becomes a layer of information which can be processed in various ways. Additional layers
can be created and added to the image file

We see colour because our eyes detect the entire visible range of wavelengths and our brains
process the information into separate colours. Can you imagine what the world would look
like if we could only see very narrow ranges of wavelengths or colours? That is how many
sensors work. The information from a narrow wavelength range is gathered and stored
in a channel, also sometimes referred to as a band. We can combine and display channels of
information digitally using the three primary colours (blue, green, and red). The data from
each channel is represented as one of the primary colours and, depending on the relative
brightness (i.e. the digital value) of each pixel in each channel, the primary colours combine
in different proportions to represent different colours.

When we use this method to display a single channel or range of wavelengths, we are
actually displaying that channel through all three primary colours. Because the brightness
level of each pixel is the same for each primary colour, they combine to form a black and
white image, showing various shades of gray from black to white. When we display more
than one channel each as a different primary colour, then the brightness levels may be
different for each channel/primary colour combination and they will combine to form a
colour image.

NOTE: DEMs are not remotely sensed image data, but are currently being produced from
stereo points in radar imagery.

Coordinate Systems
The location of a pixel in a file or on a displayed or printed image is expressed using a
coordinate system. In two-dimensional coordinate systems, locations are organized in a grid
of columns and rows. Each location on the grid is expressed as a pair of coordinates known as
X and Y. The X coordinate specifies the column of the grid, and the Y coordinate
specifies the row. Image data organized into such a grid are known as raster data.

3
File Coordinates: File coordinates refer to the location of the pixels within the image (data)
file. File coordinates for the pixel in the upper left corner of the image always begin at 0, 0.

Map coordinates—indicates the location of a pixel in a map

1000 1001 1002 1003 1004 1005


1000
1001
(1003, 1001)
1002

1003
1004

Resolution is a broad term commonly used to describe, four distinct types of resolution
must be considered
• spectral—the specific wavelength intervals that a sensor can record
• spatial—the area on the ground represented by each pixel
• radiometric—the number of possible data file values in each band (indicated by the number
of bits into which the recorded energy is divided)
• temporal—how often a sensor obtains imagery of a particular area
These four domains contain separate information that can be extracted from the raw data.

Spatial Resolution, Pixel Size, and Scale


Spatial Resolution Spatial resolution is a measure of the smallest object that can be resolved
by the sensor, or the area on the ground represented by each pixel. The finer the resolution,
the lower the number. For instance, a spatial resolution of 79 meters is coarser than a spatial
resolution of 10 meters.

Scale
The terms large-scale imagery and small-scale imagery often refer to spatial resolution. Scale
is the ratio of distance on a map as related to the true distance on the ground. Large-scale in
remote sensing refers to imagery in which each pixel represents a small area on the ground,
such as SPOT data, with a spatial resolution of 10 m or 20 m. Small scale
4
refers to imagery in which each pixel represents a large area on the ground, such as Advanced
Very High Resolution Radiometer (AVHRR) data, with a spatial resolution of 1.1 km.

This terminology is derived from the fraction used to represent the scale of the map, such as
1:50,000. Small-scale imagery is represented by a small fraction (one over a very large
number). Large-scale imagery is represented by a larger fraction (one over a smaller number).
Generally, anything smaller than 1:250,000 is considered small-scale imagery.

The ratio of distance on an image or map, to actual ground distance is referred to as scale. If
you had a map with a scale of 1:100,000, an object of 1cm length on the map would actually
be an object 100,000cm (1km) long on the ground. Maps or images with small "map-to-
ground ratios" are referred to as small scale (e.g. 1:100,000), and those with larger ratios (e.g.
1:5,000) are called large scale.

NOTE: Scale and spatial resolution are not always the same thing. An image always has the
same spatial resolution, but it can be presented at different scales (Simonett et al, 1983).

Instantaneous Field of View


Spatial resolution is also described as the instantaneous field of view (IFOV) of the sensor,
although the IFOV is not always the same as the area represented by each pixel. The IFOV is
a measure of the area viewed by a single detector in a given instant in time (Star and Estes,
1990).
For example, Landsat MSS data have an IFOV of 79 × 79 meters, but there is an overlap of
11.5 meters in each pass of the scanner, so the actual area represented by each pixel is 56.5 ×
79 meters (usually rounded to 57 × 79 meters). Even though the IFOV is not the same as the
spatial resolution, it is important to know the number of pixels into which the total field of
view for the image is broken. Objects smaller than the stated pixel size may still be detectable
in the image if they contrast with the background, such as roads, drainage patterns, etc.
On the other hand, objects the same size as the stated pixel size (or larger) may not be
detectable if there are brighter or more dominant objects nearby. In Figure 1-8, a house sits in
the middle of four pixels. If the house has a reflectance similar to its surroundings, the data
file values for each of these pixels reflect the area around the house, not the house itself, since
the house does not dominate any one of the four pixels. However, if the house has a
significantly different reflectance than its surroundings, it may still be detectable.

The detail discernible in an image is dependent on the spatial resolution of the sensor and
refers to the size of the smallest possible feature that can be detected. Spatial resolution of
passive sensors (we will look at the special case of active microwave sensors later) depends
primarily on their Instantaneous Field of View (IFOV).
5
The IFOV is the angular cone of visibility of the sensor (A) and determines the area on the
Earth's surface which is "seen" from a given altitude at one particular moment in time (B).
The size of the area viewed is determined by multiplying the IFOV by the distance from the
ground to the sensor (C). This area on the ground is called the resolution cell and determines
a sensor's maximum spatial resolution. For a homogeneous feature to be detected, its size
generally has to be equal to or larger than the resolution cell. If the feature is smaller than
this, it may not be detectable as the average brightness of all features in that resolution cell
will be recorded. However, smaller features may sometimes be detectable if their reflectance
dominates within a articular resolution cell allowing sub-pixel or resolution cell detection.

The most remote sensing images are composed of a matrix of picture elements, or pixels,
which are the smallest units of an image. Image pixels are normally square and represent a
certain area on an image. It is important to distinguish between pixel size and spatial
resolution - they are not interchangeable. If a sensor has a spatial resolution of 20 metres and
an image from that sensor is displayed at full resolution, each pixel represents an area of 20m
x 20m on the ground. In this case the pixel size and resolution are the same. However, it is
possible to display an image with a pixel size different than the resolution. Many posters of
satellite images of the Earth have their pixels averaged to represent larger areas, although the
original spatial resolution of the sensor that collected the imagery remains the same.

Images where only large features are visible are said to have coarse or low resolution. In fine
or high resolution images, small objects can be detected. Military sensors for example, are
designed to view as much detail as possible, and therefore have very fine resolution.
Commercial satellites provide imagery with resolutions varying from a few metres to several
kilometres. Generally speaking, the finer the resolution, the less total ground area can be
seen.

LOW RESOLUTION IMAGE HIGH RESOLUTION IMAGE

6
Spectral Resolution
Spectral resolution refers to the specific wavelength intervals in the electromagnetic spectrum
that a sensor can record (Simonett et al, 1983). For example, band 1 of the Landsat TM
sensor records energy between 0.45 and 0.52 µm in the visible part of the spectrum. Wide
intervals in the electromagnetic spectrum are referred to as coarse spectral resolution,
and narrow intervals are referred to as fine spectral resolution. For example, the SPOT
panchromatic sensor is considered to have coarse spectral resolution because it records EMR
between 0.51 and 0.73 µm. On the other hand, band 3 of the Landsat TM sensor has fine
spectral resolution because it records EMR between 0.63 and 0.69 µm (Jensen, 1996).

NOTE: The spectral resolution does not indicate how many levels the signal is broken into.

The spectral response and spectral emissivity curves are characterize the reflectance and/or
emittance of a feature or target over a variety of wavelengths. Different classes of features
and details in an image can often be distinguished by comparing their responses over distinct
wavelength ranges. Broad classes, such as water and vegetation, can usually be separated
using very broad wavelength ranges - the visible and near infrared. Other more specific
classes, such as different rock types, may not be easily distinguishable using either of these
broad wavelength ranges and would require comparison at much finer wavelength ranges to
separate them. Thus, we would require a sensor with higher spectral resolution. Spectral
resolution describes the ability of a sensor to define fine wavelength intervals. The finer the
spectral resolution, the narrower the wavelength ranges for a particular channel or band.

Black and white film records wavelengths extending over much, or all of the visible portion
of the electromagnetic spectrum. Its spectral resolution is fairly coarse, as the various
wavelengths of the visible spectrum are not individually distinguished and the overall
reflectance in the entire visible portion is recorded. Colour film is also sensitive to the
reflected energy over the visible portion of the spectrum, but has higher spectral resolution, as
it is individually sensitive to the reflected energy at the blue, green, and red wavelengths of

7
the spectrum. Thus, it can represent features of various colours based on their reflectance in
each of these distinct wavelength ranges.

Many remote sensing systems record energy over several separate wavelength ranges at
various spectral resolutions. These are referred to as multi-spectral sensors and will be
described in some detail in following sections. Advanced multi-spectral sensors called
hyperspectral sensors, detect hundreds of very narrow spectral bands throughout the
visible, near-infrared, and mid-infrared portions of the electromagnetic spectrum. Their
very high spectral resolution facilitates fine discrimination between different targets
based on their spectral response in each of the narrow bands.

Radiometric Resolution
Radiometric resolution refers to the dynamic range, or number of possible data files values in
each band. This is referred to by the number of bits into which the recorded energy is divided.

For instance, in 8-bit data, the data file values range from 0 to 255 for each pixel, but in 7-bit
data, the data file values for each pixel range from 0 to 128. In following Figure, 8-bit and 7-
bit data are illustrated. The sensor measures the EMR in its range. The total intensity of the
energy from 0 to the maximum amount the sensor measures is broken down into 256
brightness values for 8-bit data, and 128 brightness values for 7-bit data.

Figure: Brightness Values

8
While the arrangement of pixels describes the spatial structure of an image, the radiometric
characteristics describe the actual information content in an image. Every time an image is
acquired on film or by a sensor, its sensitivity to the magnitude of the electromagnetic energy
determines the radiometric resolution. The radiometric resolution of an imaging system
describes its ability to discriminate very slight differences in energy. The finer the
radiometric resolution of a sensor, the more sensitive it is to detecting small differences in
reflected or emitted energy.

2 bit IMAGE 8 bit IMAGE


Imagery data are represented by positive digital numbers which vary from 0 to (one less than)
a selected power of 2. This range corresponds to the number of bits used for coding numbers
in binary format. Each bit records an exponent of power 2 (e.g. 1 bit = 21 = 2). The maximum
number of brightness levels available depends on the number of bits used in representing the
energy recorded. Thus, if a sensor used 8 bits to record the data, there would be 28 = 256
digital values available, ranging from 0 to 255. However, if only 4 bits were used, then only
24 = 16 values ranging from 0 to 15 would be available. Thus, the radiometric resolution
would be much less. Image data are generally displayed in a range of grey tones, with black
representing a digital number of 0 and white representing the maximum value (for example,
255 in 8-bit data). By comparing a 2-bit image with an 8-bit image, we can see that there is a
large difference in the level of detail discernible depending on their radiometric resolutions.
Temporal Resolution: Temporal resolution refers to how often a sensor obtains imagery of a
particular area. For example, the Landsat satellite can view the same area of the globe once
every 16 days. SPOT, on the other hand, can revisit the same area every three days.
NOTE: Temporal resolution is an important factor to consider in change detection studies.

Figure: Landsat TM—Band 2 (Four Types of Resolution)

9
Temporal resolution: In addition to spatial, spectral, and radiometric resolution, the concept
of temporal resolution is also important to consider in a remote sensing system, which refers
to the length of time it takes for a satellite to complete one entire orbit cycle. The revisit
period of a satellite sensor is usually several days. Therefore the absolute temporal resolution
of a remote sensing system to image the exact same area at the same viewing angle a second
time is equal to this period. However, because of some degree of overlap in the imaging
swaths of adjacent orbits for most satellites and the increase in this overlap with increasing
latitude, some areas of the Earth tend to be re-imaged more frequently. Also, some satellite
systems are able to point their sensors to image the same area between different satellite
passes separated by periods from one to five days. Thus, the actual temporal resolution of a
sensor depends on a variety of factors, including the satellite/sensor capabilities, the swath
overlap, and latitude.

The ability to collect imagery of the same area of the Earth's surface at different periods of
time is one of the most important elements for applying remote sensing data. Spectral
characteristics of features may change over time and these changes can be detected by
collecting and comparing multi-temporal imagery. For example, during the growing season,
most species of vegetation are in a continual state of change and our ability to monitor those
subtle changes using remote sensing is dependent on when and how frequently we collect
imagery. By imaging on a continuing basis at different times we are able to monitor the
changes that take place on the Earth's surface, whether they are naturally occurring (such as
changes in natural vegetation cover or flooding) or induced by humans (such as urban
development or deforestation). The time factor in imaging is important when:
 persistent clouds offer limited clear views of the Earth's surface (often in the tropics)
 short-lived phenomena (floods, oil slicks, etc.) need to be imaged
 multi-temporal comparisons are required (e.g. the spread of a forest disease from one
year to the next)

Data Storage Image data can be stored on a variety of media—tapes, CD-ROMs, or floppy
diskettes, for example—but how the data are stored (e.g., structure) is more important than on
what they are stored. All computer data are in binary format. The basic unit of binary data is
a bit. A bit can have two possible values—0 and 1, or ―off‖ and ―on‖ respectively. A set of
bits, however, can have many more values, depending on the number of bits used. The
number of values that can be expressed by a set of bits is 2 to the power of the number of bits
used.
10
A byte is 8 bits of data. Generally, file size and disk space are referred to by number of bytes.
For example, a PC may have 640 kilobytes (1,024 bytes = 1 kilobyte) of RAM (random
access memory), or a file may need 55,698 bytes of disk space. A megabyte (Mb) is about
one million bytes. A gigabyte (Gb) is about one billion bytes.

Storage Formats: Image data can be arranged in several ways on a tape or other media. The
most common storage formats are: • BIL (band interleaved by line) • BSQ (band sequential) •
BIP (band interleaved by pixel). For a single band of data, all formats (BIL, BIP, and
BSQ) are identical, as long as the data are not blocked.
BSQ: In BSQ (band sequential) format, each entire bands are stored consecutively in the
different file (Slater, 1980). That is B1, B2, B3…Bn are written in separate n-files on the
storage medium. This format is advantageous, in that:
• One band can be read and viewed easily, and
• Multiple bands can be easily loaded in any order.

Band-1, Band-2, Band-3,.. Band-n all are in separate file in BSQ (Band Sequential) format

Band -1 file
1st line : 1,2,3,…………………………………………………………………………………………………………………….,N pixels
2nd line : 1,2,3,…………………………………………………………………………………………………………………….,N pixels
3rd line : 1,2,3,…………………………………………………………………………………………………………………….,N pixels
……………………………………………………………………………………………………………………………………………
Mth line : 1,2,3,………………………………………………………………………………………………………………….,N pixels
End of File

Band -2 file
st
1 line : 1,2,3,…………………………………………………………………………………………………………………….,N pixels
2nd line : 1,2,3,…………………………………………………………………………………………………………………….,N pixels
3rd line : 1,2,3,…………………………………………………………………………………………………………………….,N pixels
……………………………………………………………………………………………………………………………………………
Mth line : 1,2,3,………………………………………………………………………………………………………………….,N pixels

End of File
……………………………………………………………………………………………………………………………………………………………...
………………………………………………………………………………………………………………………………………………………………

Band -n file
st
1 line : 1,2,3,…………………………………………………………………………………………………………………….,N pixels
2nd line : 1,2,3,…………………………………………………………………………………………………………………….,N
pixels 3rd line :
1,2,3,…………………………………………………………………………………………………………………….,N pixels
……………………………………………………………………………………………………………………………………………
Mth line : 1,2,3,………………………………………………………………………………………………………………….,N pixels

End of File

11
BIL: In BIL (band interleaved by line) format, each record in the file contains a scan line
(row) of data for one band (Slater, 1980). In BIL, one line for band-1 is followed by the data
of the same line in the 2nd band till all the n-bands are over, then 2nd line with band-1. This is
repeated for whole scene and the Mth line of B1, B2, B3…Bn represent the last record. All
bands of data for a given line are stored consecutively within the file as shown in Fig.

Band-1, Band-2, Band-3,… Band-n all in single file in BIL (Band Interleaved Line)

1st line: 123…N Pixels (all N pixels) of Band-1


1st line: 123…N Pixels (all N pixels) of Band-2
1st line: 123…N Pixels (all N pixels) of Band-3
………………………………………………….
1st line: 123…N Pixels (all N pixels) of Band-n
2nd line: 123…N Pixels (all N pixels) of Band-1
2nd line: 123…N Pixels (all N pixels) of Band-2
2nd line: 123…N Pixels (all N pixels) of Band-3
………………………………………………….
2nd line: 123…N Pixels (all N pixels) of Band-n
………………………………………………………………………………….
…………………………………………………………………………………..
Mth line: 123…N Pixels (all N pixels) of Band-1
Mth line: 123…N Pixels (all N pixels) of Band-2
Mth line: 123…N Pixels (all N pixels) of Band-3
………………………………………………….
Mth line: 123…N Pixels (all N pixels) of Band-n

BIP
In BIP (band interleaved by pixel) format, a single image file, in pixel interleaved order is
generated. In BIP, pixels are interleaved that is, (1st pixel of Band-1, 1st pixel of Band-2, 1st
pixel of Band-3, ….1st pixel of Band-n) constitute a single pixel group. the values for each
band are ordered within a given pixel. The pixels are arranged sequentially on the tape
(Slater, 1980). The sequence for BIP
format is:

Band-1, Band-2, Band-3,… Band-n all in single file in BIP(Band Interleaved by Pixel)

1st pixel of B-1, 1st pixel of B-2, 1st pixel of B-3, …. and 1st pixel of B-n
2nd pixel of B-1, 2nd pixel of B-2, 2nd pixel of B-3, …. and 2nd pixel of B-n
2nd pixel of B-1, 2nd pixel of B-2, 2nd pixel of B-3, …. and 2nd pixel of B-n
………………………………………………………………………………….
…………………………………………………………………………………..
Nth pixel of B-1, Nth pixel of B-2, Nth pixel of B-3, …. and Nth pixel of B-n

12
Common Data Format(CDF/netCDF)
Hierarchical Data Format (HDF)
CEOS Superstructure Format
MPH/SPH/DSR
Spatial Data Transfer Standard (SDTS)
Flexible Image Transport System (FITS)
Graphics Interchange Format (GIF)
ISO/IEC 12087 - Image Processing and Interchange
Standard Formatted Data Units (SFDU)
GeoTIFF

However Digital products are supplied mainly in four formats such as


 Super Structure format
 Fast format.
 GeoTIFF (Geographic Tagged Image File Format)
 HDF (Hierarchical Data Format)

 Super structure format


This is a very exhaustive data products format suitable for Level-O (RAW i.e., no correction applied),
Level-I(RAD i.e. Radiometric Correction Applied) and Level-2 (GEO i.e., Both Radiometric and
Geometric correction applied) products. Though all categories of products can be supplied in this for-
mat, Level-O and Level-l are the most preferred.
Super structure digital data file format consists of five files namely,
1. Volume directory file.
2. Leader file.
3. Image file (either in BSQ or BIL) format.
4. Trailer file.
5. Null volume directory file.
A logical volume is a logical collection of one or more files recorded consecutively. All logical
volumes have a volume directory as the first file and null volume directory as the last file. When a
logical volume is split between physical volumes, the volume directory is repeated at the start of the
next physical tape with some updated information. The layout of the super structure format both in
BSQ and BIL is shown in Fig. 9.16(a) and (b) respectively.
1. Volume directory file
The volume directory file is the first file of the media containing the data product. This gives
information about all subsequent files present in the medium, such as, number of bands,
arrangement of bands, total number of files and how many are present in the current medium,
information about the processing station, software version used to process, etc. It is composed of
a volume descriptor record, a number of file pointer records and a text record. The volume
descriptor record identifies the logical volume and the number of files it contains. There is a file
pointer record for each type of file in the logical volume, which indicates each file's class, format
and attributes.
2. Leader file
The leader file is composed of a file descriptor record and three types of data records. The record
types are header, ancillary and annotation. Header contains information related to mission: sensor
13
and processing parameters, image comer coordinates. Ancillary records contain information
related to ephemeris, attitude and ground control points (GCPs), for image geometric correction,
radiometric calibration data, etc.
3. Image file
The image file consists of file descriptor records giving information regarding band number, bite
per pixel, etc., and image data records. Image data record contains the video data in band
interleaved by line (BIL) format or band sequential format (BSQ), and in addition, it also con-
tains prefix and suffix information.
4. Trailer file
The trailer file follows the image data file. This is composed of a file descriptor record describing
what is in the trailer file, and one trailer record for each band in the volume direction file.
5. Null volume directory file
The null volume directory file is a file which ends the logical volume. It is referred to as 'null'
because it defines a non-existent (empty) logical volume. This file contains a volume descriptor
record.

Figure: Physical layout of three band image data in Super structure BSQ (b)

14
 Fast format
Fast format is a very comprehensive digital data format that is suitable for Level-2 data
products. It consists of two files namely
1. Header file.
2. Image file(s).
The physical layout of fast format is shown in Fig.
 Header file
The first file on each volume, a Read-Me-First file, contains header data. It is in American
Standard Code for Information Interchange (ASCII) format.
The header file contains three 1536-byte ASCII records. The first record is the Administrative
Record which contains information that identifies the product, the scene and the data specifi-
cally needed to read the imagery from the digital media (CDROM, DAT or DISK). In order
80, retrieve the image data, it is necessary to read entries in the Administrative Record.
The second record is the Radiometric Record, which contains the coefficients needed to
convert the scene digital values into at-satellite spectral radiance
The third record is the Geometric Record which contains the scene geographic location (e.g
latitude, longitude etc) information. In order to align the imagery to other data sources, it will
be necessary to read entries in the Geometric Record.
 Image file
Image files are written into CDROM, DAT od Disk in Band Sequential order that is each
image file contains one band of image data. There are no header records within the image
file, nor are there prefix and/or suffix data in the individual image record or scan lines.

GeoTIFF (Geographic tagged image file format)


Although currently various data formats (e.g. PGM, GIF, BMP, TIFF) are in use for storage of raster
image data, they have a common limitation in cartographic applications. The main problem is that, it
is almost impossible to store any geographic information together with image data in a unified and
well-defined way in the above mentioned formats.

GeoTIFF is based on the original TIFF (Tagged Image File Format) format, with additional geo-
graphic information. The GeoTIFF specification defines a set of TIFF tags provided to describe all
'Cartographic' information associated with TIFF imagery that originates from satellite imaging sys-
tems, scanned aerial photography, scanned maps, or as a result of geographic analysis. Its aim is to
allow for tying an image to a known model space or map projection. This is a platform independent
format which is used by a wide range of GIS (Geographical Information System) and Image
Processing Packages currently available in the market.

15
The GeoTIFF spec defines a set of TIFF tags provided to describe all "Cartographic" information
associated with TIFF imagery that originates from satellite imaging systems, scanned aerial
photography, scanned maps, digital elevation models, or as a result of geographic analyses. Its aim is
to allow means for tying a raster image to a known model space or map projection, and for describing
those projections.

GeoTIFF does not intend to become a replacement for existing geographic data interchange standards,
such as the USGS SDTS standard or the FGDC metadata standard. Rather, it aims to augment an
existing popular raster-data format to support georeferencing and geocoding information.

The Hierarchical Data Format (HDF) has been developed by the National Center for
Supercomputing Applications at the University of Illinois at Urbana- Champaign in the USA. It was
originally designed for the interchange of raster image data and multi-dimensional scientific data sets
across heterogeneous environments. It is a multi-object file format, with a number of predefined
object types, such as arrays, but with the ability to extend the object types in a relatively simple
manner. Recently, HDF has been extended to handle tabular scientific data, rather than just uniform
array oriented data, and also annotation attributes data.

HDF can store several types of data objects within one file, such as raster images, palettes, text and
table style data. Each ‗object‘ in an HDF file has a predefined tag that indicates the data type and a
reference number that identifies the instance. There are a number of tags which are available for
defining user defined data types, however only those people who have access to the software of the
user that defined the new types can access them properly. Each set of HDF data types has an
associated software interface. This is where HDF is very powerful. The software tools supplied to
support HDF are quite sophisticated, and due to the format of the files, which extensively use pointers
in their arrangement, the user is provided with means to analyses and visualise the data in an efficient
and convenient manner.
A table of contents is maintained within the file and as the user adds data to the file, the pointers in the
table of contents are updated. An example organisational structure of an HDF file is shown in
An Example organisation of Data Objects in an HDF File
HDF File

Data Object 1 Data Object 2 Data Object 3 Group 1 Group 2

Data Object 1 Data Object 2 Data Object 3 Group 1 Data Object 1

Data Object 1 Data Object 2 Data Object 3

Hierarchical Data Format (HDF, HDF4, or HDF5) is the name of a set of Scientific data file
formats and libraries designed to store and organize large amounts of numerical data. Originally
developed at the National Center for Supercomputing Applications, it is currently supported by the

16
non-profit HDF Group, whose mission is to ensure continued development of HDF5 technologies, and
the continued accessibility of data currently stored in HDF.

In keeping with this goal, the HDF format, libraries and associated tools are available under a liberal,
BSD-like license for general use. HDF is supported by many commercial and non-commercial
software platforms, including Java, MATLAB/Scilab, IDL, Python, and R. The freely available HDF
distribution consists of the library, command-line utilities, test suite source, Java interface, and the
Java-based HDF Viewer (HDFView).

The Common Data Format (CDF) is developed and maintained by NASA. A variation of the format
that was designed for transfer across networks was developed by Unidata and called Network Common
Data Format (netCDF). The two formats are very similar except in the method that they used to physically
encode data. There is a move to merge the two developments, but at this stage they are still maintained
separately. They are discussed here under one heading as they are functionally and conceptually identical.

CDF is defined as a ―self describing‖ data format that permits not only the storage of the actual data of
interest, but also stores user-supplied descriptions of the data. CDF is a software library[3] accessible from
either FORTRAN or C, that allows the user to access and manage the data without regard to the physical
format on the media. In fact, the physical format is totally transparent to the user.

CDF is primarily suited for handling data that is inherently multidimensional; recent additions to the
format also permit the handling of scalar data, but not in such an efficient manner. Due to the nature of
Earth observation data, i.e. array oriented data, CDF is very efficient for the storage and processing of this
type of data. Data can be accessed either at the atomic level, for example, at the pixel level, or also at a
‗higher‘ level, for example, as a single image plane. The different access methods are provided by
separate software routines. One reason that CDF is efficient in data handling is that it is limited in the
basic data types that it can store. Essentially data can only be stored in a multiple of 8-bit bytes, such as
16-bit integer, 32-bit real, character string, etc. This is efficient for access, but is limiting for many Earth
observation products, where sensor data may be in a 10-bit word size, with another 6-bits used for flags,
such as cloud cover indicators.

The MPH/SPH/DSR product format is specifically used by ESA/ESRIN for ERS-1 and ERS-2 products and
hence extensively throughout Europe. It is used for the Fast Delivery Products from the ground stations to the
Processing and Archiving Facilities (PAFs) and to ESRIN, where it is archived in this format. This format also
forms the current baseline for the Envisat-1 Ground Segment. The MPH/SPH/DSR format is generally not used
for product distribution to end users, for this the CEOS SuperstructureFormat is used. Note, the format only
specifies the structure of the data packaging, it is not concerned with the syntax or semantics of the individual
data records.

Each product consists of three segments; the Main Product Header (MPH), the Specific Product
Header (SPH) and the Data Set Records (DSRs) as shown in

17
Schematic of an MPH/SPH/DSR Formatted File

Main Product Header (MPH)

Specific Product Header (SPH)


} Optional

Data Set Record (DSR)

::::::::

Data Set Record (DSR)

The MPH has a single fixed size record of 176 bytes that is mandatory for all products generated by
any satellite. The MPH for any one satellite is always the same. This header indicates, in fixed fields,
information which is applicable to all processing chain products, such as product identifier, type of
product, spacecraft identifier, UTC time of beginning of product, ground station identifier, many
quality control fields that are completed at various stages of the processing chain, etc. Following the
MPH is the SPH, which is present only if indicated by the MPH. The SPH can have a variable number
of records, each of variable size as dictated by the product type. These records contain information
specific to a particular product. For example, product confidence data that is specific to a product
type, parameters for instruments that are used to generate the product, etc. Finally there are a number
of DSRs (as specified in the MPH also), that contain the actual scientific data measurements. The
number and size of the DSR records is also dependent upon the product type.

There is only a limited number of data types that are supported in the headers, these are 1, 2 and 4-
byte integers, ASCII string parameters, single byte flags and ‘special’ fields formatted for a particular
product. The MPH/SPH/DSR does not contain any data description information. The MPH and each
of the SPH formats and fields are defined in conventional paper documents, there is no electronic
formal language description of records. The MPH indicates the type of product, and from this the user
would have to look up the relevant product specification and then know the type of SPH records and
the type of DSR records. Using this method new SPH and DSR records can be defined and then a new
identifier used in the MPH, but this is only a very basic method of data description.

Spatial Data Transfer Standard (SDTS)


Spatial Data Transfer Standard (SDTS) is a method for transferring spatial data, such as geographic
and cartographic features, between heterogeneous computer systems. Specifications are provided for
representing 13 different types of 0-, 1- and 2-dimensional real-world objects represented as vector or
raster data. In addition to the ‗standard‘ 13 simple spatial objects available, the user can define
composite objects which are made up of simple objects.
An SDTS transfer consists of a grouping of modules, these modules can be broken down into four
categories:
 Global Information Modules that define global parameters for the entire transfer, such as the
data set identifier, the co-ordinate system used, the geographic coverage, quality information,
definition of attributes, etc.;
 Attribute Modules that contain attributes of the spatial objects contained in the transfer, such
as altitude, direction, etc., this is analogous to data description information;
 Spatial Object Modules that define simple and composite structures of the basic spatial
objects;
 Graphic Representation that contain display symbols, area fill, colour, etc. for the various
objects.

In SDTS, objects are defined by attributes. For example, a ROAD may have attributes LENGTH and
DIRECTION. SDTS includes approximately 200 defined object names and 240 attributes.
18
For Earth observation, the vector representation is not of much interest, but the raster profile is
applicable. The raster profile is a standard method of formatting raster data, such as images or gridded
data that must be geolocated. Raster modules can accommodate image data, digital terrain models,
gridded GIS layers, and other regular point sample and grid cell data (all of which are termed raster
data). Two module types are required for the encoding of raster data: the Raster Definition module
and the Cell module. Additionally, a Registration module might be required to register the grid or
image geometry to latitude/longitude or a map-projection-based co-ordinate system.

SDTS supports many different organisation schemes for encoding raster data. Other data recorded in
the Raster Definition module complete the definition of the structure, orientation, and other
parameters required for interpreting the raster data. Actual pixel or grid cell data values are encoded in
Cell module records.

Remote Sensing Tape Contents


Tapes are available in a variety of sizes and storage capacities. To obtain information about the data
on a particular tape, read the tape label or box, or read the header file. Often, there is limited
information on the outside of the tape. Therefore, it may be necessary to read the header files on each
tape for specific information, such as:
• number of tapes that hold the data set
• number of columns (in pixels)
• number of rows (in pixels)
• data storage format—BIL, BSQ, BIP
• pixel depth—4-bit, 8-bit, 10-bit, 12-bit, 16-bit
• number of bands
• blocking factor
• number of header files and header records
Blocked Data
For reasons of efficiency, data can be blocked to fit more on a tape. Blocked data are sequenced so
that there are more logical records in each physical record. The number of logical records in each
physical record is the blocking factor. For instance, a record may contain 28,000 bytes, but only 4000
columns due to a blocking factor of 7.
Calculating Disk Space:
To calculate the amount of disk space a raster file requires on an ERDAS IMAGINE system,
use the following formula:

Where:
y = rows
x = columns
b = number of bytes per pixel
n = number of bands
1.4 adds 30% to the file size for pyramid layers and 10% for miscellaneous adjustments, such as
histograms, lookup tables, etc.
NOTE: This output file size is approximate
For example, to load a 3 band, 16-bit file with 500 rows and 500 columns, about 2,100,000 bytes of
disk space is needed.

Bytes Per Pixel


The number of bytes per pixel is listed below:
4-bit data:0 .5
8-bit data: 1.0
16-bit data: 2.0
In general image files can contain two types of raster layers: Thematic and continuous
19
An image file can store a combination of thematic and continuous layers, or just one type.

Thematic Raster Layer


Thematic data are raster layers that contain qualitative, categorical information about an area.
A thematic layer is contained within an image file. Thematic layers lend themselves to applications in
which categories or themes are used. Thematic raster layers are used to represent data measured on a
nominal or ordinal scale, such as: soils, land use, land cover, roads, hydrology.
NOTE: Thematic raster layers are displayed as pseudo color layers.

Figure: Example of a Thematic Raster Layer


Continuous Raster Layer
Continuous data are raster layers that contain quantitative (measuring a characteristic on an
interval or ratio scale) and related, continuous values. Continuous raster layers can be multiband (e.g.,
Landsat TM data) or single band (e.g., SPOT panchromatic data). The following types of data are
examples of continuous raster layers: Landsat , SPOT, digitized (scanned) aerial photograph, DEM,
slope, temperature.
NOTE: Continuous raster layers can be displayed as either a gray scale raster layer or a true colour
raster layer

Figure: Examples of Continuous Raster Layers


Image File Contents
The image files contain the following additional information about the data:
• the data file values
• statistics
• lookup tables
• map coordinates
• map projection

Data Reception and Transmission


Data obtained during airborne remote sensing missions can be retrieved once the aircraft lands. It can
then be processed and delivered to the end user. However, data acquired from satellite platforms need
to be electronically transmitted to Earth, since the satellite continues to stay in orbit during its

20
operational lifetime. The technologies designed to accomplish this can also be used by an aerial
platform if the data are urgently needed on the surface.
There are three main options for transmitting data acquired by satellites to the surface. The data can
be directly transmitted to Earth if a Ground Receiving Station (GRS) is in the line of sight of the
satellite (A). If this is not the case, the data can be recorded on board the satellite (B) for transmission
to a GRS at a later time. Data can also be relayed to the GRS through the Tracking and Data Relay
Satellite System (TDRSS) (C), which consists of a series of communications satellites in
geosynchronous orbit. The data are transmitted from one satellite to another until they reach the
appropriate GRS.

DATA TRANSMISSION
In Canada, CCRS operates two ground receiving stations - one at Cantley, Québec (GSS), just outside
of Ottawa, and another one at Prince Albert, Saskatchewan (PASS). The combined coverage circles
for these Canadian ground stations enable the potential for reception of real-time or recorded data
from satellites passing over almost any part of Canada's land mass, and much of the continental
United States as well. Other ground stations have been set up around the world to capture data from a
variety of satellites.

GROUND RECEIVING STATIONS


The data are received at the GRS in a raw digital format. They may then, if required, be processed to
correct systematic, geometric and atmospheric distortions to the imagery, and be translated into a
standardized format. The data are written to some form of storage medium such as tape, disk or CD.
The data are typically archived at most receiving and processing stations, and full libraries of data are
managed by government agencies as well as commercial companies responsible for each sensor's
archives.
For many sensors it is possible to provide customers with quick-turnaround imagery when they need
data as quickly as possible after it is collected. Near real-time processing systems are used to produce
low resolution imagery in hard copy or soft copy (digital) format within hours of data acquisition.
Such imagery can then be faxed or transmitted digitally to end users. One application of this type of
fast data processing is to provide imagery to ships sailing in the Arctic, as it allows them to assess
current ice conditions quickly in order to make navigation decisions about the easiest/safest routes
through the ice. Real-time processing of imagery in airborne systems has been used, for example, to
pass thermal infrared imagery to forest fire fighters right at the scene.
Low resolution quick-look imagery is used to preview archived imagery prior to purchase. The
spatial and radiometric quality of these types of data products is degraded, but they are useful for
ensuring that the overall quality, coverage and cloud cover of the data is appropriate.
Raw Data: No processing is done and the relevant scene is extracted and put in the media.
Partially processed data: radiometrically corrected but geometrically uncorrected or vice versa
Processed or standard data: both radiometrically and geometrically corrected
Geocoded data: these product are north oriented and compatible to the survey of India toposheet
Orthorectified data: these are geometrically corrected products with correction for displacement
caused terrain.
Metadata: It provides background information background information about data sets, such as,
name of the data, date and place of compilation, data source, and data structure format, etc.

21
Elements of Visual Interpretation
The analysis of remote sensing imagery involves the identification of various targets in an image, and
those targets may be environmental or artificial features which consist of points, lines, or areas.
Targets may be defined in terms of the way they reflect or emit radiation. This radiation is measured
and recorded by a sensor, and ultimately is depicted as an image product such as an air photo or a
satellite image. What makes interpretation of imagery more difficult than the everyday visual
interpretation of our surroundings? For one, we lose our sense of depth when viewing a two-
dimensional image, unless we can view it stereoscopically so as to simulate the third dimension of
height. Indeed, interpretation benefits greatly in many applications when images are viewed in stereo,
as visualization (and therefore, recognition) of targets is enhanced dramatically. Viewing objects from
directly above also provides a very different perspective than what we are familiar with. Combining
an unfamiliar perspective with a very different scale and lack of recognizable detail can make even
the most familiar object unrecognizable in an image. Finally, we are used to seeing only the visible
wavelengths, and the imaging of wavelengths outside of this window is more difficult for us to
comprehend.

Common adjectives (quantitative and qualitative)


x,y location: x, y coordinates: longitude and latitude or meters, easting and northing in a map grid
Size: length, width, perimeter, area
Shape: small, medium (intermediate), large
Shadow: an object's geometric characteristics: linear, curvilinear, circular, elliptical, radial,
square, rectangular, triangular, hexagonal, pentagonal, star, amorphous, etc.
Tone/ colour: a silhouette caused by solar illumination from the side
 grey, tone: light (bright), intermediate (grey), dark (black)
 colour: illS = intensity, hue (colour), saturation; RGB = red, green and blue
Texture: characteristic placement and arrangement of repetitions of tone or colour; smooth,
intermediate (medium), rough (coarse), mottled, stippled
Pattern: the spatial arrangement of objects on the ground; systematic, unsystematic or random, linear,
curvilinear, rectangular, circular, elliptical, parallel, centripetal, serrated, striated, braided
Height/depth/volume/aspect/slope: z-elevation (height), depth (bathymetry), volume, slope, aspect
Site/situation/ association: site: elevation, slope, aspect, exposure, adjacency to water, transportation,
utilities
 situation: objects are placed in a particular order or orientation relative to one another
 association: related phenomena are usually present

Recognizing targets is the key to interpretation and information extraction. Observing the differences
between targets and their backgrounds involves comparing different targets based on any, or all, of the
visual elements of tone, shape, size, pattern, texture, shadow, and association. Visual
interpretation using these elements is often a part of our daily lives, whether we are conscious of it or
not. Examining satellite images on the weather report, or following high speed chases by views from a
helicopter are all familiar examples of visual image interpretation. Identifying targets in remotely
sensed images based on these visual elements allows us to further interpret and analyze. The nature of
each of these interpretation elements is described below, along with an image example of each.

TONE

22
Tone refers to the relative brightness or colour of objects in an image. Generally, tone is the
fundamental element for distinguishing between different targets or features. Variations in tone also
allows the elements of shape, texture, and pattern of objects to be distinguished.

SHAPE
Shape refers to the general form, structure, or outline of individual objects. Shape can be a very
distinctive clue for interpretation. Straight edge shapes typically represent urban or agricultural (field)
targets, while natural features, such as forest edges, are generally more irregular in shape, except
where man has created a road or clear cuts. Farm or crop land irrigated by rotating sprinkler systems
would appear as circular shapes.

SIZE
Size of objects in an image is a function of scale. It is important to assess the size of a target relative
to other objects in a scene, as well as the absolute size, to aid in the interpretation of that target. A
quick approximation of target size can direct interpretation to an appropriate result more quickly. For
example, if an interpreter had to distinguish zones of land use, and had identified an area with a
number of buildings in it, large buildings such as factories or warehouses would suggest commercial
property, whereas small buildings would indicate residential use.

Pattern refers to the spatial arrangement of visibly discernible objects. Typically an orderly repetition
of similar tones and textures will produce a distinctive and ultimately recognizable pattern. Orchards
with evenly spaced trees, and urban streets with regularly spaced houses are good examples of pattern.

Texture refers to the arrangement and frequency of tonal variation in particular areas of an image.
Rough textures would consist of a mottled tone where the grey levels change abruptly in a small area,
whereas smooth textures would have very little tonal variation. Smooth textures are most often the
result of uniform, even surfaces, such as fields, asphalt, or grasslands. A target with a rough surface
and irregular structure, such as a forest canopy, results in a rough textured appearance. Texture is one
of the most important elements for distinguishing features in radar imagery.

PATTERN TEXTURE
Shadow is also helpful in interpretation as it may provide an idea of the profile and relative height of
a target or targets which may make identification easier. However, shadows can also reduce or
eliminate interpretation in their area of influence, since targets within shadows are much less (or not at
all) discernible from their surroundings. Shadow is also useful for enhancing or identifying
topography and landforms, particularly in radar imagery.

23
Association takes into account the relationship between other recognizable objects or features in
proximity to the target of interest. The identification of features that one would expect to associate
with other features may provide information to facilitate identification. In the example given above,
commercial properties may be associated with proximity to major transportation routes, whereas
residential areas would be associated with schools, playgrounds, and sports fields. In our example, a
lake is associated with boats, a marina, and adjacent recreational land.

SHADOW ASSOCIATION

24
Image Processing techniques:

25
26
Radiometric calibration:
Satellite digital data is generally delivered as digital count values. These digital values should
be rescaled or calibrated to physically meaningful units (energy per unit area * steradian *
micrometer) to 8 bit digital count values (0-255). This process is called ―radiometric
calibration‖ or, more properly, radiometric rescaling.

Conversion to Radiance:
The different digital counts or digital values/ digital number corresponding to different pixels
of consecutive bands (band centre wavelength, λ) are converted to spectral radiance (Lλ)
image using following formula:

Where, Lλ = Spectral Radiance at the sensor‘s aperture in watts/(meter squared * ster * μm)
Qcal = the quantized calibrated pixel value in DN
Lminλ = the spectral radiance that is scaled to Qcalmin in watts/(meter squared*ster * μm) are given in
image metadata (example Table1- to3)
Lmaxλ = the spectral radiance that is scaled to Qcalmax in watts/(meter squared * ster *μm) are given
in image metadata (example Table1- to3)
Qcalmin = the minimum quantized calibrated pixel value (corresponding to Lminλ) in DN
Qcalmax = the maximum quantized calibrated pixel value (corresponding to Lmaxλ) in DN = 255
The values of, Lmax, Lmin for band 1 to band 5 and band 7 to band 8 are given in Table.
Grescale = Rescaled gain (the data product ―gain‖ contained in the Level 1 product header or
ancillary data record) in watts/(meter squared * ster * μm)/DN
Brescale = Rescaled bias (the data product ―offset‖ contained in the Level 1 product header or
ancillary data record ) in watts/(meter squared * ster * μm)

The estimated radiance image is then, converted to reflectance image using the following
relation:

where, d is earth-sun distance correction (Depending upon date of data acquisition, say for 242 Julian
date 1.00969, Astronomical Units), θ is Solar zenith angle (say, 21.32o), L λ is radiance (calculated
radiance image) as a function of bandwidth, E0λ solar spectral irradiances. The E0λ values are taken
from the Landsat 7 Science Data Users Handbook as given in Table. The values of d and θ are
collected from the header file of corresponding image over the study area.

27
Atmospheric correction:
Dark subtraction using band minimum has been applied for atmospheric scattering corrections to the
calculated reflectance image data. The dark-object subtraction (DOS) method of atmospheric
correction is a scene-based method to approximate the path radiance added by scattering, based on the
assumption that within an area of a full scene, there will be a location which is in deep topographic
shadowing, and that any radiance recorded by the satellite for that area arises from the path radiance
component (assumed to be constant across the scene). The radiance of a dark object (assumed to
have a reflectance of 1% by Chavez (1996) and Moran et al. (1992) is calculated by the following
relationship:
Lλ,1% = 0.01 * E0λ * cos2 θ / (π * d2)
Haze correction can be computed using the following relationship:
Lλ,haze = Lλ – Lλ1%
ρλ = [π * d2 * (Lλ - Lλhaze)] / (E0λ * cos θz)
Further, an additional refinement to the DOS method was proposed by Chavez (1996),
known as the COST model, as shown below:
ρλ = [π * d2 *(Lλsat - Lλhaze)] / (ESUNλ * cos2 θz)

where,  λ = reflectance
d = Earth-Sun distance in astronomical units (AU)
Lsat = at-satellite radiance
ESUN = exo-atmospheric solar irradiance
z = solar zenith angle = 90 – solar elevation angle
λ = subscript indicating that these values are spectral band-specific

(The unit of Radiance is Watt Per Square Meter Per Steradian (W/m2-sr) A steradian can be
defined as the solid angle subtended at the center of a unit sphere by a unit area on its surface. For a
general sphere of radius r, any portion of its surface with area A = r2 subtends one steradian). A
graphical representation of 1 steradian. The sphere has radius r, and in this case the area of the patch
on the surface is A = r2. The solid angle is θ = A/r2 so in this case θ = 1. The entire sphere has a solid
angle of 4π sr ≈ 12.56637 sr. Because the surface area of a sphere is 4πr2, the definition implies that a
sphere measures 4π ≈ 12.56637 steradians. By the same argument, the maximum solid angle that can
be subtended at any point is 4π sr. A steradian can also be called a squared radian.

Table 1. Minimum and maximum radiances for the IRS – 1B LISS II sensor (March 21, 1995)
Band Wavelength Lmin Lmax Solar Spectral
−2 −1 −1 −2 −1 −1
range (Wm sr µm ) (W m sr µ m ) Irradiances
(µm) (W/(m-2 µm-1)
B1 0.450 – 0.520 0.0 140.7 1969
B2 0.520 – 0.590 0.0 226.5 1840
B3 0.620 – 0.680 0.0 180.2 1551
B4 0.770 – 0.860 0.0 164.5 1044

28
Table 2.Minimum and maximum radiances for the IRS – 1D LISS III sensor (March 18, 2000)
Band Wavelength Lmin Lmax Solar Spectral
range (Wm− sr−1µm−1) (W m−2 sr−1µm−1) Irradiances
(µm) (W/(m-2 µm-1
B2 0.520 – 0.590 0.0 148 1840
B3 0.620 – 0.680 0.0 156.6 1551
B4 0.770 – 0.860 0.0 164.5 1044
B5 1.55 – 1.70 0.0 24.38 240.62

Table 3. Minimum and maximum radiances for the Landsat 7 – ETM Plus sensor (November 21, 2000)

29
Data Correction There are several types of errors that can be manifested in remotely sensed data.
Among these are line dropout and striping. These errors can be corrected to an extent in GIS by
radiometric and geometric correction functions.

Line Dropout Line dropout occurs when a detector either completely fails to function or becomes
temporarily saturated during a scan (like the effect of a camera flash on a human retina). The result is
a line or partial line of data with higher data file values, creating a horizontal streak until the
detector(s) recovers, if it recovers. Line dropout is usually corrected by replacing the bad line with a
line of estimated data file values. The estimated line is based on the lines above and below it.

Striping: Striping or banding occurs if a detector goes out of adjustment—that is, it provides readings
consistently greater than or less than the other detectors for the same band over the same ground
cover.

Rectification
Raw satellite image pertains to the irregular surface of the earth. Even images of flat mass are
distorted due to the earth's curvature. Rectification is the process of projecting the data onto a plane,
and making it conforming to a map projection system (say, Universal Transverse Mercator (UTM),
Geographic (Lat &long), etc) designed to represent the surface of the sphere on a plane. The
calculation of a map projection requires definition of spheroid in terms of axis lengths and radius of
the reference sphere. Several spheroids are used for map projection depending on the region of the
earth's surface; for example, Clarke1866 for North America, Krasovsky1940 for Russia, Bessel
1841 for Central Europe, Everest 1830/1856 for the Indian subcontinent (Snyder, 1993) or in general
WGS84.

Map projection: A map projection is the manner in which the spherical surface of the Earth is
represented on a flat (two-dimensional) surface. This can be accomplished by direct geometric
projection or by a mathematically derived transformation. There are many kinds of projections, but all
involve transfer of the distinctive global patterns of parallels of latitude and meridians of longitude
onto an easily flattened surface, or developable surface.

The three most common developable surfaces are the cylinder, cone, and plane. A plane is already
flat, while a cylinder or cone may be cut and laid out flat, without stretching. Thus, map projections
may be classified into three general families: cylindrical, conical, and azimuthal or planar.

Geographical (Lat/Lon): Geographical, or spherical, coordinates are based on the network of


latitude and longitude (Lat/Lon) lines that make up the graticule of the Earth. Within the graticule,
lines of longitude are called meridians, which run north/south, with the prime meridian at 0°
(Greenwich, England). Meridians are designated as 0° to 180°, east or west of the prime meridian.
The 180° meridian (opposite the prime meridian) is the International Dateline.

30
Lines of latitude are called parallels, which run east/west. Parallels are designated as 0° at the equator
to 90° at the poles. The equator is the largest parallel. Latitude and longitude are defined with respect
to an origin located at the intersection of the equator and the prime meridian. Lat/Lon coordinates are
reported in degrees, minutes, and seconds. Map projections are various arrangements of the Earth‘s
latitude and longitude lines onto a plane. These map projections require a point of reference on the
Earth‘s surface. Most often this is the center, or origin, of the projection.

Generally, rectification/ Georeferencing/ Geocoding are the process of transforming the data from
one grid system into another grid system using a geometric transformation. Since the pixels of the
new grid may not align with the pixels of the original grid, the pixels must be re-sampled. Re-
sampling is the process of extrapolating data values for the pixels on the new grid from the values of
the source pixels.

Rectification, by definition, involves georeferencing, since all map projection systems are associated
with map coordinates. Image-to-image registration, involves georeferencing only if the reference
image is already georeferenced. Georeferencing, by itself, involves changing only the map
coordinate information in the image file. The grid of the image does not change.

Geocoded data are images that have been rectified to a particular map projection and pixel size, and
usually have had radiometric corrections applied. Geocoded data should be rectified only if they must
conform to a different projection system or be registered to other rectified data.
Registration is the process of making an image conforms to another image.

Orthorectification: Orthorectification is a form of rectification that corrects for terrain displacement


and can be used if there is a DEM of the study area. It is based on collinearity equations, which can be
derived by using 3D GCPs. In relatively flat areas, orthorectification is not necessary, but in
mountainous areas (or on aerial photographs of buildings), where a high degree of accuracy is
required, orthorectification is recommended.

When to Rectify: Rectification is necessary in cases where the pixel grid of the image must be
changed to fit a map projection system or a reference image. There are several reasons for rectifying
image data:
• comparing pixels scene to scene in applications, such as change detection or thermal inertia mapping
(day and night comparison)
• developing GIS data bases for GIS modeling
• identifying training samples according to map coordinates prior to classification
• creating accurate scaled photomaps
• overlaying an image with vector data, such as ArcInfo
• comparing images that are originally at different scales
• extracting accurate distance and area measurements
• mosaicking images

31
Spheroids:
Airy, Australian National , Bessel, Clarke 1866, Clarke 1880, Everest, GRS 1980, Helmert,
Hough, International 1909, Krasovsky, Mercury 1960, Modified Airy, Modified Everest, Modified
Mercury 1968, New International 1967 , Southeast Asia , Sphere of Nominal Radius of Earth,
Sphere of Radius 6370977m, Walbeck, WGS 66, WGS 72, WGS 84.

When to Georeference Only


Rectification is not necessary if there is no distortion in the image. For example, if an image file is
produced by scanning or digitizing a paper map that is in the desired projection system, then that
image is already planar and does not require rectification unless there is some skew or rotation of the
image. Scanning and digitizing produce images that are planar, but do not contain any map coordinate
information. These images need only to be georeferenced, which is a much simpler process than
rectification. In many cases, the image header can simply be updated with new map coordinate
information. This involves redefining:
• the map coordinate of the upper left corner of the image
32
• the cell size (the area represented by each pixel)
This information is usually the same for each layer of an image file, although it could be different. For
example, the cell size of band 6 of Landsat TM data is different than the cell size of the other bands.

Disadvantages of Rectification
During rectification, the data file values of rectified pixels must be resampled to fit into a new grid of
pixel rows and columns. Although some of the algorithms for calculating these values are highly
reliable, some spectral integrity of the data can be lost during rectification. If map coordinates or map
units are not needed in the application, then it may be wiser not to rectify the image. An unrectified
image is more spectrally correct than a rectified image.
Classification
Some analysts recommend classification before rectification, since the classification is then based on
the original data values. Another benefit is that a thematic file has only one band to rectify instead of
the multiple bands of a continuous file. On the other hand, it may be beneficial to rectify the data first,
especially when using GPS data for the GCPs. Since these data are very accurate, the classification
may be more accurate if the new coordinates help to locate better training samples.
Thematic Files
Nearest neighbor is the only appropriate resampling method for thematic files, which may be a
drawback in some applications.

Rectification Steps NOTE: Registration and rectification involve similar sets of procedures.
Throughout this documentation, many references to rectification also apply to image-to-image
registration.
Usually, rectification is the conversion of data file coordinates to some other grid and coordinate
system, called a reference system. Rectifying or registering image data on disk involves the following
general steps, regardless of the application:
1. Locating the ground control points that specify pixels in the image for which the output map
coordinates are known;
2. Computation of transformation matrix using a polynomial equation to convert the source
coordinates to rectified coordinates;
3. Creation of an output image file with the pixels resample to conform to the new grid.
Ground Control Points
GCPs are specific pixels in an image for which the output map coordinates (or other output
coordinates) are known. GCPs consist of two X,Y pairs of coordinates:
• source coordinates—usually data file coordinates in the image being rectified
• reference coordinates—the coordinates of the map or reference image to which the source image is
being registered
Entering GCPs Accurate GCPs are essential for an accurate rectification. From the GCPs, the
rectified coordinates for all other points in the image are extrapolated. Select many GCPs throughout

33
the scene. The more dispersed the GCPs are, the more reliable the rectification is. GCPs for largescale
imagery might include the intersection of two roads, airport runways, utility corridors, towers, or
buildings. For small-scale imagery, larger features such as urban areas or geologic features may be
used. Landmarks that can vary (e.g., the edges of lakes or other water bodies, vegetation, etc.) should
not be used.
The source and reference coordinates of the GCPs can be entered in the following ways:
• They may be known a priori, and entered at the keyboard.
• Use the mouse to select a pixel from an image in the Viewer. With both the source and destination
Viewers open, enter source coordinates and reference coordinates for image-to image registration.
• Use a digitizing tablet to register an image to a hardcopy map.
Polynomial Transformation
Polynomial equations are used to convert source file coordinates to rectified map coordinates.
Depending upon the distortion in the imagery, the number of GCPs used, and their locations relative
to one another, complex polynomial equations may be required to express the needed transformation.
The degree of complexity of the polynomial is expressed as the order of the polynomial. The order is
simply the highest exponent used in the polynomial.
The equation of first order polynomial transformation is
X = A + A2 x + A3 y
Y = B1 + B2 x + B3 y
x and y are source coordinates (input) and X and Y are rectified coordinates (output)
So, any image file can be transformed or rectified to new coordinate system using three corner points
of input image say point-1 (x1, y1 ), point-2 (x2, y2 ), point-3 (x3, y3 ),
Rectified point-1 (X1,Y1)
X1 = A1 + A2 x1 + A3 y1
Y1 = B1 + B2 x1 + B3 y1
Rectified point-2 (X2,Y2)
X2 = A1 + A2 x2 + A3 y2
Y2 = B1 + B2 x2 + B3 y2
Rectified point-3 (X3,Y3)
X3 = A1 + A2 x3 + A3 y3
Y3 = B1 + B2 x3 + B3 y3

Further, any image file can be transformed or rectified to new coordinate system using second order
polynomial transformation by six corner points of input image say point-1 (x1, y1 ), point-2 (x2, y2 ),
point-3 (x3, y3 ), point-4 (x4, y4 ), point-5 (x5, y5 ), point-6 (x6, y6 ):
Rectified point-1 (X1,Y1)
X1 = A1 + A2 x1 + A3 y1 + A4 x12 + A5 x1 y1 + A6 y12
Y1 = B1 + B2 x1 + B3 y1 + B4 x12 + B5 x1 y1 + B6 y12
..................................................................................
Rectified point-6 (X6,Y6)
X6 = A1 + A2 x6 + A3 y6 + A4 x62 + A5 x6 y6 + A6 y62
Y6 = B1 + B2 x6 + B3 y6 + B4 x62 + B5 x6 y6 + B6 y62
34
Further, any image file can be transformed or rectified to new coordinate system using third order
polynomial transformation by six corner points of input image say point-1 (x1, y1 ), point-2 (x2, y2 ),
point-3 (x3, y3 ), point-4 (x4, y4 ), point-5 (x5, y5 ), point-6 (x6, y6 ):

Rectified point-1 (X1,Y1)


X1 = A1 + A2 x1 + A3 y1 + A4 x12 + A5 x1 y1 + A6 y12 + A7 x13 + A8 x12 y1 + A9 x1 y12 + A10 y13
Y1 = B1 + B2 x1 + B3 y1 + B4 x12 + B5 x1 y1 + B6 y12 + B7 x13 + B8 x12y1 + B9 x1 y12 + B10 y13
................................................................................................................................................
Rectified point-10 (X10,Y10)
X10 = A1+A2 x10 + A3 y10 + A4 x102 + A5 x10 y10 + A6 y102 + A7 x103 + A8 x102 y10 + A9 x10 y102 + A10 y103
Y10 = B1 + B2 x10 + B3 y10 + B4 x102 + B5 x10 y10 + B6 y102 + B7 x103 + B8 x102y10 + B9 x10 y102 + B10 y103

Transformation Matrix
A transformation matrix is computed from the GCPs. The matrix consists of coefficients that are used
in polynomial equations to convert the coordinates. The size of the matrix depends upon the order of
transformation. The goal in calculating the coefficients of the transformation matrix is to derive the
polynomial equations for which there is the least possible amount of error when they are used to
transform the reference coordinates of the GCPs into the source coordinates.

Minimum Number of GCPs


Higher orders of transformation can be used to correct more complicated types of distortion.
However, to use a higher order of transformation, more GCPs are needed. For instance, three points
define a plane. Therefore, to perform a 1st-order transformation, which is expressed by the equation
of a plane, at least three GCPs are needed. Similarly, the equation used in a 2ndorder transformation is
the equation of a paraboloid. Six points are required to define a paraboloid. Therefore, at least six
GCPs are required to perform a 2nd-order transformation.
The minimum number of points required to perform a transformation of order t equals:
t + 1t + 2
2
Order of Transformation Minimum GCPs Required
1 3
2 6
3 10
4 15
5 21

Resampling Methods:
The next step in the rectification/registration process is to create the output file. Since the grid of
pixels in the source image rarely matches the grid for the reference image, the pixels are resampled so
that new data file values for the output file can be calculated.
The following resampling methods are generally utilised:

Nearest Neighbor: uses the value of the closest pixel to assign to the output pixel value. To
determine an output pixel‘s nearest neighbor, the algorithm uses the inverse of the transformation

35
matrix to calculate image file coordinates of the desired geographic coordinate. The pixel value
occupying the closest image file coordinate to the estimated coordinate are used for the output pixel
value in the georeferrence image.

Sl.No. Advantages Disadvantages


1 Transfers original data values without averaging When this method is used to resample
them as the other methods do; therefore, the from a larger to a smaller grid size,
extremes and subtleties of the data values are not there is usually a stair stepped effect
lost. This is an important consideration when around diagonal lines and curves.
discriminating between vegetation types, locating
an edge associated with a lineament, or determining
different levels of turbidity or temperatures in a
lake (Jensen, 1996).
2 Suitable for use before classification. Data values may be dropped, while
other values may be duplicated.
3 The easiest of the three methods to compute and the Using on linear thematic data (e.g.,
fastest to use. roads, streams) may result in breaks
or gaps in a network of linear data.
4 Appropriate for thematic files, which can have data
file values based on a qualitative (nominal or
ordinal) system or a quantitative (interval or ratio)
system. The averaging that is performed with
bilinear interpolation and cubic convolution is not
suited to a qualitative class value system.

Bilinear Interpolation—uses the data file values value of the rectified pixel, based upon the
distances between the retransformed coordinate location (xr, yr) and the four closest pixels in the input
(source) image of four pixels in a 2 × 2 window to calculate an output value with a bilinear function.
Sl.No. Advantages Disadvantages
1 Results in output images that are Since pixels are averaged, bilinear interpolation
smoother. has the effect of a low-frequency convolution.
Edges are smoothed, and some extremes of the
The stair-stepped effect that is possibly data file values are lost. Alters the original data
with nearest neighbour approach is and reduces contrast by averaging neighbouring
reduced. values together
2 This method is often used when changing It is computationally more complicated than
the cell size of the data, such as in nearest neighbour
SPOT/TM merges within the 2 × 2
resampling matrix limit

36
Cubic Convolution: uses the data file values of sixteen pixels in a 4 × 4 window to calculate an
output value with a cubic function.

Cubic convolution is similar to bilinear interpolation, except that:


• a set of 16 pixels, in a 4 × 4 array, are averaged to determine the output data file value, and
• an approximation of a cubic function, rather than a linear function, is applied to those 16 input
values.

Sl.No. Advantages Disadvantages


1 Uses 4 × 4 resampling. In most cases, the mean and standard Data values may be
deviation of the output pixels match the mean and standard altered.
deviation of the input pixels more closely than any other
resampling method.
2 The effect of the cubic curve weighting can both sharpen the This method is extremely
image and smooth out noise (Atkinson, 1985). The actual effects slow.
depend upon the data being used.
3 This method is recommended when you are dramatically
changing the cell size of the data, such as in TM/aerial photo
merges (i.e., matches the 4 × 4 window more closely than the 2
× 2 window).

Digital Image Subsetting refers to breaking out or clipping or cutting a portion of a large file into
one or more smaller files according to area of interest for study and analysis. Often, image files
contain areas much larger than a particular study area. In these cases, it is helpful to reduce the size of
the image file to include only the area of interest (AOI). This not only eliminates the extraneous data
in the file, but it speeds up processing due to the smaller amount of data to process. This can be
important when dealing with multiband data.

Digital Image Mosaicking is the process of joining or combining of adjacent images having partially
common parts to generate a larger image files according to the larger area of interest for study and
analysis. To combine two or more image files, each file must be georeferenced to the same coordinate
system, or to each other.

Radiometric Enhancement:
Radiometric enhancement deals with the individual values of the pixels in the image.
Depending on the points and the bands in which they appear, radiometric enhancements that are
applied to one band may not be appropriate for other bands. Therefore, the radiometric enhancement
of a multiband image can usually be considered as a series of independent, single band enhancements
(Faust, 1989). Radiometric enhancement usually does not bring out the contrast of every pixel in an
image. Contrast can be lost between some pixels, while gained on others.

Figure : Histograms of Radiometrically Enhanced Data


37
The range between j and k in the histogram of the original data is about one third of the total range of
the data. When the same data are radiometrically enhanced, the range between j and k can be widened.
Therefore, the pixels between j and k gain contrast—it is easier to distinguish different brightness
values in these pixels.

However, the pixels outside the range between j and k are more grouped together than in the original
histogram to compensate for the stretch between j and k. Contrast among these pixels is lost.

The lookup table is a graph of that increases the contrast of input data file values by widening some
range of the input data (the range within the brackets). When radiometric enhancements are performed
on the display device, the transformation of data file values into brightness values is illustrated by the
graph of a lookup table. Note that the input range within the bracket is narrow, but the output
brightness values for the same pixels are stretched over a wider range. This process is called contrast
stretching. Notice that the graph line with the steepest (highest) slope brings out the most contrast by
stretching output values farther apart.

Figure: Graph of a Lookup Table

The different radiometric enhancement or contrast stretching are 1) Linear stretching, 2) Nonlinear
stretching, 3) Piecewise linear stretch, 4) Histogram equalization, 5) Level slicing.

Figure: Enhancement with Lookup Tables

Linear Contrast Stretch


A linear contrast stretch is a simple way to improve the visible contrast of an image. It is often
necessary to contrast-stretch a raw image data, so that they can be seen on the display. In most raw

38
data, the data file values fall within a narrow range—usually a range much narrower than the display
device is capable of displaying. That range can be expanded to utilize the total range of the display
device (usually 0 to 255). Generally, the linear contrast stretches are of four types 1) Minimum-
Maximum stretch 2) Saturation stretch 3) Average and standard deviation stretch 4) piecewise stretch

Minimum-Maximum stretch:

Saturation stretch:

Average and standard deviation stretch:

Piecewise stretch:

The histogram shows scene brightness values occurring only in limited range of 60 to 158. If we were
to use these image values directly in the display device, we would be using only a small portion of the
full range of possible display level. Display levels 0 to 59 and 159 to 255 would be compressed into
small range of display values, reducing the interpreter‘s ability to discriminate radiometric detail.

A more expensive display would results if we were to expand the range of image levels present in the
scene (60 to 158) to fill the range of display values (0 to 255). In the figure-C, the range of image
values has been uniformly expanded to fill the total range of output device. Subtle variation in input
image data values would now be the displayed in output tones that would be more readily
distinguished by the interpreter. Lighter tonal areas would appear lighter and dark areas would appear
darker. Output Digital Numbers (DN/)=[{Input(DN)- Min(DN) of input image}/ {Max(DN) of
input image - Min(DN) of input image}]x255
Frequency

Frequency

0 MIN DN MAX 255 0 MIN DN MAX 255

Saturation Stretch: The saturation stretch (also referred to as percentage linear contrast stretch or
tail trim) is similar to the minimum-maximum linear contrast stretch except this method uses specified
minimum and maximum values that lie in a certain percentage of pixels (Fig. 10.20). Generally, very
few numbers of pixels reside at two ends of a histogram, but they occupy a reasonable amount of
39
brightness values. Sometimes these tails of the histogram are trimmed and then the remainder part of
the histogram enhances more prominently. This is the main advantage of percent linear contrast
stretch. Pixels outside the defined range are mapped to either 0 (for DNs less than defined minimum
value) or 255 (for DNs higher than defined maximum value). The information content of the pixels
that saturate at 0 and 255 is lost, yet a more detailed analysis of certain aspects of the image enhanced
for better interpretation. It is not necessary that the same percentage be applied to each tail of the
histogram distribution.
Frequency

Frequency
0 MIN DN MAX 255 0 MIN DN MAX 255
Average and Standard Deviation Stretch: It is similar to percent stretch. A standard deviation from
the mean is often used to push the tails of the histogram beyond the original minimum and maximum
values.

Piecewise Linear Contrast Stretch


Generally, a piecewise linear contrast stretch allows for the enhancement of a specific portion of data
by dividing the lookup table into three sections: low, middle, and high. It enables you to create a
number of straight line segments that can simulate a curve. You can enhance the contrast or brightness
of any section in a single color gun at a time. This technique is very useful for enhancing image areas
in shadow or other areas of low contrast.

A piecewise linear contrast stretch normally follows two rules:


1) The data values are continuous; there can be no break in the values between High, Middle, and
Low. Range specifications adjust in relation to any changes to maintain the data value range.
2) The data values specified can go only in an upward, increasing direction, as shown in Figure.

The contrast value for each range represents the percent of the available output range that particular
range occupies. The brightness value for each range represents the middle of the total range of
brightness values occupied by that range. Since rules 1 and 2 above are enforced, as the contrast and
brightness values are changed, they may affect the contrast and brightness of other ranges. For
example, if the contrast of the low range increases, it forces the contrast of the middle to decrease.

Figure : Piecewise Linear Contrast Stretch

40
Nonlinear Contrast Stretch
A nonlinear spectral enhancement can be used to gradually increase or decrease contrast over a range,
instead of applying the same amount of contrast (slope) across the entire image. Usually, nonlinear
enhancements bring out the contrast in one range while decreasing the contrast in other ranges. The
graph of the function in Figure shows one example. Major Non linear contrast enhancement
techniques are 1) Histogram equilization 2) Histogram Normalization 3) Reference or special stretch
4) Density slicing/ level slicing 5) Thresholding

Figure : Nonlinear Radiometric Enhancement

Histogram Equalization
In this approach image values are assigned to the display levels on the basis of their frequency of
occurrences. As shown in figure-d more display values (and hence more radiometric detail) are
assigned to the frequently occurring portion of the histogram. The image value range of 109 to 158 is
stretched over large portion of display levels (39 to 255). A smaller portion is reserved for the
infrequently occurring image values of 60-108.
Histogram equalization is a nonlinear stretch that redistributes pixel values so that there is
approximately the same number of pixels with each value within a range. The result approximates a
flat histogram. Therefore, contrast is increased at the peaks of the histogram and lessened at the tails.

(a) Original Histogram (b) no stretch (d) Histogram stretch

The histogram shows scene brightness values occurring only in limited range of 60 to 158. If we were
to use these image values directly in the display device, we would be using only a small portion of the
full range of possible display level. Display levels 0 to 59 and 159 to 255 would be compressed into
small range of display values, reducing the interpreter‘s ability to discriminate radiometric detail.

41
Histogram Normalization:
Normal distribution of a histogram is actually a bell-shaped distribution (also known as Gaussian
distribution). In a normal distribution, most values are at or near the middle, by the peak of the bell
curve. Values that are more extreme are rarer, by the tails at the ends of the curve. Generally, a normal
distribution of the density in an image would create an image that is natural for a human observation.
In this sense, the histogram of the original image may be sometimes converted to the normalized
histogram. This method of contrast enhancement is based upon the histogram of the pixel values and
is called a Gaussian stretch because it involves the fitting of the observed histogram to a normal or
Gaussian histogram.

Normal distribution of a histogram

Brightness values 255


However, in this conversion, pixels with same grey scale should be reallocated to other pixels with a
different greyscales, in order to form a normalized histogram. Therefore, such a greyscale conversion
is not a 1: 1 conversion and thus enables no reverse conversion. Histogram normalization may be
applied, for instance, to an unfocussed image with a low dynamic range

Reference Stretch / Histogram matching: Reference stretch (also known as histogram matching or
histogram specification) is the process of determining a lookup table that converts the histogram of
one image to resemble the histogram of another. Histogram matching is useful for matching the data
of the same scene or adjacent scenes that were scanned on separate days, or are slightly different
because of the sun angle or atmospheric effects. This is especially useful for mosaicking or change
detection. To achieve good results with histogram matching, the two input images should have similar
characteristics:
The general shape of the histogram curves should be similar.
Relative dark and light features in the image should be the same.
• For some applications, the spatial resolution of the data should be the same.
• The relative distributions of land covers should be about the same, even when matching scenes that
are not of the same area. If one image has clouds and the other does not, then the clouds should be
removed before matching the histograms. This can be done using the AOI function. The AOI function
is available from the Viewer menu bar.

To match the histograms, a lookup table is mathematically derived, which serves as a function for
converting one histogram to the other, as illustrated in Figure

Figure 6-10: Histogram Matching, (a) Source histogram, (b) Mapped through the lookup table,
(c) Approximates model histogram.

42
Level Slice or density slicing:
A level slice is similar to histogram equalization in that it divides the data into equal amounts. It
involves combining the DNs of different values within a specified range or interval into a single
value. A level slice on a true color display creates a stair-stepped lookup table. The effect on the data
is that input file values are grouped together at regular intervals into a discrete number of levels, each
with one output brightness value.
Density slicing represents a group of contiguous digital numbers using a single value. Although some
details of the image is lost, the effect of noise can also be reduced by using density slicing. As a result
of density slicing, an image may be segmented, or sometimes contoured into sections of similar grey
level. This density slice (also called 'level slice') method works best on single-band images.

It is especially useful when a given surface feature has a unique and generally narrow set of DN
values. The new single value is assigned to some grey level (intensity) for display on the computer
monitor (or in a printout). All other DNs can be assigned another level, usually black. This yields a
simple map of the distribution of combined DNs. If several features, each have different (separable)
DN values, then several grey-level slices may be produced, each mapping the spatial distribution of its
corresponding feature. The new sets of slices are commonly assigned different colours in a photo or
display. This has been used in colouring classification maps in most image analysis software systems.

Thresholding:
This is a process of image enhancement by segmenting DN values into two distinct values separated
by a threshold DN as shown in figure . Thresholding produces binary output with sharply defined
spatial boundaries.
This type of image enhancement segments the image DNs into two distinct values (black = 0 and
white = 255) separated by a threshold DN as shown in

Thresholding produces binary ouput with sharply defined boundaries.


Output DN values= 255= Input DNs> DN of Threshold
Output DN values= 0= Input DNs< DN of Threshold

Special stretch is useful for special analyses; specific feature may be analysed in greater radiometric
detail by assigning the display range exclusively to a particular range of image values. For example, if
water features were represented by narrow range of values, in a scene, characteristic of water features
could be enhanced by stretching this small range to the full display range. As shown in figure, the
output range is devoted entirely to the small range of image values between 60 to 92. On the stretched

43
display, minute tonal variations in the water range would be greatly exaggerated. The brighter land
feature on the other hand, would be washed out by being displayed at a single bright white level(255).

Decorrelation Stretch:
The purpose of a contrast stretch is to alter the distribution of the image DN values within the 0 - 255
range of the display device, and utilize the full range of values in a linear fashion.

The decorrelation stretch stretches the principal components of an image, not to the original image.
A principal components transform converts a multiband image into a set of mutually orthogonal
images portraying inter-band variance. Depending on the DN ranges and the variance of the
individual input bands, these new images (PCs) occupy only a portion of the possible 0 – 255 data
range.
Each PC is separately stretched to fully utilize the data range. The new stretched PC composite image
is then retransformed to the original data areas. Either the original PCs or the stretched PCs may be
saved as a permanent image file for viewing after the stretch.

Spatial Enhancement
While radiometric enhancements operate on each pixel individually and it does not alter the pixel
values, whereas, spatial enhancement modifies pixel values based on the values of surrounding pixels.
Spatial enhancement deals largely with spatial frequency, which is the difference between the highest
and lowest values of a contiguous set of pixels. Jensen (Jensen, 1986) defines spatial frequency as
―the number of changes in brightness value per unit distance for any particular part of an image.‖
Consider the examples in Figure:
• zero spatial frequency—a flat image, in which every pixel has the same value
• low spatial frequency—an image consisting of a smoothly varying gray scale
• highest spatial frequency—an image consisting of a checkerboard of black and white pixels

44
Convolution Filtering
Convolution filtering is the process spatial filtering by averaging small sets of pixels across an image.
Convolution filtering involves moving a window of a set of pixels in dimension (3x3, 5x5 etc) over
each pixel in the image, applying a mathetical calculation using the pixel values under that window,
and replacing the central pixel with the new value. This window know as a convolution kernel, a
matrix of numbers(these numbers are also know as coefficient) that is used to average the value of
each pixel with the values of surrounding pixels. The kernel is moved along in both the row and
column dimensions with one pixel at a time and the calculation is repeated until the entire image has
been filtered and a new image is generated. The numbers in the matrix serve to weight this average
toward particular pixels. These numbers are often called coefficients, because they are used as such in
the mathematical equations.
Filtering is a broad term, which refers to the altering of spatial or spectral features for image
enhancement (Jensen, 1996). Convolution filtering is used to change the spatial frequency
characteristics of an image (Jensen, 1996).
To understand how one pixel is convolved, imagine that the convolution kernel is overlaid on the data
file values of the image (in one band), so that the pixel to be convolved is in the center of the window.

Figure: Applying
a Convolution
Kernel,

Figure, shows a 3 × 3 convolution kernel being applied to the pixel in the third column, third row of
the sample data (the pixel that corresponds to the center of the kernel). To compute the output value
for this pixel, each value in the convolution kernel is multiplied by the image pixel value that
corresponds to it. These products are summed, and the total is divided by the sum of the values in the
kernel, as shown here:
Integer [(-1 × 8) + (-1 × 6) + (-1 × 6) + (-1 × 2) + (16 × 8) + (-1 × 6) + (-1 × 2) + (-1 × 2) + (-
1 × 8) ÷ [(-1) + (-1) + (-1) + (-1) + 16 + (-1) + (-1) + (-1) + (-1)]
= int [(128-40) / (16-8)] = int (88 / 8) = int (11) = 11

45
In order to convolve the pixels at the edges of an image, pseudo data must be generated in order to
provide values on which the kernel can operate. In the example below, the pseudo data are derived by
reflection. This means the top row is duplicated above the first data row and the left column is
duplicated left of the first data column. If a second row or column is needed (for a 5 × 5 kernel for
example), the second data row or column is copied above or left of the first copy and so on. An
alternative to reflection is to create background value (usually zero) pseudo data; this is called Fill.

When the pixels in this example image are convolved, output values cannot be calculated for the last
row and column; here we have used is to show the unknown values. In practice, the last row and
column of an image are either reflected or filled just like the first row and column.

The kernel used in this example is a high frequency kernel, as explained below. It is important to note
that the relatively lower values become lower, and the higher values become higher, thus increasing
the spatial frequency of the image.

Figure: Output Values for Convolution Kernel


Convolution Formula
The following formula is used to derive an output data file value for the pixel being convolved
(in the center):

Where:
fij = the coefficient of a convolution kernel at position i,j (in the kernel)
dij = the data value of the pixel that corresponds to fij
q = the dimension of the kernel, assuming a square kernel (if q = 3, the kernel is 3 × 3)
F = either the sum of the coefficients of the kernel, or 1 if the sum of coefficients is 0
V = the output pixel value
In cases where V is less than 0, V is clipped to 0.

The sum of the coefficients (F) is used as the denominator of the equation above, so that the output
values are in relatively the same range as the input values. Since F cannot equal zero (division by zero
is not defined), F is set to 1 if the sum is zero.

46
Zero-Sum Kernels
Zero-sum kernels are kernels in which the sum of all coefficients in the kernel equals zero. When a
zero-sum kernel is used, then the sum of the coefficients is not used in the convolution equation, as
above. In this case, no division is performed (F = 1), since division by zero is not defined.
This generally causes the output values to be:
• zero in areas where all input values are equal (no edges)
• low in areas of low spatial frequency
• extreme in areas of high spatial frequency (high values become much higher, low values become
much lower)

Therefore, a zero-sum kernel is an edge detector, which usually smooths out or zeros out areas of low
spatial frequency and creates a sharp contrast where spatial frequency is high, which is at the edges
between homogeneous (homogeneity is low spatial frequency) groups of pixels. The resulting image
often consists of only edges and zeros. Zero-sum kernels can be biased to detect edges in a particular
direction. For example, this 3 × 3 kernel is biased to the south (Jensen, 1996).

Low-Frequency Kernel/ Low Pass Filter (smoothening):


A low pass filter is designed to emphasize larger, homogenious areas of similar tone and reduce the
smaller detail in an image. Thus, lowpass filters generally serve to smooth the appearance of an
image. Average filter, Mode filter, Median filter are examples of low-pass filter.

Averaging filter:
A 2D moving average filter is defined in terms of its dimensions, which must be odd, positive and
integral. The output is found by dividing the sum of the products of corresponding convolution kernel
and image element often divided by the numbers of kernel element. Averaging filter also known as
smoothening filter.

Mean Filter
The Mean filter is a simple calculation. The pixel of interest (center of window) is replaced by the
arithmetic average of all values within the window. This filter does not remove the aberrant (speckle)
value; it averages it into the data. Below is an example of a low-frequency kernel, or low-pass kernel,
which decreases spatial frequency.

This kernel simply averages the values of the pixels, causing them to be more homogeneous. The
resulting image looks either smoother or more blurred. In theory, a bright and a dark pixel within the
same window would cancel each other out. This consideration would argue in favor of a large window
size (e.g., 7 × 7). However, averaging results in a loss of detail, which argues for a small window size.

47
In general, this is the least satisfactory method of speckle reduction. It is useful for applications where
loss of resolution is not a problem.

Median Filter
Better ways to reduce speckle, but still simplistic, is the Median filter. This filter operates by
arranging all DN values in sequential order within the window that you define. The pixel of interest is
replaced by the value in the center of this distribution. A Median filter is useful for removing pulse or
spike noise. Pulse functions of less than one-half of the moving window width are suppressed or
eliminated. In addition, step functions or ramp functions are retained.

The median filter finds the median pixel value. In the aforementioned example, pixels are arranged in
106, 197, 198, 200, 200, 201,204, 209, 210.
There are nine numbers in the list, so the middle one will be the (9 + 1) ÷ 2 = 10 ÷ 2 = 5th
So the median that is 5th pixel is 200. So, in the output filtered image center value of the window
(106) is replaced by 200.

Mode Filter:
The mode filter is primarily used to cleanup thematic maps for presentation purpose. This filter
computes the mode of the gray-level values (the most frequently occurring grey-level value) within
the filter window surrounding each pixel. Pixels are arranged in 57, 58, 60, 60,61, 64, 69, 70,125,
where, 60 occur two times. So, in the output filtered image center value of the window (125) is
replaced by 60.

It is possible that a decision have to be made between two values with the same frequency of
occurrence. In this case if the centre value is tie values, it chosen, otherwise first tie value
encountered is chosen.

For example,(1, 5, 3, 2, 3, 5, 4, 5), pixel 5 and 3 occur thrice. Neither 3 nor 5 is in the centre position,
the pixel 5 in the top row is encountered first as the value are read, and so it is chosen as mode value.

48
High-Frequency Kernels/ High-pass Filter:
High-pass filters do the opposite of low-pass filter and serve to sharpen the appearance of fine detail
in an image. A high-frequency kernel, or high-pass kernel, has the effect of increasing spatial
frequency. Just subtracting the low-frequency image resulting from a low-pass filter from the original
image can enhance high spatial frequencies. High-frequency information allows us either to isolate, or
amplify the local detail lithe high-frequency detail is amplified by adding back to the image some
multiple of the high-frequency component extracted by the filter, then the result is a sharper, de-
blurred image. High-frequency kernels serve as edge enhancers, since they bring out the edges
between homogeneous groups of pixels. Unlike edge detectors (such as zero-sum kernels), they
highlight edges and do not necessarily eliminate other features.

When this kernel is used on a set of pixels in which a relatively low value is surrounded by higher
values, like this, the low value gets lower:

Inversely, when the kernel is used on a set of pixels in which a relatively high value is surrounded by
lower values, the high value becomes higher. In either case, spatial frequency is increased by this
kernel:

Edge Detection Filter: Edge and line detection are important operations in digital image processing.
Directional, or edge detection filters are designed to highlight linear features, such as roads or field
boundaries. These filters can also be designed to enhance features which are oriented in specific
directions. These filters are useful in various fields such as geology, for the detection of linear
geologic structures. Zero-sum kernels are kernels in which the sum of all coefficients in the kernel
equals zero. A common type of edge detection kernel is a zero-sum kernel, In case of zero-sum
kernel, the sum of the coefficients is not used in the convolution equation (no division is performed),
since division by zero is not defined. This generally causes the output values to be zero in areas
where all input values are equal, low in areas of low spatial frequency, extreme in areas of high
spatial frequency. Therefore, a zero-sum kernel is an edge detector, which usually smoothes out or
zeros out areas of low spatial frequency and creates a sharp contrast where spatial frequency is high.
The resulting image often contains only edges and zeros. Following are examples of zero-sum
kernels.
49
Sobel filtering: The Sobel operator is used in image processing, particularly within edge detection
algorithms. Technically, it is a discrete differentiation operator, computing an approximation of the
gradient of the image intensity function. At each point in the image, the result of the Sobel operator is
either the corresponding gradient vector or the norm of this vector. The Sobel operator is based on
convolving the image with a small, separable, and integer valued filter in horizontal and vertical
direction and is therefore relatively inexpensive in terms of computations. On the other hand, the
gradient approximation that it produces is relatively crude, in particular for high frequency variations
in the image. The operator uses two 3×3 kernels which are convolved with the original image to
calculate approximations of the derivatives - one for horizontal changes, and one for vertical.

Prewitt Filtering: The Prewitt operator is used in image processing, particularly within edge
detection algorithms. Technically, it is a discrete differentiation operator, computing an
approximation of the gradient of the image intensity function. At each point in the image, the result
of the Prewitt operator is either the corresponding gradient vector or the norm of this vector. The
Prewitt operator is based on convolving the image with a small, separable, and integer valued filter in
horizontal and vertical direction and is therefore relatively inexpensive in terms of computations. On
the other hand, the gradient approximation which it produces is relatively crude, in particular for high
frequency variations in the image. The Prewitt operator is named for Judith Prewitt. Mathematically,
the operator uses two 3×3 kernels which are convolved with the original image to calculate
approximations of the derivatives - one for horizontal changes, and one for vertical.

The Laplacian is a 2-D isotropic measure of the 2nd spatial derivative of an image. The Laplacian
of an image highlights regions of rapid intensity change and is therefore often used for edge
detection (see zero crossing edge detectors). The Laplacian is often applied to an image that has first
been smoothed with something approximating a Gaussian smoothing filter in order to reduce its
sensitivity to noise, and hence the two variants will be described together here. The operator
normally takes a single graylevel image as input and produces another graylevel image as output.

Two commonly used discrete approximations to the Laplacian filter. (Note, we have defined the
Laplacian using a negative peak because this is more common; however, it is equally valid to use the
opposite sign convention.)
50
Adaptive filter: An adaptive filter is a filter that self-adjusts its transfer function according to an
optimization algorithm driven by an error signal. Because of the complexity of the optimization
algorithms, most adaptive filters are digital filters. By way of contrast, a non-adaptive filter has a
static transfer function. Adaptive filters are required for some applications because some parameters
of the desired processing operation (for instance, the locations of reflective surfaces in a reverberant
space) are not known in advance. The adaptive filter uses feedback in the form of an error signal to
refine its transfer function to match the changing parameters.
Generally speaking, the adaptive process involves the use of a cost function, which is a criterion for
optimum performance of the filter, to feed an algorithm, which determines how to modify filter
transfer function to minimize the cost on the next iteration.
As the power of digital signal processors has increased, adaptive filters have become much more
common and are now routinely used in devices such as mobile phones and other communication
devices, camcorders and digital cameras, and medical monitoring equipment.

Adaptive Filter Adaptive filters have kernel coefficients calculated for each window position based
on the mean and variance of the original DN in the underlying image. A powerful technique for
sharpening images in the presence of low noise levels is via an adaptive filtering algorithm. Here we
look at a method of re-defining a high-pass filter as the sum of a collection of edge sharpening
kernels. Following is one example of highpass filter.
This filter can be re-written as sum of the eight edge-sharpening kernels as follows:

0 0 0 -1 0 0
-2 2 0 0 1 0
0 0 0 0 0 0

0 0 -1 0 -2 0 0 0 0
0 0 0 0 0 0 0 0 0
0 1 0 0 2 0 0 2 0
0 2 -2 0 1 0 0 1 0
0 0 0 0 0 0
0 0 0 0 0 -1 0 -2 0 -1 0 0

Adaptive filtering using these kernels can be performed by filtering the image with each kernel, in
turn, and then summing those outputs that exceed a threshold. As a final step, this result is added to
the original image. This use of a threshold makes the filter adaptive in the sense that it overcomes the
directionality of any single kernel by combining the results of filtering with a selection of kernels,
each of which is tuned to an edge-sharpening inherent in the image.

Frequency Domain Filter:


The Fourier transform of an image, as expressed by the amplitude spectrum is a breakdown of the
image into its frequency or scale components. Filtering of these components is done using frequency
domain filters that operate on the amplitude spectrum of an image and remove, attenuate or amplify

51
the amplitudes in specified wavebands. The frequency domain can be represented as a 2D scatter plot
known as a Fourier spectrum (or Fourier domain), in which lower frequencies fall at the centre and
progressively higher frequencies are plotted outwards.
Filtering in the frequency domain consists of the following three steps:
1. Fourier transform the original image and compute the Fourier spectrum.
2. Select an appropriate filter function and multiply by the elements of the Fourier spectrum.
3. Perform an inverse Fourier transform to return to the spatial domain for display purposes.

Crisp Filter:
The crisp filter sharpens the overall scene luminance without distorting the inter-band variance
content of the image. This is a useful enhancement if the image is blurred due to atmospheric haze,
rapid sensor motion, etc. of the sensor.
The algorithm consists of the following three steps:
1. Calculate principal components of multi-band input image.
2. Convolve PC-1 with summary filter.
3. Retransform to RGB space (Faust 1993).

Spectral Enhancement / Image Transformation:


The enhancement techniques that follow require more than one band of data. They can be used to:
• compress bands of data that are similar
• extract new bands of data that are more interpretable to the eye
• apply mathematical transforms and algorithms
• display a wider variety of information in the three available color guns (R, G, B)
Image Arithmetic Operations:
The operation of addition, subtraction, multiplication, and division are performed on two or more co-
registered images of the same geographic area. These images may be separate spectral bands from
single multi-spectral data set or they may be individual bands from image data sets that have been
collected dates.

Image Addition (Averaging):


If multiple, co-registered images of a given region are available for the same time and date of
imaging, then addition (averaging) of the multiple images can be used as a means of reducing the
overall noise (Fig.). We can obtain new DN values of a pixel in the output image by averaging the DN
values of corresponding pixels of input images.

Another approach of image addition is called temporal averaging. For instance, it has the advantage of
reducing the speckle of radar image without losing spatial resolution. In this case, pixel-by-pixel
averaging is performed for multiple co-registered images of same geographic area taken at different
time. Another example of temporal averaging is creating a temperature map for a given area and given
year by averaging multiple data of different time.

52
16 20 65 19 56 64 25 65 36 42 45 42
69 56 37 28 + 45 65 85 75 = 57 61 61 52
65 75 25 46 35 29 35 64 50 52 30 55
64 59 57 38 65 98 25 54 65 79 41 46

Image Subtraction:
The subtraction operation is often carried out on a pair of co-registered images of the same area taken
at different times. Image subtraction is often used to identify changes (change detection) that have
occurred between images collected on different dates.

Typically, two images which have been geometrically registered are used with the pixel (brightness)
values in one image being subtracted from the pixel values in the other. In such an image, areas where
there has been little or no change between the original images contain resultant brightness values
around 0, while those areas where significant change has occurred contain values higher or lower than
0, e.g., brighter or darker depending on the 'direction' of change in reflectance between the two
images. This type of image transform can be useful for mapping changes in urban development
around cities and for identifying areas where deforestation is occurring.

2 5 4 6 7 5 3 1 -5 0 1 5
3 5 8 9 _ =
1 9 3 0 2 ---4 5 9
6 7 9 5 6 9 9 3 0 -2 0 2
8 9 6 8 8 6 2 7 0 3 4 1

It is also often possible to use just a single image as input and subtract a constant value from all the
pixels. Simple subtraction of a constant from an image can be used to remove the extra energy
recorded by the sensor due to atmospheric effects.

Image Multiplication:
Pixel-by-pixel multiplication of two images is rarely performed in practice. Multiplication operation
is, however, a useful one if an image of interest is composed of two or more distinctive region and if
the analysi is interested only in one of these regions. In some case, multiple co-registered images of
same are taken at same time and date are considered to multiply one with other, which increase the
variation of DNs between pixels.

53
Indices: Indices are used to create output images by mathematically combining the DN values of
different bands. These may be simplistic: (Band X - Band Y) or more complex:

In many instances, these indices are ratios of band DN values:

These ratio images are derived from the absorption/reflection spectra of the material of interest.
The absorption is based on the molecular bonds in the (surface) material. Thus, the ratio often gives
information on the chemical composition of the target.

Vegetation index:

RATIO vegetation indices (Rose et al.,1973) separate green vegetation from soil
background by dividing the reflectance values contained in the near IR band (NIR) by those
contained in the red band (R).
Ratio = NIR / RED
This clearly shows the contrast between the red and infrared bands for vegetated pixels with high
index values being produced by combinations of low red (because of absorption by chlorophyll) and
high infrared (as a result of leaf structure) reflectance. Ratio value less than 1.0 is taken as non-
vegetation while ratio value greater than 1.0 is considered as vegetation. The major drawback in this
method is the division by zero. Pixel value of zero in red band will give the infinite ratio value. To
avoid this situation Normalized Difference Vegetation Index (NDVI) is computed.

Normalized Difference Vegetation Index (NDVI)


NDVI overcomes the problem of Ratio method (i.e. division by zero). It was introduced in order to
produce a spectral VI that separates green vegetation from its background soil brightness using IRS
1C MSS digital data (Rose et al.,1974) and is given by,

This is the most commonly used VI due to the ability to minimize topographic effects while
producing a linear measurement scale ranging from –1 to +1.The negative value represents non
vegetated area while positive value represents vegetated area.

Land cover analysis is done using different slope and distance based vegetative indices (VI‘s). VI is
computed based on the data grabbed by space borne sensors in the range 0.6-0.7 (red band) and 0.7-
0.9 (Near-IR band), which helps in delineating the area under vegetation and non-vegetation areas.

54
Live green plants absorb solar radiation in the photosynthetically active radiation (0.4 µm to 0.7 µm is
PAR) spectral region, which they use as a source of energy in the process of photosynthesis. Leaf
cells have also evolved to scatter (i.e., reflect and transmit) solar radiation in the near-infrared spectral
region (which carries approximately half of the total incoming solar energy), because the energy level
per photon in that domain (wavelengths longer than about 700 nanometers) is not sufficient to be
useful to synthesize organic molecules. A strong absorption at these wavelengths would only result in
overheating the plant and possibly damaging the tissues. Hence, live green plants appear relatively
dark in the PAR and relatively bright in the near-infrared. By contrast, clouds and snow tend to be
rather bright in the red (as well as other visible wavelengths) and quite dark in the near-
infrared. The pigment in plant leaves, chlorophyll, strongly absorbs visible light (from 0.4 to
0.7 µm) for use in photosynthesis. The cell structure of the leaves, on the other hand, strongly
reflects near-infrared light (from 0.7 to 1.1 µm). The more leaves a plant has, the more these
wavelengths of light are affected, respectively.
In general, if there is much more reflected radiation in near-infrared wavelengths than in visible
wavelengths, then the vegetation in that pixel is likely to be dense and may contain some type of
forest. Subsequent work has shown that the NDVI is directly related to the photosynthetic capacity
and hence energy absorption of plant canopies
1. Negative values of NDVI (values approaching -1) correspond to water.
2. Values close to zero (-0.1 to 0.1) generally correspond to barren areas of rock, sand, or snow.
3. Low, positive values represent shrub and grassland (approximately 0.2 to 0.4).
4. High values indicate temperate and tropical rainforests (values approaching 1).
It can be seen from its mathematical definition that the NDVI of an area containing a dense vegetation
canopy will tend to positive values (say 0.3 to 0.8) while clouds and snow fields will be characterized
by negative values of this index. Other targets on Earth visible from space include
 free standing water (e.g., oceans, seas, lakes and rivers) which have a rather low reflectance in
both spectral bands (at least away from shores) and thus result in very low positive or even
slightly negative NDVI values,
 soils which generally exhibit a near-infrared spectral reflectance somewhat larger than the
red, and thus tend to also generate rather small positive NDVI values (say 0.1 to 0.2).
In addition to the simplicity of the algorithm and its capacity to broadly distinguish vegetated areas
from other surface types, the NDVI also has the advantage of compressing the size of the data to be
manipulated by a factor 2 (or more), since it replaces the two spectral bands by a single new field
(eventually coded on 8 bits instead of the 10 or more bits of the original data)

Ratio Vegetation Index (RVI)


The ratio vegetation index is the reverse of the standard simple ratio (Richerdson and Wiegand, 1977),
RVI= RED/NIR
The range for RVI extends from 0 to infinity. The ratio value less than 1.0 is taken as vegetation while
value greater then 1.0 is considered as non-vegetation area.

55
Normalized ratio Vegetation Indexes (NRVI)
Normalized ratio vegetation index is a modification of the RVI (Baret and Guyot, 1991) where the
result of RVI-1 is normalized over RVI+1.
NRVI=(RVI-1)/(RVI+1)
This normalization is similar in effect to that of NDVI, i.e it reduces topographic, illumination and
atmospheric effects and it creates a statistically desirable normal distribution. Ratio value less than 0.0
indicates vegetation area while greater than 0.0 values represents non-vegetation.

Transformed Vegetation Index (TVI) / Transformed NDVI


TVI is modified version of NDVI to avoid operating with negative NDVI values (Deering et al., 1975).
Adding 0.50 to NDVI value and taking the square root of the result computes TVI value. The
calculation of the square root is intended to correct NDVI values approximate a Poisson distribution
and introduce a normal distribution.

However negative values still exist for values less than –0.5 NDVI. There is no technical difference
between NDVI and TVI in terms of image output or active vegetation detection. Ratio values less than
0.71 is taken as non-vegetation and value greater than 0.71 gives the vegetation area.

Corrected Transformed Vegetation Index (CTVI):


CTVI suppresses the negative values in NDVI and TVI (Perry and Lautenschlager, 1984). Adding a
constant of 0.5 to all NDVI values does not always eliminate all negative values as NDVI values
ranges from –1 to +1. Values that is lower than –0.50 will leave small negative values after the
addition operation. Thus CTVI is intended to resolve this situation by dividing (NDVI+0.50) by its
absolute value ABS (NDVI + 0.50) and multiplying the result by the square root of the absolute value
(SQRT[ABS(NDVI + 0.50)]).

The correction is applied in a uniform manner, the out put image using CTVI should have no
difference with the initial NDVI image or the TVI whenever TVI properly carries out the square root
operation. The correction is intended to eliminate negative values and generate a VI image that is
similar to, if not better than, the NDVI. Ratio value less than 0.71 is taken as non-vegetation and value
grater than 0.71 gives the vegetation area.

Thiam’s Transformed Vegetation Index (TTVI)


The CTVI image is very noisy due to an overestimation of the greenness, which can be avoided by
ignoring the first term of the CTVI, and it provides the better results (Thiam, 1997) .This is done by

56
simply taking the square root of the absolute values of the NDVI in the original TVI expression to
have a new VI called as TTVI. It can be defined as:

Ratio value less than 0.71 is taken as non-vegetation and value grater than 0.71 gives the
vegetation area.

Various commonly used Band Ratios are:


• Iron Oxide = TM 3/1
• Clay Minerals = TM 5/7
• Ferrous Minerals = TM 5/4
• Mineral Composite = TM 5/7, 5/4, 3/1
• Hydrothermal Composite = TM 5/7, 3/1, 4/3
These are derived from the absorption spectra of the material of interest. The numerator is a
baseline of background absorption and the denominator is an absorption peak.
These ratio images are derived from the reflection/absorption spectra of the material of interest. The
absorption is based on the molecular bonds in the (surface) material. Thus, the ratio often gives
information on the chemical composition of the target.

Iron oxide detection


Vibrational transitions in hydroxyls produce reflectance anomalies in the near infrared region of the
electromagnetic spectrum. The middle infrared region 1.65μm (TM5) contains high reflectance
anomalies and high absorption at approximately 2.2μm (TM7) for the three swelling indicator
minerals, while iron oxide and vegetation have similar reflectance in TM bands1 (0.485μm) and TM
bands 2(0.56μm). Band 3(0.66μm) shows high reflection for iron oxides and a strong absorption from
vegetation. Thus, the band ratio 3/1 (red/blue) and 3/2 (red/green) ratios are important for
delineating ferric iron-rich rocks (lighttones) and ferric iron-poor rocks (dark tones).
magnetite (Fe3O4), hematite (α-Fe2O3), wüstite (FeO). Iron oxides are chemical compounds
composed of iron and oxygen

Ferrous mineral detection


Another band ratio used for the detection of ferrous minerals is the 5/4 ratio. Band 4(0.83μm) includes
the typical feature for vegetation identification, and contains absorption feature for iron oxides at 0.9
μm making vegetation and iron oxides to share a lot in terms of spectral information. TM band 5
(1.65μm) and TM band 7 (2.2μm) are helpful in differentiating vegetation from hydroxyl and iron
oxides because they have distinct reflectance curves.

Hydrothermal Alteration zones detection


Rock or mineral phase changes that are caused by the interaction of hydrothermal liquids and wall
rock. Hydrothermal alteration is defined as any alteration of rocks or minerals by the reaction of
hydrothermal fluid with preexisting solid phases. Hydrothermal alteration can be isochemical, like
57
metamorphism, and dominated by mineralogical changes, or it can be metasomatic and result in
significant addition or removal of elements.

Many researchers used Landsat TM data for the detection of hydrothermal alteration zones in
different countries taking into account the specific characteristics of each region. The most
characteristic combination is the 5/7, 3/1 4/3 RGB false color composite.

Clay Minerals: TM 5/7


Clay minerals (CM) consisting of a group of hydrous aluminium silicates (phyllosilicates or layer
silicates less than 2 μm in diameter) are important for agriculture (Moore and Reynolds 1977; Murray
2006). Kaolinite, illite, vermiculite, montmorillonite-smectite and chlorite are the five major types of
CM (Sposito 1989),

Principal Components Analysis:


Different bands of multi-spectral data are often highly correlated and thus contain similar information.
For example, bands 2 and 3 (green and red, respectively) typically have similar visual appearances,
since reflectance for some surface cover types are almost equal. Image transformation techniques
based on complex processing of the statistical characteristics of multi-band data sets can be used to
reduce this data redundancy and correlation between bands. One such transform is called principal
component transformation (or PCT) or principal component analysis (PCA). The objective of this
transformation is to reduce the dimensionality (i.e., the number of bands) in the data, and compress as
much of the information in the original bands into fewer bands. The 'new' bands that result from this
statistical procedure are called components. This process attempts to maximize (statistically) the
amount of information (or variance) from the original data into the least number of new components.

Principal components analysis (PCA) is often used as a method of data compression. It allows
redundant data to be compacted into fewer bands—that is, the dimensionality of the data is reduced.
The bands of PCA data are noncorrelated and independent, and are often more interpretable than the
source data (Jensen, 1996; Faust, 989).

The process is easily explained graphically with an example of data in two bands. Below is an
example of a two-band scatterplot, which
shows the relationships of data file values in
two bands. The values of one band are plotted
against those of the other. If both bands have
normal distributions, an ellipse shapes results.

58
In case of multi-band image (n-dimensional
histogram), an ellipse (2D), ellipsoid (3D),
or hyper-ellipsoid (more than 3D) is formed
if the distributions of each input band are
normal or near normal. The concept of
hyper-ellipsoid is hypothetical. To transform
the original data onto the new principal
component axes, transformation coefficients
are obtained that are further applied in a
linear fashion to the original pixel values.
This linear transformation is derived from the covariance matrix of the original data set. These
transformation coefficients describe the lengths (eigenvalues) and directions (eigenvectors) of the
principal axes.

The length and direction of the widest transect of the ellipse are calculated using matrix algebra. The
transect, which corresponds to the major (longest) axis of the ellipse, is called the first principal
component of the data. The direction of the first principal component is the first eigenvector, and its
length is the first eigenvalue. A new axis of the ellipse is defined by this first principal component.
The points in the scatterplot are now given new coordinates, which correspond to this new axis. Since,
in spectral space (feature space), the coordinates of the points are the data file values, new data file
values are derived from this process. These values are stored in the first principal component band of
a new data file. The first principal component shows the direction and length of the widest transect of
the ellipse (Fig.b). Therefore, as an axis in spectral space, it measures the highest variation within the
data. Figure (c) shows that the first eigenvalue is always greater than the ranges of the input bands,
just as the hypotenuse of a right triangle must always be longer than the legs.
59
The second principal component is the widest transect of the ellipse that is orthogonal (perpendicular)
to the first principal component. As such, the second principal component describes the largest
amount of variance in the data that is not already accounted by the first principal component (Fig.). In
a 2D analysis, the second principal component corresponds to the minor axis of the ellipse.

In n dimensions, there are n principal components. Each successive principal component is the widest
transect of the ellipse that is orthogonal to the previous components in the n-dimensional space of the
scatterplot (Faust 1989), and accounts for a decreasing amount of the variation in the data, which is
not already accounted for by previous principal components (Taylor 1977).
To transform the spatial domain (original data file values) into the principal component values, the
following equation is used:

Where:
e = the number of the principal component (first, second)
Pe = the output principal component value for principal component number e
k = a particular input band
n = the total number of bands
dk = an input data file value in band k
Eke = the eigenvector matrix element at row k, column e

Although there are n output bands in a PCA, the first few bands account for a high proportion of the
variance in the data—in some cases, almost 100%. Therefore, PCA is useful for compressing data into
fewer bands. In other applications, useful information can be gathered from
the principal component bands with the least variance. These bands can show subtle
details in the image that were obscured by higher contrast in the original image. These bands may also
show regular noise in the data (for example, the striping in old MSS data) (Faust, 1989).

Tasseled Cap
The different bands in a multispectral image can be visualized as defining an N-dimensional space
where N is the number of bands. Each pixel, positioned according to its DN value in each band, lies
within the N-dimensional space. This pixel distribution is determined by the absorption/reflection
spectra of the imaged material. This clustering of the pixels is termed the data structure (Crist and
Kauth, 1986).
The data structure can be considered a multidimensional hyper-ellipsoid. The principal axes of this
data structure are not necessarily aligned with the axes of the data space (defined as the bands of the
input image). They are more directly related to the absorption spectra. For viewing purposes, it is
advantageous to rotate the N-dimensional space such that one or two of the data structure axes are

60
aligned with the Viewer X and Y axes. In particular, you could view the axes that are largest for the
data structure produced by the absorption peaks of special interest for the application.

For example, a geologist and a botanist are interested in different absorption features. They would
want to view different data structures and therefore, different data structure axes. Both would benefit
from viewing the data in a way that would maximize visibility of the data structure of interest.

If four bands are transformed in a feature space, four axes are generated. Tasselled cap is described
using three axes. The fourth axis that could not be characterized satisfactorily was called non-such.
However, for Lands at- TM, it has been characterized as indicating haze or noise.

Later, Crist, Cicone, and Kauth developed a new transformation technique for Landsat TM data (Crist
and Kauth 1986; Crist and Cicone 1984). Their new redness or brightness and greenness are defined
as
Redness = 0.3037 TM1 + 0.2793 TM2 + 0.4743 TM3 + 0.5586 TM4 + 0.5082 TM5 + 0.1863 TM7
Greenness = -0.2848 TM1 - 0.2435 TM2 - 0.5436 TM3 + 0.7243 TM4 + 0.0840 TM5 - 0.1800 TM7
Wetness = 0.1509 TM1 + 0.1973 TM2 + 0.3279 TM3 + 0.3406 TM4 - 0.7112 TM5 - 0.4572 TM7
Haze = -0.8242 TM1 + 0.0849 TM2 + 0.4392 TM3 - 0.0580 TM4 + 0.2012 TM5 - 0.2768 TM7

Fourier Transform:
A Fourier transform is a linear transformation that allows calculation of the coefficients
necessary for the sine and cosine terms to adequately represent the image.
Fourier transformations are typically used for the removal of noise such as striping, spots, or
vibration in imagery by identifying periodicities (areas of high spatial frequency). Fourier
editing can be used to remove regular errors in data such as those caused by sensor anomalies
(e.g., striping). This analysis technique can also be used across bands as another form of
pattern/feature recognition.

Fast Fourier Transform (FFT), a classical image filtering technique, is used to convert a raster image
from the spatial domain into a frequency domain image. The FFT calculation converts the image into
a series of two-dimensional sine waves of various frequencies. The Fourier image can be edited
accordingly for image enhancement such as sharpening, contrast manipulation and smoothing.
Sharpening is achieved by using a high-pass filter whose function is to attenuate low frequencies,
whereas image smoothing is done by low-pass filter. Sometimes combination of both of low-pass as
well as high-pass filters, known as band pass filter is used. In the frequency domain the high-pass
filter is implemented by attenuating the pixel frequencies with the help of different window functions
viz., Ideal, Bartlett (Triangular), Butterworth, Gaussian, Hanning and Hamming etc (ERDAS, 2001).

Let us consider a function f (x, y) of two variables x and y, where x = 0, 1, 2,…., N-1, and y =
0, 1, 2,…., M-1. The function f (x, y) represents digital value of an image in the xth row, yth column;

61
M, N are the maximum numbers of rows and columns in the image which are multiple of two. Then
the Forward Fourier Transform of f (x, y) is defined as (Gonzalez and Woods, 1992; Jahne, 1993);

Where:
M = the number of pixels horizontally
N = the number of pixels vertically
u,v = spatial frequency variables
e = 2.71828, the natural logarithm base
j =  1 ; the imaginary component of a complex number.

The number of pixels horizontally and vertically must each be a power of two. If the dimensions of
the input image are not a power of two, they are padded up to the next highest power of two.

f(x, y)
Input image/
spatial domain

FFT
F(u, v) H(u, v)
Frequency domain Fourier filter
image function

G (u, v) =F(u, v)*H(u, v)


Editing power spectrum using
different filter function

IFT

g(x, y)
Output filtered
Image

Figure Schematic diagram for FFT filtering technique

The raster image generated by the FFT calculation is not an optimum image for viewing or editing.
Each pixel of a fourier image is a complex number (i.e., it has two components: real and imaginary).
For display as a single image, these components are combined in a root-sum of squares operation.

62
Where:
M = the number of pixels horizontally
N = the number of pixels vertically
u, v = spatial frequency variables
e = 2.71828, the natural logarithm base. Equations [1] and [2] are known as frequency transform pair.

Colour Space Transformation


To describe visually perceived colour of an image, instead of using RGB components, sometimes we
use hue, saturation, and intensity (HSI or IHS) for subjective sensation of colour, colour purity, and
brightness, respectively.

Hue refers to a specific tone of colour. Saturation refers to the purity, or intensity of a colour.
Intensity refers to how much bright, say, white or black, is contained within a colour.

Hue is generated by mixing red, green, and blue that are characterized by coordinates on the RGB
axes of the colour cube. The hue-saturation-intensity hexacone model (Fig.), where hue is the
dominant wavelength of the perceived colour represented by angular position around the apex of a
hexacone, saturation or purity is given by distance from the central vertical axis of the hexacone, and
intensity or brightness is represented by distance above the apex of the hexacone. Hue is what we
perceive as colour. Saturation is the degree of purity of the colour and may be considered as the
amount of white mixed in with the colour. It is sometimes useful to convert from RGB colour cube
coordinates to HSI hexacone coordinates, and vice versa. The HSI to RGB or RGB to HSI can be
derived through the following transformation equations:
If we consider
R, G, and B are each in the range of 0 to 1.0
I and S are each in the range of 0 to 1.0
H is in the range of 0 to 360
Then, In case of HSI to RGB
1= (R + G + B)/3
S = 1 - (3/(R + G + B)) X a; where a is the minimum of R, G, and B
H = cos-1 [(0.5 x (R - G) + (R - B)) / ((R - G)2 + (R- B) x (G-B))0.5)]
If S = 0, then H is meaningless.

The hue, saturation, and intensity transform is useful in two ways: first, as a method of image
enhancement and second, as a means of combining co-registered images from different sources. The
advantage of the HIS system is that it is a more precise representation of human colour vision than
the RGB system. This transformation has been quite
useful for geological applications, where pixel value
in the hue band represents the colour code of the
object (such as soil) and the object type can be
identified based on their colour code.

Literature proposes many IHS transformation


algorithms, which have been developed for

63
converting the RGB values. Some are also named HSV (hue, saturation, value) or HLS (hue,
luminance/lightness, saturation). (Fig.1) illustrates the geometric interpretation. While the complexity
of the models varies, they produce similar values for hue and saturation.

(a) RGB colour cube and (b) HSI hexacone


model

The hexcone transformation of IHS is referred


to as HSV model which derives its name from
the parameters, hue, saturation, and value, the
term ―value‖ instead of ―intensity‖ in this
system.

Resolution Merge/ Image Fusion:


The data fusion is the process to combine multisource data, which have different characteristics such
as, temporal, spatial, spectral and radiometric resolution to create a composite image that contains a
better description of the scene.
The image fusion is a novel method for combining spectral information of coarse resolution image
with finer spatial resolution image. The resulting merged image is a product that synergistically
integrates the information provided by various sensors or by the same sensor (Simone et al., 2002),
which may be found useful for human visual perception, provides faster interpretation and can help in
extracting more features (Wen and Chen, 2004). Fusion of data can reduce the uncertainty associated
with the data acquired from different sensors or from same sensor with temporal variation. Further,
the fusion techniques may improve interpretation capabilities with respect to subsequent tasks by
using complementary information source images (Wen and Chen, 2004).

The fusion of two data sets can be done in order to obtain one single data set with the qualities of both
(Saraf, 1999). For example, the low-resolution multispectral satellite imagery can be combined with
the higher resolution radar imagery by fusion technique to improve the interpretability of
fused/merged image. The resultant data product has the advantages of the high spatial resolution,
structural information (from radar image), and spectral resolution (from optical and infrared bands).
Thus, with the help of all these cumulative information, the analyst can explore most of the linear and
anomalous features as well as lithologies. Various image fusion techniques are available in published.

Intensity-hew-saturation (IHS) method (Carper et al., 1990; Chavez et al., 1991; Kathleen and Philip,
1994), principal component analysis (PCA) (Chavez et al., 1991; Chavez and Kwarteng, 1989),
Brovey Transform, BT (Chavez and Kwarteng, 1989; Li et al., 2002; Tu et al., 2001) and Wavelet
Transform (WT) (Ranchin and Wald, 1993; Yocky, 1996); are the are most commonly used fusion
algorithms in remote sensing. The present study has been carried out using Principal Component
Analysis (PCA) technique, which has been successfully used earlier for fusion of remote sensing data
64
for geological assessment and land cover mapping (Chavez and Kwarteng, 1989; Chavez et al., 1991,
Li et al., 2002, Pal et al., 2007)

IHS Transform Fusion:


The IHS colour transform can effectively separate a standard RGB (Red, Green, Blue) image into
spatial (I) and spectral (H, S) information. The basic concept of IHS fusion is shown in Figure 1. The
most important steps are: (1) transform a colour image composite from the RGB space into the IHS
space, (2) replace the I (intensity) component by a panchromatic image with a higher resolution, (3)
reversely transform the replaced components from IHS space back to the original RGB space to
obtain a fused image.

The PCA is a statistical technique that transforms a multivariate inter-correlated data set into a new
un-correlated data set. The basic concept of PCA fusion is shown in Figure 2

65
Its most important steps are: (1) perform a principal component transformation to convert a set of
multispectral bands (three or more bands) into a set of principal components, (2) replace one principal
component, usually the first component, by a high resolution panchromatic image, (3) perform a
reverse principal component transformation to convert the replaced components back to the original
image space. A set of fused multispectral bands is produced after the reverse transform (Chavez et al.
1991, Shettigara 1992, Zhang and Albertz 1998, Zhang 1999).

Wavelet Transform Fusion:


Wavelet transform is a mathematical tool developed in the field of signal processing. It can
decompose a digital image into a set of multi-resolution images accompanied with wavelet
coefficients for each resolution level. The wavelet coefficients for each level contain the spatial
(detail) differences between two successive resolution levels. The wavelet based fusion is performed
in the following way (Figure 3): (1) decompose a high resolution panchromatic image into a set of
low resolution panchromatic images with wavelet coefficients for each level, (2) replace a low
resolution panchromatic with a multispectral band at the same resolution level, (3) perform a reverse
wavelet transform to convert the decomposed and replaced panchromatic set back to the original
panchromatic resolution level. The replacement and reverse transform is done three times, each for
one multispectral band (Gargnet-Duport et al. 1996, Y ocky 1996, Wa1d et al. 1997, Zhou et al. 1998,
Ranchin and Wa1d 2000).

66
Arithmetic combination fusion:
Different arithmetic combinations have been employed for fusing multispectral and panchromatic
images. The arithmetic operations of multiplication, division, addition and subtraction have been
combined in different ways to achieve a better fusion effect. Brovey Transform, SVR (Synthetic
Variable Ratio), and RE (Ratio Enhancement) techniques are some successful examples for SPOT pan
fusion.

The SVR and RE techniques are similar, but involved more sophisticated calculations for the sum
image (Cliche et a1.l985, Welch and Ehlers 1987, Chavez et al. 1991, Munechika et a1.l993, Zhang
and Albertz 1998, Zhang 1999).

Multiplicative:
The algorithm is derived from the four component technique of Crippen (Crippen, 1989a). In this
paper, it is argued that of the four possible arithmetic methods to incorporate an intensity image into a
chromatic image (addition, subtraction, division, and multiplication), only multiplication is unlikely to
distort the color. However, in his study Crippen first removed the intensity component via band ratios,
spectral indices, or PC transform. The algorithm shown above operates on the original image. The
result is an increased presence of the intensity component. For many applications, this is desirable.

67
People involved in urban or suburban studies, city planning, and utilities routing often want roads and
cultural features (which tend toward high reflection) to be pronounced in the image.
Fusion image Bi= (Multi Bi) x (Pan Image)
Multi Bi are input multispectral band 1,2…n

Brovey Transform fusion:


Brovey Transform uses addition, division and multiplication for the fusion of three multispectral
bands (ERDAS 1999). Its concept can be described with equation 1. Its basic processing steps are: (1)
add three multispectral bands together for a sum image, (2) divide each multispectral band by the sum
image, (3) multiply each quotient by a high resolution pan.

Fusion image Bi= [(Multi Bi) / (MultiSum)] x (Pan Image)


Where, i= Band numbers, 1..n
MultiSum = Multi B1 + Multi B2 + … + Multi Bn
Multi Bi are input multispectral band 1,2…n

The Brovey Transform was developed to visually increase contrast in the low and high ends of an
image‘s histogram (i.e., to provide contrast in shadows, water and high reflectance areas such as urban
features). Consequently, the Brovey Transform should not be used if preserving the original scene
radiometry is mportant. However, it is good for producing RGB images with a higher degree of
contrast in the low and high ends of the image histogram and for producing visually appealing images.

Since the Brovey Transform is intended to produce RGB images, only three bands at a timeshould be
merged from the input multispectral scene, such as bands 3, 2, 1 from a SPOT or Landsat TM image
or 4, 3, 2 from a Landsat TM image. The resulting merged image should then be displayed with bands
1, 2, 3 to RGB.

Classification:
Multispectral classification is the process of sorting pixels into a finite number of individual classes,
or categories of data, based on their data file values/ Digital Number (Pixel) values. If a pixel satisfies
a certain set of criteria, the pixel is assigned to the class that corresponds to that criteria. This process
is also referred to as image segmentation. Depending on the type of information you want to extract
from the original data, classes may be associated with known features on the ground or may simply
represent areas that look different to the computer. An example of a classified image is a land cover
map, showing vegetation, bare land, pasture, urban, etc.

The Classification Process:


Pattern Recognition:
Pattern recognition is the science—and art—of finding meaningful patterns in data, which can be
extracted through classification. By spatially and spectrally enhancing an image, pattern recognition

68
can be performed with the human eye; the human brain automatically sorts certain textures and colors
into categories.

In a computer system, spectral pattern recognition can be more scientific. Statistics are derived from
the spectral characteristics of all pixels in an image. Then, the pixels are sorted based on mathematical
criteria. The classification process breaks down into two parts: training and classifying (using a
decision rule).

Pattern recognition is the science—and art—of finding meaningful patterns in data, which can be
extracted through classification. By spatially and spectrally enhancing an image, pattern recognition
can be performed with the human eye; the human brain automatically sorts certain textures and colors
into categories.

In a computer system, spectral pattern recognition can be more scientific. Statistics are derived from
the spectral characteristics of all pixels in an image. Then, the pixels are sorted based on mathematical
criteria. The classification process breaks down into two parts: training and classifying (using a
decision rule).

Training:
The computer system must be trained to recognize patterns in the data. Training is the process of
defining the criteria by which these patterns are recognized (Hord, 1982). Training can be performed
with either a supervised or an unsupervised method, as explained below.

Supervised vs. Unsupervised Training:


Supervised Training
Supervised training requires a priori (already known) information about the data, such as:
• What type of classes need to be extracted? Soil type? Land use? Vegetation?

• What classes are most likely to be present in the data? That is, which types of land cover, soil, or
vegetation (or whatever) are represented by the data?

In supervised training, you rely on your own pattern recognition skills and a priori knowledge of the
data to help the system determine the statistical criteria (signatures) for data classification.

To select reliable samples, you should know some information—either spatial or spectral—about the
pixels that you want to classify.

The location of a specific characteristic, such as a land cover type, may be known through ground
truthing. Ground truthing refers to the acquisition of knowledge about the study area from field work,
analysis of aerial photography, personal experience, etc. Ground truth data are considered to be the
most accurate (true) data available about the area of study. They should be collected at the same time
as the remotely sensed data, so that the data correspond as much as possible (Star and Estes, 1990).
However, some ground data may not be very accurate due to a number of errors and inaccuracies.

69
Supervised training is closely controlled by the analyst. In this process, analyst select pixels that
represent patterns or land cover features that you recognize, or that analyst can identify with help from
other sources, such as aerial photos, ground truth data, or maps. Knowledge of the data, and of the
classes desired, is required before classification.

By identifying patterns, analyst can instruct the computer system to identify pixels with similar
characteristics. If the classification is accurate, the resulting classes represent the categories within the
data that you originally identified.

In supervised training, it is important to have a set of desired classes in mind, and then create the
appropriate signatures from the data. You must also have some way of recognizing pixels that
represent the classes that you want to extract. Supervised classification is usually appropriate when
you want to identify relatively few classes, when you have selected training sites that can be verified
with ground truth data, or when you can identify distinct, homogeneous regions that represent each
class.

On the other hand, if you want the classes to be determined by spectral distinctions that are inherent in
the data so that you can define the classes later, then the application is better suited to unsupervised
training. Unsupervised training enables you to define many classes easily, and identify classes that
are not in contiguous, easily recognized regions.

Unsupervised Training
Unsupervised training is more computer-automated. It enables you to specify some parameters that
the computer uses to uncover statistical patterns that are inherent in the data. These patterns do not
necessarily correspond to directly meaningful characteristics of the scene, such as contiguous, easily
recognized areas of a particular soil type or land use. They are simply clusters of pixels with similar
spectral characteristics. In some cases, it may be more important to identify groups of pixels with
similar spectral characteristics than it is to sort pixels into recognizable categories.

Unsupervised training is dependent upon the data itself for the definition of classes. This method is
usually used when less is known about the data before classification. It is then the analyst‘s
responsibility, after classification, to attach meaning to the resulting classes (Jensen, 1996).
Unsupervised classification is useful only if the classes can be appropriately interpreted.

Unsupervised training requires only minimal initial input from analyst. However, analyst has the task
of interpreting the classes that are created by the unsupervised training algorithm.

Unsupervised training is also called clustering, because it is based on the natural groupings of pixels
in image data when they are plotted in feature space. According to the specified parameters, these
groups can later be merged, disregarded, otherwise manipulated, or used as the basis of a signature.

Training Samples and Feature Space Objects


70
Training samples (also called samples) are sets of pixels that represent what is recognized as a
discernible pattern, or potential class. The system calculates statistics from the sample pixels to create
a parametric signature for the class. The following terms are sometimes used interchangeably in
reference to training samples. For clarity, they are used in this documentation as follows:
• Training sample, or sample, is a set of pixels selected to represent a potential class. The data file
values for these pixels are used to generate a parametric signature.
• Training field, or training site, is the geographical AOI in the image represented by the pixels in a
sample. Usually, it is previously identified with the use of ground truth data. Feature space objects are
user-defined AOIs in a feature space image. The feature space signature is based on these objects.
Selecting Training Samples:
It is important that training samples be representative of the class that you are trying to identify. This
does not necessarily mean that they must contain a large number of pixels or be dispersed across a
wide region of the data. The selection of training samples depends largely upon your knowledge of
the data, of the study area, and of the classes that you want to extract.

Generally, training samples are identified using one or more of the following methods:
• using a vector layer
• defining a polygon in the image
• identifying a training sample of contiguous pixels with similar spectral characteristics
• identifying a training sample of contiguous pixels within a certain area, with or without similar
spectral characteristics
• using a class from a thematic raster layer from an image file of the same area (i.e., the result of an
unsupervised classification)
Digitized Polygon:
Training samples can be identified by their geographical location (training sites, using maps, ground truth data).
The locations of the training sites can be digitized from maps with the ERDAS IMAGINE Vector or AOI tools.
Polygons representing these areas are then stored as vector layers. The vector layers can then be used as input to
the AOI tools and used as training samples to create signatures.

User-defined Polygon
Using your pattern recognition skills (with or without supplemental ground truth information), you can identify
samples by examining a displayed image of the data and drawing a polygon around the training site(s) of
interest. For example, if it is known that oak trees reflect certain frequencies of green and infrared light
according to ground truth data, you may be able to base your sample selections on the data (taking atmospheric
conditions, sun angle, time, date, and other variations into account). The area within the polygon(s) would be
used to create a signature.

Identify Seed Pixel


With the Seed Properties dialog and AOI tools, the cursor (crosshair) can be used to identify a single pixel (seed
pixel) that is representative of the training sample. This seed pixel is used as a model pixel, against which the
pixels that are contiguous to it are compared based on parameters specified by you.

71
When one or more of the contiguous pixels is accepted, the mean of the sample is calculated from the accepted
pixels. Then, the pixels contiguous to the sample are compared in the same way. This process repeats until no
pixels that are contiguous to the sample satisfy the spectral parameters. In effect, the sample grows outward
from the model pixel with each iteration. These homogenous pixels are converted from individual raster pixels
to a polygon and used as an AOI layer.

Seed Pixel Method with Spatial Limits


The training sample identified with the seed pixel method can be limited to a particular region
by defining the geographic distance and area.

Thematic Raster Layer


A training sample can be defined by using class values from a thematic raster layer (see Table 7-1). The data file
values in the training sample are used to create a signature. The training sample can be defined by as many class
values as desired.

NOTE: The thematic raster layer must have the same coordinate system as the image file being classified.

Training Sample Comparison

Signatures: The result of training is a set of signatures that defines a training sample or cluster. Each
signature corresponds to a class, and is used with a decision rule (explained below) to assign the
pixels in the image file to a class. Signatures can be parametric or nonparametric.

A parametric signature is based on statistical parameters (e.g., mean and covariance matrix) of the
pixels that are in the training sample or cluster. Supervised and unsupervised training can generate
parametric signatures. A set of parametric signatures can be used to train a statistically based classifier
(e.g., maximum likelihood) to define the classes.

A nonparametric signature is not based on statistics, but on discrete objects (polygons or rectangles)
in a feature space image. These feature space objects are used to define the boundaries for the classes.
A nonparametric classifier uses a set of nonparametric signatures to assign pixels to a class based on
their location either inside or outside the area in the feature space image. Supervised training is used
to generate nonparametric signatures (Kloer, 1994).

Decision Rule: After the signatures are defined, the pixels of the image are sorted into classes based on the
signatures by use of a classification decision rule. The decision rule is a mathematical algorithm that, using data
contained in the signature, performs the actual sorting of pixels into distinct class values.
72
Parametric Decision Rule
A parametric decision rule is trained by the parametric signatures. These signatures are defined by the mean
vector and covariance matrix for the data file values of the pixels in the signatures. When a parametric decision
rule is used, every pixel is assigned to a class since the parametric decision space is continuous (Kloer, 1994).

Nonparametric Decision Rule


A nonparametric decision rule is not based on statistics; therefore, it is independent of the properties of the data.
If a pixel is located within the boundary of a nonparametric signature, then this decision rule assigns the pixel to
the signature‘s class. Basically, a nonparametric decision rule determines whether or not the pixel is located
inside of nonparametric signature boundary.

Output File When classifying an image file, the output file is an image file with a thematic raster layer. This
file automatically contains the following data:
• class values
• class names
• color table
• statistics
• histogram

Dimensionality of data used in classification:


Dimensionality refers to the number of layers (spectral bands) being classified. For example, a data file with 3
layers is said to be 3-dimensional, since 3-dimensional feature space is plotted to analyze the data.

There is no theoretical limit of layers of data to be used for one classification; it is usually wise to reduce the
dimensionality of the data as much as possible. Often, certain layers of data are redundant or extraneous to the
task at hand. Unnecessary data take up valuable disk space, and cause the computer system to perform more
arduous calculations, which slows down processing.

Different ancillary data other than remotely-sensed data could be used for better classification. Using ancillary
data enables you to incorporate variables into the classification from, (for example,) vector layers, previously
classified data, or elevation data. The data file values of the ancillary data become an additional feature of each
pixel, thus influencing the classification.

73

You might also like