Professional Documents
Culture Documents
IlfcWtt
Principles of Geographical
Information Systems
Peter A. Burrough and Rachael A. McDonnell
General Editors
P. A. Burrough
M. F. Goodchiid
R. A. McDonnell
P. Switzer
M. Worboys
GIS County User Guide: Laboratory Exercises in Urban Geographic Information Systems
W. E. Huxold, P. S. Tierney, D. R. Turnpaugh, B. J. Maves, and K. T. Cassidy
Rachael A. McDonnell
This book is sold subject to the condition that it shall not, by way
of trade or otherwise, be lent, re-sold, hired out or otherwise circulated
without the publisher’s prior consent in any form of binding or cover
other than that in which it is published and without a similar condition
including this condition being imposed imposed on the subsequent purchaser
ISBN 0-19-823366-3
ISBN 0-19-823365-5 (Pbk)
10 98765432
https://archive.org/details/principlesofgeogOOOOburr
Preface
As with the first edition (Burrough 1986), this book describes and explains the theor¬
etical and practical principles for handling spatial data in geographical information
(GI) systems. In the mid-1980s these computer tools were being developed from com¬
puterized aided design packages and computer-aided manufacturing packages (CAD-
CAM), from automated mapping systems, and from programs for handling data from
remotely sensed images collected by sensors in satellites, and there was a need to pro¬
vide a text that covered the common ground. Since then, sales of GIS as commercial
products for the mapping and spatial sciences and for the inventory and management
of spatial resources of cities, states, or businesses have increased phenomenonaily.
People have become fascinated by the power of immediate interaction with electronic
representations of the world. The development of the cheap, powerful personal com¬
puter has facilitated this process and the current ability to link computers by electronic
networks across the globe has provided a ready source of all kinds of data in electronic
form, including maps and raster images taken from all kinds of platforms from satel¬
lites to hovercraft to deep-sea vehicles.
Today, not just maps are made using GIS, but the infrastructure of utilities (cables
and pipes) in the streets of your town will be held in a GIS, your taxis and emergency
services may be guided to their destinations using satellite-linked spatial systems, the
range of goods in your shops may be decided by systems linking customer preferences
to neighbourhood and socio-economic status, the foresters and farmers will be mon¬
itoring their stands and crops with spatial information systems, and a whole range of
scientists and technicians will be advising governments, the military, and businesses
from local to world scale how best to deal with the distribution of what interests them
most.
The ready provision of powerful computing tools, the increasing abundance of
spatial data, and the enormous diversity of applications that are not just limited to
map-making or scientific enquiry, make it even more important today than in 1986
to understand the basic principles behind GIS. Everyone thinks they understand GIS
because they think they understand maps. But to understand a map is to understand
how a person once observed the world, how they formulated imperfect, but plausible
models to represent their observations, and how they decided to code these models
on paper using conventional semiology. All these processes depend strongly on cul¬
ture, on age, on discipline, and on background. With GIS we go even further down a
path in the labyrinth that leads from perception to presentation of spatial informa¬
tion, because though computers can mimic human draughtsmen, they can do so only
according to the ways they are programmed. The basic tenets of spatial interactions
needed to describe a given spatial process or phenomenon may or may not be shared
by the ground rules of the GIS you would like to buy. But electronic GIS provide an
enormous wealth of choice not possible with conventional mapping. Whereas with
conventional mapping (and even the earliest digital mapping) the map was the data¬
base, today the map is merely an evanescent projection of a particular view of a spatial
database at a given time. On the one hand this gives us enormous power to review
an unlimited number of alternatives and to make maps of anything, anyway we like,
including dynamic modelling of spatial and temporal processes; on the other hand
it may leave us swimming in the dark.
This book aims to provide an introduction to the theoretical and technical prin¬
ciples that need to be understood to work effectively and critically with GIS. It is based
on the 1986 volume, but is much more than a simple second edition. The text examines
the different ways spatial data are perceived, modelled conceptually, and represented.
It explains how, using the standard ‘entity’ and ‘continuous held’ approaches, spatial
data are collected, stored, represented in the computer, retrieved, analysed, and dis¬
played. As with the first edition, this material is accompanied by a critical evaluation
of the sources, roles, and treatment of errors and uncertainties in spatial data, and
their possible effects on the conclusions to be drawn from analyses carried out with
GIS. Also included is a discussion of the principles and methods of dealing with
uncertainty in spatial data, either through the probabilistic medium of geostatistics,
or the possibilistic route using fuzzy logic. The principles of the latter are juxtaposed
with the crisp concepts underlying the entity and field models so commonly used.
Frequently encountered, specialist terms are explained in the glossary.
Spatial analysis and GIS are nothing without computer software, and the material
in this book could not have been produced without recourse to many different pro¬
grammes. Large, commercial systems may be unsuitable or unusable for teaching or
scientific enquiry, and in Appendix 2 we provide information about sources of cheap
or free software that have been used in preparing this book and which may be used
to support training courses and research.
GIS is truly interdisciplinary and multidisciplinary, and the literature is spread over
a range of journals. Trade journals such as GIS World, and GIS Europe, provide monthly
coverage of technical and organizational developments; trade shows and conferences
and scientific journals such as the International Journal of Geographical Information
Systems, Computers and Geosciences, Computers, Environment and Urban Systems,
Geoinformation, and Transactions in GIS, cover the core of the field, though much is
published in discipline-specific journals, and also in ‘grey’ literature reports from con¬
ferences, institutes, and commerce. We have not covered everything in this volume:
that is impossible, but we aim to provide a basic technical and scientific introduction
to an important scientific and business activity, so that readers will better understand
how GIS are being used to transform the ways in which much of our world is per¬
ceived, recorded, understood, and organized.
P.A.B. & R.A.McD.
Utrecht
February 1997
Acknowledgements
We thank the following persons for supplying Plates and material for figures and
the permission to reproduce them: Ir Fred Hageman and Eurosense Belfotop NV,
Wemmel, Belgium for Plates 1.1-1.6; Rotterdam Municipality for Plate 2.1; Dr Jan
Ritsema van Eck (University of Utrecht) for Plate 2.2 and Figures 7.12 and 7.13;
Dr Stan Geertman (University of Utrecht) for Plates 2.3 and 2.4; Mr Richard Webber
of CCN Marketing for Plates 2.5 -2.8; Drs Lodewijk Plazelhoff (University of Utrecht)
for Plates 3.1-3.6; the Australian Environment Data Centre, Canberra for Plate 4.1;
Dr Jan Clevers and the Agricultural University Wageningen, for Plate 4.2; Prof. Ulf
Hellden, Lund University for Plate 4.3; Dr Robert MacMillan and The Alberta
Research Council for Plate 4.4; and Dr Gerard Heuvelink for Plate 4.6; Prof. Tom Poiker
for Figure 5.14; California Institute of Technology for Figure 1.2 (via Internet).
For base data used in examples and analyses we thank Prof. Andrew Skidmore (In¬
ternational Institute for Aerospace Survey and Earth Sciences (ITC), Enschede), Dr
Brenda Howard (Institute of Terrestrial Ecology, Merlewood, Cumbria, UK), Prof.
Boris Prister (Ukraine Institute of Agricultural Radiology, Kiev), Dr Robert MacMillan
(formerly Alberta Research Council), Dr Alejandro Mateos (University of Maracai,
Venezuela), RPL Legis (formerly Wageningen Agricultural University), Dr Arnold
Bregt (Winand Staring Centre, Wageningen, the Netherlands), and Dr Ad de Roo,
Dr Steven de Jong, Dr Victor Jetten, Drs Ruud van Rijn, and Drs Marten Rikken (all
University of Utrecht).
For computer software and assistance with figures and data we thank our colleagues
at the Department of Physical Geography, Netherlands Research Institute for
Geoecology, Faculty of Geographical Sciences, University of Utrecht: Drs Lodewijk
Hazelhoff, Dr Victor Jetten, Dr Willem van Deursen, Ing. Cees Wesseling, Dr Marcel
van der Perk, Drs Derk Jan Karssenberg, Dr Paulien van Gaans, and Dr Edzer Pebesma;
also Dr Gerard Fleuvelink (now University of Amsterdam). We thank Taylor and
Francis Ltd for permission to reproduce figures 2.5, 2.6, 11.2, 11.9, 11.2, and 11.13.
Nicholas Burrough helped sort the figures.
For information on developments in computer networking we thank Gerard
McDonnell (Novell Inc), and Peter Woodsford (Laserscan) for information on com¬
mercial developments in Object Oriented GIS. We thank Kevin Jones (Macaulay Land
Use Research Institute, Aberdeen) and Prof. B. K. Horn (Massachusetts Institute of
Technology) for information on slope algorithms. For the many discussions on the
theory of spatial reasoning we thank Profs. Helen Couclelis, Waldo Tobler, and Michael
Goodchild (NCGIA, Santa Barbara) and Prof. Andrew Frank (Technical University,
Vienna) and many other colleagues too numerous to name. We also thank the Fac¬
ulty of Geographical Sciences of Utrecht University for technical, financial and moral
support over a period of more than twelve years.
ix
Contents
appendix 2 A Selection of World Wide Web Geography and GIS Servers 307
References 312
Index 327
xi
List of Plates
3.5 The distribution of sampling points of an airborne infra-red laser altimeter scanner
3.6 Infra-red laser scan altimeter data after interpolation to a fine grid: red indicates high,
and blue low elevations
3.7 Example of a local drain direction map created using the D8 algorithm
3.8 Cumulative upstream elements (log scale) draped over the DEM from which they were
derived to provide a perspective view of the drainage network.
4.1 Greenness index (Normalized Difference Vegetation Index NDVI) for Australia, winter
1993
4.3 Interpreted land cover data derived from Thematic Mapper satellite imagery draped over
a DEM
4.4 False colour aerial photograph of farmland on former periglacial land surface, Alberta,
Canada
4.5 Tiger stripes of erroneously large slope angles obtained when an inappropriate
interpolation method is used to generate a DEM from digitized contour lines
4.6 Simulated, discretized continuous surfaces. Clockwise from top left: no spatial
correlation (noise); short distance spatial correlation; long distance spatial correlation;
long distance spatial correlation with added noise
4.7 Top: Continuous mapping of four fuzzy soil classes. Bottom: Map showing the maximum
membership value of all four classes
4.8 Map of the maximum membership values for all four classes with superposition of zones
where the classes are maximally indeterminate (boundaries).
XIII
1 • .*
ONE
Geographical Information:
Society, Science,
and Systems
l
Geographical Information
In Roman times, the agrimensores, or land surveyors, geographical information concerns the location of,
were an important part of the government, and the and interactions between, well-defined objects in space
results of their work may still be seen in vestigial form like trees in a forest, houses on a street, aeroplanes en
in the European landscapes to this day (Dilke 1971). route to destinations, or administrative units like the
The decline of the Roman Empire led to the decline Dependances of France (Figure 1.1a).
of surveying and map-making which revived with As the scientific study of the earth advanced,
the geographical discoveries of the Renaissance. By different kinds of attributes needed to be mapped.
the seventeenth century skilled cartographers such as The study of the earth and its natural resources—
Mercator had demonstrated that not only did the use geophysical geodesy, geology, geomorphology, soil
of a mathematical projection system and an accur¬ science, ecology, and land—that began in the nine¬
ate set of coordinates improve the reliability of the teenth century has continued to this day. Whereas
measurement and location of areas of land, but the topographical maps may be regarded as general pur¬
registration of spatial phenomena through an agreed pose because they do not set out to fulfil any specific
standard provided a model of the distribution of aim (i.e. they may be interpreted for many different
natural phenomena and human settlements that was purposes), maps of the distribution of rock types, soil
invaluable for navigation, route finding, and military series, or land use are made for more limited purposes.
strategy (Hodgkiss 1981). By the eighteenth century These specific-purpose maps are often referred to
the European states had reached a state of organ¬ as ‘thematic’ maps because they contain informa¬
ization when many governments realized the value of tion about a single subject or theme. Although these
systematic mapping of their lands. The Geographical phenomena vary continuously from place to place,
Information Society was first created through the the surveyors were unable to record and process the
establishment of national government bodies whose huge amounts of data that are needed to provide a
mandate was to produce cadastral arid topographical proper insight of their variation. Through qualitative
mags of whole countries. ThesehigKIy disciplined descriptions of the variation of rocks and soil and
institutes have continued to this day to render the cross-sectional diagrams the surveyors reduced their
spatial distribution of the features of the earth’s sur¬ findings to sets of ‘representative’ taxonomic classes.
face, or topography, into map form. During the last Groups of related point-based taxonomic classes are
200 years many individual styles of map have been linked to areas of land, called mapping units, which are
developed, but there has been a long, unbroken tra¬ drawn on a map using different colours or shading.
dition of high cartographic standards that has con¬ Maps showing a set of delineated areas having different
tinued until the present. colourFafe~knbwn as chorochromatic maps: they are
The mapping sciences—geodetical surveying, photo- one form of the choropleth map or a map showing areas
grammetry, and cartography—developed a powerful deemed to be of equal value (Figure 1.1c). To make
range of tools for accurately recording and represent¬ the thematic data easy to understand choropleth maps
ing the location and characteristics (usually referred are commonly drawn over a simplified topographic
to as attributes) of well-defined natural and anthro¬ base. The use of this type of map to represent continu¬
pomorphic phenomena (e.g. Goudie et al. 1988, ously varying but complex phenomena has artificially
Johnston et al. 1988). The basic units of geographic forced people to divide the space overAvfiiclTThese
information were decided very early on and the first phenomena occur into sets of ‘objects’ that are for¬
modern cartographers represented real world objects mally equivalent to the crisp entities that model well
or administrative units by accurately drawn point and understood objects such as trees, houses, or fenced
line symbols that were chosen to illustrate their most paddocks.
important attributes. The attributes of areas were The term ‘thematic map’ is very widely and loosely
indicated by uniform colours or shading, though applied (see for example, Fisher 1978, Hodgkiss 1981)
gradually varying shading was sometimes used to in¬ and is used not only for maps showing a general pur¬
dicated the change in certain properties of the surface, pose theme such as ‘soil’ or ‘geology’, but for maps of
such as the steepness of slopes, the depth of a lake the distribution of a single attribute. The value of the
or the variation in land cover from forest to marsh. single attribute can be linked directly to the distribu¬
The printing technology of etching on copper plates, tion of spatial entities that are already on the map, such
available from the seventeenth century onwards, as the display of statistical information on population
reinforced the use of a geographical symbolism that of administrative units at country, region, or local
relies on well-defined, crisp delineation. Today, much authority level.
2
Geographical Information
3
Geographical Information
Certain attributes, such as the value of elevation quantitative observation. Further, there was a lack of
above sea level, the soil pH over an experimental field, appropriate mathematical tools for describing spatial
the variation of the incidence of a given disease in a variation quantitatively. The first developments in
city, or the variation of air pressure shown on a mete¬ appropriate mathematics for spatial problems came
orological chart are more usefully represented by a in the 1930s and 1940s in parallel with developments
continuous quantitative surface that maybe modelled in statistical methods and time series analysis. Effective
mathematically. The variations are conventionally practical progress was completely blocked, however,
shown by isolines or contours—that is lines connect¬ by the lack of suitable computing tools. It is only since
ing points of equal value—or by sets of monotonically the 1960s with the availability of the digital computer
increasing grey or colour scales (Figures 1.1 b,d). that boththe Co nee p t u al methods for spatial analysis
The first maps of the spatial distribution of rocks and the actual potential fmquantitative thematic
or soil, of plant communities or people, were qualita¬ mapping and spatial analysis have been able to blos¬
tive. As in many new sciences, the first aim of many som (e.g. Cliff and Ord 1981,Cressie 1991, Davis 1986,
surveys was inventory—observing, classifying, and re¬ Journel and Huijbregts 1978, Isaaks and Srivastava
cording what is there. Qualitative methods of classifi¬ 1989, Norbeck and Rystedt 1972). Today, thanks to
cation and mapping were unavoidable given thehuge the development of Geographical Information Systems
quantities of complex data that most environmental (GIS) the application of logical and numerical model¬
surveys generate. Quantitative description was not ling and statistical methods to spatial data is almost
only hindered by data volume, but also by the lack of routine, and of great relevance, as this book explains.
In the late twentieth century, demands for data on the roads and canals and to estimate construction costs,
topography and specifictnemes of the earth’s surface, including those of cutting away hillsides and filling in
such as natural resources, have accelerated greatly. valleys. Police departments need to know the spatial
Stereo aerial photography and remotely sensed distribution of various kinds of crime, medical organ¬
imagery have allowed photogrammetrists to map izations and epidemiologists are interested in the
large areas with great accuracy. The same techno¬ distribution of sickness and disease, and commerce
logy also gave the earth resource scientists—the is interested in improving profitability through the
geologist, the soil scientist, the ecologist, the land optimization of the distribution of sales outlets and
use specialist—enormous advantages for recon¬ the identification of potential markets. The enorm¬
naissance and semi-detailed mapping (e.g. American ous infrastructure of what are collectively known as
Society of Photogrammetry 1960). The resulting ‘utilities’—that is water, gas, electricity, telephone
thematic maps were a source of useful inform¬ lines, sewerage systems—all need to be recorded and
ation for resource exploitation and management. manipulated as spatial data linked to maps. Today,
The study of land evaluation arose through the there are many ways in which the political and eco¬
need to match the land requirements for producing nomic processes responsible for our well-being need
food and supporting populations to the available to be assisted such that democratic organization of
resources of climate, soil, water, and technology the planet’s resources can be used to the advantage of
(e.g. Brinkman and Smyth 1973, Beek 1978, FAO 1976, all. The needs are first to know what resources we have
Rossiter 1996). at our disposal—how much fertile land, how much
The need for spatial data and spatial analyses is not energy, how many people? The second is to under-
just the preserve of earth scientistsTUrban planners stand how these are distributed over the earth, and
and cadastral agencies need detailed information to know where they are, their extent, who owns them
about the distribution of land and resources in towns or has other rights to them, and how we can best use
and cities. Civil engineers need to plan the routes of and manage them.
4
Geographical Information
5
Geographical Information
tion seen through the particular filter of a given sur¬ (Christian and Stewart 1968) or from the former
veyor in a given discipline at a certain moment in time. UK Land Resources Division (e.g. Brunt 1967) and
More recently, the aerial photograph, but more espe¬ work at the ITC in Enschede in the Netherlands. This
cially the satellite image, have made it possible to see method addresses the problem of excessive amounts
how landscapeFchange over time, to follow the relent¬ of data by reducing all spatial variation to a limited
less march of desertification or erosion or the swifter number of supposedly homogeneous classes that can
progress of forest fires, floods, locust swarms, or be drawn on a choropleth map.
weather systems. But the products of the airborne and The integrated survey approach has several inher¬
space sensors are not^map^s, in the original meaning ent problems and it has largely fallen into disuse. In
of the word, but photographic images or streams of many cases the information on the supposedly homo¬
data on magnetic tapes. The digital data are not in the geneous units was too general for many applications
familiar form of points, lines, and areas representing and it was very difficult or impossible to retrieve spe¬
the already recognized and classified features. oiLthe cific information about particular aspects of a land¬
earthj> surface, but are coded in picture elements scape. The division of the landscape into spatial units
—pixels—cells in a two-dimensional matrix that also depends greatly on personal insight, and the level
contain merely a number indicating the strength of of resolution of the survey had more to do with the
reflected electromagnetic radiation in a given band. scale of the topographical base map used than with the
New tools were needed to turn these streams of num¬ level of survey needed to map specific attributes. So
bers into pictures and to identify meaningful patterns. a ready market remains for the more conventional
Moreover, the images must be located properly with monodisciplinary surveys, such as those of geology,
respect to a geodetic grid, otherwise the data they carry landform, soil, vegetation, and land use. Increasing
cannot be related to a definite place. The need arose pressure on land has inflated the demand for location-
for a marriage between remote sensing,, earth-bound specific information so surveys have tended towards
survey> and cartography;.this has been made possible collecting more and more data. The concomitant
by the class of spatial information handling and increase in data volumes and new insights for ana¬
mapping tools known as geographical information lysing spatial patterns and processes has in its turn
systems. stepped up the search for simple, reproducible
methods for combining data from several different
sources and levels of resolution.
Alternatives for handling complex data: Early on, planners and landscape architects, particu- )
GENERALIZATION OR ANALYSIS larly in the United States of America, realized that data
During the 1960s and 1970s there were new trends in from several monodisciplinary resource surveys could j
the ways in which data about natural resources of soil be combined and integrated simply by overlaying trans¬
and landscape systems were being used for resource parent copies of the resource maps on a light table,
assessment, land evaluation, and planning. Realizing and looking for the places where the boundaries on
that different aspects of the earth’s surface do not func¬ the several maps coincided. One of the best-known
tion independently from each other, and at the time exponents of this simple technique was the American
having insufficient means to deal with huge amounts landscape architect Ian McHarg (McHarg 1969). In
of disparate data, people attempted to evaluate them 1963, another American architect and city planner,
in an integrated, multidisciplinary way. The ‘gestalt’ Howard T. Fisher, elaborated Edgar M. Horwood’s
method (Hills 1961, Hopkins 1977, Vink 1981) clas- idea of using the computer to make simple maps by
sifies the land surface into what are presumed to be printing statistical values on a grid of plain paper
‘naturally occurring’ environmental units or building (Sheehan 1979). Fisher’s program SYMAP, short for
blocks that can be recognized, described, and mapped SYnagraphic MAPping system (the name has its
in terms of the total interaction of the attributes origin in the Greek word synagein, meaning to bring
under study. Within these ‘natural units’ there is together), includes a set of modules for analysing
supposed to be a recognizable, unique, and inter¬ data, and manipulating them to produce choropleth
dependent combination of the environmental char¬ or isoline interpolations, with the results to be dis¬
acteristics of landform, geology, soil, vegetation, and played in many ways using the overprinting of line-
water. The same basic idea was used in integrated printer characters to produce suitable grey scales.
resource surveys—classic examples come from the Fisher became director of the ffarvard Gradu¬
Australia integrated resource or land systems surveys ate School of Design’s Laboratory for Computer
6
Geographical Information
BOX 1.1.
Graphics, and SYMAP was first in a line of mapping by using crude lineprinter graphics, many cartogra¬
programs that were produced by an enthusiastic, phers refused to accept the results they produced as
internationally well-known, and able staff. Among maps. Cartographers had begun to adopt computer
these programs were the grid cell (or raster) mapping techniques in the 1960s, but until recently they were
programs GRID and IMGRID that allowed the user largely limited to aids for the automated drafting and
to do in the computer what McHarg had done with preparation of masters for printed maps. For tradi¬
transparent overlays. Naturally, the Harvard group tional cartography the new computer technology did
was not alone in this new field and many other work¬ not change fundamental attitudes to map-making—
ers developed programs with similar capabilities (e.g. the high-quality paper map remained both the prin¬
Duffield and Coppock 1975, Steiner and Matt 1972, cipal data store and the end-product. However, by
Fabos and Caswell 1977 to name but a few). Initially 1977 the experience of using computers in map mak¬
none of these overlay programs allowed the user to ing had advanced so far that Rhind (1977) was able to
do anything that McHarg could not do; they merely present many cogent reasons for using computers in
speeded up the process and made it reproducible. cartography (Box 1.1).
However, users soon began to realize that with little
extra programming effort, they could do other kinds
Investments in computer cartography
of spatial and logical analysis on mapped data that
were helpful for planning studies (e.g. Steinitz and By the late 1970s there had been considerable in¬
Brown 1981) or ecological analysis (e.g. Luder 1980); vestments in the development and application of
previously these computations had been extremely computer-assisted cartography, particularly in North
difficult to do by hand. We note in passing, that America by government and private agencies (e.g.
although the terminology and the sources of the Tomlinson et al. 1976, Teicholz and Berry 1983). In
data are quite different, many of the image analysis Europe^and Australasia the developments proceeded
methods used in these raster map analysis programs on a smaller scale than in North America but during
are little different from those first adopted for pro¬ the 1970s and early 1980s major strides in using and
cessing remotely sensed data. developing computer-assisted cartography were made
Because SYMAP, GRID, IMGRID, GEOMAP, by several nations, notably Sweden, Norway, Den¬
MAP, and many of these other relatively simple pro¬ mark, France, the Netherlands, the United Kingdom,
grams were designed for quick and cheap analysis of West Germany, and Australia. Literally hundreds of
gridded data and their results could only be displayed computer programs and systems were developed for
7
Geographical Information
various mapping applications, mostly in government to provide the functionality for handling large amounts
institutes and universities. of data easily. Fifth, the basic functionality required for
The introduction of computer-assisted cartography handling spatial data has been widely accepted to the
did not immediately lead to a direct saving in costs as point where a limited number of commercial systems
had been hoped. The acquisition and development dominate the marketplace, thereby bringing a major
of the new tools was often very expensive; com¬ degree of uniformity to a previously heterogeneous
puter hardware was extremely expensive; there was domain. This last point is particularly true for the auto¬
a shortage of trained staff and many organizations mation of existing techniques. However, in spite of the
were reluctant or unable to introduce new working impressive technical developments, the impacts of GIS
practices. Initially, the computer-assisted mapping on the development of a fundamental body of theory
market was seen by many manufacturers of computer- of spatial interactions have been limited. As will be
aided design and computer graphics systems as so shown in Chapter 2 and elsewhere in this book, the
diverse that the major investments needed to develop basic spatial models used in modern GIS are little dif¬
the software were unlikely to reap returns from a mass ferent from those used 15-20 years ago; new insights
market. Consequently, many purchasers of expensive into spatial and temporal modelling still need to be
systems were forced to hire programming staff to adapt made if GIS is to develop beyond a mere technology.
a particular system to their needs.
The history of using computers for mapping and
spatial analysis shows that there have been parallel
'strengthening a particular technique [ i.e.
developments in automated data capture, data ana¬
lysis, and presentation in several broadly related fields introducing automation]—putting muscles on
such as cadastral and topographical mapping, thematic it—contributes nothing to its validity. The
cartography, civil engineering, geology, geography, poverty of the technique, if it is indeed impotent
hydrology, spatial statistics, soil science, surveying to deal with its presumed subject-matter,
and photogrammetry, rural and urban planning, util¬
is hidden behind a mountain of effort... the
ity networks, and remote sensing and image analysis.
Military applications have overlapped and even dom¬
harder the subproblems were to solve, and the
inated several of these monodisciplinary fields. Con¬ more technical success was gained in solving
sequently there has been much duplication of effort them, the more is the original technique
and a multiplication of discipline-specific jargon for fortified’. (Weizenbaum 1976)
different applications in different lands. This multi¬
plicity of effort in several initially separate, but closely
related fields has resulted in the emergence of the The main result of more than twenty years of tech¬
general purpose GIS. nical development is that GIS have become a world¬
During the 1990s there were several important tech¬ wide phenomenon. In 1995 it was estimated that these
nical and organizational developments that greatly technical solutions had been installed at more than
assisted the wide application and appreciation of GIS. 93 000 sites worldwide and while North America and
The firstis awareness—many more people now know Europe dominate this user base (65 per cent and 22
why it is important to be able to manipulate large per cent respectively, Estes 1995) many other coun¬
amounts of spatial inforrnation efficiently, though tries are now beginning to exploit GIS capabilities.
many more still need to be convinced. Much know¬ Today (1997), these systems are used in many differ¬
ledge has been built up on how to set up computer ent fields (Box 1.2); since 1986 there have been some
mapping and GIS projects efficiently (Huxhold and 200 books written on various aspects of the subject,
Levinsohn 1995). Second, by 1995 computer tedmo- there have been literally hundreds of conferences,
logy had provided huge amounts of processing power and there are several important scientific and trade
and data storage capacity on modestly priced personal journals devoted entirely to the design, technology,
computers enabling GIS to be used by individuals and use, and management of GIS. There have been major
organizations with limited budgets. Third, many conw. pronouncements by international and national gov¬
puters are connected by electronic networks, allowing ernments about the importance of spatial information
expensive data and software to be shared. Fourthy stan¬ in modern society (e.g. Department of Environment
dardization in interfaces between database programs 1987) for planning, marketing, and the development
and other computer programs has made it much easier of the ‘information society’.
8
Geographical Information
9
Geographical information
It is not the aim of this book to describe the growth of according to the entity paradigm, which is probably
awareness of the advantages of GIS among politicians, the most common, particularly in applications involv¬
businessmen, academics, and resource managers, nor ing topographic and cadastral mapping. The creation
to explain how these tools can be efficiently incor¬ of databases of attributes modelled by continuous
porated in modern government or business. Others fields is described in Chapters 5 and 6, which also deal
have performed this task (e.g. Huxhold and Levin- with the input and rectification of remotely sensed im¬
sohn 1995) and the job is ongoing. As with the first ages, the derivation of digital elevation models from
edition (Burrough 1986), the aim of this book is to aerial photographs and other data sources where cover
explain the scientific and technical aspects of working is reasonably complete, and the interpolation of data
wKir^ograpKicaTmTormation, so that users of GIS from point observations of quantitative and qualitat¬
undersfMd'fHe^eneralprinciples, opportunities, and ive attributes to continuous fields.
pitfalls of recording, collecting, storing, retrieving, Data analysis and numerical modelling are explored
analysing, and presenting spatial information. The in Chapters 7 and 8. Following the two paradigms of
aim includes not only the basic principles but also an spatial information, Chapter 7 concentrates on the
understanding of the limitations of the technology analysis of entity-based data—numerical and statist¬
and the role of errors in data collection, conceptualiza¬ ical operations on attributes, proximity relations, and
tion, analysis, and presentation on the perception network interactions. Chapter 8 explores the many
of the results. An old computer adage of garbage in, ways in which new information can be derived from
garbage out! holds true in GIS use. The very complex¬ continuous fields. Both chapters present practical
ity of these systems makes it possible to input good examples to illustrate the theory.
data but deliberately or unwittingly to produce gar¬ Many information systems present a false sense of
bage which, thanks to multi-colour high-resolution security. The high-quality graphics presentation and
displays looks like a high-quality product. At the same exactly formulated query languages often insulate the
time, it is not difficult to conceal bad data through user from real questions about the quality of the data,
clever or high-quality presentation. The problem is not the levels of errors that occur, and the levels of confid¬
the technology, but one of the complexity of spatial in¬ ence that should be associated with the results. Chap¬
formation (Monmonnier 1993); with computers one ters 9 and10 explore the ways in which errors can
can make bigger and better mistakes faster than ever. occur in the data, the effects they may have on the
This book provides a comprehensive, but of neces¬ results of analyses, and the ways in which statistical
sity limited, overview of the main aspects of handling treatment of spatial errors can be used to improve
spatial information in a GIS. In this chapter we present the information content rather than reduce it.
the history of development and the basic structure of Up to this point all discussions will have been in
GIS. In Chapter 2 we explore the problems of collect¬ terms of one spatial paradigm or the other, and the
ing and describing spatial data and the dilemma of ways they may be combined (as for example, by map¬
dealing with discrete entities in space (the reduction¬ ping the variations of annual rainfall (modelled as a
ist paradigm) or with the continuous variation of continuous field) over a continent (modelled as a set
an attribute of space. How should we proceed when of country entities with crisp boundaries). The choice
we are unsure which model to use? Which approaches of paradigm is made difficult because different scien¬
are preferred by which disciplines, and how can the tists and practitioners often tend to describe the same
observational and conceptual models be combined? area in different ways. Even if they stem from the same
In Chapter 3 we explore ways in which the different discipline they tend to draw boundaries around geo¬
paradigms used in Chapter 2 can be implemented graphical ‘objects’ in different ways and these bounda¬
technically in the computer, for storage, retrieval, and ries are usually convoluted and highly irregular (Legros
display. et al. 1996). The problem is worse when data from
The practical aspects of building a geographical surveys made at different levels of resolution have to
database of entities in space are described in Chapter be combined. Consequently geographical phenomena,
_4, which covers the essential aspects of digitizing, scan¬ at least as described by current generations of natural
ning, storing, updating, and plotting geographical data and social scientists, cannot be described uniquely.
10
Geographical Information
New theoretical and practical research on fuzzy logic formulations of space in the computer, but Chapter
and continuous classification suggests that it is not 11 shows that this is far from being so. In fact, by
always necessary to force spatial data into either one refusing to be hidebound by the classical paradigms
or the other paradigm—the real world is not made we find that we can often make more specific pro¬
up of either crisp entities or continuous fields—but nouncements about spatial relations than we can with
many phenomena have properties that can be dealt conventional methods. This finding provides many
with by combining the best aspects of either. Previ¬ new opportunities for useful research and applications
ously it was thought impossible to deal with vague that will take many more years to work out fully.
Definitions of GIS
The tool-base definition of a GIS is a powerful set of data represent phenomena from the real world in
tools for collecting, storing, retrieving at will, transform¬ terms of (a) their position with respect to a known
ing and displaying spatial data from the real world for a coordinate system, (b) their attributes that are un¬
particular set of purposes. The geographical (or spatial) related to position (such as colour, cost, pH, incidence
Definitions of GIS
(a) Toolbox-based definitions.
‘a powerful set of tools for collecting, storing, retrieving at will, transforming and
displaying spatial data from the real world’ (Burrough 1986).
‘a system for capturing, storing, checking, manipulating, analysing and displaying
data which are spatially referenced to the Earth ’ (Department of Environment 1987).
‘an information technology which stores, analyses, and displays both spatial and
non-spatial data’ (Parker 1988).
(b) Database definitions
‘a database system in which most of the data are spatially indexed, and upon which
a set of procedures operated in order to answer queries about spatial entities in the
database’ (Smith et al. 1987).
‘any manual or computer based set of procedures used to store and manipulate geo¬
graphically referenced data’ (Aronoff 1989).
(c) Organization-based definitions.
‘an automated set of functions that provides professionals with advanced capabil¬
ities for the storage, retrieval, manipulation and display of geographically located
data’ (Ozemoy, Smith, and Sicherman 1981).
‘an institutional entity, reflecting an organisational structure that integrates tech¬
nology with a database, expertise and continuing financial support over time’ (Carter
1989).
‘a decision support system involving the integration of spatially referenced data in
a problem solving environment’ (Cowen 1988).
11
Geographical Information
of disease, etc.) and (c) their spatial interrelations with emphasizes the differences in data organization needed
each other which describe how they are linked together to handle spatial data with their location, attributes,
(this is known as topology and describes space and and topology and most other kinds of information that
spatial properties such as connectivity which are un¬ only have to do with entities and attributes. The
affected by continuous distortions). organizational definition emphasizes the role of insti¬
Others have provided alternative definitions of GIS tutes and people in handling spatial information rather
(Box 1.3), focusing on either the spatial database or than the tools they need. In this sense GIS have been
on the organizational aspects. The database definition serving society for a very long time.
Plotter Printer
12
Geographical Information
linked to the computer) via the computer screen and Data input (Figure 1.5) covers all aspects of captur¬
keyboard, aided by a ‘mouse’ or pointing device. ing spatial data from existing maps, field observations,
and sensors (including aerial photography, satellites,
and recording instruments) and converting them to a
GIS SOFTWARE
standard digital form. Many tools are available includ¬
The software for a geographical information system ing the interactive computer screen and mouse, the
maybe split into five functional groups (Figure 1.4): digitizer, word processors and spreadsheet programs,
(a) Data input and verification scanners (in satellites or aeroplanes for direct record¬
(b) Data storage and database management ing of data or for converting maps and photographic
(c) Data output and presentation images), and devices necessary for reading data already
(d) Data transformation written on magnetic media such as tapes or CD-
(e) Interaction with the user. ROMs. Data input, and the verification of data needed
Aerial Field
Maps Sensors
photographs observations
1
Magnetic
Computer Stereo and
Digitizers Scanners
keyboard plotter Optical
i#; Media
1
DATA INPUT
Figure 1.5. Data collection and input
13
Geographical Information
14
Geographical Information
(/) Use numerical methods to derive new attributes from existing attributes or new
entities from existing entities.
(/??) Using the digital database as a model of the real world, simulate the effect of pro¬
cess P over time T for a given scenario S.
15
Geographical Information
the user with a high-level, compilable programming (Hearnshaw and Unwin 1994). The introduction of
language in which the models can be efficiently writ¬ the personal computer, the mouse or other pointing
ten (cf. PCRaster—Wesseling et al. 1996, or MMS— device, and multi-windowing has made it much easier
Leavesley et al. 1996). In other situations the GIS is for many people to use computers though typing skills
used to assemble the data for a complex model that is are still essential for most tasks. Voice interaction,
programmed outside the system using a standard com¬ virtual reality, and multi-media with sound input and
puter language (Burrough 1996a). Once the model has output are becoming available but still need to be
been run, the results are transferred back to the GIS developed fully for GIS. Whether the commands are
for display. given by typing, clicking on a window symbol with a
The interaction between user and GIS for data and mouse, or by voice recognition, the user still has to
query input and the writing of models for data ana¬ ensure proper formulation according to an agreed set
lysis is an aspect that has been neglected until recently of rules, otherwise the result will be spurious.
Questions
1. Explain why up-to-date spatial information is valuable for the following activities:
marketing, real estate, banking, tourism, electricity generation and supply, forestry,
agriculture.
2. Why should a national government provide spatial data on the World Wide Web
for free? Examine the aims and motives behind this approach.
16
TWO
Imagine that you are talking on the telephone to some¬ into units such as a building, road, held, valley, or hill
one and they ask you to describe the view from your and use geographical referencing hi terms of‘beside’,
window. How would you depict the variations you see? ‘to the left of’, or ‘in front of’ to describe the features.
It is likely that you would break down the landscape You have in fact developed a conceptual model of the
17
Data Models and Axioms
t
Perception/observation
Presentation -►Measurement
Data models
Data retrieval
and analysis Data structures
and file structures
Figure 2.1. All aspects of dealing with geographical information involve interactions with people
Spatial
* data models and data structures
The creation of analogue and digital spatial data sets involves seven levels of model
development and abstraction (cf. Peuquet 1984a, Rhind and Green 1988, Worboys 1995):
(a) A view of reality (conceptual model)
(b) Human conceptualization leading to an analogue abstraction (analogue model)
(c) A formalization of the analogue abstraction without any conventions or restric¬
tions on implementation (spatial data model)
(<d) A representation of the data model that reflects how the data are recorded in the
computer (database model)
(e) A file structure, which is the particular representation of the data structure in the
computer memory (physical computational model).
if) Accepted axioms and rules for handling the data (data manipulation model)
(g) Accepted rules and procedures for displaying and presenting spatial data to
people (graphical model)
18
Data Models and Axioms
landscape. Your interpretation of the features you have world phenomena in the computer but only represen-
observed and the ones you have decided to ignore will tations based on these formalized models. The major
be influenced by your experience, your cultural back¬ steps involvedfin proceeding from human observation
ground, and that of the person to whom you are de¬ of the world, either directly or with the assistance of
scribing the scene. tools like aerial photographs, remotely sensed images,
When information needs to be exchanged over a or statistically located samples, to an analogue or dig¬
larger domain it becomes necessary to formalize the ital representation are outlined in Box 2.1 and illus¬
models used to describe an area to ensure that data trated in Figure 2.1. The most important first step is
are interpreted without ambiguity and communicated that people observe the world and perceive phenom¬
effectively. This chapter will describe the main data ena that are fixed or change in space and time. Their
models used for describing geographical phenom¬ perception will influence all subsequent analysis; suc¬
ena (see Couclelis 1992, Frank et al. 1992; Frank and cess or failure with GIS does not defend in the first
Campari 1993; Egenhofer and Herring 1995; and instance on technology but more on the appropriate¬
Burrough and Frank 1996 for more detailed discus¬ ness or otherwise of the conceptual models of space
sion). It gives an essential background to the follow¬ and spatial interactions.
ing chapters of this book, because we do not store real
Geographical phenomena require two descriptors to entation to show its various structures. Phenom¬
represent the real world; what is present, and where it ena are also very often grouped or divided into units
is. For the former, phenomenological concepts such at other levels of resolution (‘scales’) according to
as Town’, ‘river’, ‘floodplain’, ‘ecotope’, ‘soil associa¬ hierarchically define3~faxonomiesTTor example the
tion’ are used as fundamental building blocks for ana¬ hierarchy of administration units of country-province-
lysing and synthesizing complex information. These town-district, or of most soil, plant, or animal class¬
phenomena are recognized and described in terms of ification systems.
well-established ‘objects’ or ‘entities’, which are de¬ The referencing in space of the phenomena may be
fined in standard texts (cf. Goudie etal 1988, Johnston definecfm terms of a geometrically exact or a relative
et al. 1988, Lapedes 1976, Lapidus 1987, Scott 1980, location. The former uses local or world coordinate
Stevens 1988, Whitten and Brooks 1972, Whittow systems defined using a standard system of spheroids,
1984). However, these dictionaries fail to point out projections, and coordinates which give an approx¬
that there are many ways to describe these phe¬ imation of the form of the earth (a spheroid) onto
nomena, and different terms can be used for different a flat surface. The coordinate system may be purely
levels of resolution. Many of these perceived geo¬ local, measured in tens of metres, or it may be a
graphical phenomena described by people as explicit national grid or an internationally accepted projection
entities (such as ‘hill’, ‘town’, or ‘lake’) do not have that uses geometrical coordinates of latitude and lon¬
an exact form and their extent may change with time gitude. Alternatively some maps provide geograph¬
(e.g. see Burrough and Frank 1996). ical referencing in a relative, rather than an absolute
At the same time, the type of building block used spatial geometry as Illustrated by aboriginal rock
to describe a phenomena at one scale of resolution is paintings anxTtKe plan of the London Underground.
likely to be quite different from that at another. For With these maps the locations are defined in reference
example, a road imaged from a satellite-based sensor to other features within the space, and neighbour-
might be modelled as a line, but the plan of a building hoodness and direction between entities is shown
site would have to be modelled using an areal repres¬ rather than actual metric distances.
19
Data Models and Axioms
20
Data Models and Axioms
Planning
Landform
zones
Soil maps
Rivers
and takes
Geodetic
Groundwater
control
levels
points
(time series)
Boreholes
Subsoil
(soil properties,
surveys
groundwater
quality)
Figure 2.2. Examples of the different kinds of geographical data collected for different purposes by
persons from different disciplines
or technical discipline of the observer. Disciplines that the continuous field approach while those who work
focus on the understanding of spatial processes in entirely in an administrative context will view an area
the natural environment may be more likely to use as a series of distinct units (Figure 2.2).
Geographical data models are the formalized equiva¬ information on the method or the level of resolu¬
lents of the conceptual models used by people to per¬ tion of observation or measurement may also be an
ceive geographical phenomena (in this book we use important part of the data model.
the term ‘data type’ for the kind of number used to Most anthropogenic phenomena (houses, land
quantify the attributes—see below). They formalize parcels, administrative units, roads, cables, pipelines,
how space is discretized into parts for analysis and agricultural fields in Western agriculture) can be
communication and assume that phenomena can be handled best using the entity approach. The simplest
uniquely identified, that attributes can be measured and most frequently used data model of reality is
or specified and that geographical coordinates can be a basic spatial entity which is further specified by attri¬
registered. As data may be collected in a variety of ways, butes and geographical location. This can be further
21
Data Models and Axioms
Figure 2.3. The fundamental geographical primitives of points, lines, and polygons
subdivided according to one of the three basic geo¬ applied to continuous mathematical functions also
graphical data primitives, namely a ‘point’, a ‘line’, or apply to these discretized approximations.
an ‘area’ (which is most usually known as a ‘polygon’ Both the .entity and tessellation models assume that
in GIS) which are shown in Figure 2.3. These are the the phenomena can be specified exactly in terms of
fundamental units of the vector data model and its both their attributes and spatial position. In practice
variouTTorms are summarized in Table 2.1 and illus¬ there will be some situations where these data models
trated in Figure 2.4a,c. Alternative means of repres¬ are acceptable representations of reality, but there
enting entities using tessellations of regular-shaped will be many others where uncertainties force us to
polygons are to use sets of pixels (see below). choose pragmatically the one or the other approach
With continuous field data, although the variation (the effects of uncertainty and error in spatial analysis
of attributes^such as elevation, air pressure, temp¬ are dealt with in Chapters 9 and 10).
erature, or clay content of the soil is assumed to
be continuous in 2D or 3D space (and also in time),
the variation is generally too complex to be captured Vector data models of entities
by a simple mathematical function such as a polyno¬ The vector data model represents space as a series of
mial equation. In some situations simple regression discrete entity-defined point line or polygon units
equations (trend surfaces) may be used to represent which are geographically referenced by Cartesian
large-scale variations in terms of simple, differenti¬ coordinates as shown in Figure 2.3.
able numerical functions (see Chapter 5) but gener¬
ally it is necessary to divide geographical space into Simple points, lines, and polygons Simple point,
discrete spatial units as given in Table 2.1 and shown line, and polygon entities are essentially static repres¬
in Figure 2Ab,d. The resulting tessellation is taken as entations of phenomena in terms of XY coordinates.
a reasonable approximation of reality at the level of They are supposed to be unchanging, and do not
resolution under consideration and it is assumed that contain any information about temporal or spatial
the operations such as differentiability which can be variability. A point entity implies that the geographical
22
Data Models and Axioms
-riub
entities Tessellations of continuous fields
\
v
Exact entities Continuous fields
Erorrrra
kkkl I I kl
kkki-i-i-ki \
kkki-i-i-k
(C)
(4/
Vector representation Irregular tesselation
of linked topology for with triangular elements
modelling networks
^ T) jJ, .
Figure 2.4. The encoding of exact objects (entities) and continuous fields in different data models.
(a) top left: vector representation of crisp polygons; (b) top right —raster model of continuous fields;
(c) bottom left—vector representation of linked lines; (d) bottom right —Delaunay triangulation of a
continuous field
23
Data Models and Axioms
extents of the object are limited to a location that can sulating entity data have been possible through the
be specified by one set of XY coordinates at the level object-oriented approach. This is the technical name
of resolution of the abstraction. A town could be rep¬ for database and programming tools that provide
resented by a point entity at a continental level of nested hierarchies and functional relations between
resolution but as a polygon entity at a regional level. related groups of entities that together form a single
Increasing the level of resolution reveals internal struc¬ unit at a higher aggregation level. Object orientation
ture in the phenomenon (in the case of a town, sub¬ is described more fully in the context of data struc¬
districts, suburbs, streets, houses, lamp-posts, traffic tures in Chapter 3.
signs) which may be important for some people and
not for others.
Tessellations of continuous fields
A line entity implies that the geographical extents
of the object may be adequately represented by sets Continuous surfaces can be discretized into sets of
of XY coordinate pairs that define a connected path single basic units, such as square, triangular, or hex¬
through space, but one that has no true width unless agonal cells, or into irregular triangles or polygons (the
specified in terms of an attached attribute. A road at Thiessen/Dirichlet/Voronoi procedure—see Chap¬
national level is adequately represented by a line; at ter 5) which are tessellated to form geographical rep¬
street level it becomes an area of paving and the line resentations. The use of irregular triangles, long used
representation is unrealistic. A telephone cable, on in land surveying, is based on the principle of trian¬
the other hand, can be represented as a line at most gulation whereby the continuous surface of the land
practical levels of resolution used in GIS. is approximated by a mesh of triangles whose apices
The simplest definition of the polygon implies that or nodes are given by measured ‘spot heights’ at care¬
it is a homogeneous representation of a 2D space. fully located trigonometric points. A major advantage
Clearly this also depends on the level of resolution. The of this approach is that the density of the mesh can be
polygon can be represented in terms of the XY coor¬ easily adjusted to the degree with which the surface
dinates of its boundary, or in terms of the set of XY varies—areas with little variation can be represented
coordinates that are enclosed by such a boundary. adequately by few triangles while areas with large vari¬
Polygons may contain holes, they have direct neigh¬ ation require more. This supports a variable resolu¬
bours, and different polygons with the same charac¬ tion in the data. The triangular surface can also easily
teristics can occur at different locations (Figure 2.3). accommodate variations in form as seen in 3D rep¬
If the boundary can be clearly identified it can be resentations of landform, or other surfaces such as
specified in terms of a linked list of a limited number aeroplane wings or car bodies. These data models are
ofXY coordinates (see Burrough and Frank 1996); the essentially static representations of the hypsometric
size of the set of included XY coordinates depends on surface, which is supposed to be unchanging over
the level of spatial resolution that is used. Very often time.
it is assumed that the level of resolution is given by Triangular meshes are also used to represent
the number of decimals to which the coordinates continuous variation in the dynamic modelling of
or the boundary are specified, which means that groundwater flows, surface water movement, wind
encoding using boundaries is more efficient than fields, and so on. In this form they provide a structure
listing all coordinates that are inside the boundary called finite elements; the differences in attribute val¬
envelope. ues between adjacent triangular cells that result from
the flow of water or dissolved materials are modelled
Complex points, lines, polygons, and objects with using differential calculus (Gee et al. 1990). The
functionality More complex definitions of points, triangular irregular network or TIN is a data structure
lines, and polygons can be used to capture the inter¬ (see Chapter 3) used to model continuous surfaces
nal structure of an entity; these definitions may be in terms of a simplified polygonal vector schema
functional or descriptive. A town includes streets, (Figure 2Ad).
houses, and parks, each of which have different func¬ The most-used alternative to triangulation is the
tions and which may respond differently to queries regular tessellation or regular grid. The 2D geometric
or operations at the town level. Topological links surface is divided into square cells (known as pixels)
(Figure 2.4a,c) can be used to indicate howTiries"are whose size is determined by the resolution that is re¬
hm^eduntopolygons or linked networks, respectively. quired to represent the variation of an attribute for a
In recent years more highly structured ways of encap¬ given purpose. The grid cell representation of space is
24
Data Models and Axioms
known as the Raster data model (Figure 2.4b). When This equivalence -between the vector and raster
the grid cells are used to represent the variation of models of space frequently causes confusion about the
a continuously varying attribute each cell will have a nature of the phenomena being represented. For ex¬
different value of the attribute; the variations between ample, continuous fields can also be represented by
cells \\djLhgjrssumed to be mathematically continu¬ isolines or contours, which are sets of XY coordinates
ous so that differential calculus may be used to com¬ linking sites of equal attribute value. Contours are use¬
pute local averages, rates of change, and so on. Each ful ways of representing attribute values on paper maps
grid cell may be thought of as a separate entity that and computer screens but they are less efficient for
differs from vector polygons only in terms of its regu¬ handling continuous spatial variation in numerical
lar form and implicit rather than explicit delineation models of spatial interactions. Contour envelopes may
(Tobler 1995). be treated as simple polygons or as closed lines in terms
Although the regular grid is most often used to rep¬ of the entity approach, but they are merely artefacts
resent static phenomena, it can easily be adapted to of representation, not outlines of real world ‘objects’.
deal with dynamic change. Changes over time may The overlap between the two approaches is great¬
be recorded in separate layers of grid cells, one for est when we have to deal with phenomena such as soil
s each time step, so that the change from the static to mapping units, or land use or land cover units. The
K 0 J the dynamic data model requires only that the basic classic approach of conventional mapping is to define
structure is repeated for each time step. Time, like classes of soil, land use, land cover, etc. and then
j** space, is assumed to be discretized in this model. Regu- to identify areas of land (entities) that correspond
^(\cl lar tessellations may also be nested to provide more to these classes. These areas can be represented by
spatial detail through the use of linear and regional vector boundaries enclosing polygons or by sets of
quadtrees and other nested structures (see Chapter 3). contiguous raster cells having the same value. Such a
Lateral changes over space may also be handled representation is known as a choropleth map, because
easily by the regular grid because of its approximation it contains zones of equal value. It may also be known
to a continuous, differentiable mathematical surface. as a chorochromatic map because each zone is dis¬
The flow of materials through space maybe computed played using a single colour or shading.
using finite difference modelling because the constant An alternative approach to representing artificial
geometry of the cells means that first and second entities such as land use or soil classes is to postulate
order derivatives can be easily calculated by simple that land use or soil are continuous variables, not en¬
subtraction and addition. tities, but the geographical surface is made up of zones
Three-dimensional equivalents of pixels are termed where the attributes have the same value (the poly¬
voxgkz—these are the basic units of spatial variation in gons) and zones where the attribute values change
a 3D space. In 2D and 3D space some applications use abruptly (boundaries). This approach stems from the
rectangular or parallelepiped tessellations but these need to extract entities and homogeneous zones from
can be seen as aberrant forms of the regular square grid-based data such as remotely sensed images. Note
discretization. All the operations possible in the 2D that though the variation of these kinds of attributes
regular grids can be applied to data on a 3D grid. may be thought of aa_heing continuous in space (be¬
cause every cell has a value for soil or land use, includ¬
ing classes for ‘no soil’ or ‘waste land’) the surface is
Pixels and voxels as ‘entities’ not continuous in terms of the differential calculus.
The basic units of discretization in the regular tessel¬ This means that mathematical operations like the cal¬
lation of continuous space may also be used to pro¬ culation of slopes should not be applied to data that
vide a geometrical reference for the simple data units cannot be approximated by a continuous mathemat¬
of points, lines, and areas. A vector ‘point’ can be rep¬ ical function
resented by a single cell; a vector ‘line’ by a set of con¬ In both 2D and 3D we can think of pixels (voxels)
tiguous cells one cell wide having the same attribute or polygons as units that can be treated as a series of
value; a vector polygon by a set of contiguous cells open systems with regular or irregular form. The state
having the same attribute value. Vector representa¬ of each cell (the local system) is determined by the
tion is often preferred because regular grid cells may value of its attributes; attribute values can be changed
lose spatial details, though this is becoming less of by operations that refer only to the cell in question
a problem with increased power of computers and or that use information from other cells that in one
memories. way or another are part of the cell’s surroundings. The
25
Data Models and Axioms
a) Conceptual models
of the world
b) Data models
c) Representation
Complex continua can be represented by:
Objects consist of
C3 l & U 1 9 1 fit 1 1 UU J 6C
continuous,
(atomic objects/entities), non-dif ferent iable
discretized surfaces continuous smooth
their attributes, relations mathematical functions
(tesselation) mathematical functions
and rules (fractal fields,
stochastic surfaces)
Figure 2.5. Steps in the process from observation of real word phenomena to the creation
of standardized data models
difference between this approach for cells and the interactions is much more complex than with regular
approach for polygons is not great (Tobler 1995); cell structures.
only the variable geometry of the vector representa¬ Figure 2.5 summarizes the data modelling stages
tion of entities means that computing neighbourhood followed so far.
26
Data Models and Axioms
Networks • •
r-T • *
Sampling + + +
M3^ !_i4J
records + +
Surface
data
-G
"f
'\ *
+/ ^3) IP
Label/
text
G Utrecht
Arnhem G
G A12 %
Symbols ★
♦ - —^^
attributes attributes
Figure 2.6. The different ways of graphically displaying data encapsulated by (a) left—vector entity
models, and (b) right —raster models
models can be visualized in the vector or raster do- the influence of cartographic semiology in the vector
main. The figure also summarizes and makes explicit representation.
Data types
In everyday speech we distinguish qualitative or nom¬ The attributes of entities may be expressed by
inal attributes from quantitative data or numbers, and Boolean, nominal, ordinal, integer, or real data types.
recognize that different kinds of operations suit dif¬ Real data types include decimals; integer and real are
ferent kinds of data. The same is true when describ¬ collectively known as scalar data. Geographical coor¬
ing geographical phenomena using a formalized data dinates are sometimes expressed as integers but mostly
model and the information maybe written down (and as real data types, and topological linkages use in¬
stored in the computer) using various data types tegers. Differentiable continuous surfaces require real
(Table 2.2). data types though integers are sometimes used as an
27
Data Models and Axioms
'ri . •:
approximation (one cannot take the derivative of a and integer data types. Consequently the kind of data
nominal attribute). Non-differentiable continuous sur¬ analysis is governed by the data types used in the data
faces and their discretized forms (grid cells or pixels) model. The limits of accuracy of arithmetical opera¬
can take the same range of data types as entities. tions is limited by the length of the computer word
Logical operations can be carried out with all data used to record the numbers (see Chapter 8).
types, but arithmetical operations are limited to real
Having explored the problems of defining data mod¬ polygons, and pixels (grid elements). Complex entities
els as representations of phenomena in the real world having a defined internal structure can be built from
and the ways they can be constructed from geo¬ sets of points, lines, and polygons.
graphical primitives as simple or complex entities or 2. All fundamental entities are defined in terms
discretized continuous surfaces we can now specify of their geographical location (spatial coordinates or
the logical ground rules and axioms (sensu ‘gener¬ geometry), their attributes (properties) and relation¬
ally accepted propositions or principles sanctioned ships (topology). These relationships may be purely
by experience’: Collins English Dictionary, 3rd edn., geometrical (with respect to spatial relations or neigh¬
1994) that govern the way these data models may be bours), or hierarchical (with respect to attributes) or
treated. Though some of the following may seem self- both.
evident, the following statements (adapted from 3. Individuals (entities) are distinguishable from one
Robinove 1986) provide a formal basis for spatial another by their attributes, by their location, or by
data handling and look forward to the material pre¬ their internal or external relationships. In the simple,
sented in Chapters 7 and 8. static view, individuals or ‘objects’ are usually assumed
1. It is necessary to identify some kind of dis¬ to be internally homogeneous unless they are a repre¬
cretization such as entities (individuals) that carry sentation of a mathematical surface or unless they are
the data. In GIS the primitive entities are points, lines, complex objects built out of the primitives. In most
28
Data Models and Axioms
cases GIS usually only distinguish objects that are in¬ 8. New complex entities or objects can be created
ternally homogeneous and that are delineated by crisp from the basic point, line, area or pixel entities.
boundaries. A more complex GIS allows intelligence 9. New attributes can be derived from existing
about inexactness in objects in which either the attributes by means of logical and/or mathematical
attributes, the relationships, or the location and de¬ procedures or models.
lineation are subject to uncertainty or error.
New attribute U =/(A, B, C,. . .)
4. Both entities and attributes can be classified into
useful categories. The mathematical operations include all kinds
5. The propositional calculus (Boolean algebra) can of arithmetic (addition, subtraction, multiplication,
be used to perform logical operations on an entity, division, trigonometry, differentiation and inte¬
its attributes, its relations, and the groups to which it gration, etc.) depending on data type—see Table
belongs. 2.2.
6. In GIS the propositional calculus is extended New attributes can also be derived from existing
to take account of: topological relations and from geometric properties
(e.g. linked to, or size, shape, area, perimeter) or by
distance
interpolation.
direction
10. Entities having certain defined sets of attributes
connectivity (topology)
maybe kept in separate subdata sets called data planes
adjacency
or overlays.
proximity
11. Data at the same XYZt coordinate can be linked
superposition
to all data planes (the principle of the common basis
group membership
of location).
ownership of other entities
12. Data linked to any single XYZt coordinate may
Intelligent GIS extend the propositional calculus refer only to an individual at that coordinate, or to the
to take account of non-exact data (see Chapter 11). whole of an individual in or on which that point is
7. New entities (or sets of entities) can be created located.
by geometrical union or intersection of existing 13. New attribute values at any XYZt location can
entities (line intersection, polygon overlay)—see be derived from a function of the surroundings (e.g.
Chapter 6. computation of slope, aspect, connectivity).
29
Data Models and Axioms
Collectors and users of geographical data need to make digging in the street will not damage it and so it can
decisions on the choice of data model every day. The be found quickly when needing repair), and (c) in¬
following examples illustrate the importance of an formation on how different parts of the net are con¬
understanding of these ideas. nected together. Clearly all these requirements can be
incorporated in a data model of topologically con¬
Cadastre nected lines (entities) that are described by attributes
(Figure 2.2). Data types may include all forms.
The main aim of the cadastre or land registry is to pro¬
vide a record of the division and ownership of land.
The important issues are the location, area, and ex¬ Land cover databases
tent of the land in question and its attributes (such as
National and international governments are interested
the name and address of the owner), the address of
in the division of the landscape according to classes
the parcel in question, and information about trans¬
of land cover—urban areas, arable crops, grassland,
actions and legal matters. In this case the exact entity
forest, waterbodies, coasts, mountains, etc. Creating
(vector) model works well, using nominal, integer, and
a data model for such an application requires several
real data types to record the attributes and real data
steps. First, it is necessary to define exactly what is
types for the coordinates. In many countries land
meant by the classes. Second, one needs to decide
registry is highly organized, parcel boundaries are
how to recognize them, and third, one must choose a
surveyed with high accuracy to reduce the chance of
survey methodology (such as a point sample survey
disputes so the assumption of a data model in which
or a remote sensing scanner in a satellite) to collect
the parcel is bound by infinitely thin lines is a good
surrogate data which are then interpreted to produce
approximation to what the land registry is trying
the result desired.
to achieve. The coordinates of these boundaries are
The simplest data model assumes that the classes
located accurately with respect to a national or local
are crisp and mutually exclusive and that there is a di¬
reference and the attributes of the entity in the data¬
rect relation between the class and its location on the
base are simply the properties associated with the
ground. If this is acceptable then one can use the sim¬
parcel. An essential aspect of the polygon represen¬
ple polygon primitives as a model for each occurrence
tation is that boundaries may be shared by adjacent
of each class. The result is the well-known choropleth,
parcels. This saves double representation in the data¬
or more correctly, chorochromatic map. The issues in¬
base (see Chapter 3) and links the boundaries into
volved in building the database are then defining the
a topologically sound polygon net which can handle
classes, identifying and mapping the boundaries, and
both adjacency and inclusions.
attaching the attributes to each class in a manner that
is equivalent to the polygon net model for cadastral
Utility networks
mapping. The data types used will range from nomi¬
Utility network is the generic term for the collections nal (for recording names of classes) to scalar (for com¬
of pipes and wires that link the houses of consumers puting and recording areas).
to the supplies of water, gas, electricity, telephone, Consider now what might happen if there are dis¬
cable television to national or regional suppliers and agreements about how to classify land cover. Different
also to the waste water disposal systems of drains and people or organizations might have different reasons
sewers. In many countries these networks are hidden for allocating land to different classes; even if the same
below ground though in some, electricity, telephone, number of classes are used and the central concepts
and television cables may be suspended from poles of the classes are similar. Table 2.3 presents some
along the streets. Three aspects of these networks are examples of how different the results can be. Clearly,
important when recording these phenomena, namely any analyses based on data from the different sources
(a) the attributes of a given net (what it carries, what would give considerably different results, with serious
kind of wire or pipe, additional information on ma¬ implications for interpretation.
terials used, age, name of contractor who installed it, Now consider the effects of the method of survey
and so on); (b) the location of the net (so that persons on the results. If we collect land cover data by sam-
30
Data Models and Axioms
Land use FAO-Agrostat Pan-European 10 minutes Land Use Statistical Land Use Vector
classification Questionnaire Pan-European Database Database
by Eurostat Land Use Database
Permanent crops
Germany 4.42 — 1.80 2.30 5.36
France 12.11 — 12.07 12.18 31-45
Netherlands 0.29 — 0.22 0.34 1.07
UK 0.51 — 0.52 0.59 6.54
Forest
Germany 103.84 103.84 98.56 —
100.46
France 147.84 148.10 140.675 145.81 79-63
Netherlands 3.00 3.30 1.48 3.00 0.78
UK 23.64 24.00 18.96 14.29 10.03
pling, i.e. by visiting a set of data points on the ground This example shows that two different, but comple¬
and recording what is there we will have to interpo¬ mentary data models can be used for land cover map¬
late from these ‘ground truth’ data to all sites where ping. The representation of these models as vectors
no observations have been made. The allocation of or rasters depends partly on how the data have been
an unsampled site to a given class is then a function collected and partly on the way they will be used.
of the quality and density of the data and the power of
the technique used for interpolation. Note that if we
decide to interpolate to crisply delineated areas of land Soil maps
we still use the choropleth model based on the geo¬ Most published soil maps use the entity data model
graphical primitive area/polygon. If we interpolate to based on the vector polygon as the geographical prim¬
a discretized surface such as a regular grid then the land itive. Polygons are defined in terms of their soil class,
cover map consists of sets of pixels with attributes which by implication is homogeneous over the unit.
indicating the land cover class to which they belong. Boundaries are represented as infinitely thin lines
If we use remotely sensed data to identify land cover implying abrupt changes of soil over very short dis¬
we automatically work with a gridded discretization tances. Note that inclusion of polygons in polygons is
of continuous space because that is how the satellite an important aspect of soil and geological maps.
scanner works. The resolution of our spatial informa¬ This data model is practical, because it means that
tion is limited by the spatial resolution of the scanner, simply by locating a site on a map and determining
which for digital orthophotos maybe very fine indeed the mapping unit one can retrieve information on
(Plate 1). Unlike the case with sampling, we do have the soil properties by consulting the survey report.
complete cover of the area (excluding problems with However, the paradigm is scientifically inadequate be¬
cloud cover on the image and so on) so the informa¬ cause it ignores spatial variation in both soil forming
tion present in each pixel is of equal quality, which is processes and in the resulting soils (see Plates 4.4,4.6).
not the case with interpolation. The major problem The model is conceptually identical to that used in
with identifying land cover with remotely sensed data other polygon-based spatial information, such as
is to convert the measurements of reflected radiation parcel-based land use or land ownership data, and it
for each pixel into a prediction that a given land cover has provided the role model for the development of
class dominates that cell. Obviously the success of soil information systems and GIS in the late 1970s and
the quality of the data depends on the quality of the 1980s (Burrough 19911?). Data types used to record
classification process. attribute data will include nominal, integer, and real.
31
Data Models and Axioms
A critical aspect of delineating soil classes in geo¬ attribute values, or attribute values may be interpo¬
graphical space concerns the interpretation of bounda¬ lated to cells or locations on a regular grid, which leads
ries on the ground. Important soil differences may be to the raster model of space. As with soil and many
indicated by abrupt, clearly observable physiographic other attributes of the physical, chemical, and biolo¬
features such as changes in lithology, drainage, or gical landscape, these cannot be seen directly but must
breaks of slope, which we can call ‘primary bound¬ be collected at sets of sample points according to some
aries’. However, the drawn soil boundaries may also approved sampling scheme. Both the area (or volume)
merely reflect interpreted differences in soil classifi¬ of the samples (known technically as the support) and
cation in the data space. These we term ‘secondary their density in space relative to the spatial variation
boundaries’. of the attribute concerned are important for the qual¬
As far as the user is concerned, printed (and digi¬ ity of the resulting interpolations. The details of
tized) soil maps do not distinguish between ‘real’ pri¬ interpolation and the differences in results obtained
mary boundaries that have a physiographic basis, and using different methods are explained in Chapters 5
‘interpreted’ secondary boundaries. Consequently, as and 6.
both types of boundary are represented as supposedly
infinitely thin lines, the mapping procedures lead
Hydrology
inevitably to a ‘double crisp’ conceptual model of
soil variation, both with respect to the classification Hydrological applications require the modelling of
in attribute space and the geographical delineation of the transport of water and materials over space and
mapping units. According to this Boolean model any time, which can require changes to be signalled in
site can belong to a only single soil unit: both in attributes, and in location and form of critical patterns
attribute space and geographical space the member¬ (e.g. water bodies). Not only may water levels in
ship of a site in soil class i is either 0 (not a member or rivers, reservoirs, and lakes change but the geometry
outside the area) or 1 (is a member or is inside). and location of water bodies can vary as well. A change
A major problem with soil, vegetation, and other in water level in a lake causes the location of the
similar natural phenomena, is that they vary spatially boundary between water and land to change. A flood
at all scales from millimetres to whole continents. may result in the opening up of new channels and
Although soil scientists have long recognized this, the abandonment of old ones, thereby changing both
soil cartographers still use the choropleth model for topology and location.
mapping soil at different levels of resolution. This gen¬ The simple entity vector data model of points, lines,
erates a serious logical fallacy because at one level it and areas is not very well suited to dealing with
assumes within-unit homogeneity, while at another hydrological phenomena because changes in geometry
spatially coherent differences in soil have been recog¬ mean changing the coordinate and topological data
nized. In addition, whereas with land cover we might in polygon networks, which can involve consider¬
expect the boundaries of land cover classes to be able computation. Better is to use a data model based
reasonably sharp and discrete because their location on ideas of ‘object orientation’ in which primitive
is often dictated by differences in human use of land¬ entities are linked together in functional groups
scape, real soil boundaries can be sharp, gradual, or (McDonnell 1996). The internal structure of the data
diffuse. model permits action on one component of the group
An alternative to the discrete polygon data model to be passed automatically to other parts; consequently
for soil is to assume that soil properties vary gradually the data model contains not only geographical loca¬
over the landscape. The soil is sampled at a series of tion, geometry, topology, and attributes but also in¬
locations and attributes are determined for these formation on how all these react to change.
samples. The simplest data model is then the geo¬ Transport of material can also be easily captured by
graphical point (to represent the locations) with the data models of continuous variation. The use of the
values of the associated attributes. From this simple variable resolution triangular or square finite element
data model new data models of continuous spatial net is common in hydrological models but not in com¬
variation may be created by interpolation. mercial GIS (e.g. McDonald and Harbaugh 1988).
The data models for describing soil as a continu¬ Transport of material over a surface can also be dealt
ous variable are, in principle, very similar to those used with using the raster data model of continuous varia¬
for the hypsometric surface of land elevation. Sets of tion to which the surface topology has been added or
discrete contour lines can be used to link zones of equal derived by computation (see Chapter 8).
32
Data Models and Axioms
Figure 2.5 summarizes the steps that need to be taken points, lines, and areas from a GIS to make a new map.
when going from a perception of reality to a set of data Given enough bricks one can build houses, recreate
models that can be used in a computerized informa¬ landscapes, or even construct life-sized models of
tion system. In some applications the decision to opt animals like giraffes and elephants.
for an entity-based approach or a field-based approach And this is the point. The giraffe built out of plastic
may be clear-cut. In others it may be a matter of opin¬ blocks can be the size, the colour, and the shape of a
ion depending on the aims of the user. real giraffe, but the model does not, and cannot have
Interconversion between an entity-based vector or the functions of a giraffe. It cannot walk, eat, sleep,
raster representation and a continuous representation procreate, or breathe because the basic units of which
is technically possible if the original phenomena have it is built (the blocks or database elements) are not
been clearly identified. Raster-Vector conversion is capable of supporting these functions. While this is a
covered in Chapter 4. No amount of technology, how¬ trivial example, the same point can be made for many
ever, can make up for differences in interpretation that database units that are used to supply geographical
are made before the phenomena are recorded. If sci¬ data to drive analytical or process oriented models.
entist X perceives the landscape as being made up of No amount of data processing can provide true func¬
sets of crisp entities represented by polygons, his view tionality unless the basic units of the data models have
of the world is functionally different from scientist Y, been properly selected.
who prefers to think in terms of continuous variation.
Both approaches may be distortions of a complex
Nine factors to consider when embarking on
reality which cannot be described completely by either
SPATIAL ANALYSIS WITH A GIS
model.
Figure 2.2 illustrates how the preferred choice of The following nine, not necessarily independent, ques¬
entities or continuous fields may vary between appli¬ tions concerning spatial data are of fundamental im¬
cations and within disciplines. Generally speaking, portance when choosing data models, and database
those disciplines concerned with the inventory and approaches for any given application:
recording of static aspects of the landscape opt for the 1. Is the real world situation/phenomena under
entity approach; disciplines dealing with the studies study simple or complex?
of pattern and process requiring dynamic data models 2. Are the kinds of entities used to describe the
opt for continuous, differentiable fields. situation/phenomena detailed or generalized?
In most GIS all locational data and attribute values 3. Is the data type used to record attributes Boolean,
are deemed to be exact. Everything is supposed to be nominal, ordinal, integer, real, or topological?
known, there is no room for uncertainty. Very often 4. Do the entities in the database represent objects
this comes not because we are unable to cope with that can be described exactly, or are these objects
statistical uncertainty, but rather because it costs too complex and possibly somewhat vague? Are their
much to collect or process the data needed to give properties exact, deterministic, or stochastic?
us the information about the error bands that should 5. Do the database entities represent discrete phys¬
be associated with each attribute for each data unit. ical things or continuous fields?
But, in principle there is no reason why information 6. Are the attributes of database entities obtained
on quality cannot be added to the data. by complete enumeration or by sampling?
In essence, the intellectual level of the simple crisp 7. Will the database be used for descriptive, admin¬
entity models of spatial phenomena is little different istrative, or analytical purposes?
from that of children’s plastic building blocks. These 8. Will the users require logical, empirical, or
toys obey the basic axioms of information systems, process-based models to derive new information
including that which says that it is possible to create a from the database and hence make inferences
wide variety (possibly an infinite variety?) of derived about the real world?
objects by combining various blocks in different ways. 9. Is the process under consideration static or
Logically this is no different from combining sets of dynamic?
33
Data Models and Axioms
Questions
1. Develop simple data models for use in the following applications:
A road transport information system
The location of fast food restaurants
The incidence of landslides in mountainous terrain
The dispersion of pollutants in groundwater
An emergency unit (police, fire, ambulance)
A tourist information system
The monitoring of vegetation change in upland areas
The monitoring of movement of airborne pollutants, such as the 137Cs deposited
by rain from the Chernobyl accident in 1986.
Consider the sources of data, the kinds of phenomena being represented, the data
models, the data types, and the main requirements of the users.
34
THREE
Geographical Data
in the Computer
To take advantage of the possibilities a computerized spatial modelling tool may
provide, it is essential to understand how the data models used to represent geo¬
graphical phenomena are coded in these devices. This chapter describes the ways
in which spatial data may be efficiently coded into a computer to support the
operations of a GIS. Computers code data and carry out instructions using a series
of switches which exist in one of two states ‘on’ or ‘off’. These states are coded
by the numbers 1 and o respectively and the base 2 binary system is therefore
the fundamental coding basis for all computing. Therefore geographical data need
to be converted into discrete records in the computer using these switches to rep¬
resent the location, presence or absence, type etc. of the phenomenon. The main
data structures used to date are termed vector and raster and they divide space
into either a series of points, lines, and areas, or into a regular tessellation. Re¬
cently the object-oriented data structure has been used which employs the same
basic units as the vector and raster systems but structures the entities into linked
or hierarchical forms.
When entered into the computer the data need to be organized to allow effi¬
cient accessing, retrieval, and manipulation. Systems for organizing data in the
computer range from simple lists or indexed files, through to highly structured
databases based on hierarchical, network, relational, and object-oriented sche¬
mata. These structures determine how the data are organized into data records
and so control the way they are held in the computer. The various combinations
impose limitations on the data representation and handling which affect the ana¬
lysis and modelling of the information. Different ways of coding and storing the
various data structures in a computer are discussed. In recent years research has
highlighted the possibilities of using object-oriented database structures in GIS
which support a more realistic structuring for entity data.
35
Geographical Data in the Computer
A computer version of a map may be no more than a information that has been collected from the real
digital image of a paper document stored on an elec¬ world, but also in response to numerical models of
tronic storage device such as a CD-ROM, or it may spatial and temporal processes. The GIS is then more
be an interactive display of information held in a than a simple automator of existing tasks; it provides
complex GIS. The latter form of map may have been both an archive of spatial data in digital form and a
made using procedures that are different from con¬ tool for exploring the interactions between process and
ventional topographical or thematic maps and may pattern in spatial and temporal phenomena.
use other kinds of symbolism to convey its message When geographical data are entered into a com¬
so it is essential to understand how the end-product puter it would be very convenient if the GIS recognized
relates to the original observations of the phenomena the same conceptual models the user is accustomed
in question. to (discussed in Chapter 2). This would allow the
The computer provides a much greater range of interactions between the user and the computer to
possibilities for recording, storing, retrieving, analys¬ be both possible and intuitive. However, human per¬
ing, and displaying spatial data than a conventional ception of space is frequently not the most efficient
map. Whereas the conventional paper map was a gen¬ way to structure a computer database and does not
eral purpose database created by a survey at a given account for the physical requirements of storing and
point in time, the computer can select any view of repeatedly using digital information. Computers for
the data for any given purpose. The computer allows handling geographical data therefore need to be pro¬
us to add data, to retrieve data, and to compute new grammed to represent the phenomenological struc¬
information from existing data, so a computerized tures in an appropriate manner, as well as allowing
spatial database need not remain static, but may be the information to be written and stored on magnetic
used to model changes in spatial phenomena over devices in an addressable way. Unlike many other
time. These new possibilities raise the following kinds of data handled routinely by modern infor¬
questions about the nature of space, on spatial data, mation systems, geographical data are complicated
and how spatial phenomena should be described in because they refer to the position, attributes, and the
terms of conceptual models. internal and external topological connections of the
If our aim is to automate the map-making process, phenomena recorded. The topological and spatial
to provide an electronic analogue of the conventional aspects of geographical data distinguish them from the
map (in which the graphic representation of spatial administrative records handled by data processing
phenomena is coded by standard symbols and shad¬ systems used for banking, libraries, airline book¬
ing) then the conceptual models required for com¬ ings, or medical records. In addition GIS must display
puterizing the data differ little from those used in the data using graphical components and form,
conventional mapping. The main task is to devise ways such as colour, symbols, shading, and line type and
of coding the formalized geographical data models width, that are interpretable by people. Therefore as
so that the computer can handle them. The links be¬ a result of the nature, volume, and complexity of geo¬
tween conceptualization-data storage-data retrieval- graphical data a number of logical computer sche¬
visualization are similar to those used in conventional mata have been developed to pack data onto devices
map-making, with the computer doing the hard work for efficient storage, updating, and retrieval. These
of retrieving and drawing the maps. schemata are a continuance of the model/structure
If our aim, however, is to extend our abilities to han¬ development processes discussed in Chapter 2 and
dle spatial data in such a way that our view of the world are stages (d) and (e) of Box 2.1 (reproduced below
is not limited by the constraints of the printed map, if for your convenience).
we wish to extract subsets of spatial data for specific Once data have been stored in the database, they
purposes, or to link information on spatial phenom¬ can be retrieved and transformed, again according to
ena to the physical processes governing their appear¬ formalized logical and mathematical rules. New data
ance and distribution, then the computer provides may be created from old; data can be used over and
many opportunities beyond the automation of exist¬ over again without deteriorating in quality, though
ing tasks. A computer provides the means to inter¬ their intrinsic quality may affect the quality of the
act with data in ways that are impossible with printed results—Chapters 6, 7, 8, 9, and 10 cover these aspects
information. Data can be changed, retrieved, recom¬ of geoinformation processing. First, it is important
puted, and displayed not just as a reflection of new to consider how computers store data.
36
Geographical Data in the Computer
People represent numbers and do arithmetic using the be linked together to make a computer word. Words
decimal system with the base 10. Each position in the are grouped together in a data record and records are
written form of a number indicates how many units grouped together in a computer file. Sets of computer
of the power of 10 correspond to that position. So the files can be grouped together hierarchically in direct¬
first column to the left of the decimal represents mul¬ ories or subdirectories.
tiples of unity (10°), the second multiples of 10 (101), With a byte of eight bits we can represent 256 num¬
the third multiples of 100 (102), and so on. Com¬ bers from 0 to (2s - 1) i.e. from 0 to 255. For example
puters use a different counting system. They store data the sequence of eight switches 111111112 means:
in the form of arrays of switches that have only two
states; they are ‘on’ or they are ‘off’ and these states
27*1 + 26*1 + 25*1 + 24*1 + 23*1
+ 22*1 + 2U1 + 2°*1 - 25510
are coded by the numbers 1 and 0 respectively. There¬
fore computers code data and do computations using Remember that any positive number raised to a zero
binary arithmetic which is base 2. Box 3.1 demonstrates power equals 1.
the binary system and arithmetic. In the following dis¬ If we combine 2 bytes in a 16-bit word it is possible
cussion 10 is used to indicate a number in the decimal to code numbers from 0 to 65 535. However, it is
system and 2 in the binary system. also useful to be able to code positive and negative
numbers so only the first fifteen bits (counting from
the right) are used for coding the number and the six¬
Binary numbers and computer representation teenth bit (215) is used to determine the sign. This
OF NUMBERS AND TEXT means that with sixteen bits we can code numbers
As with ordinary numbers, the data in the binary sys¬ from -32 767 to +32 767. A number lacking a frac¬
tem are represented using a series of‘columns’ and in tional component is called an integer, so numbers in
a computer each column is used to represent a switch. the range-32 767 to +32 767 are called 16-bit integers.
The switches, which are really small magnetized areas Frequently it is necessary to code numbers that are
on a computer disk or memory, are usually grouped larger or smaller than -32 767 to +32 767, or num¬
in packets of eight, known as a byte. Several bytes can bers that have fractional components, so computer
37
Geographical Data in the Computer
BOX 3.1.
Multiplying by 2 is quickly achieved by shifting the columns one place to the left
2 * 0101 is 1010
1110/2 = 0111
Of course, if the right most bit is a Y, division results in it being removed from the word
so that simply multiplying by 2 (the left shift) again does not give the number one started
with unless other checks are added.
words with more bits are needed. Real numbers with Data organization and coding using
positive and negative values and decimals require 32- HEXADECIMALS
bit or even 64-bit words (so-called double precision)
in which some of the bits are reserved for the decimal So far we have considered the coding of decimal num¬
part (the part smaller than 1) and the rest are used bers into machine usable form but computers are also
for larger numbers. This method of coding numbers used to store text characters. One of the most com¬
means that the accuracy with which any number may monly used systems for coding alphanumeric data is
be coded is a function of the number of bits used. This known as the American Standard Computer Informa¬
means numbers are not coded exactly, which may lead tion Index (ASCII) and is based on the hexadecimal
to rounding errors when the user attempts to work system.
with numbers that are larger than those allowed. This system works with units of 4 bits which is
Chapter 9 discusses the problems of rounding errors equivalent to counting to the base 16 (24). Hexadeci¬
in GIS in more detail. mal numbers may be represented by single characters
38
Geographical Data in the Computer
(as in the series 0-9) by replacing the numbers 10, 11, The ASCII system is by no means the only way of
12, 13, 14, 15 by the symbols A, B, C, D, E, F respect¬ coding text and many other systems are not only pos¬
ively. The column/place system is used just as with sible, but have been devised for coding text charac¬
binary or decimal numbers. For example: ters, diagrams, images, sound, and other information.
the decimal number 17 is hexadecimal 11 The organization of the bits is known as its format;
and binary 10001; clearly unless the format is known, information can¬
the decimal number 182 is hexadecimal B6 not be extracted from the string of bits.
and binary 10110110.
The hexadecimal system also provides a convenient
basis for coding alphanumeric data such as the letters Data volumes
39
Geographical Data in the Computer
Table 3.1. Characteristics of basic spatial entities in vector and raster mode
Vector units
Points X,Y,(Z) records implicit
Nodes X,Y,(Z) records explicit
Simple line NIXXZ] records implicit
Complex line (arc) NlX.Y.Z] A/[records] explicit
Polyhedron (solid body) /W[tines] records explicit
Raster units
Grid cell (pixel) row, column single value implicit
Line A/[pixels] single values or pointer to records implicit
Polygon /W[pixels] single values or pointer to records implicit
Voxel row, column, layer single values or pointer to records implicit
Points, lines, and areas: vector data point called a node, whose function is to carry infor¬
STRUCTURES mation about the way the lines are linked together and
A data structure that uses points, lines, or polygons how traffic could flow from one to the other. If lines
to describe geographical phenomena is known as a cross without being joined by nodes the computer
vector data structure. Vector units are characterized cannot distinguish between a bridge, an underpass,
by the fact that their geographical location may be in¬ or a crossroads.
dependently and very precisely defined, as maybe their Nodes are also used to carry information about how
topological relationships. They usually admit no in¬ the boundary arcs of polygons link up and provide
ternal variation (unless they are composed of simpler information about left and right polygon neighbours
units) so that their attributes refer to the whole unit. and enclosed islands. The methods used to link arcs
A spatial phenomenon is modelled in terms of its into polygon networks are described in more detail
geographic location and attributes. An oil well could later in this chapter. Attributes of vector units are
be represented by a point unit consisting of a single stored in computer files as records or tuples that may
XY coordinate pair and the label ‘oil well’; a section be linked to them by pointers.
of oil pipeline could be represented by a line unit Typical operations on the geometric data combine
consisting of a starting XY coordinate and an end XY layers of vector information to derive answers to
coordinate and the label ‘oil_pipeline’; an oil refinery Boolean type questions or to quantify relations (geo¬
could be represented by a polygon unit covering a metric or topological) between various units within
set of XY coordinates plus the label ‘oil_refinery’. and across layers. Attribute-based operations allow
The labels could be the actual names as given here, searching the occurrence of a particular set of values,
or they could be numbers that cross-reference with or for new values to be computed from existing data.
a legend or a table of additional information, or they
could be special symbols.
Grid cell: raster data structures
Simple spatial units are rarely sufficient for efficient
data handling. It is very useful to be able to handle Spatial phenomena may also be represented by sets of
more complex phenomena as though they were regular shaped tessellated units as described in Chap¬
single units in the information system. The simplest ter 2. The simplest form is the square cell (pixel) and
complex object is the ‘string’, ‘chain’, or ‘arc’ (GIS the tessellated regular grid is known as a raster data
terminology is rich in synonyms and near-synonyms) structure. The location of entities is defined with
that is used to represent a curved or non-straight direct reference to the grid network with each pixel
line. When lines join, as at a river junction, or road associated with a square parcel of land on the earth’s
intersection, they must be linked by a special kind of surface. The resolution or scale of the raster data is then
40
Geographical Data in the Computer
Vector units
geometric data x,y,z coordinates are usually scalar or real data types using 32 or 64 bits
attribute data are highly variable and may be binary, nominal, ordinal, integer or real,
or directional using 16, 32, or 64 bits
topological data usually small numbers so 16 bits are often enough
Raster units
grid locations and size 32-bit real numbers
attributes 16-bit integers
32- or 64-bit reals for mathematical modelling
the relation between the pixel size and that of the cell are rarely available to generate these surfaces directly
on the ground. Whereas in vector data structures the so interpolation techniques are often used to convert
topology between different units is explicitly recorded sampled points into continuous surfaces (see Chap¬
through database pointers (explained later in this ter 5).
chapter), in raster databases this is only implicitly Typical operations include Boolean retrieval and
coded through the attribute values in the pixels. neighbourhood querying. In addition mathematical
The variation of the phenomena maybe represented modelling is relatively easy to undertake with the raster
in the grid array with a different real-valued number data structure by combining attribute data from var¬
per pixel; with each attribute being described by a sep¬ ious layers for the same or neighbouring raster cells.
arate overlay. In more complex raster structures, cell
values could be referenced by a pointer to a record
carrying the value of many separate attributes. New Computer storage of vector and raster data
by classification) or by carrying out various kinds of The storage requirements of these different data types
spatial search or equivalent operation. Measured data are summarized in Table 3.2.
Before considering in detail the ways in which spatial knowledge of data modelling and data structuring
entities may be stored efficiently in the computer, methods will help them to understand better how the
we must first consider the general issues of organiz¬ systems work, and what their limitations and advant¬
ing data for optimal storage and access. Although it is ages might be. The following section presents only a
not essential for users of GIS to understand in detail brief introduction. Interested readers should consult
how the data maybe ordered inside the computer, any a standard work in information storage and retrieval
more than the driver of a car needs to know about the such as Date (1995), Salton and McGill (1983), Ullman
workings of the internal combustion engine, a little (1980), Whittington (1988), or Wirth (1976).
41
Geographical Data in the Computer
as the computer programs are known that control data Simple sequential and ordered sequential files require
input, output, storage, and retrieval from a digital that data be retrieved according to a key attribute.
database. In the case of a dictionary, the key attribute is the
spelling. But in many applications, particularly in geo¬
graphical information, the individual items (pixels,
Simple lists points, lines, or areas) will have not only a key attribute
The simplest form of database is a simple list of all such as an identification number or a name, but will
the items. As each new item is added to the database also carry information about associated attributes.
it is simply placed at the end of the file, which gets Very often it is the information about the associated
longer and longer. It is very easy to add data to such attributes that is required, not just the name. For ex¬
a system, but retrieving data is inefficient. For a list ample, we may have an ordered list of soil profiles that
containing n items, it takes an average of (n + l)/2 has been structured by soil series name, but we would
search operations to find the item you want. So, for like to retrieve information about soil depth, drain¬
an information system containing 10 000 descriptions age, pH, texture, or erosion. Unless we adopt another
of items on cards, given that it takes 1 second to read database strategy, our search procedures revert to
the card name or number, it would take an average those of the simple sequential list.
of (10 000 + l)/2 seconds or about an hour and a half With indexed files, access to the original data file
to find the card you want. can be speeded up in two ways. If the data items in the
Most people know that searching through unstruc¬ files themselves provide the main order of the file, then
tured lists trying to find The needle in the haystack’ the filers are known as direct files (see Table 3.3 a). The
is very inefficient. It is thus an obvious step to order location of items in the main file may also be speci¬
or structure the data and to provide a key to that struc¬ fied according to topic, which is given in a second file,
ture in order to speed up data retrieval. known as an inverted file (see Table 3.3b). Just as in
a book, examination of the index can determine the
items (pages) which satisfy the search request.
Ordered sequential files In the direct file, the record for each item contains
Words in a dictionary or names in a telephone book sufficient information for the search to jump over
are structured alphabetically. Addition of a new item unnecessary items. For example, consider a data file
means that extra room must be created to insert it, but containing soil series names that have been ordered
the advantages are that stored items may be reached alphabetically. Each item contains not only the series
very much faster. Very often ordered sequential files name and other information but also a number indic¬
are accessed by binary search procedures. Instead of ating the storage location of series names beginning
beginning the search at the beginning of the list, the with that same letter. The search for a particular record
record in the middle is examined first. If the key value is then made much simpler by constructing a simple
(e.g. the sequence of letters in a word) matches then index file that lists the correspondence between first
the middle record is the one being sought. If the val¬ letter of series name and storage location. The search
ues do not match, a simple test is made to see whether proceeds by a sequential search of the index, fol¬
the required item occurs before or after the middle lowed by a sequential search of the appropriate data
element. The appropriate half of the file is retained and block. The average number of search steps is then
the search repeated, until the item has been located. («! + l)/2 T- (n2 + l)/2, where nx is the number of
42
Geographical Data in the Computer
Table 3.3.
A 1 A,
B na +1 a2
C na + nb +1 •.•
... ... B*
... Q
S pH De Dr T E
Deep 1 4 5
Shallow 2 3 6
Good drainage 1 2 4
Poor drainage 3 5 6
Sandy 1 3
Clay 2 4 5 6
Eroded 2 4
Geographical Data in the Computer
steps in the index and n2 is the number of items in direct file means that both the file and its index must
the data block referenced by the index. be modified. When new records are written to a file
The use of the inverted file index requires first that accessed by an inverted file index, the new record does
it be constructed by performing an initial sequential not have to be placed at a special position; it may be
search on the data for each topic. The results are then simply added to the end of the file, but the index must
assembled in the inverted file or index, which provides be updated. File modification may be an expensive
the key for further data access (see Table 3.3c). undertaking with large data files, particularly in an
Indexed files permit rapid access to databases. Un¬ interactive environment.
fortunately they have inherent problems when used A further disadvantage of indexed files is that very
with files in which records are continually being added often data can only be accessed via the key contained
or deleted, such as often happens with interactive map¬ in the index files; other information may only be
ping systems. Addition or deletion of a record in a retrievable using sequential search methods.
As the above examples show, data on real world entit¬ used in some commercial GIS, although research on
ies are written into computer files, which are essen¬ developing the concept is still ongoing.
tially simple or organized lists. The basic unit of a file Database modelling is the task of designing a data¬
is the data record or tuple which contains all the rel¬ base that performs efficiently, contains correct in¬
evant information for each entity. Depending on the formation, has a logical structure and is as easy as
kinds of data collected, the data records may all be possible to maintain and extend (Worboys etal. 1990).
of the same length, or may be of variable length, and In order to understand how this can be achieved in
measures are taken to ensure that both types can be GIS it is first necessary to examine the fundamental
efficiently encoded (Box 3.2). Spatial databases, how¬ principles behind the different organizational struc¬
ever, contain many files with data on related aspects tures and to see how they can be used for spatial
of the same entities, or of data on entities that because information that has been recorded using either the
of their spatial proximity or connectivity have to be exact entity or the continuous field data models.
linked or grouped together. It is essential to organize
the way these files are stored and linked in the com¬
puter to model real world phenomena and to ensure The hierarchical database structure
efficient storage and retrieval of data. A computer When the data have a parent/child or one-to-many
program that is designed to store and manage large relation, such as areas for various levels of adminis¬
amounts of data is called a database management sys¬ tration, soil series within a soil family, or pixels within
tem (DBMS). a polygon, hierarchical methods provide quick and
Modern DBMS use many methods for efficiently convenient means of data access. Hierarchical systems
storing and retrieving data (e.g. Date 1995, Worboys of data organization are well known to the environ¬
1995) but all are based on three fundamental ways of mental sciences as they are the methods used for plant
organizing information that also reflect the logical and animal taxonomies, soil classifications, and so on.
models used to model real world structures: these are They assume that each part of the hierarchy can be
known as the hierarchical, network, and relational sche¬ reached using a key (a set of discriminating criteria)
mata, and they are all used in GIS. Recently a fourth that fully describes the data structure. The assumption
structure has been used for CAD-CAM and GIS, which is also made that there is a good correlation between
is called Object-Orientation (referred to by some as the key attributes (discriminating criteria) and the
‘O-O’ for short). 0-0 is a further development of the associated attributes that the items may possess.
network model which enables the interconnectivity Figure 3.1u presents a simple example of a 2-
and ownership relations between spatial entities to polygon map and Figure 3.1 b shows how it could be
be modelled effectively. Object orientation is now coded according to a hierarchical model. The main
44
Geographical Data in the Computer
Data Records
In all kinds of database structures, the data are written in the form of records. The sim¬
plest kind of record is a one-dimensional array of fixed length, divided into a number of
equal partitions, as shown in (a). This record is ideal when all items have the same number
of attributes, for example when a number of soil profiles have been sampled and ana¬
lysed for a standard range of cations.
x
HI X+l
R1
X+J R1
R2 X+L1+1
H2 X+L1+2
X+2J
R3 R2
X+3J X+L1+L2 +2
H3 X+L1+L2+3
R4
R3
X+4J X+L1+L2+L3+3
(a) (b)
Fixed length records are inconvenient, however, when the attributes are of variable
length, and when the set of attributes measured is not common to all items. For exam¬
ple, not all soil profiles have the same number of horizons, and not all polygon bounda¬
ries have the same number of coordinates. In these situations variable length records
are used. Each record has a ‘header’, an extra attribute that contains information about
the type of information in the sub-record and the amount of space it takes up, shown in
(b). A series of records make up a file.
advantages of the model are its simplicity and ease of requests cannot be accommodated by the rigid hier¬
access via keys that define the hierarchy. They are easy archy and the user rejects the system as impossibly
to understand and to update and expand and can inflexible. For example, Beyer (1984) reports that the
be useful for organizing data in mass storage systems Royal Botanical Gardens in Kew initially set up a
(e.g. see Kleiner and Brassel 1986). Data access via the hierarchical database for their herbarium collection of
key attributes is easy and quick , but unfortunately is more than a million items based on the rules of plant
very difficult for associated attributes. Consequently, taxonomy. This seemed sensible until the director
hierarchical systems are good for data retrieval if the wished to make a trip to Mexico to gather new mater¬
structure of all possible queries is known beforehand. ial for the herbarium and asked the database which
This is commonly the case with bibliographic, bank, plants Kew already had from that land. Unfortunately
or airline retrieval systems. For environmental data, for the suppliers and the users of the database, the
however, the exploratory nature of many retrieval director received no answer because the attribute
45
Geographical Data in the Computer
Map
Polygons
Lines
Points
Map
Polygons
Lines
Points
‘place of collection’ is not part of the plant taxonomy graphics features where adjacent items in a map need
key! to be linked together even though the actual data about
Further disadvantages of hierarchical database their coordinates may be written in very different parts
structures are that large index files have to be main¬ of the database.
tained, and certain attribute values may have to be Both redundancy and linkage problems are avoided
repeated many times, leading to data redundancy, by the compact network structure shown in Figure
which increases storage and access costs. Figure 3.1 b 3.1c, in which each line and each coordinate need
shows this redundancy as each coordinate pair would appear only once. With this structure it is a simple
have to be repeated twice, and coordinates 3 and 4 matter to suppress the printing of line c whenever it
would have to be repeated four times because line c is referenced by polygons having the same name,
has to be repeated twice. Ffierarchical structures are thus making map generalization easier. Very often
also wasteful of space; for example, if an operation in graphics, network structures are used that have
were made to give polygons I and II the same name a ring pointer structure. Ring pointer structures
there is no easy way to suppress the display of line c, (Figure 3.1 d) are very useful ways of navigating
which would become unnecessary for display. around complex topological structures such as poly¬
gon networks.
Network database structures are very useful when
The network database structure
the relations or linkages can be specified beforehand.
In hierarchical structures, travel within the database They avoid data redundancy and make good use of
is restricted to the paths up and down the taxonomic available data. The disadvantages are that the database
pathways. In many situations much more rapid link¬ is enlarged by the overhead of the pointers, which in
age is required, particularly in data structures for complex systems can become quite a substantial part
46
Geographical Data in the Computer
of the database. These pointers must be maintained struct the new tables. These rules are often encoded
every time a change is made to the database and the in the so-called Structured Query Language (SQL)—
building and maintenance of pointer structures may see Date (1995) and Chapter 6. Figure 3.2a shows that
be a considerable overhead for the database system. the relational structure of the simple map of Figure
3.1a also contains much redundancy and methods
known as normalization are used to create more
The relational database structure efficient codings. For example, addressing the lines
In its simplest form the relational database structure as sets of straight lines segments (arcs) simplifies the
stores no pointers and has no hierarchy. Instead, the relational structure without loss of information (Fig¬
data are stored in simple records, known as tuples, ure 3.2 b).
which are sets of fields each containing an attribute; Relational databases have the great advantage that
tuples are grouped together in two-dimensional tables, their structure is very flexible and may meet the de¬
known as relations, somewhat in the manner of a mands of all queries that can be formulated using the
spreadsheet program. Each table or relation is usu¬ rules of Boolean logic and of mathematical operations.
ally a separate file. The pointer structures in network They allow different kinds of data to be searched, com¬
structures and the keys in hierarchical structures are bined, and compared. Addition or removal of data is
replaced by data redundancy in the form of ident¬ easy too, because this just involves adding or remov¬
ification codes that are used as unique keys to identify ing a tuple, or even a whole table (as shown in Figure
the records in each file (Figure 3.2a). 3.3). Querying across different relational tables is made
Data are extracted from a relational database by joining them through common fields. This is good
through the user defining the relation that is appro¬ for situations where all records have the same number
priate for the query. This relation is not necessarily attributes and there is no natural hierarchy. However,
already present in the existing files, so the controlling where the relationships between tables are complex
program uses the methods of relational algebra to con¬ and a number of joins are needed, operations take a
47
Geographical Data in the Computer
O
Relational tables (tuples) Z
>*
4—’
+-*
0
z UJc
■H*>
^ Af A2 A3 A4 As As Aj
* *H ^ '' ^ " 1
O 2 -"
3 -"
> it - ~A 1 A2 A3 A4 As A6 A7 - 4'
•H ^ x 1 5
C 2 ^ -
6
UJ 3
"
f/ *2 A3 A4 A5 A6 A7 ^ •
. A" "
i 5
2 6
At t r i butp- 'set m
3
4
5
At t r i bu te^set 2
Attribute set d .
Figure 3.3. Adding new data to a relational system is just a matter of defining new tables
considerable amount of computer time even on fast 1990). The terms of‘object-based’ or ‘object-centred’
computers. Consequently, relational database systems are more nebulous in meaning than the definable char¬
have to be very skilfully designed in order to support acteristics of‘object-oriented programming languages’
the search capabilities with reasonable speed, which (Jacobsen et al. 1992).
is why they are expensive. They were first used for GIS Object-oriented database structures, developed
in the early 1980s (Abel 1983, Lorie and Meier 1984, using object-oriented programming languages, com¬
van Roessel and Fosnight 1984) and they are now bine the speed of hierarchical and network approaches
established as major tools for handling the attributes with the flexibility of relational ones by organizing
of spatial entities in many well-known commercial the data around the actual entities as opposed to the
GIS (see Hybrid databases, below). functions being processed (Chance et al. 1995, Kim
and Lochovsky 1989). In the relational structure, each
entity is defined in terms of its data records and the
The object-oriented database structure logical relations that can be elucidated between the
Object-oriented concepts originated in programming attributes and their values. In object-oriented data¬
languages such as Simula (Dahl and Nygaard 1966) bases, data are defined in terms of a series of unique
and Smalltalk (Goldberg and Robson 1983) and the objects which are organized into groups of similar
application of these ideas to databases was stimulated phenomena (known as object classes) according to any
by the problems of redundancy and sequential search natural structuring. Relationships between different
in the relational structure. In GIS their use has been objects and different classes are established through
stimulated by the need to handle complex spatial en¬ explicit links.
tities more intelligently than as simple point, line, poly¬ The characteristics of an object may be described
gon primitives, and also by the problems of database in the database in terms of its attributes (called its state)
modifications when analysis operations like polygon as well as a set of procedures which describe its beha¬
overlay are carried out (see Chapter 7). The approach viour (called operations or methods). These data are
is being applied increasingly in a number of fields encapsulated within an object which is defined by a
although there are many different formalisms of the unique identifier within the database. This remains
concept with no clear agreement on the definition the same whatever the changes are to the values
amongst the computing community (Worboys et al. that describe their characteristics (Bhalla 1991). For
48
Geographical Data in the Computer
49
Geographical Data in the Computer
and their use with entity-based data has been high¬ based approach to information storage, querying,
lighted (Raper and Livingstone 1995). The application and processing using programming languages such as
of this type of structuring is explored more fully later Prolog. A deductive database stores both the data and
in this chapter. the logic that defines a fact or which expresses a rela¬
tionship; data records are made up of an extension
(which is similar in form to a relation database) and
New developments in database structures an intension part (which is a virtual relation expressed
Alternatives to the database structures mentioned logically using other relations) (Yearsley and Worboys
above have been explored by various researchers seek¬ 1995, Worboys 1995, Quiroga etal. 1996). The advant¬
ing to represent more effectively and flexibly the spa¬ age of the deductive database approach is that it
tial (and temporal) nature of geographical data. For allows more complex objects and modelling to be
example, Van Oosterom (1993) has described a nested constructed than maybe undertaken using relational,
approach whereby different scales of representation of network, or hierarchical structures and so offer great
data are stored in a database in which links are made potential for spatial data handling.
between the same geographically referenced areas. The development of GIS based on these database
One active area of research has been in the use of structures is still essentially at the theoretical stage at
the deductive database approach. This extends a logic- present.
50
Geographical Data in the Computer
It should be apparent that the four basic database from large data volumes, redundancy, and long search
structures—hierarchical, network, object-oriented, times. Consequently it is not surprising that these tech¬
and relational—all have something to offer for spa¬ niques are often used together in spatial information
tial information systems. Hierarchical systems allow systems to complement each other, rather than assign¬
large databases to be divided up easily into manage¬ ing all the work to one of them.
able chunks, but they are inflexible for building new A hierarchical approach is often useful for dividing
search paths and they may contain much redundant spatial data into manageable themes, or into man¬
data. Network systems contain little redundant data, ageable areas, so that continuous, seamless mapping
if any, and provide fast, directed, if inflexible links becomes possible. The network approach is ideal for
between related entities. Object-oriented systems topologically linked vector lines and polygons. The
permit relations, functionality, persistence, and inter¬ relational approach is good for retrieving objects
dependence to be built into one system at the expense on the basis of their attributes, or for creating new
of the programming tools being more complex and attributes and attribute values from existing data.
heavier demands on computing power. Relational sys¬ Object orientation is useful when entities share attri¬
tems are open, flexible, and adaptable, but may suffer butes or interact in special ways.
51
Geographical Data in the Computer
(a) (b)
Simple
raster
structure
with attribute
overlays
Modelling
U=f (A, B, C,.)
Figure 3.5. Data planes and feature layers in (a) raster and (b) vector data structures
With the fourth structure each overlay is stored storage requirements for the raster data, providing of
as a separate file with a general header containing in¬ course that the data structures are properly designed.
formation such as map projection, cell size, numbers The structures described in Figure 3.6a and b may
of rows and columns, and data type; this is followed use array coordinates to reduce the actual quantity of
by a simple list of values which are ordered according numbers stored, with the limitation that all spatial
to the sequence of rows and columns (Figure 3.6d). operations must be carried out in terms of array row
This is obviously more efficient as coordinate values and column numbers. These systems do not encode
are not stored for each cell and generic geometry and the data in the form of a one-to-many relation between
display values are written in the header of the overlay. mapping unit value and cell coordinates, so compact
PCRaster (Wesseling et al. 1996) uses this structure. methods for encoding cannot be used.
The third structure given in Figure 3.6c references
the sets of points per region (or mapping unit) and
Compact methods for storing raster data
allows a variety of methods of compact storage to be
When raster data structures are used to represent a used. There are four main ways in which the spatial
continuous surface, each cell has a unique value and data for a mapping unit or polygon may be stored
it takes a total of nrows x mcolumns to encode each more economically: these are chain codes, run-length
overlay, plus general information on map projection, codes, block codes, and quadtrees.
grid origin, grid size, and data type. The most storage-
hungry data layer is one where the data type is a Chain codes Consider Figure 3.7. The boundary
scalar and each cell contains a real number, as in of region A may be given in terms of its origin and
the case of altitude matrices for digital elevation a sequence of unit vectors in the cardinal direc¬
and other continuous surfaces obtained from inter¬ tions. These directions can be numbered (East = 0,
polation. Other data types will require less space North = 1, West = 2, South = 3). For example, if
because they can code their data in fewer bits (as we start at cell row = 10, column = 1, the boundary
discussed earlier). of the region is coded clockwise by:
When raster structures are used to represent lines
or areas in which the pixels everywhere have the same 0, 1, 02, 3, 02, 1, 0, 3, 0, 1, 0\ 32, 2, 33, 02, 1, 05, 32,
value, it is possible to effect considerable savings in the 22, 3, 23, 3, 23, 1, 22, 1, 22, 1, 22, 1,22, l3
52
Geographical Data in the Computer
Point Overlay
X Point
Y coordinate X coordinate
Overlay Z value Y coordinate
Next overlay Z value Z value
1
1
1 Next point
1 1
Y 1
Y Y
>1
Next point Next overlay
(a) (b)
Overlay Overlay
title i
i scale \-1
mapping unit 1 i 1 Number of rows, N ,
i label 1 ! Number of columns. Mi
display symbol ; I
i attribute i 1 Cell size
set of points ■ ;
i (X, Y coordinate i Projection 1
1 pairs) i 1 !
i 1X minimum ,
i
1 ! '
| Next mapping unit 1 i Y minimum i
i i
| i
' Y i Pointer to legend
i
i
i i Sequence of NxM values
i i
Y Y
Next overlay Next overlay
(c) <d)
Figure 3.6. Four different ways of creating a raster data structure
(a) Each cell is referenced directly; (b) each overlay is referenced directly; (c) each mapping unit is
referenced directly; (d) each overlay is a separate file with a general header
53
Geographical Data in the Computer
54
Geographical Data in the Computer
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
9 3 4 7 8 13 14 17 18 Q
O 7 38
10 5 6 9 10 15 16 19 20
44
11 39 40
11 12 21 22 43
12 41 42
13 24 25 45 46 49 50
23 30 31 58 59
14 26 27 47 48 51 52
15 33 34 54 55 60 61
29 OO
53 64
16 35 36 56 57 62 63
55
Geographical Data in the Computer
Two-dimensional orderings
During the 1980s there was much interest in the use Compact raster structures are efficient for data ana¬
of quadtrees in GIS (Martin 1982, Mark and Lauzon lysis when the spatial unit (pixels, lines, or polygons)
1984) and it is clearly a technique that has much to are exact, static entities. Quadtrees with sufficient
offer. An authoritative description of the algorithms levels of nesting can provide as fine spatial resolution
used for computing perimeters and areas, and for con¬ of bounded entities as vector methods of coding. Com¬
verting from raster-to-quadtree and other represen¬ pact raster structures provide few advantages for han¬
tations is given by Samet (1990a, 1990b). Quadtrees dling the continuous fields encountered in remotely
have many interesting advantages over other methods sensed images, digital elevation models, interpolated
of raster representation. Standard region properties surfaces, and numerical spatial modelling because each
maybe easily and efficiently computed. Quadtrees are grid cell takes a different value. They are also of little
Variable resolution’ arrays in which detail is repres¬ value for modelling movement or change. When data
ented only when available without requiring exces¬ stored in block or run-length codes are used in ana¬
sive storage for parts where detail is lacking (Figures lyses with data treated as continuous surfaces they have
3.1 la-d). In 3D systems, the octtree is the analogue of to be converted to the full, simple raster format. Now
the 2D quadtree. that computer storage space and processing speeds
The largest problems associated with quadtrees have greatly increased it is not so necessary today to
appear to be that the tree representation is not trans¬ use compact raster structures for operational coding
lation-invariant—two regions of the same shape and though the methods are useful for archiving large
size may have quite different quadtrees. Consequently amounts of similar data.
shape analysis and pattern recognition are not straight¬ Summary—raster data structures. If each cell rep¬
forward, which is a problem for objects that move or resents a potentially different value, then the simple
56
Geographical Data in the Computer
b) row-prime
c) Peano-Hilbert d) Morton
Figure 3.12. Alternative ways of encoding data at different levels of resolution
data and size of memory required. When ‘regions’ (i.e. A vector database is built up from what the user per¬
areas of uniform value) are present, as assumed to be ceives as a number of feature planes which are used
the case in many thematic maps, data storage require¬ to separate different classes of phenomena (shown in
ments may be considerably reduced by using chain Figure 3.5b). The units are represented as crisp world
codes, run-length codes, block codes, or quadtrees. Run- objects using a coordinate space that is assumed to be
length codes appear to be most efficient when the pixel continuous, not quantized as with the raster structure,
size is large with respect to the area of the regions allowing all positions, lengths, and dimensions to be
being displayed and sorted; as resolution improves and defined precisely. In fact this is not exactly possible
pixel numbers per region increase, however, block because of the limitations of the length of a computer
codes and quadtrees become increasingly attractive. word on the exact representation of a coordinate and
The quadtree representation has the added advantage because all display devices have a basic step size or reso¬
of variable resolution. The ease of subsequent process¬ lution. Besides the assumption of mathematically
ing varies with the data structure used. exact coordinates, vector methods of data storage use
57
Geographical Data in the Computer
58
Geographical Data in the Computer
well-known and frequently used methods of structur¬ used to tell the user what each polygon is are then held
ing polygon data. as a set of simple text entities. While this method has
The aim of a polygon data structure is to describe the advantages of simplicity it has many disadvantages.
the topological properties of areas (that is their shapes, These are (a) lines between adjacent polygons must
neighbours, and hierarchy) in such a way that the as¬ be digitized and stored twice. This can lead to serious
sociated attributes of these basic spatial building blocks errors in mismatching giving rise to slivers and gaps
may be displayed and manipulated as thematic map along the common boundary, (b) there is no neigh¬
data. Before describing the ways in which a polygon bourhood information, (c) islands are impossible
data structure may be constructed it would be useful except as purely graphical constructions, (d) there are
to state the requirements of polygon networks that no easy ways to check if the topology of the boundary
geographic data impose.
First, all the component polygons on a map will each
have a unique shape, perimeter, and area. There is no
single standard basic unit as is the case in raster sys¬
tems. Even for the most regular or regularly laid-out
American street plan it will be unwise to assume that
all or even some of the blocks have exactly the same
shape and size. For soil and geological maps uniform¬
ity of space and size is clearly most unlikely. Second,
geographical analyses require the data structure to
be able to record the neighbours of each polygon in
the same way that the stream network required con¬
nectivity. Third, polygons on thematic maps are not
all at the same level—islands occur in lakes that are
themselves on larger islands, and so on.
59
Geographical Data in the Computer
is correct or whether it is incomplete (‘dead-end’) or polygons may be used with chain dictionaries (Fig¬
makes topologically inadmissible loops (‘weird poly¬ ure 3.17fr). This has the advantage that over-defined
gons’)—see Figure 3.16. The simple polygon structure chains, resulting from continuous or stream digitiz¬
maybe extended such that each polygon is represented ing, maybe reduced in size by weeding algorithms (see
by a number of chains, but this does not avoid the basic Chapter 4) without having to modify the dictionary.
problems. Polygon attributes are linked by pointers to data
tables (Figure 3.17c).
Polygons with point dictionaries With this repres¬
entation all coordinate pairs are numbered sequen¬ Polygon systems with explicit topological structures
tially and are referenced by a dictionary that records Islands and neighbours may only be properly handled
which points are associated with each polygon (Fig¬ by incorporating explicit topological relationships into
ure 3.17a). The point dictionary database has the the data structure. The topological structure may be
advantage that boundaries between adjacent polygons built up in one or two ways—by creating the topo¬
are unique, but the problem of neighbourhood func¬ logical links during data input, or by using software
tions still exists. Also, the structure does not easily to create the topology from a set of interlinked chains
allow boundaries between adjacent polygons to be or strings. In the first case, the burden of creating the
suppressed or dissolved if a renumbering or reclassi¬ topology is thrust on the operator; the second often
fication should result in them both being allocated relies on considerable amounts of computing power;
to the same class. The problem of island polygons and both methods result in an increase in the amount
still exists, as do the problems of checking for weird of data that needs to be stored to describe the full poly¬
polygons and dead ends. As with simple polygons, gon structure.
Figure 3.17. Three different ways of incorporating simple topology in polygon nets
60
Geographical Data in the Computer
One of the first attempts to build explicit topolog¬ polygon boundaries have been coded in the form of
ical relationships into a geographic data structure is arcs, and that the polygon names or other records used
the Dual Independent Map Encoding (DIME) system to link the graphical to the attribute data are digitized
of the US Bureau of the Census. The basic element in the form of identifiable point entities somewhere
of the DIME data file is a simple line segment defined within each polygon boundary.
by two end-points; complex lines are represented by Stage 1. Linking arcs into a boundary network. The
a series of segments. The segment has two pointers arcs are first sorted according to their extents (min¬
to the nodes and codes for the polygon on each side imum and maximum X and Y coordinates occupied
of the segment. As the nodes do not point back to by the arc) so that arcs topologically close to one an¬
segments, or segments point to adjacent segments other are also together in the data file. This saves time
laborious searches are needed to assemble the outlines when searching for adjacent arcs. The arcs are then
of polygons. Moreover, the simple segment structure examined to see which others they intersect. Junction
makes the handling of complex lines very cumbersome points are built at the end of all arcs that join, and the
because of the large data redundancy. arc data records are extended to contain pointers and
angles. Arcs that cross at places other than the end¬
A fully topological polygon network structure The points are automatically cut into new arcs and the arc
structure shown in Figure 3.18 maybe built up from a pointers are built.
set of boundary chains or strings that have been digi¬ Stage 2. Checking polygons for closure. The resulting
tized in any order and in any direction. The system network is then checked for closure by scanning the
allows islands and lakes to be nested to any level; it modified arc records to see if they all have pointers
allows automatic checks for weird polygons and dead to and from at least another arc. The other arc may
ends; and automated or semi-automated association be the arc itself in the case of single islands defined
of non-spatial attributes with the resulting polygons. by a single arc. All arcs failing to pass the test may be
Neighbourhood searches are fully supported. Al¬ Magged’ (i.e. brought to the attention of the oper¬
though differing in details, the system about to be de¬ ator) by causing them to be displayed in a particular
scribed is similar to systems used in the Harvard way or otherwise be removed from the subset of arcs
Polyvrt program (Peuker and Chrisman 1975). to be used for the polygon network.
Digitizing polygon boundaries (digitizing is de¬ Stage 3. Linking the lines into polygons. The first
scribed in Chapter 4) and creating polygon topology step in linking lines into polygons is to create a new
are best treated separately. The procedures used to ‘envelope’ polygon from the outer boundary of the
build the boundary topology should only need to make map (Figure 3.18). This envelope entity consists of
two assumptions of the input data, namely that the records containing:
6i
Geographical Data in the Computer
(a) a unique identifier gon. This ensures that all bounding lines are associ¬
(b) a code that identifies it as an envelope polygon ated with two polygons.
(c) a ring pointer The same procedure is followed for all ‘islands’ and
(d) a list of pointers to the bounding arcs ‘unconnected subcontinents’. Once all bounding arcs
(e) its area have been linked into polygons, the ‘islands’ and ‘sub¬
(/) its extents (minimum and maximum XY co¬ continents’ must be arranged in the proper topolog¬
ordinates of bounding rectangle). ical hierarchy. This may be done by first sorting them
into increasing area and then testing to see if an ‘is¬
The envelope polygon will not be seen by the user; land’ falls within an ‘envelope’ of the next largest size.
its sole purpose is in building the topological struc¬ A quick test may be made by comparing the extents
ture of the network. It is created by following arcs of the two envelope polygons. Once a match has been
around the outer boundary in a clockwise direction found, the problem is now to locate the exact poly¬
and choosing the most left-hand arc at each junction. gon in which the island lies. The matching of the
The unique identifier of every arc is recorded and extents is then repeated for each component polygon
stored along with the other data and a flag is set to in the ‘continent’. Once a match has been found, a
indicate that each arc has been traversed once. ‘point-in-polygon’ routine is used to see if the island
Once the envelope polygon has been built, each is totally within the polygon (see Box 3.4). If an over¬
individual polygon may now be created. This is done lap is found, it signals an error in the database or
by starting at the same place as before but this time the intersection procedures (stage 1) and indicates that
the clockwise searches involve choosing the most the operator must take remedial action. If there is no
right-hand arc at each junction. A tally must be kept overlap, then a pointer is written from the network
of the number of times an arc has been traversed; once polygon to the envelope polygon of the enclosed
it has been traversed twice it falls out of the search. island. If no overlap and no matching is found, it
Arriving back at the starting-point, one has identified means that then two polygon networks are independ¬
all the component lines. At the same time a check is
ent of each other. The ring pointer structure of en¬
made on the cumulative turning angle (Figure 3.18) velope polygons-network polygons-island envelope
and if this is not 360° there has been a digitizing fault polygons-island network polygons allows an infinite
and the polygon is weird. (The weird polygons should amount of nesting. Moreover, the nesting only needs
have been filtered out by the intersection-seeking step
to be worked out once; thereafter the network may
in stage 1, but if the arcs must be linked to manually be traversed easily by following the pointers.
entered nodes then this check is essential). As with Stage 4. Computing polygon areas. The next stage
the envelope polygon, each polygon entity receives involves computing the area of the individual poly¬
several sets of information:
gons by the trapezoidal rule (see Box 3.3). Polygons
(a) a unique identifier in geographical data may have many hundreds of
(b) an ordinary polygon code coordinates in the bounding arcs and many island
(c) a ring pointer from the envelope polygon. At polygons (imagine the geological map of an area with
the same time the identifier of this polygon lakes and drumlins) so it is usually more efficient to
is written in the ring pointer of the envelope compute areas once, subtracting the areas of enclosed
polygon islands as necessary, and then to store this area as an
(d) a list of all bounding arcs. At the same time, associated attribute.
the polygon’s unique identifier is written into Stage 5. Associating non-graphic attributes to the poly¬
the record of the line gons. The last stage of building the database is to link
(e) a ring pointer to the adjacent polygon in the the polygons to the associated attributes that describe
network what they represent. This may be done in several ways.
(f) minimum and maximum XY coordinates The first is to digitize a unique text entity within each
(extents) of the bounding rectangle. polygon area, either as part of the data entry, or inter¬
actively after polygons have been formed. This text
The search proceeds to the next polygon in the same may then be used as a pointer to the associated at¬
network at the same level in the hierarchy, and so on tributes that may or may not be stored with the graphic
until all individual polygon have been built up. When data. The text maybe used for visual display; it is linked
the last polygon in the net has been traced its ring to the polygon by the use of a point-in-polygon search
pointer (e) is set pointing back to the envelope poly¬ (see Box 3.4). The second is to instruct the computer
62
Geographical Data in the Computer
to write the unique identifier of each polygon at the that the system just described also allows the
centre of each polygon; at the same time the computer arcs to have non-graphic attributes associated
prints a list of all polygon identifiers. This list may then with them as well
be merged with a file containing the other non-graphic (c) the number of continent-island nestings is
attributes of the polygons which may then be cross- unlimited
referenced through the unique polygon identifiers. (d) the locational accuracy of the database is lim¬
Complex software is needed to construct the topo¬ ited only by the accuracy of the digitizer and
logically sound polygon network described above but the length of the computer word.
the resulting data structure of Figure 3.19 has the
following advantages: Editing and updating the polygon net Vector poly¬
(a) the polygon network is fully integrated and is gon networks may be edited by moving the coordin¬
free from gaps and slivers and excessive ates of individual points and nodes, by changing
amounts of redundant coordinates the polygon attributes, and by cutting out or adding
(b) all polygons, arcs, and associated attributes sections of lines or even whole polygons. Changing
are part of an interlinked unit so that all kinds coordinates or associated attributes requires no modi¬
of neighbourhood analyses are possible. Note fication to the topology and is easy. Modifying the
63
Geographical Data in the Computer
Point-in-polygon search
At least two separate steps in the creation of the polygon network involve the general
problem of point-in-polygon search, namely the checks to see if a small polygon is con¬
tained by a larger one, and the association of a given polygon with a digitized text label.
The figures below show two aspects of the point-in-polygon algorithms.
First a quick comparison of the coordinates of the point with the polygon extents quickly
reveals whether a point is likely to be in or not. So point (a) may easily be excluded be¬
cause it is outside the polygon’s minimum bounding rectangle, but (b) and (c) cannot.
To check if points (b) and (c) are in the polygon a horizontal line is extended from the
point. If the number of intersections of this line with the polygon envelope (in either
direction) is odd, the point is inside the polygon.
To check if an island polygon, P2 is inside P\ first check the extents; P1 is then divided
into a number of horizontal bands and the first and last point of each band is treated as
the point (b) above. If the number of points for each line is odd, then the polygon P2 is
completely enclosed.
Problems may occur if any segment of a boundary is exactly horizontal and has exactly
the same Y coordinate as the point X, but these may be easily filtered out. Haralick (1980)
lists an alternative procedure for finding the polygon containing a point that is based on
a binary search of monotone arcs—that is sequences of arcs in which the order of the
vertices in the arc is the same as the order of the vertices projected onto a line e.
network by cutting out or adding lines and polygons values into a series of triangles based on a Delaunay
requires local recalculation of topology and rebuild¬ triangulation. The triangulation allows a variable
ing the database. Consequently these kinds of data density and distribution of points to be used which
structures are not efficient for spatial patterns that are reflects the changes in attribute values within an
constantly changing. area as shown in Figure 3.20a. The structure model
regards the nodes of the network as primaiy units. The
topological relations are built into the database by con¬
Special purpose vector data structures — structing pointers from each node to each of its neigh¬
the Triangular Irregular Network (TIN) bouring nodes. The neighbour list is sorted clockwise
An important, and much used vector polygon struc¬ around each node starting at north. The world out¬
ture is the TIN. It is built from joining known point side the area modelled by the TIN is represented by a
64
Geographical Data in the Computer
dummy node on the 'reverse side’ of the topological by using the trilist to associate each directed edge with
sphere on to which the TIN is projected. This dummy the triangle to its right. In Figure 3.20b> triangle T2 is
node assists with describing the topology of the bor¬ associated with three directed edges held in the pointer
der points and simplifies their processing. list, namely from node 1 to 2, from node 2 to 3, and
Figure 3.20 b shows a part of the network data struc¬ from node 3 to 1.
ture (three nodes and two triangles) used to define a Nodes located in areas of greatest change help to
TIN. The database consists of three sets of records reduce errors within the derived form. In DEMs for
called a node list, a pointer list, and a trilist (triangle example values along ridges and valleys help to ensure
list). The node list consists of records identifying each that the resulting elevation surface does not have
node and containing its coordinates, the number of anomalies such as rivers which flow upstream. In
neighbouring nodes and the start location of the iden¬ using these irregularly spaced points TINs avoid the
tifiers of these neighbouring nodes in the pointer list. redundancies of the regular grid and provide efficient
Nodes on the edge of the area have a dummy pointer means for computing derived data such as slope.
set to -32 000 to indicate that they border the outside
world.
The node list and pointer list contain all the essen¬ Developments in vector data structures
tial altitude information and linkages so they are suf¬ TO IMPROVE DATA ACCESS TIMES AND EFFICIENT
such as slope mapping, hill shading, or associating The vector data model is a relatively efficient means
other attributes with the triangles, it is necessary to of storing the geometric information of geographical
be able to reference the triangles directly. This is done data with only pertinent coordinate values recorded;
65
Geographical Data in the Computer
Nodes
Coordinates Pointer Count Pointers Trilist
# _
T2
T1
T2
T1
T2
0
0
Figure 3.20. Triangular irregular networks based on the Delauney Triangulation provide a vector
data model of a continuous field
66
Geographical Data in the Computer
the main problems have been associated with access¬ branches and leaves of a tree. In the example shown
ing the data, particularly the topological and attribute in Figure 3.21 a file of street name data is indexed
information. For example the interrelationships alphabetically. The first index splits the data into
between the basic vector units point, line, or polygon groups of letters a-f, g-m, n-s, t-z which are listed
are uniquely identified by a pointer or a label; topo¬ along with pointers to the data in the respective nodes.
logical structures can only be traversed by means of The next level of the index splits these groups into finer
these unique labels. In early GIS these labels, some¬ divisions. When a search is initiated it is therefore lim¬
times called master index pointers, were often held in ited at each stage so saving time. The B-tree structure
a sequential list that was the key to accessing the rest adjusts to dynamic changes in the databases through
of the database. This list had two major problems; first, algorithms which alter the organization following the
it was rarely fully consecutive; during editing, gaps insertion or deletion of records (discussed in more
appeared, or it increased in length, which meant that detail in Worboys 1995).
searches almost invariably were sequential. Second, Any data type which has a linear ordering may be
the search times increased sharply with the length of used as the index field in a B-tree. In GIS index files
the table which meant that map processing times of number or text string values are useful for searches
increased non-linearly with the size of the database. on attribute records (van Oosterom 1993). However,
Many technical developments in GIS databases in this structure does not address the spatial (two-
the 1980s concentrated on the problem of making dimensional) nature of many of the queries of a GIS.
the processing of spatial data less dependent on the There are various alternatives to the B-tree model
size of the database. The speed of computers today which allow geometrical properties of the database
means that storage and accessing problems are not as to be included in structuring an index. For example,
noticeable as before. Changes in database organization the R-tree (Guttman 1984) divides space into a series
and internal referencing have also helped to improve of boxes, known as Minimal Bounding Rectangles
efficiency. (MBRs). Figure 3.22 shows a series of hospital loca¬
tions and the three MBRs used to divide the space.
Data clustering on storage media The first attempts Nodes in the index represent these rectangles and a
at improving database access times used ‘brute-force’ search for a particular hospital location would be dir¬
computing methods to scan the pointer arrays quickly, ected first by an algorithm to one of these. A hierar¬
or to concentrate the master index array onto a small, chical structure of different rectangle sizes of MBRs
contiguous area of disc or core storage. While this may be set up using tree-shape structure for the in¬
undoubtedly brought some improvements, it was at dex. The search algorithm checks which large rect¬
best a palliative and did little to resolve the underly¬ angle an entity is contained in and then follows the
ing problems. Another approach involved cluster¬ branches of the tree down various levels until the
ing the master index pointers not only according to conditions of the query are met.
entity type, but also to spatial location. The data were
also clustered on the disk according to geographical Quadtrees The basic concepts of quadtrees were elu¬
location. cidated earlier in this chapter and their use in reduc¬
ing the amount of disk space needed to represent raster
Database indexing using B-trees and R-trees The data structures was described. This data structuring
importance of indexing a database to speed up query¬ technique may also be used with vector based data
ing was emphasized earlier in this chapter. However, where the presence or absence of an attribute deter¬
the indexes for large or complex databases as in GIS mines the coding of a defined quadrant (Laurini and
applications may themselves become long and slow to Thompson 1992). In some complex quadtree systems
query. Structures have therefore been developed which the cells may be referenced also in terms of the geom¬
give hierarchical indexes of indexes so that searching etry of the units present such as an edge or a vertex,
is more efficient and directed. They are known as or number of points.
multi-level indexes and are particularly useful with The quadtree structures used to code spatial data
vector data structures where much of the topological have been geometrically varied. In many the hierar¬
and attribute data are held in index files. chical partitioning of the space has been in terms of
The B-tree structure provides a multi-level index regular-shaped squares as already described. However,
which is structured using ‘internal nodes’ and ‘leaf with some structures the data themselves have been
nodes’ which are analogous respectively to the used to determine the position and shape of the sub-
67
Geographical Data in the Computer
High Street
Warren Street
Acacia Avenue
Boeslaan
School Lane
Market Street
Charai Close
Newland Close
Warwick Road
Rosling Road
Woodstock Road
Banbury Street
Harolds Way
Barnfield Close
Burnett Road
Home Close
Topsham Bridge
Mere Avenue
Wellington Road
w-x
y-z
divisions. The distribution or position of the points, introduces an extra complication in that all arcs must
lines, and vectors may be used to divide the space into automatically be terminated and begun at tile bound¬
irregular shapes at each subdivision so helping to rep¬ aries and topological pointers must not only reference
resent the variations in density and distribution of data other entities but other tiles as well. The costs are
values at various scales (Laurini and Thompson 1992). thus a larger database but one in which searching
The structural shape of irregular quadtrees is highly times may be kept very low by virtue of the positional
dependent on the time various points are inserted into hierarchy.
it (Worboys 1995). The main problem with using In principle, tiling allows limitless maps to be cre¬
quadtrees with vector data is the loss of explicit topo¬ ated and stored as only data from a few tiles need be
logical referencing and that the precision of point and referenced at any one time the rest being stored on
linear features is limited. disk. In practice, the sheer volume of data will exceed
the financially permissible disk space, even allowing
Continuous vector coverages: tiling The real world for the dramatic decrease in hard drive costs, so for
is continuous and does not stop at the boundaries of country-wide mapping great reliance must be placed
map sheets and computer files. A seamless database on networked databases or storage devices such as CD-
simulates the continuity by linking the files for adja¬ ROMS, optical disks, or magnetic tapes for storing
cent areas according to a system known as tiling. In much of the total database until it is required.
theory the world is represented by an infinite set of
tiles reaching in both directions. Each tile, or page as New developments in vector data structures One
it is sometimes called, may reference a certain amount of the problems encountered with conventional GIS
of information; extra details can be accommodated structures is that the level of detail or resolution of
by dividing each main tile into subtiles in a manner the data that may be stored and displayed is fixed and
similar to that used in quadtree structures. Tiling determined during the input stage. When working
68
Geographical Data in the Computer
at different map scales, the detail of the data remains the geometric representation which the user interacts
the same and the user’s perspective is limited by the with is determined by the scale of the display. Zoom¬
resolution of the graphics display and the degree of ing in to large-scale maps would show more detail than
zooming. Van Oosterom (1993) is one of several a small-scale representation. A linear entity might be
researchers who have developed the concept of a displayed as a polygon in the former case and as a line
reactive data structure in which the level of detail and in the latter.
69
Geographical Data in the Computer
As the preceding discussion has shown the raster and The technology is cheap.
vector methods for spatial data structures are dis¬ Many forms of data are available.
tinctly different approaches to modelling geograph¬
ical information. The relative merits of both systems Disadvantages
may be summed up as follows: Large data volumes.
Using large grid cells to reduce data volumes reduces
spatial resolution, result in loss of information
Vector data structures and an inability to recognize phenomenologically
defined structures.
Advantages
Crude raster maps are inelegant though graphic eleg¬
Good representation of entity data models.
ance is becoming much less of a problem today.
Compact data structure.
Coordinate transformations are difficult and time con¬
Topology can be described explicitly—therefore good
suming unless special algorithms and hardware are
for network analysis.
used and even then may result in loss of informa¬
Coordinate transformation and rubber sheeting is
tion or distortion of grid cell shape.
easy.
Accurate graphic representation at all scales.
Only a few years ago the conventional wisdom was
Retrieval, updating and generalization of graphics and
that raster and vector data structures were irreconcil¬
attributes are possible.
able alternatives. This was because raster methods
required (for the time) huge computer memories to
Disadvantages
store and process images at the level of spatial resolu¬
Complex data structures
tion that could be obtained by vector structures with
Combining several polygon networks by intersection
ease. They were also irreconcilable because vector
and overlay is difficult and requires considerable
systems were seen as being truer to conventional car¬
computer power.
tographic images and were therefore essential for high-
Display and plotting may be time consuming and
quality map-making and topographic accuracy. Most
expensive, particularly for high-quality drawing,
early technical developments were undertaken in
colouring, and shading.
vector mode simply because this structure was most
Spatial analysis within basic units such as polygons
familiar to cartographers and draughtsmen. Raster
is impossible without extra data because they are
systems were seen as being suitable only for overlay
considered to be internally homogeneous.
analysis where cartographic elegance was not required.
Simulation modelling of processes of spatial interac¬
The previous debates on raster versus vector are no
tion over paths not defined by explicit topology is
longer relevant and they have been shown to be not
more difficult than with raster structures because
mutually exclusive. It has become clear that what first
each spatial entity has a different shape and form.
seemed to be an important conceptual problem is in
fact largely a question of technology. The problems of
Raster data structures
graphical and hard copy presentation and data stor¬
age with raster systems have largely been overcome
Advantages with high-resolution computer screens and printers,
Simple data structures. and relatively cheap media for data storage and com¬
Location-specific manipulation of attribute data is pression techniques. In the late 1970s several work¬
easy. ers, notabley Peuquet (1977, 1979) and Nagy and
Many kinds of spatial analysis and filtering may be Wagle (1979) showed that many of the algorithms that
used. have been developed for polygonal data in vector
Mathematical modelling is easy because all spatial structures not only had raster alternatives, but that
entities have a simple, regular shape. in some cases these were more efficient. For example,
70
Geographical Data in the Computer
the calculation of polygon perimeters and areas, sums, structures and provide conversion programs too.
averages, and other operations within a given radius However, these usually require the data to be in the
from a point are all reduced to simple counting same structural form for analysis across layers. In
operations in raster mode. Some operations of course recent years an alternative in database organization,
still remain more suited to one of the data structures; the object-oriented methods, allows vector and raster
network analyses are much easier in a topological data data structures to be used at the same time (discussed
structure whilst map intersection and overlay are later in this chapter) as they treat the various spatial
trivial in raster systems. units, whether a point, line, polygon, or pixel as unique
Today, many GIS support both vector and raster objects.
Earlier in this chapter various database structures were (c) commercial RDBMS ensure that new develop¬
described. All of them offer particular benefits (and ments are incorporated as standard,
drawbacks) for storing the raster and vector data struc¬ (d) data structures may be defined in standard
tures just detailed. That said, the database structures ways using data dictionaries: data can be
used in most GIS systems tend to be the powerful com¬ retrieved using general methods such as SQL
mercial Relational Database Management Systems (standard query language) that are independ¬
(RDBMS). More recently object-oriented databases ent of the RDBMS,
have been used in a number of the new versions of (e) keeping the attribute data in a RDBMS does
commercial GIS as they offer benefits for applications not interfere with the basic principles of layers
in a number of fields. in a GIS, and
(/) attributes in a RDBMS can be linked to spatial
units that may be represented in a wide vari¬
Hybrid relational databases: linking ety of ways.
GEOMETRIC REPRESENTATION TO ATTRIBUTES
Using these commercial RDBMS, GIS designers have
The availability of commercial RDBMS, such as INFO,
created a variety of hybrid structures including the
ORACLE, INGRES, INFORMIX, and similar prod¬
following:
ucts, greatly eased the work of GIS system designers
as they were able to apply ready developed and tested 1. ARC-NODE—RDBMS. This is probably the most
systems to their data handling needs. These databases used database system in which the full vector arc-node
allowed designers to divide the problems of spatial data topology is used to describe networks of lines and poly¬
management into two parts. The first part was how to gon boundaries as explained above. Each spatial unit
represent the geometry and topology of the spatial ob¬ is identified by a unique number or code and it may
jects—should this be done using vector or raster data be placed in a chosen layer or overlay. The attributes
structures? The second part was how to handle the of the spatial unit are stored as records in relational
attributes of the spatial objects, which may be done tables that may be handled by the RDBMS. Some of
using the commercial RDBMS. The resulting hybrid the topological information and attributes such as co¬
structures (sometimes referred to as georelational ordinates, neighbours, areas, coordinates of min¬
models) have a number of distinct advantages: imum bounding rectangles, and similar data may also
(a) attribute data need not be stored with the spa¬ be stored in tables in the RDBMS. Figure 3.23a shows
tial database but may be kept anywhere on the a typical example of such a data structure.
system, or even on-line via a network, One complication with such a system is the crea¬
(b) attribute data can be expanded, accessed, de¬ tion of new spatial entities and their associated attri¬
leted, updated without having to modify the butes when layers are intersected because this means
spatial database, building new sets of records and links in the RDBMS.
71
Geographical Data in the Computer
Figure 3.23. Equivalent hybrid vector (a—left) and raster (b—right) data structures for modelling
crisp polygons
2. Compact raster—RDBMS. If the spatial objects are valleys with broad land planning units, which could
represented by sets of pixels instead of topologically be done in vector mode, but would be difficult in raster
linked lines then the raster equivalent to the above may systems that permit only a single cell size or level of
be created. When most of the spatial data refer to resolution.
thematic units that are internally homogeneous, such 4. Object—RDBMS. In recent years, an object-ori¬
as choropleth map units, then raster compaction ented approach (discussed earlier) has been adopted
methods of run-length codes may be used to save in organizing both raster and vector data structures
space. Figure 3.23b shows an example of the simple in the same GIS. In these systems the various geomet¬
raster hybrid approach for homogeneous polygons. ric and attribute data are stored in relational tables
3. Quadtree—RDBMS. Quadtrees may also be (Gahegan and Roberts 1988) and object-oriented pro¬
used for data compression but because they allow the gramming languages provide analytical functionality
data to be represented at various levels of spatial as well as a graphical object-based interface to the data.
aggregation they may permit different levels of spatial The systems allow the benefits of object-oriented or¬
resolution to be used in different layers. This could ganization of geographical data to be exploited within
be useful when intersecting data on narrow river the well-known relational database environment.
Object-oriented databases require geographical data metric type may be used to reflect the differences in
to be defined as a series of atomic units. This obvi¬ shape found at different spatial scales.
ously favours data defined using the entity conceptual The attribute and behaviour variables are them¬
model. Geographical data are characterized by a ser¬ selves object classes for which their properties and the
ies of attribute and behavioural values which define methods used on them are defined. Hierachical rela¬
their spatial, graphical, temporal, and textual/numeric tionships may be set up with the various classes; for
dimensions (Worboys 1994). Part of the attribute example the ‘arc’ object may be a subclass of the ‘poly¬
definition will describe the geometric nature (point, gon’ object. Topological links between various object
line, polygon, or cell) of the object; more than one geo¬ instances and classes are established explicitly through
72
Geographical Data in the Computer
object pointers and operators such as ‘direction’, continuous spatial fields into separate units. How
‘intersection’, ‘adjacent-to’, ‘overlaps’, ‘left-of’, or do you break up a hill into a series of distinct objects
‘right-of’. that maybe used in analysis and modelling? The choice
Spatial data organization in object-oriented data¬ of boundary is often subjective (see Chapter 11 and
bases has proved attractive to certain GIS users as it Burrough and Frank 1996). However where data do
offers a way of modelling the semantics and processes support this and an objective discretization may be
of the real world in a more integrated, intuitive man¬ made, the possibilities of data exchange are increased.
ner than possible in relational systems (Kidner and The objects may be used in new applications even
Jones 1994). People working with human-made ob¬ where a different structuring is required. While reus¬
jects, such as utility companies, have found that these able object libraries involve a considerable investment
systems provide an approach which suits the data types of time and money, they have already been shown to
they use and the querying capabilities needed. Hier¬ give a considerable return (Worboys 1995). Spatial
archical structuring and the representation of relatively data object libraries may now be accessed across the
complex relationships between object classes may Internet.
be controlled directly so giving flexibility in database To date the implementation of object-oriented
updating and changing. databases in GIS has been limited. The problem is
The structuring of the database into a series of self- that there are few generic object-oriented database
contained, fundamental units brings with it both prob¬ products available to act as an engine to support GIS
lems and possibilities. It is very difficult to break down functionality.
Advantages
Object-oriented database GIS
Modifiability of spatial data after inputting to the
system. Advantages
Data retrieval and modelling functionality is provided The semantic gap between the real-world objects and
by the DBMS. concepts and their representation in the database
Easy data integration from other systems particularly is less than with relational databases.
for the attribute data. The storage of both the state and the methods ensures
All aspects of the data are stored in a specialized file database maintenance is minimized.
structures. Raster and vector data structures may be integrated
Ease of use. in the same database.
Sound theoretical foundation for relational database. The data exchange of objects is supported.
Fast querying of the database especially when complex
Disadvantages objects and relationships have to be dealt with as
Poor handling of temporal data. fewer join operations are needed.
Coordinate data tend not to be subject to the rigorous Requires less disk space than relational entities which
database management as might be applied to attrib¬ need to store many more index files.
ute data, so issues of security and integrity exist. Enables user-defined functions to be used.
73
Geographical Data in the Computer
Questions
1. Discuss the limitations that computer coding imposes on spatial data structuring.
2. Why are database management systems so important? What are their main
functions?
3. Think of the main digital geographical data that you come across in day-to-day
living. What data models are used to represent the information and why?
4. Why is it important for the user to be aware of the database structure when using
a GIS?
5. Review the different methods used for speeding up data access and compression.
Consider a range of different GIS applications where these techniques are important.
74
FOUR
75
Data Input, Verification, Storage, and Output
having many applications including the creation of It is most important that all spatial data in a GIS are
topographical maps and orthophotomaps by photo- located with respect to a common frame of reference.
grammetry (see Chapter 5). Stereo aerial photographs For most GIS, apart from local studies, the common
are a major source of data for the human interpreta¬ frame of reference is provided by one of a limited
tion and mapping of geology, soil, vegetation, or land number of geodetic coordinate systems.
cover, and they are also valuable background docu¬ The most usual and convenient coordinate sys¬
ments for placing other spatial data in a proper geo¬ tem used in GIS is that of plane, orthogonal cartesian
graphical context. Digital photogrammetry and digital coordinates, oriented conventionally north-south and
orthophoto mapping provide data on terrain eleva¬ east-west. Because the earth is not flat, but nearly
tion and land cover directly in digital form without spherical, the longitude or east-west position is related
the need for conversion from a paper analogue docu¬ to the Greenwich meridian and the latitude or north-
ment (e.g. Plate 1.1). south position is related to the Equator. Since the earth
A wide range of scanners mounted in satellites or is not even a true sphere but flattened at the poles,
aircraft provide digital data directly. There are systems geodesists have devised several ellipsoids for mapping
that are passive receivers from reflected radiation, and the true curved surface of the earth on to a plane of
the amount of reflected energy is recorded for an which the most common are the International, the
76
Data Input, Verification, Storage, and Output
The data record the amount of radiation reflected or emitted by features on the earth’s
surface and may be either in analogue (as with aerial photography) or digital form. With
the latter, data are recorded as a series of grid cells (pixels or picture elements) each
coded with a value representing the radiation detected by the sensor from the area of
the earth’s surface covered by the pixel. Simple systems only register a single value
for each of a limited number of wavebands (a range of wavelengths of electromagnetic
radiation) that have been chosen to give as much information as possible about certain
aspects of the earth’s surface such as vegetation, rock and soil minerals, and water. For
example, the scanners on the French SPOT satellite record values for four wavebands
(Band 1; 0.5-0.6 Jim, Band 2; 0.6-0.7 jam, Band 3; 0.7-0.8 jam, Band o.8-1.1 p.m) in
order to be able to detect differences in water, vegetation, and rock. Multispectral scan¬
ners now in development record continuous spectra for each pixel and therefore gener¬
ate huge amounts of data.
The spatial resolution, or the area covered by a single pixel, depends on the altitude
of the sensor, the focal length of the lens or focusing system, the wavelength of the
radiation, and other inherent characteristics of the sensor itself. Pixel sizes vary from
a square kilometre for data from meteorological satellites to a few square centimetres
for aircraft-based, high-resolution sensors.
Data collected by remote sensing are affected by atmospheric conditions and irregu¬
larities of the platform, such as tilt and orientation. Geometric and radiometric correc¬
tions are needed prior to data input to the GIS to minimize these distortions. The visual
appearance of the images can be improved by increasing contrast, stretching the range
of grey levels or colours used, and by edge detection, to make it easier to recognize
spatial features.
77
Data Input, Verification, Storage, and Output
The UTM divides the world east-west into 60 zones of longitude each 6° wide.
Zones are numbered from west to east with zone 1 having its western edge on the
180° meridian.
The UTM divides the world north-south into 20 zones of latitude, starting at the equa¬
tor. Zones are 8° high, except the most northerly and southerly, which are 120 high.
The Eastings of the origin of each zone is given a value of 500 000 m. For the southern
hemisphere the equator is assigned a value of 10 000 000; for the northern hemisphere
the equatorial value is 0.
Krasovsky, the Bessel, and the Clarke 1880 ellipsoids Georeferencing and GIS
(Brandenburger and Gosh 1985).
It is essential for GIS and spatial analysis that all data
There are three main ways for projecting locations
are referenced to the same coordinate system. When
from an ellipsoid onto a plane surface, namely cylin¬
working within a single country one usually adopts the
drical projections, azimuthal projections, and conical
coordinate conventions without question, but when
projections (Maling 1992). The best projection to use
building multinational data sets it is very important
depends on the location of the site on the surface
to ensure cross-border equivalence of ellipsoid, pro¬
of the earth, and cartographers find that cylindrical
jection, and base level.
projections are best for lands between the tropics,
conical projections are best for temperate latitudes,
and azimuthal projections are best for polar areas.
Georeferencing raw data
The most widely used, general projection is called the
Universal Transverse Mercator (UTM), developed by In practice, ground and field surveys are georeferenced
the US Army in the late 1940s, which is now a world in many ways. Ground checks are needed to locate
standard for topographic mapping and digital data aerial photographs and satellite images correctly. Par¬
exchange (see Box 4.2). cel boundaries for cadastral systems are surveyed ac¬
curately on the ground using laser theodolites, as are
the locations of infrastructure and utilities (roads, elec¬
Base levels tricity and telephone cables, gas, water, and sewerage
All elevation data are referenced to mean annual sea pipes). Sources of demographic and socio-economic
levels. These, of course, are not constant over the whole data such as the census are linked to less accurate, car-
world and may differ by some metres from one side tographically defined areal units as the basic spatial
of a continent to the other (e.g. along the Panama reference. A similar approach is adopted for data
canal). All national mapping agencies (NMAs) have collected by administrative or municipal authorities.
defined local base reference levels to suit their condi¬ In surveys, such as those used in market research
tions, but a problem for international studies is that or with consumer information such as store cards,
adjacent countries may have quite different standards. postcodes (or zip codes) are used to indicate the
78
Data Input, Verification, Storage, and Output
National Mapping Agencies (NMA), natural resource countries have national or regional mapping agencies
survey institutes, commercial organizations, and indi¬ responsible for collecting systematic data on phenom¬
vidual researchers are all involved in collecting and ena such as the nature of the terrain, natural resources,
disseminating geographical data, both in analogue human settlements, and infrastructure. In many coun¬
and digital form. Government agencies are the main tries these agencies were, or still are, part of military
collectors, providers, and users of geographical infor¬ organizations. The data they collect tend to be gen¬
mation; of all data they collect, some 60-80 per cent eral purpose and are employed by a broad range of
maybe classified as geographical (Lawrence 1997). users. Other surveys are carried out by more special¬
ized agencies that record variables such as land own¬
ership, employment and journey to work patterns,
Systematic data collectors soils and geology, rainfall and temperature, river flow
Traditionally, government agencies maintain the and water quality. National mapping agencies have
systematic collection of geographical data. Many the mandate for defining and maintaining the map
79
Data Input, Verification, Storage, and Output
8o
Data Input, Verification, Storage, and Output
available water capacity, soil depth, and other prop¬ creasing demand for information. The suppliers tend
erties. Kineman (1993) describes a five-year inter¬ to develop themed data products aimed at niche mar¬
agency project sponsored by the US NOAA and EPA kets such as postcode referenced socio-economic data,
to develop a global ecosystems (spatial) database to transport routes, or elevation terrain representations.
help in the investigation of global change that include Many of the companies use existing data sets collected
data on vegetation cover boundaries, surface clima¬ by other organizations, which when integrated creates
tology, soils, elevation, and terrain. Langas (1997) ‘new’ data with added value (and intellectual prop¬
describes efforts to create a multinational GIS for the erty); these data may then be sold as a commercial
Baltic catchment. These databases are aimed at pro¬ product free from the copyright restrictions of the
viding information that may be used in a range of original data (Lawrence 1997). For example, it is now
applications. In the last few years there have been dis¬ possible to buy the details of decadal censuses in
cussions in many countries concerning the develop¬ several countries from commercial organizations
ment of integrated national, continental, and global who have taken the published information and
geographical databases. These are in varying stages combined it with easy-to-use querying and mapping
of implementation and are discussed in more detail functionality to create new commercial products.
in Chapter 12. Other companies have been able to realize the in¬
In parallel with these developments in the public vestment of their own data collection exercises and
sector, commercial organizations have started to de¬ have sold this information under copyright to a wider
velop digital geographical datasets to help meet the in¬ audience.
Digital data sets from data suppliers range from small- supporting information, known as meta data, about
scale, country and continent-wide coverages available the way the survey was conducted and the informa¬
for nothing on the Internet to expensive, tailored tion collected.
products geared to specialist market sectors (Burrough At a more practical level a user will need to con¬
and Masser 1997). For many local government and sider the following characteristics of the data to
business users they provide an essential source of ensure that they are compatible with the application:
digital data that they can rely on and mean that these
• the currency of the data,
users do not have to cope with the overheads of digit¬
• the length of record,
izing or collecting their own data.
• the scale of the data,
Although using existing datasets is attractive, seri¬
• the georeferencing system used,
ous attention must be paid to data compatibility when
• the data collection technique and sampling strat¬
data from different suppliers are combined in one pro¬
egy used,
ject. There maybe differences in projection, scale, base
• the quality of the data collected,
level, and description of attributes that could cause
• the data classification and interpolation methods
problems. For example, in a survey of desert terrain a
used,
nomad is likely to classify the area differently from a
• the size and shape of the individual mapping
geological engineer. Water-bearing capabilities, shade,
units.
and grazing productivity are likely to dominate the
interpretation of the landscape features by the nomad, Where data are used from a number of sources, and
whilst economic potential or geological hazards might particularly where the area of study crosses adminis¬
be the concern of the engineer. Users therefore need trative boundaries, difficulties in data integration are
to ensure that when using an existing data set, its caused by different geographical referencing systems,
semantics correspond with theirs. This is not always data classification, and sampling strategies of the
easy to ascertain, as many data sets do not include any individual surveys. Users need to be aware of these
8i
Data Input, Verification, Storage, and Output
82
Data Input, Verification, Storage, and Output
The World Wide Web (WWW) is a term used sometimes synonymously with the Internet.
It is in fact the main information tool of the Internet through which data, written using
hypertext media onto ‘web pages’, are accessed. These are available to the connected
world through a ‘web server’ and viewed by users at remote locations using ‘web browser’
software. For the GIS user the WWW provides data, and is a source of information about
the different technologies, about current academic research including on-line journals,
and even about employment opportunities
The Intranet is a local, private linked network which provides all the facilities of the
Internet but limits the usage to those directly connected to it. It is used predominantly
within organizations for disseminating information and for communication, and because
it is separate from the public network data and connection security are ensured.
83
Data Input, Verification, Storage, and Output
(described in more detail later in this chapter) to geo/gissites.txt), provides a comprehensive list of on¬
generate linear and polygonal entities. The resulting line GIS and Remote Sensing sites, arranged in alpha¬
mapped images may then be transferred using a ex¬ betical order. Many of the data are free (for example
change format to a general GIS. the 1 km resolution Digital Chart of the World)
New methods of capturing data in a GIS have been though increasingly costs are being imposed by the
brought with the Internet and Intranet (see Box 4.3). data providers through licensing agreements required
Whole libraries of vector, raster, and object data are to access the datasets. Not all data are up to date, some
being offered now on the Internet as well as direct¬ have historical value, and others provide daily or
ory information on different datasets. To take one weekly information.
example, as of September 1997, Roelof Oddens Web browsing software is used to access data stored
Bookmarks (see http://kartoserver.frw.ruu.nl/html/ at particular provider sites (web servers) and either
staff/oddens/oddens.htm) listed more than 2,350 used ‘on-line’ using proprietary GIS software as shown
sources of new digital maps, more than 50 sources for in Figure 4.2d or downloaded to the user’s local com¬
the Netherlands, 20 interactive routing programs (see puter (known as the client) using a standard transfer
Chapter 7) for road navigation, 124 Electronic Atlases format. The proprietary software consists of programs
and libraries, 65 on-line map generators, 30 GPS and which run on top of web browsers and offer basic GIS
map projection sites and many Carto- and Geoservers functionality such as data integration, spatial brows¬
that provide access to other sites that offer digital data ing, and measurements. Some support the develop¬
or scanned printed maps, or give direct access to na¬ ment of spatial querying capabilities within another
tional and international mapping agencies, academic application and examples include Autodesk Map-
departments, and businesses. More than 200 new sites Guide, Intergraph, and ESRI MapObjects Internet
were added in the period end of July to mid-September Map Server. At the moment speeds oTEafalfansmis-
1997. Another example, the Internet GIS and Remote sion and Internet access are the main limits to using
Sensing Information site (ftp://ftp.census.gov/pub/ this data source.
84
Data Input, Verification, Storage, and Output
Digitizers A digitizer is an electronic or electromag¬ orthogonal fine wire grid or the electromagnetic and
netic tablet upon which a map or document is placed range in size 30 x 30 cm (12 x 12 inches) to approx¬
(shown in Figure 4.5). Embedded in the table, or imately 1.1 x 1.5 m (40 x 60 inches).
located directly under it, is a sensing device that can The pointer may be held in a cursor-device such
accurately locate the centre of a pointing device which as a mouse, or a puck or alternatively in a pen-like
is used to trace the data points of the map. The most appliance; these may be either corded or cordless.
common types currently used are either the electrical- Positioning the cursor or pen over a point on the map
85
Data Input, Verification, Storage, and Output
86
Data Input, Verification, Storage, and Output
87
Data Input, Verification, Storage, and Output
88
Data Input, Verification, Storage, and Output
Stereoplotters are used extensively in capturing the map is a unique drawing, locational errors need
continuous elevation data for digital elevation models only be considered within the map boundary; if the
and orthophoto maps and they are described in more map is one of a series covering a larger area, or the
detail in Chapter 5. digitized data must link up with map data already
in the computer, then the spatial data must also be
Entering the attribute data Attribute data (some¬ examined for spatial contiguity across map edges.
times called feature codes) are those properties of a Certain operations, such as polygon formation, may
spatial entity that need to be handled in the GIS, but also indicate errors in the spatial data.
which are not themselves spatial. For example, a road Attribute data may be checked by printing out the
may be captured as a set of contiguous pixels, or as a files and checking the columns by eye. A better method
line entity and represented in the spatial part of the is to scan the data files with a computer program
GIS by a certain colour, symbol, or data location (over¬ that can check for gross errors such as text instead of
lay). Information describing the type of road (e.g. mo¬ numbers, numbers exceeding a given range, and so on.
torway or dirt track) may be included in the range of Errors that may arise during the capturing of spa¬
cartographic symbols normally available. If other data tial and attribute data (discussed in more detail in
about the road such as the width, the type of surface, Chapter 8) may be grouped as follows:
any specific traffic regulations, the estimated number
of vehicles per hour etc., also need to be recorded then Spatial data are incomplete or double.
these have to be included in the GIS. Although these • Incompleteness in the spatial data arises through
attribute values and associated identifiers may be omissions in the input of points, lines, or cells of
attached to graphic entities directly on input, it is not manually entered data. In scanned data the omis¬
efficient to enter large numbers of complex attributes sions are usually in the form of gaps between lines
interactively. The data are therefore either stored sep¬ where the raster-vector conversion process has
arately from the spatial information in the GIS in the failed to join up all parts of a line. Similarly, the
case of relational databases, or are input along with raster-vector conversion of scanned data can lead
spatial description with the object-oriented databases. to the generation of unwanted ‘spikes’. Digitized
Attribute data may come from many different lines may also be input more than once, lines and
sources such as paper records, existing databases, nodes may not be joined for intersections.
spreadsheets, etc. They may be input into the GIS Spatial data are in the wrong place
database either manually or by importing the data • Mislocation of spatial data can range from minor
using a standard transfer format such as TXT, CSV, placement errors to gross spatial errors. The
or ASCII. Where relational databases are used an iden¬ former are usually the result of careless digitiz¬
tifier (discussed in more detail in the next section) ing; the latter maybe the result of origin or scale
is included in the attribute record to link the spatial changes that have somehow occurred during
and attribute data together. digitizing, possibly as a result of hardware or soft¬
ware faults.
Data verification and editing Once the data have Spatial data are defined using too many coordinate
been entered it is important to check them for errors— pairs
possible inaccuracies, omissions, and other problems • As a result of both scanning or digitizing pro¬
—prior to linking the spatial and the attribute data. cesses lines in the database maybe defined using
Figure 4.8 summarizes the complete process of creat¬ too many points. These may take up exorbitant
ing a GIS database by hand. amounts of storage space.
The best way to check for errors in the spatial data Spatial data are at the wrong scale
is to produce a computer plot or print of the data, pref¬ • If all spatial data are at the wrong scale, then this
erably on translucent or thin paper, at the same scale is usually because the digitizing was done at the
as the original. The two maps may then be placed over wrong scale. With scanned data the problems
each other on a light table and compared visually, usually arises during the georeferencing process
working systematically from left to right and up and when incorrect values are used.
down the map. Missing data, locational errors, and Spatial data are distorted
other errors should be clearly marked on the print¬ • The spatial data may be distorted because the base
out. Certain GIS show some topological or identifier maps used for digitizing are not scale correct.
errors directly by colour coding the screen image. If Most aerial photographs are not scale correct
89
Data Input, Verification, Storage, and Output
Paper map
or
Field Data
linked by
NON-SPATIAL SPATIAL
ATTRIBUTES unique identifiers DATA
1
REGISTRATION TO
COORDINATE SYSTEM
V
CLEAN UP LINES
AND JUNCTIONS
T
WEED OUT EXCESS
COORDINATES
T
CORRECT FOR DISTORTION
V
BUILD TOPOLOGY
Y Y
CREATE RELATIONAL TABLES CREATE ENTITY IDENTIFIERS
LINK SPATIAL TO A
NON-SPATIAL DATA
I
TOPOLOGICALLY CORRECT
LINE AND POLYGON DATABASE
over the whole of the image beca use of tilt of the than another. In addition, paper maps and held
aircraft, relief differences, and differences in dis¬ documents may contain random distortions as
tance to the lens from objects in different parts a result of having been exposed to rain, sunshine,
of the held. All paper maps suffer from paper coffee, beer, and frequent folding. Transforma¬
stretch which is usually greater in one direction tions from one coordinate system to another,
90
Data Input, Verification, Storage, and Output
9*
Data Input, Verification, Storage, and Output
from already digitized points or text entities to the sur¬ It is much more difficult to spot errors in attribute data
rounding polygon. when the values are syntactically good but incorrect.
The linkage operation provides an ideal chance to
verify the quality of both spatial and attribute data.
Screening routines can check that every graphic Data structuring
entity receives a single set of attribute data; they can
also check that none of the spatial attributes exceed Following the data capture process the data are now
their expected range of values, or that non-sensible in one of two forms—geometrically correct raster
combinations of attributes or of attributes and geo¬ data, or topologically and geometrically correct vec¬
graphical entities occur. All geographical entities not tor data. Modern GIS will accept both kinds of data
passing the screening test may be flagged so that the but it may be necessary to restructure the spatial data
operator can quickly and easily repair the errors. Cor¬ from one form to the other, depending on the appli¬
recting attribute errors requires referring back to the cation (see Chapters 7 and 8). Vector-to-raster and
original data to check the information and keying raster-to-vector conversion have already been dis¬
in any missed or wrong values. cussed earlier in this chapter.
The programs that link spatial and attribute data Structuring of the database as a whole is required
may also be used to check that all links have been prop¬ to optimize the storage requirements and querying
erly made. The programs should be written in such performance of the GIS through sensible ordering and
a way that they flag only the errors. Incorrect links data compression techniques as described in Chapter
between spatial and attribute data are usually the re¬ 3. For example, quadtree encoding of data is supported
sult of incorrect identification codes being entered in in at least one commercial GIS, and is carried out
the spatial data during digitizing or scanning processes, using basic menu commands. Efficiency within the
or when establishing polygon topology. Editing the database is brought about by indexing the attribute
spatial data involves correcting any errors in the poly¬ and spatial data or through normalization of the
gon topology and the identifier codes given to the data. RDBMS (again described in Chapter 3).
Data presentation
Once the database is complete, querying and analysis for producing aesthetically pleasing graphical out¬
operations maybe undertaken, and these are described put have increased in recent years with many vendors
in detail in Chapters 7 and 8. The results of data re¬ developing special mapping tools which are linked
trieval and analyses need to be either in a form that is to or integrated within the GIS database. These allow
understandable to a user or that allows data transfer detail such as map legend, title, orientation indicator
to another computer system. Most GIS include soft¬ (north arrow or coordinate marks) and a scale or scale
ware which support the production of a range of data bar, as well as colour, symbology, and text to be added
output options. Digital or 'computer-compatible out¬ easily to the data presentation. They also give choices
put may be in the form of a file on an optical disk or in displaying the data such as using smooth isolines,
other storage device that can be subsequently reused or coloured or shaded choropleth maps. The results
or read into another system. Alternatively the data are maps of high cartographic quality for output on
output may involve some form of data transmission three types of devices: visual display units (VDUs),
over fibre optic or telephone lines such as the Internet. plotters and printers.
Analogue or ‘people-compatible’ kinds of output
are maps, graphs, and tables; the output devices for
producing these can be classified into those that pro¬ Visual display units
duce ephemeral displays on electronic screens and The computer screen or visual display unit is the main
those that produce permanent images on stable base tool for displaying the results of GIS analysis to peo¬
materials such as paper or mylar. The capabilities ple, especially for Internet users who work on-line.
92
Data Input, Verification, Storage, and Output
Electronic computer screens come in a wide range of abilities and specialist software is needed to generate
sizes and use conventional cathode ray tube (CRT) this type of display.
technology or arrays of light emitting diodes (LEDs) The perspective, quasi three-dimensional display or
to form images on the screen. Within the CRT there block diagram (otherwise known as draping) is a popu¬
are red, green, and blue electron guns which emit lar method of displaying surficial thematic informa¬
beams of light onto coated, light-sensitive particles tion in relation to the landforms on which it occurs.
which make up the screen. When the beam hits one The method can be used to provide visual impressions
of the particles it emits a coloured light and the image that are impossible to achieve with the usual two-
is built up of a grid of thousands of glowing dots. The dimensional map format. Examples of perspective
beams from the CRT are moved across the screen displays are given in Plate 1.4 (orthophoto draped
systematically in horizontal and vertical directions over the digital elevation model of the landscape from
to excite the particles in turn and the whole screen is which it was derived—see Chapter 5), Plates 3.7 and
scanned or refreshed 60-80 times a second. The LED 3.8, and in Figures 5.13, 8.13, and 8.16. Draped
displays have triple layers, one for each primary colour. images are usually best done in colour because the
The emission of light from the CRT to a particular human eye cannot distinguish sufficient grey scales
part of the screen is controlled by the CIS software and to make monochrome plots of thematic data fully
the graphics adapter on the computer. The software understandable. The combination of perspective plots
sends a signal to the processor which in turns outputs and dynamic visualization, as employed in video
to the graphics adapter. This latter is a separate card games, is a powerful means of displaying the re¬
or an addition to the processor board within the com¬ sults of space-time models in GIS (e.g. Van Deursen
puter which controls the signals sent to the CRT and and Burrough 1998). Variations within a three-
has a small memory capacity for graphics manipula¬ dimensional block of land can be displayed by a range
tion purposes. of methods that are marketed by several specialist
The differences in output resulting from these de¬ vendors but fence mapping and cut-away diagrams
vices depend on the hardware itself (the VDU and are common (e.g. Plates 3.3 and 3.4).
graphics adapter) as well as the CIS display software.
They determine resolution of the screen which con¬
trols the detail that may be displayed, the number of Plotters
colours used, and the size and scale of the data shown. Plotters are automated draughtsmen: they are output
Often the displays cannot show all the visual detail of devices for making copies of geographical data on
a complex database at full resolution so most systems paper or film. They have a paper feeding or holding
allow the user to ‘zoom in’ and display an enlarged part mechanism, which is usually a roller or flat surface bed,
of the database. and a drawing ‘arm’ which houses coloured pens (Fig¬
Most GIS users will view the results of their ana¬ ure 4.10). Two-dimensional line images are created by
lysis as a static map, graph, or table on the screen.
Hardcopies of these images may be obtained photo¬
graphically using special devices or committing them
to a temporary file in the computer memory (a screen
dump) which may subsequently be printed. Recent
advances in computer technology, which enable in¬
creasing amounts of data to be stored and viewed, per¬
mit more complex screen displays in which a series
of images may be shown in rapid succession. This
dynamic visualization allows the sensation of motion,
either of the viewer or of the environment, to be rep¬
resented and is particularly useful in GIS modelling
applications where changes in the study area are tak¬
ing place over time. This is a great improvement on
previous displays where the values for each successive
time step were shown as single static images making
it hard for the user to interpret the results. Very few
commercial GIS support dynamic visualization cap¬ Figure 4.10. Large-format map plotter
93
Data Input, Verification, Storage, and Output
moving the drawing device and/or the paper horizont¬ head to show the information. These give output of
ally and vertically (x and y). All information is drawn much higher cartographic quality with finer resolu¬
by a series of line-drawing commands given by the tions and up to AO in size. The resolution is determined
software such as: move drawing device to point XY, set by the speed of the step movement and the size of
pen down, move pen to X' Y', move pen to X"Y" . . ., the ink jet with some achieving high-quality output
raise pen. The moves to the XY coordinates are given of 720 dpi.
using either absolute or incremental coordinates. The toner-based printers are known as laser print¬
The flexibility and speed of a plotter is determined ers and will be familiar to many readers through their
largely by the amount of preprogrammed information use in photocopiers (the xerox process). They work
it has for drawing complex shapes such as letters by taking the output from the computer processor and
and symbols. ‘Smart’ plotters will contain prepro¬ converting it to a laser signal (‘on’ or ‘off’) which is
grammed, ‘hard-wired’ character and symbol sets that temporarily imprinted using a scanning action on to
need only to be referenced by simple computer com¬ an electrically charged drum. The drum is then passed
mands. The quality of the plotted image depends by a container of toner which is itself electrically
largely on the step-size of the motors controlling the charged; those points on the drum where the signal
drawing device. is ‘on’ attract black powder which sticks to form the
Recently there have been major changes in plotter image. Paper is passed over this drum and the fine
technology which replace the coloured pens with ink powder is transferred on to it. The paper is next passed
or bubble jet drawing devices, formerly found only in between two heated plates which cook the toner so that
the physically smaller printers (see next section). These it gives a permanent copy. Recently, coloured laser prin¬
have the advantage of being faster and quieter and ters have become available which work on the same
they allow areal-filled entities as well as linear ones principle but three passes of the laser on to the drum
to be drawn quickly and uniformly. They may be used and over the toners are needed for full colour prints.
with various media including plain and glossy paper, With inkjet printers the different coloured inks are
velum and polyester film. Resolutions of 360 dpi emitted from fine nozzles in the print head in response
for colour and 720 dpi in monochrome are possible. to electronic signals from the GIS software. Two main
Laser film writers draw maps on photographic mate¬ mechanisms have been developed to control the
rials to even higher levels of accuracy. emission. Inkjet printers use electrical methods to
transfer ink from cartridges to the paper. Bubblejet
printers have a print head which contains a series
Printers
of nozzles which are approximately one micrometre
Printers were originally simple raster output devices thick. The ink is released from these nozzles using
such as a line printer or printing terminal which had a deceptively simple mechanism based on the prin¬
to use combinations of different keyboard characters ciple that when fluids are heated bubbles are pro¬
to represent different data values. The original com¬ duced. Each nozzle has a heater which when pulsed
puter printers gave images in which the finest resolu¬ by an electrical current produces several thousand
tion was limited to the size of a physical type and so temperature increases per second. Each of these cre¬
were cartographically unappealing with their coarse ates a tiny bubble which exerts pressure within the
grid cell. These have been universally superseded by nozzle, forcing a single, microscopic ink droplet to
printers which still build up the images by printing be ejected. The pressure then drops, a vacuum is
values on a line-by-line basis but which use fine black created attracting new ink and the process begins
powder (toner) or coloured inks ejected from a printer all over again.
94
Data Input, Verification, Storage, and Output
Data updating
Many geographical data are not inviolate for all time is needed the topology of the dataset will need to be
and are subject to change. Few of these changes are so generated again.
deterministic that they can be performed automat¬ Updating is rather more than just modifying an
ically. For example, political boundaries may change ageing database; it implies resurvey and processing
with the whims of parliament, land use and field new information. Some aspects of the earth’s surface,
boundaries may change as the result of reallotment, such as rock types, change slowly, and important
soil boundaries change as a result of land improvement changes are few, so updating remains a small prob¬
or degradation. If these changes in the landscape are lem. However, there are other kinds of geographical
not included in the database then its credibility and data where it may be more cost-effective to resurvey
integrity is undermined; updating is therefore needed. completely every few years rather than attempt to
The basic operations just described for editing are used update old databases. In comparison, updating the
to effect this so new data are input, existing informa¬ attribute data is trivial, provided the one-to-one links
tion relocated, and new or modified attribute values between attribute data records and the spatial entities
introduced. When changes to the spatial structuring remain unaltered.
Data storage
Building a digital database is a costly and time- age media) even when the data is used infrequently
consuming process and it is essential that the digital on the computer.
map information is transferred from the local disk
memory of the computer to a more permanent storage
medium where it can be safely preserved. Some data¬ Removable Storage
bases for topographic, cadastral, and environmental
mapping can be expected to have a useful life of 1-25 Removables are forms of magnetic and optical stor¬
years and it is essential to ensure that they are preserved age media that maybe taken away from the data source
in mint condition. computer and used elsewhere. They are used for hold¬
Digital data are stored on magnetic or optical ing or ‘backing up’ (for security reasons) data that are
media however, and the form is variable and reflects currently being used, and for archiving those which
the cost of the media, and where and how often the are not. The main media available are listed below. The
data are used. In the past data were transferred from floppy disk and CD ROM are the most familiar pro¬
a computer’s hard disk storage (which was relatively ducts with most desktop machines providing access
expensive to buy) and archived on media such as 0.5 drives. The other products require special read-write
inch wide 9-track magnetic tapes. These were relatively drives to be fitted and are usually used on the main
safe, cheap to use, and reliable because the magnetic data storage computers.
particles on the tape do not lose their signals over time. Magnetic Media are:
They were stored in a safe area and then loaded onto
a tape drive mechanism for reading whenever the data • floppy disks—3.5 inch plastic diskettes which
were needed. These portable types of media are known contain a flat circular magnetic disk inside and
as ‘removables’. Today rapid changes in hard disk store up to 2.8 megabytes of data but usually 1.4
technology have led to more compact and very much megabytes; new ZIP drives provide large capac¬
cheaper devices being available. More and more data ity removable magnetic disks with 100 megabytes
are now being stored on ‘non-removables’ (fixed stor¬ to more than 1 gigabyte of data.
95
Data Input, Verification, Storage, and Output
• DATs—Digital Audio Tapes which contain mag¬ Non-removable storage and networks
netic tape within cassettes or cartridges and are
The falling price of hard disks per megabyte of data
in two main sizes of 4 mm and 8 mm; these can
has meant that economies in storage costs, formerly
store several gigabytes of data, made using removables, no longer apply. Gigabytes
• 0.5 inch tapes, large spools of magnetic tape
of data may now be stored on a single computer hard
which are stored in metal cases, were used most
disk. The main problem associated with this is shar¬
extensively ten years ago though many organ¬
ing the data; users will have to either use only the
izations still use them today, and store about
computer on which they are stored or make copies
520 megabytes of data. of the data to put on their own. This is obviously
Optical media are: inconvenient and leads to a large duplication in data
• plastic resin disks which allow a laser to encrypt storage, and difficulties in knowing which is the most
digital data on to it and their most familiar form current version of the database. Where there are a
is used to record music. number of users wanting to access the same data (and
• optical disks (synonymously called CDs— software), organizations have used computer networks
compact disks) which are approximately 13 cm to minimize these problems.
(5.25 in) in diameter, are able to store 675 mega¬ There are two main types of network—Local Area
bytes, and are of a variety of forms which are Networks (LANs) and Wide Area Networks (WANs).
distinguished by their ability to write data to LANs are where computers are linked together at one
the disk: site; WANs are networks which link geographically
— CD ROMS (Read Only Memory) act as pure remote sites. The computers are joined using either
storage media and many data and software wired (copper-based or fibre optic cable) or wireless
are distributed by them; (radio, microwave, laser, and infrared) links. Wireless
— WORM Optical Disks (Write Once Read links support point-to-point connections and are used
Many) allow the user to copy the data to the mostly with WANS, whereas wired links are used to
disk only once, after which time the data can physically join the computers together in LANs, and
only be read from the disk; where feasible WANs.
— Rewritable Optical Disks are usually held The configuration of computers on a network may
within a plastic housing and allow data to be be described using the following types:
copied to and from the disk many times— • Peer-to-peer—where two computers of equal
they are effectively large storage floppy disks; power are joined for the sharing of files. This
— laser disks are large 12 in disks and are usu¬ type of network is little used in GIS environ¬
ally ROMS. ments because transferring the large data files
There are essentially two types of operation used to involved seriously affects the performance of the
commit the data to storage onto these media. Archiv¬ computers.
ing is the transfer of the data from the computer onto • Client-server—there are one or more dedicated,
the media with the specialist software or the hardware powerful computers (servers) which are used to
drives themselves performing compression operations store the data and software. The computers
to maximize the amount of information stored. This linked to the server (the clients) have their own
data encryption requires the information to be ‘re¬ processing power but access data and software
stored’ prior to using it in the GIS again. The second from the central store. The server manages file
method, known as Mass Storage Systems (MSS), is transfer and storage, central printing, and, with
now widely used and differs from archiving systems some systems, supports database querying and
in that the stored data are written in the same format manipulation tasks.
as the original. The data may therefore be accessed • Central processing systems—these are principally
directly from the storage media and used in the GIS. associated with main-frame and mini networks
Rewritable optical disks are being used increasingly in and consist of an extremely powerful, central
these operations and some computers have fittings processing computer which stores all the data and
which are able to hold many of these; they are known software, and a series of dumb terminals through
as optical juke boxes. The computer is then able to which the user accesses them. All the processing
call on any one of these devices to provide data. is done by the main computer.
96
Data Input, Verification, Storage, and Output
Networks used with GIS are of types 2 and 3 with always made available. Problems are encountered if
a general move these days towards the client-server more than one user is updating or using a version of
approach. This provides efficient data transfer and a database at the same time. There are also problems
printing capabilities although the GIS software is associated with giving a large number of people
usually installed on the clients because of its size. access and the ability to change a very valuable data
Network based storage is not without its problems. resource. Access and usage limits therefore have to
The principal ones are associated with multiple access¬ be imposed.
ing of the data and ensuring the latest version is
Questions
1. Explain how you would assemble the geographical data needed for the following
applications.
• The location of fast food restaurants
• The incidence of landslides in mountainous terrain
• The dispersion of pollutants in groundwater
• An emergency unit (police, fire, ambulance)
• A tourist information system
• The monitoring of vegetation change in upland areas
• The monitoring of movement of airborne pollutants, such as the 137Cs deposited
by rain from the Chernobyl accident in 1986.
2. What steps would you take to limit the introduction of errors in (a) the digitizing
and (b) the scanning of spatial data?
3. Look at the computers you have near to you. What devices and media are used for
storing or backing their data? How would you improve the safety factor for this?
4. What are the most important aspects of a map for communicating information?
What makes a good map? How should you choose colours and grey scales for display¬
ing data?
97
FIVE
Creating Continuous
Surfaces from Point Data
This chapter explains methods of creating discretized, continuous surfaces for
mapping the variation of attributes over space. Data sources are commonly ob¬
servations at sparsely distributed point samples such as groundwater wells, soil
profiles, meteorological stations or presence/absence data for vegetation or counts
of animals, people or marketing outlets for basic spatial units such as grids or
administrative areas. The results are usually interpolated to regular grids and can
be displayed as colour or grey scale maps or by contour lines. The chapter de¬
scribes spatial sampling strategies and methods of spatial prediction including
global methods of classification and regression and local deterministic interpola¬
tion methods such as Thiessen polygons, inverse distance weighting and thin-plate
splines. Each method is illustrated by worked examples that provide a basic intro¬
duction to the mathematics and demonstrate when the method is applicable.
Digital elevation models and digital orthophoto maps are examined as special cases
of continuous surfaces.
98
Creating Continuous Surfaces from Point Data
Examples of (a) are the conversion of scanned indirectly. The many uses of continuous surfaces in
images (documents, aerial photographs, or remotely spatial modelling are explained in Chapter 8.
sensed images) from one gridded tessellation with a The continuous surfaces obtained from interpola¬
given size and/or orientation to another. This proce¬ tion can be used as map overlays in a GIS or displayed
dure is known generally as convolution. in their own right. The surfaces can be represented by
Examples of (b) are the transformation of a continu¬ the data models of contour lines (isopleths), discrete
ous surface from one kind of tessellation to another regular grids, or by irregular tiles. The data struc¬
(e.g. TIN to raster or raster to TIN or vector polygon tures used are regular grids (raster), contour lines, or
to raster). triangular irregular networks (TINs). Because the
Examples of (c) are the conversion of data from sets interpolated surfaces vary continuously over space,
of sample points to a discretized, continuous surface. regular gridded interpolations must be represented
We must distinguish situations with dense sampling by a data structure in which each grid cell can take a
networks from those with sparse sampling networks, different value: therefore space-saving raster data
or data collected along widely spaced transects. Dense structures such as run-length codes and quadtrees
sampling networks are common when creating hypso¬ cannot easily be used. The original data may be col¬
metric surfaces to represent variations in the elevation lected as point samples distributed regularly or irregu¬
of the land surface (Digital Elevation Models—DEM; larly in space (and/or time), or they can be taken from
also known as Digital Terrain Models—DEM) from already gridded surfaces such as remotely sensed im¬
aerial photographs and satellite imagery where the ages and images from document scanners. Attributes
source data are cheap and the attribute can be observed predicted by interpolation are usually expressed by the
directly. Sparse sampling networks are often imposed same data type as those measured, but some inter¬
by the costs of borings, laboratory analyses, and field polation methods provide means to estimate indicator
surveys—most often, the spatial variation of the attrib¬ functions that show a probability that a given value is
ute of interest cannot be seen and must be derived exceeded, or that a given class may occur.
99
Creating Continuous Surfaces from Point Data
The rationale behind spatial interpolation and extra¬ land have been computed, for example for a given
polation is the very common observation that, on aver¬ map unit on a geological, soil, land cover, or vegeta¬
age, values at points close together in space are more tion map, all information on the short-range varia¬
likely to be similar than points further apart. In gen¬ tion of the attributes has been lost. Most methods of
eral, two observation points a few metres apart are interpolation attempt to use this local information
more likely to have the same altitude than points to provide a more complete description of the way
on two hills some kilometres apart. However, clas¬ an attribute varies within that area. If this variation
sification is also a popular method for predicting can be captured successfully, we may expect that our
values at unsampled locations from estimates of map estimates of the value of any given attribute at unvis¬
unit means or the central concepts of taxonomic ited sites will be better than those obtained from class
classes, irrespective of any spatial association between averages alone. The maps so constructed should result
the measured values within the classes. When mean in smaller errors when used for subsequent overlay
attribute values for ‘homogeneous’ classes or pieces of analyses and quantitative modelling in the GIS.
• Stereo aerial photos or overlapping satellite The location of sample points can be critical for sub¬
images using photogrammetry. sequent analysis. Ideally, for mapping, samples should
• Scanners in satellites or aeroplanes and document be located evenly over the area. A completely regular
scanners sampling network can be biased, however, if it coin¬
• Point samples of attributes measured directly cides in frequency with a regular pattern in the land¬
or indirectly in the field on random, structured, scape, such as regularly spaced drains or trees, and for
or linear sampling patterns, such as regular this reason statisticians have preferred to have some
transects or digitized contours. kind of random sampling for computing unbiased
• Digitized polygon/choropleth maps. means and variances. Completely random location of
sample points has several drawbacks too. First, every
Many data for interpolation come from sampling a point has to be located separately, whereas a regular
complex pattern of variation at relatively few points. grid needs only the location of the origin, the orienta¬
These measurements are often known as ‘hard data’. tion, and spacing to fix the position of every point. This
When data are sparse, it is very useful to have informa¬ is much easier in wooded or difficult terrain, even with
tion on the physical processes or phenomena that GPS. Second, complete randomization can lead to an
may have caused the pattern which are known as uneven distribution of points unless very many points
‘soft information’, and which can assist interpolation. can be measured, which is usually prohibited by costs.
In many cases, however, the physical process is un¬ Figure 5.1 presents the main options available. A
known and we must make do with various kinds of good compromise between random and regular sam¬
assumptions about the nature of the spatial variation pling is given by stratified random sampling where
of the attribute in question. These include assumptions individual points are located at random within regu¬
about the smoothness of the variation sampled, and larly laid out blocks or strata. Cluster (or nested) sam¬
statistical assumptions concerning the probability dis¬ pling can be used to examine spatial variation at several
tribution and statistical stationarity (see next chapter). different scales. Regular transect sampling is often used
100
Creating Continuous Surfaces from Point Data
# #
•
t
—
• •• #
a) regular sampling b) random sampling e) transect sampling
• # # • m
# •» %
•
#• ## t*
%
* •• • • #
• •• #
\
••
c) stratified random sampling d) duster sampling f) contour sampling
Figure 5.1. Different kinds of sampling net used to collect spatial data from point locations
to survey profiles of rivers, beaches, and hillsides, and within a defined area around the geometrically located
digitizing contour lines is a common method of sam¬ sampling ‘point’ and to homogenize this bulked
pling printed maps to make digital elevation models. sample before laboratory analysis. For example if 10
sub-samples were collected and mixed within an area
of 10 x 10 m the support would then be a square of
The support the same dimensions.
The support is the technical name used in geostatistics Enlarging the support by bulked sampling is sens¬
for the area or volume of the physical sample on which ible when the short-range spatial variation of the attri¬
the measurement is made. If the sample is a kilogram bute in question is so large that long-range variations
of soil taken from a soil profile pit, then the support cannot be clearly distinguished. It is also useful when
would be approximately 10x10 cm in area and about data collected by different methods need to be com¬
5 cm thick. If the sample is a litre of groundwater bined, such as soil or water information and informa¬
extracted from a sampling tube then the support has tion collected by remote sensors. Most remote sensing
the dimensions of the column. Because analytical lab¬ scanners collect data which are recorded as single grid
oratory methods usually homogenize the samples by values (pixel values) and pixels can vary in size from a
grinding or mixing, all internal structure or variation few centimetres to several kilometres. The numbers
is lost. Therefore the measurement refers to the area recorded are area-weighted averages of the radiation
or volume of material sampled and not to a larger or received, so the pixel area defines the size of the sup¬
smaller area or volume. port. Ground measurements, however, are usually
When data collected on a given support are used made at much smaller locations and will detect varia¬
to predict values of the same attributes at unsampled tions within the pixels, unless suitably bulked. If the
locations then the predictions refer to locations that support sizes of both sets of observations are not
also have that support, unless procedures such as bulk¬ matched it may be difficult to combine the data sets
ing or spatial averaging are used to relate the observa¬ for modelling or spatial analysis (Burrough 1991a).
tions to larger areas or volumes. These procedures In demographic studies the support is often an irre¬
are known collectively as ‘upscaling’ procedures. The gularly shaped area determined by a census district or
simplest upscaling procedure is to collect a bulk postcode area. These vary in size and shape (and some¬
sample consisting of several small samples taken times over time) so they provide a difficult basis for
101
Creating Continuous Surfaces from Point Data
interpolation. Frequently the data collected are not enced to the coordinates x of any convenient cartesian
measurements of a single attribute such as elevation grid. A predicted value at an unsampled location is
or barometric pressure, but counts of people or socio¬ indicated byz(x0).
economic indicators expressed on a nominal or ordi¬
nal scale. For these kinds of data interpolation may Exact and inexact interpolators An interpolation
serve to bring data from different kinds of support method that predicts a value of an attribute at a sam¬
(post codes and census districts) to a common base. ple point which is identical to that measured is called
an exact interpolator. This is the ideal situation, because
it is only at the data points that we have direct know¬
Terminology ledge of the attribute in question. All other methods
Throughout this book we shall use the following ter¬ are inexact interpolators. The statistics of the differences
minology for data sampled at point locations. The (absolute and squared) between measured and pre¬
attribute value at a data point is denoted by z(x;), dicted values at data pointsz(x,-) — z(xt) are often used
where the subscript i indicates one of a number n of as an indicator of the quality of an inexact interpola¬
possible measurements that are geographically refer¬ tion method.
102
Creating Continuous Surfaces from Point Data
Global interpolation
Methods of interpolation can be divided into two easily available soft information (such as soil types or
groups, called global and local interpolators. Global administrative areas) to divide the area into regions
interpolators use all available data to provide pre¬ that can be characterized by the statistical moments
dictions for the whole area of interest, while local (mean, variance) of the attributes measured at the
interpolators operate within a small zone around locations within those regions.
the point being interpolated to ensure that estimates Regression methods explore a possible functional
are made only with data from locations in the im¬ relation between attributes that are easy to measure
mediate neighbourhood, and fitting is as good as and the attribute to be predicted. These methods can
possible. be based only on the geographical coordinates of the
Global interpolators are mostly used not for direct sample points (in which case the method is called trend
interpolation but for examining, and possibly remov¬ surface analysis), or on some relationship with one
ing, the effects of global variations as caused by major or more spatially variable attributes (in which case the
trends or the presence of various classes of land that empirical regression model is often called a transfer
may indicate areas that have different average values. function).
Once the global effects have been take care of, the Methods such as Fourier transform and wavelets
residuals from the global variations can be interpolated are also used for interpolation (particularly in the
locally. analysis of remotely sensed imagery—Lillesand and
Global methods are usually simple to compute and Kiefer 1987, Buiten and Clevers 1993), but as they re¬
are often based on standard statistical ideas of vari¬ quire large amounts of data at many different levels
ance analysis and regression. Classification methods use of resolution they are not considered here.
When spatial data are sparse it is sometimes conven¬ ‘objects’ like soil mapping units, river terraces, catch¬
ient to assume that the observations are taken from a ment areas, hillsides, breaks of slope, etc. have been
statistically stationary population (i.e. the mean and recognized as useful features for carrying information
variance of the data are independent of both the loca¬ about other aspects of the landscape.
tion and the size of the support). Alternatively, we The simplest statistical model is the ANOVA
can decide that these observations sample some coor¬ model:
dinated spatial change. If we opt for the former, we
automatically select a classificatory approach to spa¬ z(x 0) — /a + otk + £ 5.1
tial prediction, implying that the spatial structure of
variation is determined by these externally defined where z is the value of the attribute at location x0, ju
spatial units. If we opt for classification, we can com¬ is the general mean of z over the domain of interest,
pute our predictions by using well-known, standard ak is the deviation between ju and the mean of unit k,
analysis of variance (ANOVA) methods. and eis residual (pooled within-unit) error, sometimes
Classification by homogeneous polygons assumes known as noise.
that within-unit variation is smaller than that be¬ This model assumes that for each class k the at¬
tween units; the most important changes take place at tribute values are normally distributed; ideally, each
boundaries. This conceptual model is commonly used contains a distinct mode. The mean attribute value per
in soil and landscape mapping to define ‘homogene¬ class k is fa + ak, which is estimated from a set of in¬
ous’ soil units, landscape units, ecotopes, etc. where dependent samples that are assumed to be spatially
103
Creating Continuous Surfaces from Point Data
1fn^(Zi - mf
i
Obviously it is useful to know s and to ensure that it is as small as possible to obtain the
best precision.
The square of the standard deviation is called the variance a2. The variance is a very
useful quantity because variances computed from different sources of variation can be
added together to combine estimates of uncertainty from different sources. So, if we
have a map with pk polygons, containing n, observations per polygon we can compute
three variances, namely a2, the total variance of all observations which is made up of
o\ the between class variance, and a^the within class variance (pooled over all classes).
These variances are estimated by our sample which computes:
S2t = S2b + S2w
The value of dividingthe data according to the pk polygons is indicated by the statistical
significance of the variance ratio s2Js2B for the appropriate degrees of freedom. The
400
300
200
100
0
-3-2-10 1 2 3
standard deviations
104
Creating Continuous Surfaces from Point Data
degrees of freedom are given by pk -1 for s2B and by n -pk for s2w and can be found in
statistical tables of F.
The magnitude of the variation ‘explained’ by the soft data classification into pk poly¬
gons is given by s2B/s2T, which in regression is called R2. The larger R2 the more of the
variation is explained by the pk classes, and the more likely that the mean values of each
class mk will be a good global predictor.
The variation within each polygon can be explored by subtractingthe data valuesz, from
the polygon mean mk in which they are situated.
independent. The mean within-class variance is given others. The assumption of within-class variation be¬
by £, and is assumed to be the same for all classes. ing random implies that differences cannot be mapped
The relative variance {o2wlo]), where a2w is the out at larger mapping scales, which is usually not true:
pooled within-class variance and o] the total sample soil maps are made over a wide range of nested scales,
variance, is a measure of the goodness of the classi¬ with differences being seen at all scales. Data are not
fication. Both can be estimated when all map units necessarily normally distributed and may be distrib¬
contain more than one point sample. The lower the uted lognormally, rectangularly, hyperbolically, or in
relative variance, the better the classification. The other ways. In some cases, normalization by comput¬
analysis of variance is set out in Table 5.1. The sig¬ ing natural logarithms, logit transformations, or other
nificance of the success of the classification can be suitable transformations maybe necessary (Box 5.2).
tested with the usual statistical F-test on the variance When transformation is necessary, it is more sensible
ratio with degrees of freedom m, n — k. to treat each map unit, or indeed each delineation of
each map unit, as a separate entity and to compute
Assumptions individual means and standard deviations if there are
sufficient data.
This approach makes the following assumptions about
Spatial prediction of zinc levels in the topsoil by the
the spatial variation
ANOVA method will be illustrated using soft in¬
• variations in the value of z within the map units formation from a flooding frequency map with three
are random and not spatially contiguous classes: 1 frequent flooding (annual), 2 flooding every
• all mapping units have the same within class 2-5 years, and 3 flooding less than every 5 years. As
variance (noise) which is uniform within the the zinc is carried to the site by polluted river sediment
polygons a reasonable hypothesis is that frequently flooded sites
• all attributes are normally distributed will have greater concentrations of zinc.
• all spatial change takes place at boundaries, which Figure 5.3 shows that, as a group, the zinc measure¬
are sharp not gradual. ments do not appear to be normally distributed with
Note that these assumptions need not necessarily a single mode. They could be log-normally distributed
hold. Some map units or individual occurrences of a or they could be a multimodal distribution with
given map unit might be internally more variable than different means and variances. To see the effect of
105
Creating Continuous Surfaces from Point Data
Flood classes
(a) Data points (b) Flood frequency (c) Zinc levels predicted by flood frequency
Figure 5.2. (a) Location of sites of example data set, {b) flood frequency classes, and (c) zinc levels
per flood frequency class
106
Creating Continuous Surfaces from Point Data
1 769-76 423.17
2 264.97 176.62
3 205.77 105.32
107
Creating Continuous Surfaces from Point Data
When variation in an attribute occurs continuously x, its long range variation can be approximated by the
over a landscape it may be possible to model it by a regression model
smooth mathematical surface. There are several ways
of doing this: all of them fit some form of polynomial z(x) = b0 + bxx + £ 5.2
equation to the observations at the data points so that
where b0 and bx are the polynomial coefficients known
values at unsampled locations can be computed from
respectively as the intercept and the slope in simple
their coordinates.
regression. The residual £ (the noise) is assumed to be
The simplest way to model long-range spatial vari¬
normally distributed and independent of the x values.
ation is by a multiple regression of attribute values
In many circumstances z is not a linear function of
versus geographical location. The idea is to fit a poly¬
x but may vary in a more complicated way. Quadratic
nomial line or surface, depending on whether our data
or still higher order polynomial models such as
are in one or two dimensions, by least squares through
the data points thereby minimizing the sum of squares z(x) = b0 + bxx + b2x2 + £ 5.3
for z(Xj) - z(x,). It is assumed that the spatial coordi¬
nates (x,y) are the independent variables, and that z, can be used. By increasing the number of terms it is
the attribute of interest and the dependent variable possible to fit any set of points by a complicated curve
is normally distributed. Also, it is assumed that the exactly, thereby reducing £ to zero.
regression errors are independent of location, which In two dimensions the polynomials derived by mul¬
is often not the case. tiple regression on x and y coordinates are surfaces of
As a simple example, consider the value of an envi¬ the form
ronmental attribute z that has been measured along a
transect at points xlt x2, xn. If, apart from minor vari¬ f{(x,y)}= ^(brs - xr ■ ys) 5.4
ation, the value of z increases linearly with location, r+s<p
108
Creating Continuous Surfaces from Point Data
of which the first three are: The significance of a trend surface The statistical
significance of a trend surface can be tested by using
b0 flat 5.5 the technique of analysis of variance to partition the
b0 + bx - x + b2- y linear 5.6 variance between the trend and the residuals from the
b0 + bx • x + b2 • y + b3 • x2 trend. Let n be the number of observations, so there
+ • xy + b5 • y2 quadratic 5.7 are (n — 1) degrees of variation associated with the
total variation. The degrees of freedom for regres¬
The integer p is the order of the trend surface. There
sion, m, are determined by the number of terms in
are P = (p + l)(p + 2)12 coefficients that are normally
the polynomial regression equation, excluding the b0
chosen to minimize
coefficient.
For a linear regression, z(x,y) = b0 + bxx + b2y, the
^lz(xi) ~/(xi)}2 5.8 degrees of freedom for regression m = 2. The degrees
i=1 of freedom for the deviations from the regression are
where x is the vector notation for (x,y). So a horizon¬ given by (n - 1) - m. Table 5.4 presents the terms in
tal plane is zero order, an inclined plane is first order, the analysis of variance; the reader will note that this
a quadratic surface is second order, and a cubic sur¬ table is very similar to Table 5.1 for classification.
face with 10 parameters is third order. The regression line or surface can be considered to
Finding the coefficients is a standard problem in be analogous to the classes in the usual analysis of vari¬
multiple regression, so the computations are easy with ance methods. The variance ratio, or ‘F-test’, estim¬
standard statistical packages. Trend surfaces can be ates whether the amount of variance taken up by the
displayed by estimating the value of z(x) at all points regression differs significantly from that expected for
on a regular grid. Contour threading algorithms can an equivalent number of sites with the same degrees
be used to prepare output for a pen plotter. Examples of freedom drawn from a random population. Just as
of trend surfaces computed for the untransformed ‘relative variance’ estimated how much of the variance
zinc data are given in Figure 5.4. remained after classification, so a regression goodness
The advantage of trend surface analysis is that it is of fit (R2) may also be calculated as:
a technique that is superficially easy to understand, at
least with respect to the way the surfaces are calculated. R2=MSd/MSt 5.9
Broad features of the data can be modelled by low-
Using trend surfaces to model the concentration of
order trend surfaces, but it becomes increasingly dif¬
zinc over the floodplain as a continuous surface yields
ficult to ascribe a physical meaning to complex, higher
different maps depending on the order of the regres¬
polynomials. The surfaces are highly susceptible to
sion surface chosen (Figure 5Aa-d). The goodness of
edge effects, waving the edges to fit the points in the
fit (R2) values show that even the higher-order surfaces
centre of the area, with the result that second order
with 21 (fifth order) or 28 (sixth order) coefficients
and higher surfaces may reach ridiculously large or
do not fully represent all the variation in the data. In¬
small values just outside the area covered by the data.
deed, the higher-order surfaces (4th, 5th, 6th—the last
Because it is a general interpolator, the trend surfaces
two not shown) predict negative values of zinc in the
are very susceptible to outliers in the data. Trend sur¬
south-eastern corner of the area, which is clearly a
faces are smoothing functions, rarely passing exactly
serious distortion of reality.
through the original data points unless these are few
and the order of the surface is large. It is implicit in Order: 1234 56
multiple regression that the residuals from a regres¬ R2 .183 .475 .560 .687 .767 .802
sion line or surface are normally distributed independ¬
ent errors. The deviations from a trend surface are Although the R2 values improve with the order of the
almost always to some degree spatially dependent; in regression, at first sight we have no way of judging
fact one of the most fruitful uses of trend surface ana¬ whether increasing the order of the polynomial sig¬
lysis has been to reveal parts of a study area that show nificantly increases the fit of the data. The significance
the greatest deviation from a general trend (Burrough of improvement can also be estimated using an ana¬
et al. 1977; Davis 1986). The main use of trend sur¬ lysis of variance, as shown in Box 5.3. Even if signifi¬
face analysis then, is not as an interpolator within a cantly better fits can be obtained with higher-order
region, but as a way of removing broad features of the polynomials it is not physically sensible to choose a
data prior to using some other local interpolator. trend surface that has no physical explanation.
109
Creating Continuous Surfaces from Point Data
Figure 5.4. Simple global trend surfaces for untransformed zinc data
110
Box 5.3. ESTIMATING THE SIGNIFICANCE OF USING A HIGHER DEGREE POLYNOMIAL IN TREND SURFACES
Regression degree p + i SS
~'~Jrp+1 m MS rn+1 MSrp+1/MSdp+1
Deviation from p + i CC
~J~Jdp+1 n -m -1 MSdp+1
/Is an example, to test for significant improvement in fit from order 2 to order 3 we compute:
ORDER 2 source sums of squares degrees of freedom mean square variance ratio
ORDER 3 source sums of squares degrees of freedom mean square variance ratio
Both surfaces are significant according to a F-test on the variance ratio. To see if there is a significant improvement
when increasing the order of the regression surface from two to three we compute:
source sums of squares degrees of freedom mean square variance ratio
The variance ratio of 4.27 (degrees of freedom 4,88) is larger than the 1% tabulated value of 3.56 so we conclude
that the third order surface is significantly better.
ill
Creating Continuous Surfaces from Point Data
Consideration of Figures 5.2 and 5.4 shows that for the example of mapping using flood frequency classes
the example data set there is a clear geographical rela¬ showed. Because the attributes Distance to the river
tion between the zinc content of the soil and the dis¬ and floodplain elevation are cheap to map, it is pos¬
tance to the river. From other studies (e.g. Leenaers et sible that we could improve the spatial predictions
al. 1989a, 1989b) it is known that heavy metal pollut¬ of zinc content if we could derive an empirical model
ants in floodplain soils depend on several factors, the of the relation between zinc and these independent
most important of which is the distance to the source variables.
(i.e. the river) and the elevation of the floodplain. In The regression model is of the form
this area, coarse polluted sediments are deposited with
z(x) = b0 + b1P1 + b2P2 + £ 5.10
sand in the river levees and fine polluted sediments
settle out in low-lying areas where flooding is frequent where b0. . . bn are regression coefficients and Pl ... Pn
and inundation persists for longer periods. Areas less are independent properties. In this case Pl is Distance
frequently flooded receive a smaller pollutant load, as to river and P2 is Elevation.
Figure 5.5. Scattergrams of independent and dependent terms in regression of zinc on distance to
river and floodplain elevation
112
Creating Continuous Surfaces from Point Data
Figure 5.6. Results of computing zinc levels by regression from maps of floodplain elevation
and distance to river
Figure 5.5 presents histograms for the independent Such a regression model is often called a transfer func¬
variables of Distance to the river (DRiver—metres), tion (Bouma and Bregt 1989); it can be computed
floodplain elevation (elevation—metres), and the easily in most GIS (see Chapter 7). The source maps
dependent zinc concentration (ppm); the histograms can be obtained in several ways and may be in vector
of logarithmically DRiv and zinc are also included. or raster format (see Chapters 7 and 8). Figure 5.6
Inspection of Figure 5.5 indicates that it is best to use shows the results.
the logarithmically transformed variables in the mul¬ The same procedure may be used with many other
tiple regression. Methods of multiple regression are sets of independent and dependent variables, such as
available in most standard statistical packages. temperature and altitude, rain with distance from the
For the 98 data points of the Meuse zinc example, sea, vegetation composition as a function of mois¬
multiple regression of ln(zinc) against elevation and ture surplus, number of clients and income levels,
In (Distance to river) gives the following relation: etc. Geographical coordinates and associated attrib¬
utes can be combined in one regression to use as much
ln(zinc) = 10.000 - 0.292(elevation)
information from the data as possible. The most im¬
- 0.333(lnD.river)
portant point to consider is that the regression model
+ 0.394(resid error) (5.11)
makes physical sense. Note that all regression transfer
R2 = .749
models are inexact interpolators.
All the methods presented so far have imposed an point to be similar to values measured close by. There¬
external, global spatial structure on the interpolation. fore people have sought local methods of interpolation
In all cases short-range, local variations have been dis¬ that use the information from the nearest data points
missed as random, unstructured noise. Intuitively, this directly. For this approach, interpolation involves (a)
is not sensible as one expects the value at an unvisited defining a search area or neighbourhood around the
113
Creating Continuous Surfaces from Point Data
point to be predicted, (b) finding the data points We examine all these points in terms of different inter¬
within this neighbourhood, (c) choosing a mathemat¬ polation functions, specifically:
ical function to represent the variation over this lim¬ • Nearest neighbours
ited number of points and (d) evaluating it for the • Inverse distance weighting
point on a regular grid. The procedure is repeated until • Splines (exact and thin plate smoothing) and
all the points on the grid have been computed. other non-linear functions (e.g. Laplacian)
The following issues need to be addressed: • Optimal functions using spatial covariation.
• The kind of interpolation function to use. All these local functions smooth the data to some
• The size, shape, and orientation of the neighbour¬ degree in that they compute some kind of average
hood. value within a window or search distance. The first
• The number of data points. three kinds of function are examined below; optimal
• The distribution of the data points: regular grid functions are the subject of Chapter 6.
or irregularly distributed?
• The possible incorporation of external informa¬
tion on trends or different domains.
Thiessen (otherwise known as Dirichlet or Voronoi) ity within borders and all change at borders. As there
polygons take the classification model of spatial pre¬ is only one observation per tile, there is no way to
diction to the extreme whereby the predictions of estimate within-tile variability, short of taking replic¬
attributes at unsampled locations are provided by the ate observations.
nearest single data point. Thiessen polygons divide a
region up in a way that is totally determined by the
configuration of the data points, with one observation
per cell. If the data lie on a regular square grid, then
the Thiessen polygons are all equal, regular cells with
sides equal to the grid spacing; if the data are irregu¬
larly spaced, then an irregular lattice of polygons
results (Figure 5.7). The lines joining the data points
show the Delaunay triangulation, which is the same
topology as a TIN (Chapter 3).
Thiessen polygons are often used in GIS and geo¬
graphical analysis as a quick method for relating point
data to space; a common, but implicit use of Thiessen
polygons is the assumption that the meteorological
data for any given site can be taken from the nearest
climate station. Unless there are many observations
(which usually there are not), this assumption is not
really appropriate for gradually varying phenomena
like rainfall and temperature and air pressure because
(a) the form of the final map is determined by the
distribution of the observations, and (b) the method Figure 5.7. An example of a Thiessen polygon net and the
maintains the choropleth map fiction of homogene¬ equivalent Delaunay triangulation
114
Creating Continuous Surfaces from Point Data
Figure 5.8. (a) data locations, (b) Thiessen polygons, (c) zinc levels per Thiessen polygon
for the zinc data, (d) pycnophylactic interpolation of zinc from the Thiessen polygon data
An advantage of Thiessen polygons is that they can plain soil. Instead of computing averages for externally
be easily used with qualitative data like vegetation defined units, the spatial variation of zinc over the
classes or land use if all you need is a choropleth map floodplain is mapped simply by assigning the meas¬
and do not mind the strange geometrical pattern of ured values to their closest neighbouring cells (Figure
the boundaries. Because all predictions equal the 5.8b,c). The Thiessen polygon map suggests that the
values at the data points, Thiessen polygons are exact zinc map obtained from information on flooding fre¬
predictors. quency polygons masks considerable spatial variation
Figure 5.8a-c demonstrates the use of Thiessen of zinc levels, particularly within the flood frequency
polygons for mapping the zinc content of the flood¬ classes 2 and 3 (cf. Figure 5.2c).
115
Creating Continuous Surfaces from Point Data
13.4
12.6
io 11.8
7X1 8 11.0
tOOO persons 10.2
9.4
8.6
7.8
7.0
WOO persons
Figure 5.9. Population counts displayed by (a) uniform for administrative areas and (b) by
pycnophylactic interpolation
ll6
Creating Continuous Surfaces from Point Data
The pycnophylactic method can be applied to the though continuous, the method is obviously not an
zinc data or other data sampled at points such as rain¬ exact interpolator, in the sense that the values inter¬
fall amounts, by converting the original data into a polated for individual grid cells differ from the ori¬
density function. This is done by computing the total ginal measurements at the same cell. The highest and
amount of zinc within each Thiessen polygon to ob¬ lowest values obtained by pycnophylactic interpola¬
tain a pseudo count of the zinc ‘population’ for each tion are respectively much greater, and much less, than
polygon. The resulting pattern (Figure 5.8d) is sim¬ those actually measured, which for sampled data like
ilar to that obtained by other smooth interpolators zinc could bring problems of interpretation.
(see sections 5.2, 5.3, and Figures 5.9 and 6.6) but
Inverse distance methods of interpolation combine the are computed from a linear function of distance be¬
ideas of proximity espoused by the Thiessen polygons tween sets of data points and the point to be predicted.
with the gradual change of the trend surface. The Inverse distance interpolation is commonly used in
assumption is that the value of an attribute z at some GIS to create raster overlays from point data. Once
unvisited point is a distance-weighted average of data the data are on a regular grid, contour lines can be
points occurring within a neighbourhood or window threaded through the interpolated values and the map
surrounding the unvisited point. Typically the ori¬ can be drawn as either a vector contour map or as a
ginal data points are located on a regular grid or are raster shaded map.
distributed irregularly over an area and interpola¬ Figures 5.10a,b,c show the zinc data interpolated
tions are made to locations on a denser regular grid in with inverse distance weighting with values of r or 1,
order to make a map. 2, and 4. Note that the form of the resulting map is
Weighted moving average methods compute strongly dependent on the value of r. Conclusions
drawn about the levels of zinc pollution when r = 1
n n could be quite different from those when r = 4.
z(x0 ) = X^"*(x*) = 1 5.15 Inverse distance interpolation is forced to be an
i=1 i=1 exact interpolator because it produces infinities when
X(dA = 0 (i.e. at the data points) so if the output
where the weights are given by 0(d(x,x;)). A re¬ grid coordinates equal those of a sampling point the
quirement is that O(d) —> missing value as d —> 0, observed, unsmoothed values must be copied over.
which is given by the commonly used reciprocal or The form of the map also depends on the cluster¬
negative exponential functions d~r, e~{d) and e~(d The ing in the data and on the presence of outliers. Inverse
most common form of O(d) is the inverse distance distance interpolations commonly have a ‘duck-egg’
weighting predictor whose form is: pattern around solitary data points with values that
differ greatly from their surroundings, though this can
n n be modified to a certain extent by altering the search
z(x0) = 5>(x,-) • d~[ / ^d~f 5.16 criteria for the data points to account for anisotropy.
i=1 / i=1 The method has no inbuilt method of testing for the
quality of predictions so the map quality can only be
where the x, are the points where the surface is to be assessed by taking extra observations. Note that these
interpolated and the are the data points. Because in must be for the same support as the original observa¬
equation 5.16, O(d) —» °o as d —» 0, the value for an tions, though it is arguable that since inverse distance
interpolation point that coincides with a data point interpolation smooths within a zone proportional to
must be simply copied over. The simplest form of this the value of r, this is the true resolution of the inter¬
is called the linear interpolator, in which the weights polation and not the sample support.
117
Creating Continuous Surfaces from Point Data
Zinc ppm
1900
1675
1450
1225
1000
775
550
325
100
(a)r= 7 (b)r = 2
1994
1747
1499
1252
1005
757
510
263
15
Figure 5.10. (a-c) Inverse distance interpolation showing the effect of the weighting parameter
on the results; (d) thin plate spline mapping of zinc
Splines
Before computers were used to fit curves to sets of data in place by weights on pegs at the data points while the
points, draughtsmen used flexible rulers to achieve the line was drawn (Pavlidis 1982). The modern equiva¬
best locally fitting smooth curves by eye. The flexible lent is the plastic coated flexible ruler sold in most
rulers were called splines. The flexible rulers were held office equipment shops. It can be shown that the line
118
Creating Continuous Surfaces from Point Data
denote the constraints on the spline. When r = 0, there mean local changes; {b) exact splines round off sharp corners;
(c) choosing to locate break points at or between data points
are no constraints on the function; when r = 1 the
has a large effect on the resulting spline
function is continuous without any constraints on its
derivatives. If r = m + 1, the interval x0, xk can be rep¬
resented by a single polynomial, so r = m is the max¬
imum number of constraints that leads to a piece-wise B-splines are often used for smoothing digitized
solution. For m = 1, 2, or 3, a spline is called linear, lines for display, such as the boundaries on soil and
quadratic, or cubic. The derivatives are of order 1, 2, geological maps where cartographic conventions ex¬
m - 1, so a quadratic spline must have one continuous pect smooth, flowing lines. Pavlidis notes that com¬
derivative at each knot, and a cubic spline must have plex shapes such as those occurring in text fonts, can
two continuous derivatives at each knot. For a simple be more economically defined in terms of B-splines
spline where r=m there are only k + m degrees of free¬ than in sets of data points. The use of B-splines for
dom. The case of r = m = 3 has particular significance smoothing polygon boundaries, however, can lead to
because the term spline was first used for cubic piece- certain complications, particularly when computing
wise polynomial functions. The term bicubic spline is areas and perimeters. If the area of a polygon is calcu¬
used for the three-dimensional situation where sur¬ lated from digitized data points using the trapezoidal
faces instead of lines need to be interpolated. Equa¬ rule (Box 3.3), it will be different from the area that
tion 5.18 means that adjacent polynomials have the results from smoothing the boundaries with B-splines.
same value at a break point for a given j = r. Another problem may arise when high-order B-splines
Because of difficulties of calculating simple splines are used to smooth sinuous boundaries that also in¬
over a wide range of separate sub-intervals, such as clude sharp, rectangular corners (Figure 5.lib).
might be the case with a digitized line, most practical A problem with using splines for interpolation is
applications use a special kind of spline called B- whether one should choose the break points to coin¬
splines. B-splines are themselves the sums of other cide with the data points or to be interleaved with
splines that by definition have the value of zero out¬ them. Different results for the interpolated spline
side the interval of interest (Pavlidis 1982). B-splines, may result from the two approaches (Figure 5.11c).
therefore, allow local fitting from low-order polynom¬ Note that with splines the maxima and minima do not
ials in a simple way. necessarily occur at the data points.
119
Creating Continuous Surfaces from Point Data
Thin plate splines as surface interpolators models where it is necessary to interpolate large
Exact splines are commonly used in GIS and drawing areas quickly and efficiently—see Hutchinson 1995,
packages for smoothing curves and contour lines to Mitasova et al. 1995. Both texts demonstrate their use
improve appearance where it is assumed that exact for multivariate interpolation of attributes such as
interpolation is required. When data have been sam¬ annual mean rainfall from a trivariate spline function
pled in two or three dimensions, effects of natural vari¬ of longitude, latitude, and elevation.
ation and measurement errors may be such that an
exact spline may produce local artefacts of excessively Using splines for the zinc data Figure 5.10d shows
high or low values. These artefacts can be removed by the variation of zinc as mapped by splines. Note that
using thin plate splines, in which the exact spline sur¬ the spline surface tends to ‘draw up’ the areas around
face is replaced by a locally smoothed average. As with sample sites with large zinc values.
line interpolation, spline methods of surface inter¬
polation assume that an approximate function should Advantages and disadvantages of splines Because
pass as close as possible to the data points while also splines are piece-wise functions using few points at a
being as smooth as possible. time, the interpolating values can be quickly calcu¬
When the data contain a source of random error, lated. Test data for smooth surfaces show predictions
i.e.: are very close to the values being interpolated, pro¬
viding the measurement errors associated with the
y(Xi)=z(Xi) + e(xi) 5.19
data are small (Mitasova et al. 1995). In contrast to
where z is the measured value of an attribute at point trend surfaces and weighted averages, splines retain
Xi, and epsilon is the associated random error. The small-scale features. Both linear and surficial splines
spline function p(x) should pass ‘not too far’ from the are aesthetically pleasing and quickly produce a clear
data values and so the smoothing spline is the func¬ overvi ew of the data. The smoothness of splines means
tion/that minimizes that mathematical derivatives can easily be calculated
for direct analysis of surface geometry and topology
(see Chapter 8 and Figure 5.14a). The incorporation
Mf) + Xw ?[/(**) - y(xi)l2 5-20 of linear parametric sub-models (regression models)
i-1
makes the interpolation of dependent variables on
The term A(f) represents the ‘smoothness’ of the func¬ point supports easy.
tion f and the second term represents its ‘proximity’ Some disadvantages of splines have already been
or ‘fidelity’ to the data. The weights wf are chosen to mentioned. Others are that there are no direct estim¬
be inversely proportional to the error variance; ates of the errors associated with spline interpolation,
though these may be obtained by a recursive technique
w] =p/Var [£(*,-)]-pis] 5.21
known as ‘jack-knifing’ (Dubrule 1984). The most
where the value of p reflects the relative importance critical disadvantage may be that thin plate splines
given by the user to each characteristic of the smooth¬ provide a view of reality that is unrealistically smooth;
ing splines. in some situations, such as the estimation of attribute
Smoothing, thin plate splines are frequently used values for numerical models, this property could gen¬
for interpolating elevation to create digital elevation erate misleading results.
We have explained several different kinds of inter¬ Table 5.5 compares all methods in terms of the min¬
polation technique, but how do they compare? Which imum and maximum interpolated values, and the
produces the most consistent results, when should one proportions of the area estimated to be above the
method be preferred to another? To give a first insight, thresholds of 500, 1000, and 1500 ppm zinc.
120
Creating Continuous Surfaces from Point Data
Method Minimum Maximum Per cent area Per cent area Per cent area
value (ppm) value (ppm) >500 ppm >1000 ppm >1500 ppm
Flood frequency
(untransformed) 206 770 11.96 0.00 0 00
Trend surface—order 1 -98 791 38.53 0.00 0 00
Trend surface—order 2 109 1350 31.88 3-37 0.00
Trend surface—order 3 -13 1256 33.26 3.66 0.00
Trend surface —order 4 -100 1500 28.30 5.88 0.69
Regression on distance
+ elevation 142 1164 17.82 0.64 0.00
Thiessen polygons 113 1839 28.11 9.65 5.3i
Pycnophylactic method 0 2296 28.35 10.15 8.87
-/•w/
Inverse distance r= 1 136 1541 28.92 1.16 0.03
Inverse distance r- 2 114 1827 28.82 3-99 0 61
Inverse distance r = 4 113 1839 28.47 7.05 2.49
Thin plate splines 15 1994 30.44 6.74 1.87
Comparing these results we see that global trend limits produced by all these methods is greatest for the
surfaces of orders higher than 3 are u nreliable. Trend values exceeding 1000 ppm; without independent data
surfaces are therefore really only useful for describing and a review of the performance of the methods on
broad geographical trends (cf. Burrough et al 1977). other data sets it is impossible to say which method is
Pycnophylactic methods and thin plate splines stretch generally the best. The inverse distance and thin plate
the maximum and minimum values above and below splines seem to produce the most ‘natural’ looking
the recorded values but inverse distance with a r = 2 surfaces, but this may just be a reflection of our cul¬
gives results closest to the original data. The range of tural preferences, as these surfaces are always much
predictions of the areas exceeding the three arbitrary smoother than the underlying reality.
This section examines the digital elevation model to derived products such as block diagrams and digital
(DEM) as a special case of an interpolated, continu¬ orthophotos. Other products, created from the math¬
ous surface that has many uses in GIS. DEMs were ematical derivatives of DEMs are explained in Chapter
originally computed as a precursor of the orthophoto 8 as part of the spatial analysis of continuous surfaces.
map, but today they have many other applications
(Box 5.4), including the eye-catching ability to display
spatial data over the underlying landform (e.g. Plate Methods of representing DEMs
1.4). This section explains how DEMs are created and The variation of surface elevation over an area can be
how they can be modelled in the computer, leading modelled in many ways. DEMs can be represented
121
Creating Continuous Surfaces from Point Data
either by mathematically defined surfaces or by point are becoming increasingly available for these and other
or line images. Line data can be used to represent con¬ countries.
tours and profiles, and critical features such as streams, Figures 5.12 and 5.13 give examples of altitude
ridges, shorelines, and breaks in slope. In GIS, DEMs matrix DEMs. Figure 5.12 shows a DEM with a cell
are modelled by regular grids (altitude matrices) and size of 30 m x 30 m. Elevation data may be displayed
triangular irregular networks (TINs). The two forms in several ways—Figure 5.12 shows a grey scale image
are inter-convertible and the preference for one or the of simple elevation and a grey scale image of surface
other depends on the kind of data analysis that needs aspect (Chapter 8 explains how this is derived). By
to be carried out. choosing an appropriate grey scale, the image appears
Altitude matrices are the most common form of dis¬ to be illuminated from the north (top), thereby pro¬
cretized elevation surface. Originally they were derived viding a sense of three-dimensional relief. Figure 5.13
from quantitative measurements of stereoscopic aerial shows a similar image from another area, but now the
photographs made on analytical stereoplotters such as aspect information is ‘draped’ over the DEM to yield
the GESTALT GPM-II (Kelly et al 1977). The DEM a block diagram that portrays the relief naturally. Alti¬
was a by-product needed to produce scale-correct tude matrices are the starting-point for deriving much
Orthophoto maps (see below). Alternatively, the alti¬ useful information about landform, such as slope, pro¬
tude matrix can be produced by interpolation from file convexity, solar irradiance, lines of sight, and sur¬
irregularly or regularly spaced data points in the same face topology, as explained in Chapter 8.
way as other quantitative data.
Because of the ease with which matrices can be
handled in the computer, in particular in raster-based The Triangular Irregular Network (TIN)
geographical information systems, the altitude matrix Although altitude matrices are useful for calculating
has become the most available form of DEM. Britain, contours, slope angles and aspects, hill shading and
Australia, the United States of America and indeed automatic basin delineation (see Chapter 8), the regu¬
much of the world, are completely covered by coarse lar grid system is not without its disadvantages. These
matrices derived from 1 : 250 000 scale topographic disadvantages include (a) the large amount of data
maps (grid cell sides of 1-5 km cell size are available redundancy in areas of uniform terrain (b) the inabil¬
on the Internet). Higher-resolution matrices based on ity to adapt to areas of differing relief complexity
1 : 50 000 or 1 : 25 000 maps and aerial photography without changing the grid size, (c) the exaggerated
122
Creating Continuous Surfaces from Point Data
Figure 5.12. (Left) Grey-scale altitude matrix (pixel size 30 x 30 m); (right) Aspect map shaded
to enhance effect of shaded relief. Data courtesy A. Skidmore
Figure 5.13. Shaded relief data draped over a block diagram creates a strong impression of the
relief (data courtesy S. M. de jong)
123
Creating Continuous Surfaces from Point Data
emphasis along the axes of the grid for certain kinds topographic structures such as peaks, ridges, valley
of computation such as line-of-sight calculations. bottoms, and breaks of slope. He evaluated three
The Triangular Irregular Network (or TIN) was methods (known respectively as the VIP—‘Very im¬
designed by Peuker and co-workers (Peuker et al. portant points’ method, the HT—hierarchy transform
1978) for digital elevation modelling that avoids the methods, and the DH—the drop heuristic method)
redundancies of the altitude matrix and which at the and showed that each had different advantages and
same time would also be more efficient for many types disadvantages. For example, the VIP method was best
of computation (such as slope) than systems of that at locating critical points (peaks or pits), while the
time that were based only on digitized contours. A HT method has a very efficient data structure at the
TIN is a terrain model that uses a sheet of continu¬ expense of producing long, thin triangles. The DH
ous, connected triangular facets based on a Delaunay method minimizes the loss of information every time
triangulation of irregularly spaced nodes or observa¬ a data point is dropped from the grid, but it requires
tion points (Figure 5.7). Unlike altitude matrices, the much larger computing resources. Lee found that the
TIN allows extra information to be gathered in areas absolute performance of all three methods depends on
of complex relief without the need for huge amounts the choice of parameter values and so was unable to
of redundant data to be gathered from areas of simple make absolute recommendations as to which algo¬
relief. Consequently, the data capture process for a TIN rithm should be used. The choice seems to depend on
can specifically follow ridges, stream lines, and other the user’s aims—detection of extremes, efficient data
important topological features that can be digitized to storage, and retrieval or minimizing loss of informa¬
the accuracy required. TINs provide efficient, accur¬ tion; these are difficult properties to trade off against
ate data storage of elevation data at the expense of each other.
introducing a triangular discretization that may hinder Recently Dutton (1996) has proposed an interest¬
some kinds of spatial analysis, such as the derivation ing alternative compact method for modelling planet¬
of surface geometry and topology. ary relief based on recursive polyhedral tesselation of
TINs are modelled with a topological vector struc¬ a sphere into equilateral triangular facets. The method
ture similar to those used for polygon networks, and is known as the Geodetic Elevation Model of Planetary
the TIN data structure was explained in Chapter 3. The Relief because it attempts to bring the whole of the
main difference with vector polygons is that the TIN earth’s surface into one system. Horizontal coordin¬
does not have to make provision for islands or holes. ates are implicit in the hierarchy of nested triangles
Peuker et al. {1978) demonstrated that the TIN struc¬ and only elevations are stored, using single bit flags to
ture can be built up from data captured by manual quantize height changes. It is claimed that an entire
digitizing or from automated point selection and tri¬ terrain can be coded using less than one bit of data for
angulation of dense raster data gathered by automated each triangular facet.
orthophoto machines. They have also shown that the
TIN structure can be used to generate maps of slope,
shaded relief, contour maps, profiles, horizons, block Converting TINs to altitude matrices
diagrams, and line of sight maps, though the final map The transformation of elevation data from TIN to
images retain an imprint of the Delaunay triangula¬ raster is achieved by laying a regular grid of the re¬
tion (Figure 5.146). Information about surface cover quired resolution and orientation over the TIN. Each
can be incorporated by overlaying and intersecting the grid cell is visited in turn, checked to see in which are
TIN structure with the topological polygon structure the nearest TIN apices and a linear or bicubic average
used for many thematic maps of discrete variables. The of the TIN heights is computed.
TIN structure has been adopted for at least one com¬
mercially available geographical information system.
Data sources for DEMs
Unlike many other kinds of quantitative data, eleva¬
Converting altitude matrices to TINs
tion of the land is easy and cheap to measure. Data
Lee (1991a) has reviewed four different methods for sources include direct measurement in the field with
converting gridded elevation data to a TIN. The aim theodolite and GPS, stereo aerial photographs, scan¬
of the conversion, as he saw it, was to extract the small¬ ner systems in aeroplanes and satellites, and the digit¬
est possible set of irregularly spaced elevation points izing of contour lines on paper maps. For systematic
that provides a maximum of information about mapping, elevation data are derived by the methods
124
Creating Continuous Surfaces from Point Data
Canoe Valley
(a) DEM constructed from splines. (b) DEM constructed from TIN
Figure 5.14. Digital elevation models generated a) from Splines, and b) from a TIN
(data courtesy of A. de Roo and T. Poiker)
oi photogrammetry (ASPRS 1980—Manual ofPhoto- of elevation data we have complete coverage of the
grammetry) from overlapping stereoscopic aerial pho¬ landscape at the level of resolution of the image. The
tographs and satellite imagery. For special purposes, creation of a DEM is then the extraction of data to cre¬
other kinds of scanners can be used, such as airborne ate a proper hypsometric surface at a level of resolu¬
laser interferometry for high-accuracy surface meas¬ tion appropriate for the application, including the
urements (Plates 3.5, 3.6; Hazelhoff, pers. comm.). geometric correction and removal of distortion with
Sonar scanners mounted on boats, submarines, or respect to the chosen spheroid, projection, and ori¬
hovercraft are also used for surveying the elevation entation. Distortions in the data caused by tilt and
patterns of sea and lake beds, or ground-penetrating wobble in the viewing platform (aeroplane or satel¬
radar and seismic technologies for mapping the eleva¬ lite), variations in land elevation, and atmospheric
tion of sub-surface layers. effects must also be removed to give a good product.
As there is often an abundance of data, local, sim¬
ple (linear) methods of interpolation are often better
Creating digital elevation models from stereo
than complex interpolation methods because they do
AERIAL PHOTOGRAPHY AND SATELLITE IMAGES
not need to make assumptions about the spatial inter¬
actions and they are quick to compute. When stereo Makarovic (1976) distinguished several methods of
aerial photographs and satellite images are the source photogrammetric sampling for DEMs. Selective sam-
125
Creating Continuous Surfaces from Point Data
points along the rows and columns is calculated. The The most common line model of terrain is the set of
second differences are then calculated. These carry contour lines on printed maps, and digitizing the
126
Creating Continuous Surfaces from Point Data
Figure 5.16. Interpolating from digitized contour lines with a simple search circle can create
serious distortions and errors in the interpolated surface (see also Plate 4.5)
contours provides a ready source of data for digital points all having the same z value. Estimating a z value
terrain models. Great efforts have been made by Na¬ for an unsampled location on a grid involves finding
tional Mapping Agencies to capture them automatic¬ a certain number of data points within a given search
ally using scanners (see Chapter 4). This is in spite of radius and then computing a weighted average. If all
arguments that digitizing existing contours produces data points have the same z value, the result will also
poorer quality DEMs than direct photogrammetric be that very same z value. The net result is that nar¬
measurements (e.g. Yoeli 1982). Unfortunately, digit¬ row areas bordering each contour zone are all inter¬
ized contours are not especially suitable for comput¬ polated with the same z value so that each contour
ing slopes or for making shaded relief models and so is transformed into a ‘padi’ (or rice) terrace. The
they must be converted to an altitude matrix. problem is usually greater in areas of low relief where
Unsatisfactory results are often obtained when peo¬ contour lines are further apart and the chance that the
ple attempt to create their own DEMs by digitizing search algorithm only catches data from one contour
contours and then using local interpolation methods line is greatest.
like inverse distance weighting and kriging (see next Of course, other errors result from the stepped
chapter) to interpolate the digitized contours to a regu¬ ‘padi’ DEMs; computing slopes (Chapter 8) often
lar grid. The problem is not so much the assumptions yields a map with unnatural tiger stripes (Plate 4.5).
behind the computation of the interpolation weights Very often these errors are overlooked, because their
as in the geometry of the search algorithms Apart origin is not understood. For example, the Walker data
from the problem of assuming that smooth contour set used by Isaaks and Srivastava (1989) appears to be
lines are true representations of terrain and locational corrupted by errors resulting from interpolating from
errors are only caused by paper stretch and digitizer digitized contours as evidenced by banding in their
placement, interpolating digitized contours to a regu¬ figures 5.8 and 5.9c and 5.10c.
lar grid can result in the creation of severe artefacts A better solution to contour line problems is to use
in the resulting surface. Curiously enough, the more another interpolation method that is designed to deal
care that is taken to digitize a contour line with many with the kinds of data yielded by digitizing contours.
sampled points, the greater the problem. If this is not available, the best strategy is to thin the
The reasons for the errors can be seen in Figure 5.16. digitized contour points to the very minimum, to add
Accurately digitized contour lines return sets of data extra points to the data set to indicate peaks, ridges,
127
Creating Continuous Surfaces from Point Data
valley bottoms, and breaks of slope, and to inter¬ form, atmospheric effects, and the variation in eleva¬
polate with a large search window that uses many data tion of the land surface. Scanned paper maps contain
points. This will be computationally more demand¬ distortions induced by paper stretch and folding. Ail
ing but it provides better results than simple inter¬ these sources of error need to be removed before the
polation methods. data can be correctly registered to an accurate geo¬
Ceruti (1980) and Yoeli (1984) described algo¬ metrical reference base.
rithms that were specifically designed for interpolat¬ The methods used to correct the geometry of a
ing contour lines to altitude matrices and Oswald and discretized surface, possibly to produce in addition one
Raetzsch (1984) presented a system, known as the with a different level of resolution, cell size, or orienta¬
Graz Terrain Model for generating discrete altitude tion from the original, are collectively known as con¬
matrices from sets of polygons representing contours volution. The problem is solved by assuming that the
that have been digitized either manually or by raster original gridded data are discrete samples from a con¬
scanning supplemented by drainage and ridge lines. tinuous statistical surface. A set of values for a new
The Graz Terrain Model generates an altitude matrix tessellation is obtained by using a local interpolation
from digitized contours in several steps. First a grid technique to convert data from the original array to
of an appropriate cell size is overlaid on the digital the new. For large raster arrays this is a very intensive
image of the contour envelopes (contour polygons) computing problem because every output pixel must
and the ridge and drainage lines. All cells lying on or be computed separately using data from its neigh¬
immediately next to a contour line are assigned the bours. Consequently, warping or transforming large
height value of that contour. All other cells are assigned raster images used to take considerable amounts of
a value of —1. These other cells are assigned a height computer time but technological developments in
value in the following step which is a linear interpola¬ computer processors and memories have reduced the
tion procedure working within rectangular subsets or time required to rotate a large raster array from sev¬
windows of the raster database. Interpolation usu¬ eral hours to a fraction of a second. Warping involves
ally takes place along four search lines oriented N-S, two separate processes. The first is the computation
E-W, NE-SW, NW-SE. The interpolation proceeds of the addresses of the output pixels relative to the in¬
by computing the local steepest slope for the window put data matrix. The general equation for a first order
as a simple function of the difference between the warp (translation, rotation, scale) given by Adams
heights of cells that already have a height value as¬ et al. (1984) is:
signed. For each window the slopes are grouped in four
VI — Uq ~\~ ^1^3Y
classes. Beginning with the class of steepest slope, un¬
v=b0 + blb2x + bxb3y 5.22
assigned cells within the window are assigned a height;
the procedure is repeated for the other slope classes where x,y are the original pixel coordinates, u,v are the
excepting flat areas which are computed separately new coordinates, a0 and b0 are translation values, ax
after all steep parts of the DEM have been computed. and b1 are x and y scale values respectively, and a2, b2
Oswald and Raetzsch (1984) claim that this inter¬ and a3, b3 are dependent on the angle of rotation # as
polation method, the ‘sequential steepest slope algo¬ given by
rithm’, is a robust and useful technique. Note that
a2 = cos # b2 = -sin#
commercial GIS may include procedures for inter¬
a3 = sin# b3 = cos# 5.23
polating from contours but do not disclose the algo¬
rithms, so the user has no idea of the quality of the For warps that are not coplanar, such as in the prob¬
result. Carrara et al. (1997) report a comparison lem of fitting satellite imagery to the curved surface of
of several widely-available methods for generating the earth to match a conventional map projection such
DEM’s from digitized contours in which commercial as the Universal Transverse Mercator, higher-order
methods performed well. warps must be used. Several methods are used for the
local interpolation. The simplest, and most limited, is
to interpolate the cell value on the warped surface from
Geometry correction for altitude matrices
its closest neighbour on the original surface. A bet¬
AND OTHER RASTER DATA
ter alternative is a bilinear interpolater in which the
All scanned imagery collected from airborne and space new value is computed from the four input pixels
platforms contains geometrical distortions because of surrounding the output pixel (Figure 5.17a). The
the curvature of the earth, tilt and wobble in the plat¬ best interpolator is probably the cubic convolution
128
Creating Continuous Surfaces from Point Data
Figure 5.17. (a) Bilinear interpolation from four input pixels to compute a new value at cell 0;
(b) cubic convolution using the sine function (sinx / x)
(Figure 5.17b) which uses a neighbourhood of 16 obtained from a DEM is the modern digital ortho¬
pixels and weighted sum approach based on a two- photo. Digital orthophoto maps are being used in¬
dimensional version of the sine x(sin x/x) function. creasingly to provide geometrically correct, highly
detailed photographic images for deriving data on
land use and land cover and as an informative back¬
Digital orthophotos ground to data on utilities, municipal administra¬
Chapter 8 explains how many useful products can tion, and environmental studies (Plates 1.1-1.4).
be derived from altitude matrices, but perhaps the The Orthophoto is a photo map that is geometric¬
most useful digital cartographic product that can be ally correct in the same way that a topographical map
129
Creating Continuous Surfaces from Point Data
is geometrically correct with respect to map scale and aspects of the landscape. The orthophoto map is easier
projection. Orthophoto maps are photogrammetric to use for orientation and object recognition; the topo¬
products that were designed initially to reduce the graphic map is a rapid key to the structure, location,
costs and to speed up large-scale topographic map¬ and connectivity of predefined objects.
ping in areas where full ground surveys had not been Digital orthophoto maps are produced by scanning
carried out or where there were financial constraints black and white or colour aerial photographs at a reso¬
for producing full topographic coverage at large map lution varying between 200/^m-10/^m. The images
scales or for updating maps at short notice. Unlike are corrected for distortion by mathematical correc¬
topographic maps, in which the terrain features are tion methods using information from ground control
depicted in standard codes and symbols on a scale- points (at least six per aerial photo), camera optics,
correct base map using a standardized map projection colour balance, and terrain elevation differences,
and map legend, orthophotos are aerial photo mosaics which requires an accurate DEM. The photomosaic
that have been geometrically corrected to a stand¬ is made by merging the corrected digital images from
ard scale and projection. Orthophoto map scales are adjacent photographs, adjusting them for differences
usually larger than 1 : 25 000. Orthophoto maps and in grey-scale intensity or in colour balance. Topo¬
digital products may carry a limited amount of topo¬ graphic information on administrative boundaries
graphic information such as contour lines, admin¬ and other features can be added from the GIS data¬
istrative boundaries, and thematic data, which are base or digitized by hand, and the digital orthophoto
overprinted on the image. Until recently, orthophotos can be distributed on CD-ROM or as a hardcopy made
were only available as paper photomaps, but now they with a laser film writer.
can be provided as very high-resolution raster digital Full colour digital orthophoto images are stored
files in 32-bit colour (see Plates 1.1-1.4). with a data accuracy of 32 bits per pixel. The volume
Remotely sensed satellite images (Thematic of data can be reduced enormously by recoding
Mapper, SPOT) can also be corrected for scale and the image using only 8 bits per pixel and image
projection and used as surrogates for topographic compression techniques which means that orthophoto
maps. Because of the coarser spatial resolution the images can be easily viewed on personal computers or
resulting maps are usually at smaller scales than ortho¬ distributed on the Internet, and can be incorporated
photo maps. Satellite images are currently used in GIS as a data layer in most geographical information
and remote sensing systems to provide background systems.
information for thematic data. Analysis of the spec¬ Digital Orthophotomaps can be produced at a wide
tral information in the satellite images can also sup¬ range of scales or perhaps better, levels of resolu¬
ply useful and important information about land tion. As a rule of thumb, the scale of the map is usu¬
use, vegetation, soil conditions, hydrology, and so on. ally reckoned to be about a maximum of three times
Image analysis techniques are used to help identify' the scale of the aerial photographs. Scales of 1 : 1000-
and classify geographical objects from the spectral 1 : 3500 are used for urban areas; scales of 1 : 50 000
information in the images. Very similar analysis tech¬ can be used for regional mapping.
niques can also be used with spectral data obtained
from airborne sensors of non-visible radiation (e.g. Updating orthophotos Producing the DEM for the
radar, infra-red) which may give better spatial resolu¬ geometric correction of aerial photographs is the most
tion than the satellite images. Remote sensing is often expensive part of making digital orthophotos. How¬
used as an affordable means of obtaining informa¬ ever, since in most cases the shape of the landscape
tion about temporal changes on the earth’s surface. does not change over time, the DEM does not need to
Remotely sensed images and their classified products be recomputed every time new aerial photography is
are usually primarily available as digital databases; flown, and the existing surface can be used to support
they can also be printed on paper or film using high- the correction of new photography. When the digital
quality film writers. data are in a GIS it is very easy to compare the ori¬
The main difference between the orthophoto map ginal situation (e.g. land use) with the new situation
(or satellite image map) and the topographic map is and quantitative estimates of change can easily be
that the former contains photographic information on made. In certain situations, such as open-cast mining,
all aspects of the landscape that are visible at the level the erosion of dunes on the coast, or in civil engineer¬
of resolution used while the topographic map pres¬ ing cut-and-fill operations for construction, DEM up¬
ents structured, classified information about selected dating allows volume changes to be computed quickly
130
Creating Continuous Surfaces from Point Data
and accurately. If imagery is available for several years tured vector data to be viewed and analysed with
the whole process of change can be studied, which raster data, such as scanned data from satellite images
could be of value in coastal or fluviatile areas. or digital orthophotos. Consequently the structured
Much structured topographic data can be collected topographic data, which shows an abstract view of
directly from aerial photographs, particularly for the world, can be seen in its proper context. This can
large to medium-scale applications. Structured data be useful for checking on data quality and correctness
on invisible aspects of the landscape, such as buried and for seeing the relations between structured data
pipelines and surveyed parcel boundaries, must be and other aspects of the location.
collected by field survey. Modern GIS permit struc¬
Questions
1. Compare and contrast the different methods for creating spatial classifications in
physical sciences such as hydrology, soil science, and geology with those encountered
in the human sciences, such as demography, epidemiology, and politics.
2. Explore the assumptions inherent in the ANOVAR method and how these affect
the integrity of the classification method of spatial prediction.
3. Explain the assumptions behind trend surface analysis and show how these may
seriously affect the quality of the results. Consider residual errors, outliers, assump¬
tions of stationarity (homogeneity of variation) and independence. Does a large value
of R2 necessarily mean that unsampled points will be predicted correctly? How would
you check the quality of the predictions independently?
4. Discuss the kinds of spatial process encountered in natural and social sciences
(physical and human geography, ecology, hydrology and meteorology, demography,
etc.) that lead to spatial variation that can be modelled by a global trend or regression
surface.
5. Explain how you would set up a method of objective testing that would compare
the predictive quality of different interpolation techniques, such as inverse distance
weighting versus thin-plate splines or regression surfaces.
6. Explain why the TIN may be unsuitable for modelling the continuous variation of
physical attributes other than elevation measured at point locations.
131
SIX
Optimal Interpolation
Using Geostatistics
When data are abundant, most interpolation techniques give similar results. When
data are sparse, however, the assumptions made about the underlying variation
that has been sampled and the choice of method and its parameters can be crit¬
ical if one is to avoid misleading results. Geostatistical methods of interpolation,
popularly known as kriging, attempt to optimize interpolation by dividing spatial
variation into three components —(o) deterministic variation (different levels or
trends) that can be treated as useful, soft information, (b) spatially autocorrelated,
but physically difficult to explain variations, and finally (c) uncorrelated noise. The
character of the spatially correlated variation is encapsulated in functions such
as the autocovariogram and (semi) variogram, and these provide the information
for optimizing interpolation weights and search radii. Experimental variograms
are computed from sample data in one, two, or three spatial dimensions. These
experimental data are fitted by one of a limited number of variogram models,
which serve to provide data for computing interpolation weights.
Geostatistical methods provide great flexibility for interpolation, providing
ways to interpolate to areas or volumes larger than the support (block kriging),
methods for interpolating binary data (indicator kriging), and methods for incor¬
porating soft information about trends (universal kriging) or stratification (strati¬
fied kriging). All these methods of interpolation yield smoothly varying surfaces
accompanied by an estimation variance surface. In contrast to smooth interpola¬
tors, the methods of conditional simulation, given the variogram and the original
observations, provide the best estimators of data values at unsampled points.
The resulting surfaces need not be smooth, but have lower estimation errors
than kriging predictions. Combining soft information and conditional simulation
is useful for computing data for raster-based environmental models.
Finally, the information in the variogram can be used to help optimize sampling
schemes for mapping from point data.
132
Optimal Interpolation Using Geostatistics
When comparing all the maps made in Chapter 5 of of interpolation for use in the mining industry. The
the zinc concentration in the floodplain soils we see methods are now being increasingly used in ground-
that the different methods return large differences in water modelling, soil mapping, and related fields,
the global patterns, the amount of local detail, and in and packages for geostatistical interpolation are be¬
the minimum and maximum values predicted. These coming important modules of commercial GIS.
differences were summed up in Table 5.5 in terms of Geostatistical methods for interpolation start with
the estimates given by the various methods of the per¬ the recognition that the spatial variation of any con¬
centages of the area thought to be above 500, 1000, tinuous attribute is often too irregular to be modelled
and 1500 ppm zinc respectively. Clearly, if we were by a simple, smooth mathematical function. Instead,
faced with all these different results and must make a the variation can be better described by a stochastic
decision on the costs of cleaning up soil with more surface. The attribute is then known as a regionalized
than a certain level of zinc, it would be very difficult variable; the term applies equally to the variation of
to choose. None of the methods of interpolation dis¬ atmospheric pressure, elevation above sea level, or the
cussed so far can provide direct estimates of the qual¬ distribution of continuous demographic indicators.
ity of the predictions made in terms of an estimation Interpolation with geostatistics is known as kriging,
variance for the predicted value at unsampled loca¬ after D. G. Krige.
tions. In all cases, the only way to determine the good¬ Geostatistical methods provide ways to deal with the
ness of the predictions would be to compute estimates limitations of deterministic interpolation methods
for a set of extra validation points that had not been listed above, and ensure that the prediction of attribute
used in the original interpolation. Apart from research values at unvisited points is optimal in terms of the
studies to judge the quality of performance of a tech¬ assumptions made. An optimal policy is a rule in dy¬
nique (e.g. Laslett and McBratney 1990a, 1990b), this namic programming for choosing the values of a vari¬
is rarely done because it costs money. able so as to optimize the criterion function (Bullock
A further objection to all methods so far is that there and Stallybrass 1977). The interpolation methods de¬
is no a priori method of knowing whether the best veloped by Matheron, Krige, and their co-workers are
values have been chosen for the weighting parameters optimal in the sense that the interpolation weights are
or if the size of the search neighbourhood is appro¬ chosen so as to optimize the interpolation function,
priate. As we have seen, the control parameters of i.e. to provide a Best Linear Unbiased Estimate (BLUE)
trend surfaces and inverse distance weighting can be of the value of a variable at a given point. The same
varied to produce quite different maps which in turn theory can be used for optimizing sample networks.
provide different estimates of the distribution of the Regionalized variable theory assumes that the spatial
variable being mapped. Moreover, no method studied variation of any variable can be expressed as the sum
so far provides sensible information on: of three major components (Figure 6.1). These are (a)
a structural component, having a constant mean or
• the number of points needed to compute the lo¬
trend; (b) a random, but spatially correlated com¬
cal average,
ponent, known as the variation of the regionalized
• the size, orientation, and shape of the neighbour¬
variable, and (c) a spatially uncorrelated random noise
hood from which those points are drawn,
or residual error term. Let x be a position in 1, 2, or 3
• whether there are better ways to estimate the in¬
dimensions. Then the value of a random variable Z at
terpolation weights than as a simple function of
x is given by
distance,
• the errors (uncertainties) associated with the in¬ Z(x) = m(x) + £'(x) + £" 6.1
terpolated values.
where m(x) is a deterministic function describing
These questions led the French geomathematician the ‘structural’ component ofZat x, £'(x) is the term
Georges Matheron and the South African mining denoting the stochastic, locally varying but spatially
engineer D. G. Krige to develop optimal methods dependent residuals from m(x)—the regionalized
133
Optimal Interpolation Using Geostatistics
Figure 6.1. Regionalized variable theory divides complex spatial variation into (i) average
behaviour such as differences in mean levels (upper) or a trend (lower), (ii) spatially correlated, but
irregular (‘random’) variation, and (iii) random, uncorrelated local variation caused by measurement
error and short range spatial variation
variable—, and e" is a residual, spatially independent the remaining variation is homogeneous in its varia¬
Gaussian noise term having zero mean and variance tion so that differences between sites are merely a func¬
<T2. Note the use of the capital letter to indicate that Z tion of the distance between them. We can rewrite
is a random function and not a measured attribute z. equation 6.1 as:
The first step is to decide on a suitable function for
Z(x) = m(x) + y( h) + e" 6.4
m(x). In the simplest case, where no trend or drift is
present, m(x) equals the mean value in the sampling in order to show the equivalence between e'(x) and
area and the average or expected difference between 7(h).
any two places x and x + h separated by a distance vec¬ If the conditions specified by the intrinsic hypo¬
tor h, will be zero: thesis are fulfilled, the semivariance can be estimated
from sample data:
E[Z{x) - Z(x + h) ] = 0 6.2
where Z(x), Z(x + h) are the values of random vari¬ 7(h) = 3~y{z(x;) _ z(Xj + h)p 6.5
able Z at locations x, x + h. Also, it is assumed that the 2ntt
variance of differences depends only on the distance
where n is the number of pairs of sample points of
between sites, h, so that
observations of the values of attribute z separated
£[{Z(x)-Z(x + h)}2] by distance h (see Box 6.1). A plot of y(h) against h
= E[{e'(x) -e'(x + h)}2] = 2y(h) 6.3 is known as the experimental variogram. The experi¬
mental variogram is the first step towards a quant¬
where y(h) is known as the semivariance. The two con¬ itative description of the regionalized variation. The
ditions, stationarity of difference and variance of dif¬ variogram provides useful information for interpola¬
ferences, define the requirements for the intrinsic tion, optimizing sampling and determining spatial
hypothesis of regionalized variable theory. This means patterns. To do this, however, we must first fit a the¬
that once structural effects have been accounted for, oretical model to the experimental variogram.
134
Optimal Interpolation Using Geostatistics
Covariance (1) = [(1-3)* (3-3) + (3-3)* (6-3 ) + (6-3)* (5-3) + (5-3)* (3-3) + (3-3)* (1-3) + (1-3)* (2-3) + (2-3)*
(3-3)]/7 = [0 + 0 + 6+ 0 + 0 + 2+ 0]/7=8/7=1.14
Semivariance (1) = [(1-3f + (3-6? + (6-5? + (5-3? + (3-1 f + (1-2? + (2-3? ]/7 = [4 + 9 + 1 + 4 + 4 + 1 + 1]/7 = 24/7
= 3.43
Figure 6.2 shows a typical experimental variogram y-axis at a positive value of y(h). According to equa¬
of data from a not too smoothly varying attribute, tion 6.22 the semivariance is zero when h = 0, because
such as a soil property. The curve that has been fitted the differences between points and themselves is by
through the experimentally derived data points dis¬ definition zero. The positive value of y(h) h -» 0 is an
plays several important features. First, at large values estimate of e", the residual, spatially uncorrelated
of the lag, h, it levels off. This horizontal part is known noise, e" is known as the nugget; this is the variance of
as the sill; it implies that at these values of the lag there
is no spatial dependence between the data points be¬
cause all estimates of variances of differences will be
invariant with sample separation distance. Second, the
curve rises from a low value of y(h) to the sill, reach¬
ing it at a value of h known as the range. This is the
critically important part of the variogram because it
describes how inter-site differences are spatially de¬
pendent. Within the range, the closer sites are together
the more similar they are likely to be. The range gives
us an answer to the question posed in weighted mov¬
ing average interpolation about how large the window
should be. Clearly, if the distance separating an un¬
visited site from a data point is greater than the range,
then that data point can make no useful contribution
to the interpolation; it is too far away.
The third point shown by Figure 6.2 is that the fitted Figure 6.2. An example of a simple transitional variogram
model does not pass through the origin, but cuts the with range, nugget, and sill
135
Optimal Interpolation Using Geostatistics
measurement errors combined with that from spatial 7(h) = c0 + cx {1 - exp (-h/a)} 6.7
variation at distances much shorter than the sample
is often a good choice.
spacing, which cannot be resolved.
If the variation is very smooth and the nugget vari¬
The form of the variogram can be quite revealing
ance e" is very small compared to the spatially depend¬
about the kind of spatial variation present in an area,
ent random variation £"(x), then the variogram can
and can help to decide how to proceed further. When
often best be fitted by a curve having an inflection,
the nugget variance is important but not too large, and
such as the Gaussian model.
there is a clear range and sill, a curve known as the
spherical model, y(h) =c0 + cx{ 1 — exp (-h21 a2)} 6.8
Figure 6.3. Examples of the most commonly used variogram models: (a) spherical; (b) exponential;
(c) linear; and (d) Gaussian
136
Optimal Interpolation Using Geostatistics
steep with h indicates a trend in the data that should when estimating variograms, the effective degrees
be modelled separately. of freedom are greatest for the shortest lag and then
Variogram estimation and modelling is extremely decrease in a complex way (cf. Taylor and Burrough
important for structural analysis and for interpolation. 1986) with the lag. Consequently the fitting of vario¬
The variogram models cannot be any haphazardly gram models usually proceeds using a weighted least
chosen function, as they must obey certain mathemat¬ squares method where the weights are computed from
ical constraints (see Isaaks and Srivastava 1989). Also the numbers of pairs. Even so, variogram fitting is an
the models must not be fitted to the experimentally interactive process requiring considerable judgement
estimated semivariances for the various lags without and skill. Modern Geostatistics software usually in¬
taking the numbers of pairs of points at each lag into cludes interactive routines for variogram fitting (e.g.
consideration. Because the data are used repeatedly Pebesma 1996, Pannatier 1996).
The variogram is an essential step on the way to deter¬ general mean. These distances can be modified by
mining optimal weights for interpolation. When the anisotropy, which modifies the shape of the search
nugget variance e" so dominates the local variation neighbourhood from a circle to an ellipse.
that the experimental variogram shows no tendency The presence of a hole effect in the experimental
to diminish as h —> 0, the interpretation is that the data variogram (a dip in semivariance at distances greater
are so noisy that interpolation is not sensible. In this than the range) may indicate a pseudo-periodic pat¬
situation, the best estimate of z(x) is the overall mean tern caused by long-range variation over a study area
computed from all sample points in the region of in¬ that is too small to encompass the total range of vari¬
terest without taking spatial dependence into account: ation. True periodicity will give a variogram with a
interpolation is meaningless and a waste of time and periodic variation in the sill that matches the wave¬
money. length of the pattern, providing the original field sam¬
A noisy variogram, in which the experimentally pling is in step with the periodicity.
derived semivariances are scattered, suggests that too If the range is large then long-range variation dom¬
few samples have been used to compute y(h). A rule inates: if it is small, then the major variation occurs
of thumb suggests that possibly at least 50-100 data over short distances. Anisotropy in the experimental
points are necessary to achieve a stable variogram, variogram suggests a directional effect in pattern, but
depending on the kind of spatial variation encoun¬ directional differences can occur if there are insuffi¬
tered, though smooth surfaces require fewer points cient samples to get robust estimates in all directions.
than those with irregular variation: smoother vario¬ A variogram that can be fitted by a Gaussian vario¬
grams can also be obtained by increasing the size of gram model indicates a smoothly varying pattern,
the search window. such as often occurs with elevation data. A variogram
The range of the variogram provides clear informa¬ modelled by a spherical variogram model has a clear
tion about the size of the search window that should transition point which implies one pattern is dom¬
be used. If the distance from a data point to an un¬ inant. The choice of an exponential variogram model
sampled point exceeds the range, then it is too far away may suggest that the pattern of variation shows a
to make any contribution; if all data points are gradual transition over a spread of ranges or that
further away than the range, the best estimate is the several patterns interfere.
4
137
Optimal Interpolation Using Geostatistics
A A
\a
A
A
A l
define a zone whose midpoint is h from its centre. This /A h hj I
\ A \ A
wheel is placed over a data point and all data points A \a \ / A A L
falling in the tyre are used to estimate the contribu¬ "A A
A
tion of [zl - Zj)2 from all pairs (Figure 6.4). As a gen¬ A A
A
A
eral rule of thumb to avoid edge effects, it is not usually A
A
sensible to compute a variogram for more lags than a A
\
A
total separation distance of one-half the dimensions
of the study area. If we ignore directional effects, the
resulting variogram is known as isotropic, it averages Figure 6.4. Circular search window for estimating
the variogram over all directions. However, as Figure semivariance in any direction ft
6.4 shows, it is easy to compute the variogram for
specific directions /3. These are known as anisotropic By computing (zt - z;)2 for all possible directions
variograms, and if different in range or sill for dif¬ one creates a variogram surface, which can be plotted
ferent values of [5 they may indicate that the spatial as an ellipsoid map with centre point 0,0. The major
variation varies with direction. This could happen axis of the variogram is the longer axis of the ellipse,
in sediments perpendicular or parallel to a river, for and its orientation is given by the angle of that axis
example. with respect to north.
138
Optimal Interpolation Using Geostatistics
The straightforward approach to computing the When faced with the problem of important differ¬
variogram is to assume that all data are located in the ences in domain it may be sensible to see if the study
same cover class or domain. In this case the computed area can be better modelled by a set of domain-
variogram is a global model that determines the inter¬ specific variograms rather than a single global model.
polation weights over the whole area. This need not The main problem is usually insufficient data: should
necessarily be so: different parts of an area may have one risk using an inappropriate, but well-defined
important differences in land cover, rock type, soil global variogram, or several poorly defined, local
type, flooding frequency, or land tenure pattern, each models? The pragmatic approach is to attempt to use
of which has its own, unique spatial correlation struc¬ local models whenever possible because then for each
ture or pattern. For example, the pattern of distribu¬ spatial unit (polygon or mapping unit) the variogram
tion of heavy metals in river floodplains is likely to be for any given variable, becomes a new, unique attrib¬
quite different from that on the hillsides bordering the ute of that polygon, and its parameters can be stored
river, because the sources of heavy metals and the in the relational database (Pebesma 1996, Laurini and
transport processes will be quite different. Pariente 1996, in Burrough and Frank 1996. Voltz and
Webster 1990).
z(x0) = 6.12
i=1
6] =X'L>Z(*»*o) + 0 6.13
1=1
139
Optimal Interpolation Using Geostatistics
Computing kriging weights for the unsampled point z(x/=0) (x = 5,/ = 5) in Figure 6.5
Let the spatial variation of the attribute sampled at the five points be modelled by a
spherical variogram with parameters c0 = 2.5, c, = 7.5 and range a = 10.0. The data at the
five sampled points are:
i x y z In matrix terms we have to solve the
1 223 following:
2 3 7 4
3 9 9 2 A~' - b= A
4 654 l_0_
5 5 3 6
where A is the matrix of semivariances between pairs of data points, b is the vector
of semivariances between each data point and the point to be predicted and A is the
vector of weights. 0 is a Lagrangian for solving the equations.
Start by creating a distance matrix between the data points:
1 1 2 3 4 5
1 0.0 5.099 9.899 5.000 3.162
2 5.099 0.0 6.325 3.606 4.472
3 9.899 6.325 0.0 5.0 7.211
4 5*o 3.606 5.0 0.0 2.236
5 3.162 4.472 7.211 2.236 0.0
and the vector of distances between the data points and the unknown site:
io
1 4*243
2 2.828
3 5-657
4 1.0
5 2.0
Substitute these numbers into the variogram to get the corresponding semivariances
(matrices A and b).
A=/ 1 2 3 4 5 6
1 2.500 7-739 9.999 7.656 5-939 1.000
2 7*739 2.500 8.667 6.381 7.196 1.000
3 9-999 8.667 2.500 7.656 9.206 1.000
4 7.656 6.381 7.656 2.500 4.936 1.000
5 5-939 7.196 9.206 4.936 2.500 1.000
6 1.000 1.000 1.000 1.000 1.000 0.000
Note the extra row and column (/ = 6) to ensure that the weights sum to 1.
1 7.151
2 5*597
3 8.815
4 3*621
5 4*720
6 1.000
140
Optimal Interpolation Using Geostatistics
/ 1 2 3 4 5 6
1 -.172 .050 .022 -.026 .126 .273
2 .050 -.167 .032 •o 77 .007 .207
3 .022 .032 -.ill .066 “.010 •357
4 -.026 •0 77 .066 -.307 .190 .030
5 .126 .007 -.010 .190 -.313 .134
6 •273 .207 •357 .003 .134 -6.873
The quantity /(x^Xj) is the semivariance of z be¬ Example: Figure 6.6 presents the results of using
tween the sampling points Xj and x^; /(x^Xq) is the ordinary kriging for the untransformed zinc data. The
semivariance between the sampling point and the map of prediction variances is also shown. Compare
unvisited point Xq. Both these quantities are ob¬ the prediction map with the inverse distance maps and
tained from the fitted variogram. The quantity 0 is a spline map (Figure 5.10) and note that it is smoother
Lagrange multiplier required for the minimalization. than these, but also avoids the local large values that
The method is known as ordinary kriging: it is an exact are linked to isolated data points. The standard devia¬
interpolator in the sense that when the equations given tion map clearly shows how prediction variance is
above are used, the interpolated values, or best local linked to the density of the data points. Though com¬
average, will coincide with the values at the data points. parisons cannot strictly be made because variance ana¬
In mapping, values will be interpolated for points on a lysis and kriging use different statistical models (see de
regular grid that is finer than the spacing used for sam¬ Gruijter and Ter Braak 1990), note that the maximum
pling, if that was itself a grid. The interpolated values kriging standard deviation of458 ppm is similar to the
can then be converted to a contour map using the tech¬ maximum within-class standard deviation of class 1
niques already described. Similarly, the estimation of the flooding frequency map (Figure 5.2c), but that
error <7e2, known as the kriging variance, can also be this maximum only occurs in the kriging map in areas
mapped to give valuable information about the reli¬ where there are no data points. In most of the areas
ability of the interpolated values over the area of inter¬ covered by class 1 (standard deviation 423 ppm) the
est. Often the kriging variance is mapped as the kriging kriging prediction standard error is about 140 ppm.
standard deviation (or kriging error), because this has These results suggest that kriging prediction that takes
the same units as the predictions, and this convention account of gradual spatial change appears to be much
is followed in this chapter. Box 6.2 and Figure 6.5 show better than that given by global, crisp, choropleth
how the equations for ordinary kriging are set up and mapping.
solved.
141
Optimal Interpolation Using Geostatistics
180000
160000
140000
9 120000
o
c
j= 100000
CD
>
§(0 80000
60000
40000 -
20000
0
0 200 400 600 800 1000
distance
(a) (b)
Point kriging of untransformed zinc:
(a) predictions, (b) kriging standard deviation
Figure 6.6. Results of interpolating the Maas data set using ordinary kriging. Above—variogram
with fitted exponential model; (a) Kriging predictions of zinc levels, (b) Kriging standard deviations
Cross validation is the practice of using the kriging the rest of the data (Figure 6.7). The procedure is de¬
equations retrospectively to check the variogram signed only to test the variogram for self-consistency
model. It involves computing the moments of the dis¬ and lack of bias, indicated by a mean difference near
tribution of (z(Xi) - z(x,)) for all data points, when each zero and a z-score (not to be confused with the z data
data point is successively left out and predicted from values!) of one.
142
Optimal interpolation Using Geostatistics
Block kriging
Clearly, kriging fulfils the aims of finding better ways
to estimate interpolation weights and of providing
information about errors. The resulting map of inter¬
polated values may not be exactly what is desired, how¬
ever, because the point kriging, or simple kriging,
equations (6.13 and 6.14) imply that all interpolated
values relate to the support, which is the area or vol¬
ume of an original sample. Very often, as in sampling
for soil or water quality, this sample is only a few
centimetres across. Given the often large amplitude,
short-range variation of many natural phenomena like
soil or water quality (Burrough 1993), ordinary point
kriging may result in maps that have many sharp spikes
or pits at the data points. This can be overcome by
modifying the kriging equations to estimate an aver¬
age value z(B) of the variable z over a block of land B.
This is useful when one wishes to estimate average Figure 6.8. Predicting z for blocks of different size
values of z for experimental plots of a given area, or
to interpolate values for grid cells of a specific size The minimum variance is now
for quantitative modelling (Figure 6.8). n
The average value of z over a block B, given by <72(B) = '£Xif(xl,B) + <p-y(B,B) 6.17
2 =1
z(B) =
I z(x)dx
b areaB
6.15
and is obtained when
n
143
Optimal Interpolation Using Geostatistics
Simple kriging
144
Optimal Interpolation Using Geostatistics
Figure 6.lO. A comparison of ordinary point kriging and co-kriging using log-transformed data
version of the distance matrix into the matrix of is an attempt to force both variograms to have the
semivariances A (see Box 6.2) taking account the same nugget. Figure 6.12 shows the effect of in¬
variation of semivariance with direction. Figure corporating anisotropy into the interpolation, pro¬
6.11 presents the variograms computed in a NW-SE ducing long, linear streaks parallel to the river. The
direction (perpendicular to the river) and in a NE-SW results are plausible, but in spite of the clear variograms
direction (parallel to the river). Note the double they look rather contrived compared to the other
variogram fitted to the major axis (NE-SW), which maps.
145
Optimal Interpolation Using Geostatistics
(a) Zinc levels from anisotropic variogram (b) Anisotropic kriging standard deviation
146
i.i Digital orthophoto of the city of Antwerp, Belgium 1.2 Digital orthophoto of a city with added vector line data to indi¬
cate administrative boundaries and data for utility mapping
EUnOSEMSC DELFOTOPM*
NoivtaftUnn 54
0.1760 V7EWMEl.HEl.0IUM
Tel.: ♦ 3Z (0)2 400 70 OO
Telex: 26087
Fa*: * 32 (0)2 460 49 68
1.3 Digital orthophoto of semi-rural area with motorway including 1.4 Digital orthophoto draped over a digital elevation model to pro¬
added vector line data vide a perspective view of the landscape
1.5 Large-scale digital orthophoto of urban scene including overlaid 1.6 Vector data for the road and building outlines for Plate 1.5
vector data on roads and building outlines for a utilities database
•Kirkdale
W .
iltest J3erby SMto* *
■finlieC ■BHL’tftnfiali
.toy fret hi
2.6 (far right) The spatial pre¬ Toxtethl
|4AUerT°n^
diction of marketing opportuni¬ pintle?'
ties: whisky consumption in pool tont JlooJtonj
ti qbort h
Liverpool (red indicates house¬
holds consuming most whisky)
TOP 5 COUNTIES
London Botany Bay
East Sussex 1 f■ a » Enfield
Lothian Hadleu_Uood<
Avon *
lonken Hadley*
South Glamorgan
2.7 (right) The distribution of
TOP 5 SHOPPING CENTRES Enfield]
socio-economic group ‘stylish Surbiton iBarnet1
Putney
singles’ over the UK Chelsea
Hampstead
Mu swell Hitt
2.8 (far right) The distribution UlinchHBore Hill
Penetration
of the opportunities for selling
High —-
packaged holidays in NE Above average
London Average —-;
ifriern Barnet Idroont-
Below average
Plates 2.5-2.8 courtesy Low-
inchley
blood Green]
CCN Marketing |Tot t enhampy‘;
Core
In1 1989-1993 S
ss
[Hi 1985-1989
in 1981-1985 ■ Samples
'■B 1976-1981
ma 1972-1976
Alsctting
M before 1972 m
3.3. Cross-section through sediments of the river Rhine 3.4. An example offence-mapping to show continuous variation of
sediment composition in three dimensions
Frequently, the data points are not the only source of done by using an extension of the normal kriging tech¬
information about the distribution of z, and we may nique known as co-kriging. Co-kriging can produce
be able to draw on other knowledge that can help predictions for both points and blocks in analogy to
with interpolation. The main sources are: (a) an ap¬ ordinary kriging (Isaaks and Srivastava 1989).
propriate stratification into clearly different domains, Besides the variograms describing the non-struc-
(b) data from a cheap-to-measure co-variable that tural variation of both U and V, co-kriging needs
has been sampled at many more data points, and information on the joint spatial covariation of both
(c) a physical or empirical spatial model of a driving
process.
Stratified kriging
Co-KRIGING
Often data may be available for more than one attrib¬
ute per sampled location. One set (U) may be expen¬
sive to measure and therefore is sampled infrequently Figure 6.13. Computing the variograms of zinc for flood
while another (V) may be cheap to measure and has frequency zones separately, (a) variogram for flood frequency
more observations. If U and V are spatially correlated class 1; (b) variogram for flood frequency classes 2 and 3.
then it may be possible to use the information about Note that the sill for class 1 is nearly 10 times higher than
the spatial variation of V to help map U. This can be for classes 2 and 3
147
Optimal interpolation Using Geostatistics
Flood classes
(a) Major flood (b) Ordinary kriging of zinc (c) Kriging standard deviation
divisions within flood classes (ppm) within flood classes
Figure 6.14. Prediction of zinc levels using stratified variograms in Figure 6.13
v nv
variables. For any pair of variables U and V the cross ¬
semivariance Yuv(h) at lag h is defined as: Zu(x0) = ^^Zlkz(xik) for all Vk 6.23
k=1 i=1
^Yuvih) — E[{Zjj(x) — Zv(x + h)}
{Zv(x) - Zv(x + h)}] 6.20 To avoid bias, i.e. to ensure that T[z[/(x0) -z^Xq)] = 0
the weights Xxk must sum as follows:
where ZL/, Zv are the values of U,V at places x,x + h.
The cross-variogram is estimated directly from the
sample data using: ^Xlk =1 for V = U and
i=1
n{h)
nv
Yuv(h) = V2n(h)^{zu(xt) - + h)}
i=i
=0 for all Vk*U 6.24
1=1
{zv(xt) - zv(xi + h)} 6.21
where n(h) is the number of data pairs of observations The first condition implies that there must be at least
of zUy Zy at locations xt, x{ + h for the distance vector one observation of U for co-kriging to be possible.
h. Cross-variograms can increase or decrease with h The interpolation weights are chosen to minimize the
depending on the correlation between U and V. When variance:
cross-variograms are fitted, the Cauchy-Schwartz
v2u(xo) = E[{zv(xo) -zv(x0) }2] 6.25
relation:
There is one equation for each combination of sam¬
\Yuv(h)\< V(yu(/t)5fyv(fi)) forallfr>0 6.22 pling site and attribute, so for estimating the value
must be checked to guarantee a positive co-kriging of variable; at site x0 the equation for the g-th obser¬
estimation variance in all circumstances (Deutsch and vation site of the k-th variable is:
Journel 1992; Isaaks and Srivastava 1989).
V nv
A co-kriged estimate is a weighted average in which
the value of U at location x0 is estimated as a linear v(xii’xsb + <t>k =Yuv(x 0,xgk) 6.26
h i i=1
weighted sum of co-variables Vk. If there are k vari¬
ables k= 1,2,3,. . . Vand each is measured at nvplaces, for all g = 1 to nv and all k = 1 to V, where (j)k is a
xlk = 1, 2, 3,. . . Nky then the value of one variable U at Lagrange multiplier. These equations together make
x0 is predicted by: up the co-kriging system.
Optimal Interpolation Using Geostatistics
149
Optimal Interpolation Using Geostatistics
regression models, principal component transforma¬ fuzzy k-means values must equal 1, so the kriging
tions, reciprocal averaging, or fuzzy k-means. Some¬ equations need to be modified. See Chapter 11 and de
times constraints must be applied, e.g. the sum of all Gruijter et al. (1997).
Probabilistic kriging
Given the wide variations in results among the differ¬ data do not follow any simple distribution (multi¬
ent kriging and deterministic interpolation methods, modal Gaussian or lognormal) disjunctive kriging pro¬
some users may decide that for their purposes they are vides a nonlinear, distribution-dependent estimator
not really interested in the best estimates of z(xq), but (Armstrong and Matheron 1986a, 1986b, Olea 1991,
only in the probability that the value of the attribute Rivoirard 1994). When the data are not normally
in question exceeds a certain threshold. For them, the
probability of a threshold being exceeded may be suf¬
ficient to support a decision to mine, to commission
operations to clean up polluted soil or to set up a mar¬
keting base. Indicator kriging is a non-linear form of
ordinary kriging in which the original data are trans¬
formed from a continuous scale to a binary scale (1 if
z < Tj, and 0 otherwise, or vice-versa). Variograms are
computed for the binary data in the usual way, and
ordinary kriging proceeds with the transformed data.
The resulting maps display continuous data in the
range 0-1 indicating the probability that T has been
exceeded or not exceeded, as the case may be. Com¬
puting variograms and interpolating for other thresh¬
olds provides insight in how the probabilities of
threshold exceedance vary with threshold levels.
Figure 6.16 presents indicator variograms for cut¬
offs in zinc levels at 500 ppm and 1000 ppm measured
on all 98 points. The resulting maps (Figures 6.17ayb)
show the probability that the zinc level exceeds the given
threshold. The results are clear and easy to understand
and could easily be incorporated in a GIS database for
decision-making where the managers do not want to
have to understand the intricacies of kriging.
Indicator kriging can easily be combined with soft
information, such as the flood frequency classes:
Figure 6.17c,d show the probabilities of zinc levels
exceeding 500 ppm within the areas defined by the
flood frequency classes, and these figures provide some
interesting insights into the process of flooding and
contamination around frequently flooded creeks.
There are several published examples of indic¬
ator kriging and indicator co-kriging in soil science,
soil pollution, and geology (Journel and Posa 1990, Figure 6.l6. Indicator variograms of zinc: (a) for> 500 ppm
Okx and Kuipers 1991, Solow 1986). If the original level, (b) for> 1000 ppm level
150
Optimal Interpolation Using Geostatistics
(a) Indicator kriging: Probability of zinc (b) Indicator kriging: Probability of zinc
>500ppm > 1000pm
Figure 6.17. (a) Indicator kriging 500 ppm level; (b) idem 1000 ppm level; (c) flood frequency
clasess; (d) indicator kriging with stratification in flood frequency classes 1 and 2 + 3 (500 ppm)
151
Optimal Interpolation Using Geostatistics
distributed the data are transformed to normality by useful for attributes with unusual distributions, such
a linear combination of Hermite polynomials. The as can be found in pollution studies (e.g. see Burrough
method is computationally demanding, but may be 1993).
Simulation
Simple Monte Carlo methods can be used to simu¬
Unfortunately very few earth science processes late values for each cell individually:
152
Optimal Interpolation Using Geostatistics
Figure 6.18. Monte Carlo simulation: (a) original data; {b) ordinary kriging map; (c) mean of 100
realizations of a stationary noise field having the same mean and variance as the data; (d) mean
of 100 realizations of a simulation conditioned to pass through the data points and with the same
variogram as the original data
Figure 6.19. Four of 100 realizations of zinc levels produced by conditional simulation
5. Repeat steps 1-4 until sufficient realizations have 7. Run the environmental model with each real¬
been created. ization to see how results vary with the different
6. Compute surfaces for the mean value and inputs.
the standard deviation from all realizations if Conditional simulation is more computer intensive
required. than either kriging or co-kriging. For example, ordin-
153
Optimal Interpolation Using Geostatistics
Conditional simulation of zinc levels, based on 100 realizations, (a, b) Using only
the hard data. (c,d) Using hard data and 'soft' stratification on flood frequency.
(a,c) Mean simulated values. (b,d) Standard deviations.
Figure 6.20. Conditional simulation can be carried out for whole areas or can be stratified
according to flood frequency classes, which yields much preciser results
ary point kriging (Figure 6.6) on a 100 Mhz Pentium The differences between ordinary kriging, simple
computer with 16 Mbyte RAM takes about 1 minute simulation, and conditional simulation are shown in
but conditional simulation takes about 2.2 minutes per Figure 6.18. Usually one simulates at least 500 real¬
realization on the same machine, or 3.5 hours in all izations of the random surfaces, which can be used as
for 100 realizations. It is sensible to use the method different inputs for a Monte Carlo analysis of models
when the variogram is known and when most likely (see Chapter 10). Figure 6.19 shows four realizations
estimates, rather than smoothed averages, are needed from only 100 simulations of the variation of zinc con¬
to fill data cells for modelling. tent and Figure 6.20 shows the mean and standard
154
Optimal Interpolation Using Geostatistics
deviation surfaces for of all 100 realizations. Note that ordinary kriging, and this could be an artifact of com¬
the standard deviation surface produced by condi¬ puting too few realizations.
tional simulation is not tied to the distribution of data Conditional simulation can also be combined
points, as with kriging. Note also that the mean sur¬ with soft information in an attempt to get still better
face of the conditional simulations should be very precision. Figure 6.20c>d shows the results of com¬
similar to the ordinary kriging interpolation (if enough puting an average surface and standard deviation
replicates are used it should be the same). With fewer surface for 100 realizations using the variograms and
than 500 realizations it is possible that the size of the stratification for the two flood frequency classes of
standard deviations are substantially different from Figure 6.14a.
Geostatistical methods are not one, but a wide range Burrough (19932?) reviews work comparing the
of techniques that rely on an understanding of the un¬ quality of geostatistical interpolation with other
derlying spatial correlation structure of the data and methods. The general conclusions are that geostat¬
use that to guide interpolation. Theoretical assump¬ istical methods are generally superior when there
tions include ideas of stationarity, the intrinsic hypoth¬ is sufficient data to estimate a variogram because,
esis and normality, and these are sometimes difficult unlike splines, such methods do not treat noise as
to meet with real data. The validity of the results de¬ part of the signal. The presence of large outliers can
pends not only on the method but also on assump¬ seriously affect the performance of all prediction
tions about uniformity of your area. Geostatistical methods, largely by distorting the estimated variogram
methods have the advantage of providing estimates for and several authors recommend the removal of out¬
points or blocks; information on spatial anisotropy, liers by methods of exploratory data analysis (EDA—
nested scales of variation, and external information Haslett et al. 1990, Gunnink and Burrough 1997)
can be combined to get the most out of expensive data. or the use of robust methods (Cressie and Hawkins
155
Optimal Interpolation Using Geostatistics
Method Minimum Maximum Per cent area Per cent area Per cent area
value (ppm) value (ppm) >500 ppm >1000 ppm >1500 ppm
Ordinary point kriging with isotropic variogram 119 1661 29.28 4.91 0.36
The same, but with transformation to
natural logarithms 122 1348 20.24 1.45 0.00
Ordinary point kriging with anisotropic variogram 127 1202 31-43 3.92 0.00
Ordinary point kriging within major
flood categories 116 1817 15.56 3-53 0.85
Co-kriging on ln(distance to river) 144 1300 16.20 0.97 0.00
Universal point kriging with a regression
model on ln(distance) and elevation 114 1483 18.89 2.33 0.00
Conditional simulation with one
general variogram 140 1606 28.89 3.96 0.21
Conditional simulation with stratification
according to flood frequency and 2 variograms 105 1800 15.11 3-53 0.86
1980). Ordinary kriging is least successful when abrupt Comparing the results of the different
boundaries are present, though stratification into INTERPOLATION METHODS IN CHAPTERS 5 AND 6
clearly different areas may bring considerable im¬ In Table 5.5 we compared the results of the determin¬
provement as shown in this chapter. Conventional istic methods of interpolation, and Table 6.1 does the
choropleth mapping is generally poor at predicting same for the geostatistical methods. Although there are
site-specific values when spatial variation is gradual no absolute standards for comparison, such as an in¬
and splines can behave unpredictably. When data dependent data set for validation, we note that both
are sparse, however, the classification approach usu¬ deterministic methods and geostatistical methods
ally yields the best prediction (see Chapter 10). show a wide range in results (cf. Englund 1990). Gen-
156
Optimal Interpolation Using Geostatistics
erally all methods (deterministic and geostatistical) ard deviations. Geostatistical prediction is effective at
that use only the data from the point observations reducing interpolation errors where spatial variation
estimate larger proportions of the area above the is continuous; in combination with the stratification,
critical threshold values given in the table than those even better prediction errors can be achieved. Whether
that use external information such as flooding classes these results are generally true requires testing with
and distance/elevation regression models. an independent data set, and this is deferred until
Those methods where a kriging standard deviation Chapter 10. The evidence strongly suggests that using
map can be easily computed seem to show that stra¬ geostatistics and soft data together greatly improves
tified kriging has produced by far the best results the predictive power of GIS.
(Tables 6.1 and 6.2). In general, the greater the infor¬ Table 6.3 gives a comparative overview of all
mation from data points and external sources (flood methods of interpolation discussed in both Chap¬
frequency classes), the lower the prediction stand¬ ters 5 and 6.
This chapter has been about methods of interpolation, (iv) The size of the block of soil for which the
their advantages and disadvantages and their strengths estimate is made—is it an area equivalent
and weaknesses. Clearly, even with the best methods to the support or a larger block of land?
one is severely limited by the data, and though many (v) How the sample points are arranged with
GIS users may not collect their own data, for those that respect to the block.
do it is worth considering if different numbers of data
or arrangements of data points might not yield better These points are reviewed by Burrough (1991a) based
information. This idea is further explored in Chapter on original work by Webster and colleagues (reported
10; here we show how the variogram can be used to in Webster and Oliver 1990); here we only consider
relate sample spacing (and layout) to the size of the points (ii) and (v), namely the number of neighbour¬
cells used for block kriging. ing data points on a regular grid and their sample
Note that in ordinary block kriging (equations 6.17, spacing needed to yield a maximum value of G ] for a
6.18) the prediction errors of zB are controlled only given block.
by the variogram and the sampling configuration. Once the variogram is known, one can calculate the
Therefore, once the variogram is known it is possible combination of block size and sample spacing on a
to design sampling strategies that will result in any regular triangular or rectangular grid to yield predic¬
required minimum interpolation error. In particular, tions of block averages with a given prediction error.
the prediction error a \ for a block of land B depends The prediction variance of a block B is the expected
on: squared difference between the kriging prediction
of the block mean and the true value. The standard
(i) The form of the variogram (linear, spherical,
error of the block mean is the square root of the pre¬
or other function), the presence of anisotropy
diction variance of the block B. The prediction vari¬
or non-normality, and the amount of nugget
ance is given by:
variance or noise.
(ii) The number of neighbouring data points
g2b = E{[zb-zb]2}
that are used to compute the point or block
estimate. n n n
(iii) Sampling configuration—which is most effici¬
ent—irregular sampling, or a regular square i=1 i=l j=1
grid or a triangular grid? 6.31
157
Optimal Interpolation Using Geostatistics
IA C0 reLJ re
smooth variation
.OJ
is better than ad
.2
1,3
■4—* x o
o
<A c
_ re
s "oC
bO «[
o o =3 CO o o o
‘50 X
re
CO ~o
o3 3 3
Continuous,
C O *55:
0) ^
o tA 4-4
.2 c E -p re
<o re _o
X o
hoc areas
C O d> o & re x
E re
O 4—*
CD _Q o
2" o 50 “I
E
o
C
- X
3
o3 3re at o
4-4
tA
o .E
I e-2 E -c
c -p
re cl re <a
C
X
4-4
4-4 "0
Dre1 c.2 reret-
»s1 o |
< ,£ E
a
x 5
-E x
cl re E x x re9- E.g> co x 3
surface or
tA tA
<u re iD
x tA IA CD
contours
3 3
E°-a
x pre
Gridded
E 3 o x cd
,'r O Q TJ
3 re
0)
p O P re S
’t/I 50 .3 x re 50 .E x re x re
a. 1(«; 3
« y >■ -S' 50
3 « £
tA
J2 o
>•
c2 -
p c 3 O 1—
.:
3
o
Xw
*t—
t
t-
53 o >_ 3
O x In LJ X t—» 50 tA CL O P 50 tA x o t/1
moderate
Small-
re re re re
E E E E
i/i <-n L0 <S)
surfaces
CO
>
uCD ’oQ. re _ XJ 4-4
_re l«5t O 0)
“to CO
c £
to o j*: re £7
"x o> JZ bo re
(A CJ »— 57 .O re
U_ 4-4 .2 " CO 4"4 X3 c
tz tA
E Q. CP 73
0)
OQ
,3 a. re a; re
Cj iS> C£ xi -a
‘3 CL
O' co
><
lo d> E _Q
O X
2 0
0)I— T3
0)
co
X>
3 C o x:
3
Q. c
■p
CD "O CO _S o d>
*4
w—
>_
TO
tA c c
qj re
CD 3 4—*
.E »J re
re
3
x
^
tA E
x 2
CO
x ti¬
tA £ I a V
re
O
c
o O E >A 3
re
re E
.re ""
>-o oo
X
3
3
O
° S
CL O
CO
d) c Q- o
tfl
E
tA (/)
m 1/1
re o *4_j tz 50 g. W
tA ?T re
3 -S
CO o d
CO a. O
4—>
Z5
.2 | <D ro LU CD re a> .TZ 4-» re re CO 4—1 _Q
densities
O 4—*
P O X3 73 CO E1/1 ?5 m
4-> CO *d
03 CO
1 CD <D
C CO
ttj C
J4- — re a>■ re
X
re ---3 d>■
*4 50
3
x —j
tA <A
o m CO
-a d)
0)
7= CO
CD CO
o -E
i: •—5 T3re x
fn
E x
50
3 E
co
re re
113
re q. tA
CO
O
£
l~ St: <u
•C _
u? ^
a Tj LU Q. E re CO 33 cc x
j_1 x c
j_1_ co cu o h-^ o
conserves
volumes
No, but
c tA
o o o re
X >
JH
o
re3 — tA — E
Gradual
Q. n 4-4 co 4—1 co .2
i_ Cl “O
Q. 3 m d)
CD 3 x ~o o_ cu -a co
2
X 2
X
i—
JD
if)
73
c
0 re m
co c
1— ’
3; *r~
CO
CO x— to
ig CO x
« w < u~ 03 o e? i*= -c on > <
O (0
CO o
-o re
o
j2 JO JZ.
to
o tz c
* > d>
0J o _ E
E re CO re re
x
Local
JD x c re
_o o o x
re
p
o
a m a
c
o
Deterministic
.2 o .2
c 4^ X *4-4
IA
*c ,2 M _X tA - W
~s -3 re <0
tA
ro <A re re .E COo .2 .= ,2 o
*3
a. re
jz
£ E ^ E i 1
*— E v— O co
1
E u Si & o
i— S 5 ^
to 4-> E
— w
retO 4*-»
<d f~2- xm re
4-4
o 2tA CL) O
c
10 cu E tA re g iS re
u q y> .E LU X) W LU X O tA Q
<
Pycnophylactic
CO
c <u
interpolation
.2
+—* *2L_ 3
O
vO CO
_a> XI
o
u
gg
3
tA tA 23
retA £Jo2
3 re ,5o
3 c
J= X re a> .E ~
tA 50 x: cl
ja 4-1
0)
'to
CO 3
re 50 re _>> o cl
CO jro re x o o 2
2 C£ l— x 3 E
158
Optimal Interpolation Using Geostatistics
CO <L> CD
CD
B QJ S
O U
ro c —
CO ~ c
JE JO
o XJ XJ
.2 .2 33 3 o 3 3
CO S CO X
•4-j ro ■*-' x E
'ifl -C J E 60 m\
CD
_ CD co >S
4-J > .U
c x c r qj 4-J ‘to
_
to to ~ x
ju •— TO .2 CO \~ 3 4 ^ X *i- 3 to
o ^ o "E .B X CD .2 X CD
x to £o o o o >
O TO *4-J 3 4-J -3 4-J 3
*3
ns1— ro <D E w <£> (— 4-J
v- ro .!£? .2 C
CL x C0 -2 £ ^
X) iy^ CD CD TT 4-J .2
*4—»
.rz 0
4-J 4—1
jj O CL O sc c 4-J ro X TO Q- X X CD
4-J 4-J
Q CO X 3 =3 co 3 — to
-c >-
CO to 4-J -i—
4-J 4-J
CO to 4—j
XJ XJ i— to
CD
XJ
CD
tu
co
CD
XJ
aT
3
O
"S S 3 1 g "a
CD
CD
<4-- CO 4-J co XJ H3
<4— to XJ ro
2
1—
u_ IE 3 TO 1^1 a/ "O (4— “O t
3 3 O c c
LD to L9 to tJ 5 5 s (J3
3
Xf> CD to
<u
4-J CD
ro
03 ro
t_
cu CD
s >S
ro -a XJ
>
co
E E E o O CD
cn <s> IS)
TO -r, ro x x x CD
4- J O 4—' 2Z x ^ TO ro
c co cp ro LU >- 4-4 TO QJ -3
to 0) X u t_ XJ 3 o 2 ro 4-J
ro
4-J
JO
4-J
*4 ■ 1 ■ O ro
X
5 o c <1) 2 o c
Q q. a
E “ o i—
o II O —
4-J
o ro
S2 S cu o •<—* ro o ro
CL
j~ Cl CL OJ
ro c XJ
203 c Eo .2 1
ro ■£ t_> CO TO
%m
CD Its
B o to O <D
"a CO uZ
5- “
1/1 03 ro g
2 -S
4-J
o .2
O
4-J
-E 43
O
o
4-J
3 £ CO m -3
TO (0 E 3 S g 4-J ’> ro „
xs to o .E £ . 4-J
O B
o <sj XJ
ro
.£ x +-> 3 3 o IS) CD to to —'
CD
o O
o m
CO
"O <p
O
tjr
60
2
c _3
3 ro "ro
£ 2 £ £ bo
CD ,2
_Q 4-J JZ
^ CO 3
CD
4-J
CD XJ 4-J
3 _
e m
to
c
_o
— <U 4-4 o Q- ^ -E O
Q- x ro r- O o ro . 3 ro
"a -c c br t£ <D
E m
ro cl im c. 4—J ,
TO X O *,_ £ ’^o
■a 2
J5 cc u JD
£
2
to
03 O .2
4-J
3 T
t_ iz —
3 .= S’
CO 60
ro w ro ro
ro 4-J "aJ
> c
CD
CO TO
as
E
3 t/i
X TO
to 2 re co
O
X
w
« o 4-4 CU
4-4 cu V- £
t-
>- ro JZ 3
x t_ ro </> X) c45 bo
<u
x
P to
■ — r~ =
t/i
~ 3
in 4-J X
oo a) o ^ ro
M o o. a)
ro
ro g — o a> to E
to ~3
■ CD .2 §
.E HE Sr ^ C t/| X _
+5 ’to
J5o toro -E o
60 £-
a> co
»— +-> X3
a> ro TO
4—•
TO
X)
ro 2 w
£
■£
■£1 2 CO
ro +j
rg CO
■== U o °
-2 !sp 3
ro §-ro - o
u_ tn c ro
ro
t ’4= in
CD
_ro o ro m
x 2 r
c * — 4-j 42 •2 ^
to ro xj to
<D 2 £ § g
a. -a 53 u bo
.* > cu E - 60 2 O c E 3 .> ro 6 O : :, *4. , W
tn 2
S3 2 > .2°. J£ CD
TO o
2- c — i ^4-J
u XJ
TO ^ £ >■ >—
ro c
a; o
— E — E > bo = =
V ™ 5
~ TO
O >~ *3 r- *3 £ o sz £ -o -5 CD 4—* 3 4-J o C +3 TO 03 Cl —
ro
tD CL O' o a e § > .E £ $ g CO
r ro s g ro
£
4-j
re
1—
ro
ro ,0
ro X x
as 3
to C Cl_ s
60 o O <D bo
.3 O HI L -O u CO 3
v— .2
CO ‘3 4-J
CD
O' u
-3 o >x .2
^ 4-J cu
4—
to Sm.
X
^
o
*4—» cr CO
CL 60 +C *4— <D
• “ro -a§ 2
Q_ 3
TO -o
E$
S -Q C CD
_ SC CO o O DC X c
c g S O ’co .2 CL
a> roE ^o *5g
CU
4-J
cd m
JZ
u
CO
O
4-J t£
Q-
TO
X
-x
4-J
co £ig 3
c ro
to
>.
£
E to 3 CD CO o
to >
° E “■ E U
tO
CD
2 “a
Sco -2co -o<J ro 3
5 3
3
tO
tO cC
3 ro
U
t^ '4—
£^ x ^
C JH
CD
XJ 3
fcj,:- 60 TO
<D <d <4— to E ^ O 4-J c o LJ
tO
S)
2 c c *• g
JZ o ro 3
8 ro
ro X E to
X Q-
CU
C
CO bo £ 5 m to cu 3 £ 3 3
ro o ro TO P c o
3 XJ to x: -a . .E _ro CD O
S) 1/1 m ro 33 .2
0X3$ .E o TO 4-J <d x:
CD <D to 2 Q x: 4-J Tn
to 2£ o CD tj X .Nj C £- 4-J T3
X •-
ro 60
O
CL g ro 2 ro 22
CD
o _ o. j—
o '43
ro !iz! O LC - § o •§
L_ 5 TO <D
<D OJ
-a
X
tu x
TO
o D~ -3 1— ro ,SP ° ® -p E *— ro co O c o o
Z O 5 ro g
u ro x -a t/> LU > x .5 u u => to E
60 bO
C * x 3
L—
ro
J^
+-*
o
§ roc gic
X3 o o -2 o s
CO
CD iiE r c
■; *c
o (U
O •-
E E
CO
> CD
1/15 0 0
TO
> s 1=
co ro TO ro -TO
=3 =3 3 3
XJ X3 3
X “O 60
CO ro ro ro
j— 1— )—w V— TO
CD o o e>
3 3
ro —. CD ro ro — CD ro
-Q CO JZ _Q x fO SZ x
O o > o _o o
^ to CO bo bO to —
o §
to 60
-3 £ -3
4-J E XJ
X E -TO £ •
*x ro ro .X £ ~o
_ bo
_
JZ
_ CD
bo £ to
ro
L- $ 2 TO -=• to
CO ro
60 _ 60 £
co co o ro o 4-j O ro
CD
o
u <w> u y ^ £
ro
t) 'g
—— *4— o .0
ro +j
F ro ?
O o O co
>
o
—j
ro
>
+j
to
o 2 ro
> x
o ro> g
10
° E
o 2d
V-1 4—*
4<o <D F .B
*E *3 3 X *4-J TO 4-J
0 to SC to to
*E
w E 0
—w X O X ro
L_ -3 -3
TO a> a> -3 CD
Cl
u
x
o
TO
O
a>
'tr'
a
4—»
<D
4-J O
4-J
E
O
0
4-J O
4-J
a $ to 0 CO IS)
c "a
.O 3
4—<
_TO
ro
Si
CD .E
bo CD
4-J
ro c
ro o
CD o •—
, O b/5 CD 0 4-J -5 U3 bo 4-J
^2 cus- >
co to
*—
3
l_ co JZ
CD 4-J bo
Q. Oi
.E
-LJ 03
-o ro
C
—t
4-1
ro
o CD
> 3
> to CD .E
JZ
~
Cl
*bo
i-4 o £
CO XJ g I— to LJ to
159
Optimal Interpolation Using Geostatistics
All the interpolations and simulations presented in this packages (e.g. SPLUS, GENSTAT, etc.) also include
chapter were carried out using GStat (Pebesma 1996) geostatistical methods. Pannatier (1996) has published
and PCRaster (Wesseling et al 1996). Other useful an excellent set of interactive, Windows-based pro¬
sources of cheap or free geostatistical software such grams for modelling variograms and covariograms.
as GSLIB (Deutsch and Journel 1992) and others are The reader is advised to beware of incompletely
given in Varekamp et al. (1996). Some statistical documented software in general GIS packages!
160
Optimal Interpolation Using Geostatistics
Questions
1. Explain why it is so important in kriging to have a good model of the variogram.
2. Compare ordinary point kriging and thin plate splines as methods for interpolat¬
ing elevation data to make a DEM.
3. Examine the costs and benefits of the different ways in which kriging interpolation
can be assisted by using extra hard and soft information (hint—read Chapter 10).
5. Explain how you might use block kriging to ensure that point data are interpolated
to a grid that has the same spacing and level of spatial generalization as a Thematic
Mapper remote sensing image.
161
SEVEN
Having gone to the trouble of collecting data and scales or projections, or for interpolation were dis¬
building a spatial database, the next issue is how to cussed in Chapters 4, 5, and 6. Geographical informa¬
use these data to provide information to answer ques¬ tion systems, however, provide a large range of analysis
tions about the real world. This involves a wide range capabilities that can be used in many ways. These
of methods of data manipulation from simple data analytical capabilities will usually be organized in
retrieval and display to the creation and application modular commands so that each kind of analysis can
of complex models for the analysis and comparison be performed separately, or combined with others to
of different planning scenarios. build data analysis models. The actual user interface
Some transformation capabilities, such as those can be provided through typed commands, windows
necessary for data cleaning or updating, for changing with preprogrammed buttons, or as statements in a
162
Analysis of Discrete Entities in Space
163
Analysis of Discrete Entities in Space
Operations using in-built spatial topology ing the size and value of the database. Certain opera¬
• Operations to model spatial interactions over tions also create new spatial entities, requiring the
a connected net. database to be expanded to include these new items.
Data that have been retrieved from the database can
All these operations can result in new attributes, which be displayed on the screen, plotted as a paper map,
can be attached to the original entities, thereby increas¬ or written as a computer file for future processing.
164
Analysis of Discrete Entities in Space
belong to A but not to B, and XOR (symbol v) is the the desired premises is easily produced by a multiple
exclusive OR, or the set of objects that belong to one AND query on the specified attributes to highlight
set or another, but not to both. These simple set rela¬ the matching plots:
tions are often portrayed visually in the form of Venn
diagrams (Figure 7.2). Note that logical operations can IF COST GE $200 000 AND COST LT $300 000 AND
be applied to all data types, be they Boolean, nominal, N BEDROOM = 4 AND PLOT AREA GE 3OO THEN
Two simple examples illustrate the principles. Con¬ The selected entities are given a Boolean value of 1
sider first a spatial database used by a real estate agent. (true) if they match the specifications, and a 0 (false)
A typical retrieval query from a prospective buyer if not. Display of the results follows by assigning a new
might be the following: ‘Please show me the locations colour to entities with ITEM = 1 (cf. Plate 2.1).
of all houses costing between $200000 and $300000 Now consider a query in land suitability classifica¬
with 4 bedrooms and plots measuring at least 300 m2’. tion. In a database of soil mapping units, each map¬
If the data set contains the attributes ‘cost’, ‘number ping unit may have attributes describing texture and
of bedrooms’, ‘area of plot’, and location, a map of pH of the topsoil. If set A is the set of mapping units
165
Analysis of Discrete Entities in Space
Figure 7.2. Venn diagrams showing the results of applying Boolean logic to the union and
intersection of two or more sets. In each case the shaded zones are ‘true’
called Oregon loam (nominal data type), and if set B is is a particular instance of the logical statement ‘IF
the set of mapping units for which the top soil pH condition(C) THEN carry out specified task’.
equals or exceeds 7.0 (scalar data type), then the data Note that unlike arithmetic operations, Boolean
retrieval statements work as follows: operations are not commutative. The result of A AND
B OR C depends on the priority of AND with respect
X = A AND B finds all occurrences of Oregon to OR. Parentheses are usually used to indicate clearly
Loam with pH >= 7.0 the order of evaluation when there are more than two
X=A OR B finds all occurrences of Oregon loam, sets (Figure 7.2). For example if set C contains map¬
and all mapping units with pH >= 7.0. ping units of poorly drained soil, then X = (A AND
X = A XOR B finds all mapping units that are B) OR C returns all mapping units that are either (a)
either Oregon loam, or have a pH >= 7.0, but Oregon loam with a pH >= 7.0 or (b) units of poorly
not in combination. drained soil. The relation X = A OR (B AND C)
X = A NOT B finds all mapping units that are returns (a) all Oregon loam mapping units and (b)
Oregon loam where the pH is less than 7.0. those mapping units with a combination of pH >= 7.0
and poor drainage.
Selected entities can also be renamed and/or given Note also that Boolean operations may require an
a new display symbol (Figure 7.3a) by statements such exact match in attributes to return data and they take
as: ‘Give the designation “Suitable” to all mapping no account of errors or uncertainty unless that is spe¬
units with soil texture = “loam” and pH >= 5.5’. This cifically incorporated into the definitions of the sets.
166
Analysis of Discrete Entities in Space
COMPUTE
AND
RECOLOUR
A, A, A,
A, A 2 -4, 4 ,
,
Ai A A,-- A.
\Aj >4, A, A.
COMPUTE,
A, A, >4, A„ AVERAGE
A1 a2\a.
AND
RECOLOUR
Figure 7.3. When retrieving entities on attributes alone, or computing new attributes from old, the
spatial units of the map do not change shape, only colour or shading (top). The same occurs when
point-in-polygon search is used to find enclosed point objects, that are used to compute area
averages (bottom)
If the value of the attribute ‘elevation’ is set at 2000 m ute or condition requires the preparation of a legend
above sea level to define the class ‘mountain’ then hills and a recolouring or reshading of the selected entities
with elevations up to 1999.999999999 . . . m will be (see Figure 7.3a and Plate 2.1). When selection leads
rejected. This is not a problem with ordinal and nom¬ to adjacent polygons receiving the same code it may
inal data types but it can present problems when work¬ be sensible to dissolve the boundaries between them,
ing with scalar data types that represent quantities like thereby achieving a form of map generalization (Fig¬
elevation, pH, clay content, soil depth, atmospheric ure 7.4). Figure 7.5 illustrates the use of this option
pressure, salinity, population, and so on, that are to simplify a complex soil map.
subject to various sources of measurement error and Boolean operations are not only applicable to the
uncertainty. If the error bands on these data span non-spatial attributes of the geographical elements,
the boundary values of sets then strict application because they can also be applied to the geographic
of Boolean rules may yield results that are counter¬ location and attributes derived from the spatial or
intuitive (see Chapter 11). topological properties of the geographic entities. For
example, one might wish to find all mapping units
exceeding 5 ha in areas having soil with clay loam
Spatial aspects of Boolean retrieval on texture in combination with a pH > 7.0. More com¬
MULTIPLE ATTRIBUTES OF SINGLE ENTITIES plicated searches may involve the shapes of areas,
the properties of the boundaries of areas or the
Carrying out logical retrieval and reclassification on properties of neighbouring areas such as the areas
the non-spatial attributes of spatial entities has little of woodland bordering urban areas. In these cases,
effect on the map image, except in terms of symbol¬ the results of the search would have an effect on the
ism and boundary removal. Computing a new attrib¬ spatial patterns.
167
Analysis of Discrete Entities in Space
RECODE \ RECODE,
AND \ SELECTAND
SELECTy \ DISSOLVE BOUNDARIES
Figure 7.4. If, during a retrieval or recoding operation, two adjacent polygons receive the same
new code, boundaries between them can be dissolved, leading to map generalization
< »■*■**
s- ■,
sod-podzolic
sod-gley
meadow-swamp
peatbog
sand
alluvium
low-midgley
high-gley
wood
water
urban
unclassified
Figure 7.5. Using reclassification as a means of map generalization. Left: original soil map with 95
different units; right: reclassified map with 12 units. Note that reclassification preserves the original
geometry and that the legend and shading codes only apply to the simplified map
168
Analysis of Discrete Entities in Space
Simple and complex arithmetical operations and earth sciences, Jongman etal. (1995) for ecology,
ON ATTRIBUTES OF SINGLE ENTITIES and Plaining (1990) for the environmental and social
New attributes can be computed using all normal sciences. In this chapter we assume that statistical
arithmetical rules (+, /, *, logarithms, trigonometric methods are but one set of procedures out of many
functions, exponents, and all combinations thereof for computing new attributes.
including complex mathematical models). Arithmet¬ Simple statistical analysis can be used to compute
ical and trigonometrical operations can obviously only means, standard deviations and correlations, and
be used on scalar data and certain kinds of ordinal regression. The operations can be applied to a set of
data. Arithmetical operations on Boolean data types attributes attached to single entities or to any set of
and nominal data types are nonsensical and therefore entities that are retrieved by a logical search. For
not allowed (an expression such as X = sqrt(London) example, compute the mean and variance of nitrate
has no meaning). Arithmetical operations can be levels for all groundwater monitoring sites included
very simple, or very complicated, but in all cases the in a Province P (e.g. Figure 7.3b), or compute the
operation is the same—a new attribute (a result) is average takings for all fast food outlets in the postcode
computed from existing data. districts that include railway stations.
Some hypothetical examples of computing new As many GIS do not provide more than simple sta¬
attributes for a given administrative area (polygon) tistical functions the user who wishes to carry out sta¬
or point location are: tistical computations is advised to choose a system in
which the attributes of spatial entities can be analysed
• Population increase = Population 1990 - Popu¬
using a standard statistical package. Linking a GIS to
lation 1980
a spreadsheet program or standard statistical package
• Total spending power = Average income x
such as SPSS, Statistica, or S-Plus is an easy way to
number of persons
provide a wide range of logical, arithmetical, and sta¬
• Average wheat yield per farm = (Total yield)/
tistical operations for transforming the attribute data
(Number of farms)
of entities held in a GIS without having to have these
• Predicted wheat yield =/(crop)
computational tools in the GIS. Both spreadsheets and
where/(crop) is a complex mathematical model
statistical packages include useful graphics routines for
that computes wheat yield as a function of the
plotting graphs, histograms, and other kinds of sta¬
soil, moisture, nutrients properties of a site (point
tistical charts. The use of hypertext facilities to link
entity)
database, graphs, and map displays is providing pow¬
• Class allocation = Result( multivariate classifica¬
erful exploratory data analysis tools by which users can
tion)
examine patterns in the probability distributions and
where (multivariate classification) might be any
correspondences in attribute data in relation to the
statistical or multicriteria analysis of the numer¬
spatial distribution of the entities to which they are
ical attributes on the entity.
attached (Gunnink and Burrough 1997, Haslett et al.
For a set of river catchments the proportion of pre¬ 1990).
cipitation discharging through the outlet can be com¬ Most statistical packages provide at least the follow¬
puted by dividing the annual precipitation for each ing procedures for statistical data analysis:
catchment by the cumulative annual river discharge
• Basic statistics—means, standard deviations,
measured at the outlet.
variances, skewness, kurtosis, maxima and
Arithmetical operations can be easily combined
minima, etc.
with logical operators:
• Non-parametric statistics—median, mode,
IF (a + b)/(c) > TEST VALUE THEN upper, and lower quartiles.
CLASS = GOOD • Histograms, 2D and 3D scatter plots, Box and
whisker plots, stem and leaf plots.
• Univariate and multivariate analysis of variance.
The statistical analysis of attributes • Linear regression and correlation.
There is no space in this book to explain the principles • Principal components and factor analysis.
of univariate and multivariate statistical analysis but • Cluster analysis.
readers requiring details of these methods should con¬ • Canonical analysis.
sult a standard text such as Davis (1986) for geology • Discriminant analysis.
169
Analysis of Discrete Entities in Space
170
Analysis of Discrete Entities in Space
The results of computing new attributes or reclassifi¬ less couples, and pensioners). These attributes yield
cation are usually displayed by reshading or recolour¬ 108 possible combinations of classes which are recoded
ing the entities (Figure 7.2). As with Boolean selection, to ten core classes for the identification of character¬
the spatial properties of the entities (location, shape, istic socio-economic types. Multivariate methods
form, topology) do not change, except in the case that of class allocation are used to assign a basic spatial
neighbouring entities are reclassified as being the same, unit to one of these ten classes which are also linked
when generalization can take place. Note that if the to empirical sales data and consumer preference
data are in raster form, these operations are carried attributes obtained by questionnaire. Simple logical
out on each pixel separately, unless the data structure retrieval of spatial units in terms of their class and
uses a map unit-based approach to raster data coding attributes provides maps at local or national level
(see Chapter 3). The following sections give examples that show the spatial distribution of market opportun¬
of spatial analysis based entirely on the derivation of ities and brand preferences (Plates 2.5-2.8).
new attributes.
171
Analysis of Discrete Entities in Space
resist the effects of erosion by rain. The four attributes, as a Boolean union (AND) of slope classes and soil
available moisture, available oxygen, nutrients, and series. Examples of the rules are:
erosion susceptibility are known as land qualities
If soil series is Sj then assign nutrient quality W3
(.LQi)y and they maybe derived from primary soil and
from lookup table Lw.
land data using simple logical transfer functions
If soil series is S2 and slope class is 'flat' assign
derived by agronomists and soil experts. The overall
erosion susceptibility 1.
suitability of a site (its land quality for the use in¬
If soil series is S2 and slope class is 'steep’ assign
tended) is determ ined by the most limiting of the land
erosion hazard value 3.
characteristics—a case of worst takes all (FAO 1976,
McRae and Burnham 1981). Once the individual land qualities have been as¬
Figure 7.6 illustrates the complete procedure using signed, the overall suitability per polygon or pixel is
data from a soil series map and report and a digital determined by the land quality with the most limiting
elevation model of a small part of the Kisii District in (largest) value:
Kenya (Wielemaker and Boxem 1982). The study area
Suitability = maximum(LQwater, LQoxygen,
covers some 1406 ha (3750 X 3750 m) of the area
nutrients’ LQ erosion)
mapped by the 1:12 500 detailed soil survey of the
Marongo area (Boerma et al. 1974) and was chosen This gives suitabilities of poor—at least one serious
for its wide variety of parent material (seven major limitation; moderate—no severe but at least one mod¬
geological types), relief (altitude ranges from 4700 to erate limitation and good—no limitations.
5300 feet a.s.l.—1420-1600 m) and soil (12 mapping It is simple to repeat the analysis for the same area
units). Detailed soil survey information describes par¬ using other values of the conversion factors relating
ent material, soil series, soil depth to weathered bed¬ soil to the land qualities to see how the areas of suit¬
rock, stoniness and rockiness and tabular information able land may increase as limitations are dealt with by
relating soil series to land qualities. Each attribute was irrigation, mulching, or terracing. This scenario explo¬
digitized as a separate polygon overlay and converted ration can be achieved by replacing the real data with
to a 60 x 60 array of 62.5 m square cells. The digital values of the land characteristics that represent a more
elevation model was obtained by interpolation from degraded or an improved situation. In this way one
digitized contours and spot heights and converted can use the logical model to explore how the suitabil¬
to local relief (minimum 40 m, maximum 560 m). ity of an area depends on the different factors. One
Information about the climate, the chemical status must not forget, however, that the results are no bet¬
of the soil, the land use and cultural practices is ter than the data and the insight in the land evalua¬
also available (Wielemaker and Boxem 1982). Slope tion procedure allow.
lengths (needed for estimating erosion) were inter¬
preted from stereo aerial photographs and digitized
as a separate overlay. Temporal modelling of attributes of
holder maize, which is determined by the land A fundamental problem with simple rule-based mod¬
qualities nutrient supply, oxygen supply, water supply, elling is that it takes no account of changes of the soil/
and erosion susceptibility. These land qualities can landscape resource over time. We might expect that
be ranked by assigning values of 1, 2, 3 respectively the growing of maize on bare plots might lead to soil
to the following classes: erosion and degradation of the nutrient status, which
even on initially suitable sites, might lead to reclassifi¬
No limitation assign 1 cation to a poorer suitability class. So it would be use¬
Moderate limitation assign 2 ful to be able to estimate the suitability of the study
Severe limitation assign 3 area for smallholder maize for a time forty years hence,
assuming that in the interim period the farmers have
The rules for deriving the land qualities are: water been growing maize over the whole of the area simply
availability is a Boolean union (AND) of soil depth to feed the population which in 1985 was growing at
and soil series (B); oxygen availability and nutrient more than 4 per cent per annum. Investigating which
availability can be derived directly from the soil series simple conservation methods the farmers can imple¬
information by recoding using a lookup table (L). ment themselves in order to protect the soil from ero¬
Erosion susceptibility or hazard can be determined sion, and determine in which parts of the area these
172
Analysis of Discrete Entities in Space
Bl \ Bl
MIN
Bl - Boolean Intersection;
L - look up table; Poor
MIN minimize Moderate
Good
Suitability classes
Figure 7.6. The flowchart of operations for ‘top down’ land evaluation for determining the
suitability of land to grow maize
173
Analysis of Discrete Entities in Space
Figure 7.7. Flowchart of operations of land evaluation including an erosion model to update
estimates of erosion susceptibility over time
conservation methods are most likely to be successful by erosion, (b) absolute rates of erosion, and (c) the
would be an added bonus. benefits of the trash lines. The information about
This problem is common: data exist for the present new soil depths, rates of erosion, and the benefits of
situation, or for some time in the recent past. The land¬ trash lines could be incorporated into a new, future
scape changes, data become obsolete, and the changes ‘soil conditions for maize’ overlay that would replace
will be complex and not totally predictable. We can the erosion overlay used in the previous example.
gather insight into what might happen in the future At the same time, the intersection of the new depth
by introducing the concept of time into the data ana¬ and the soil series maps could give new estimates
lysis in order to explore quickly and easily what might of the nutrient availability, oxygen availability, and
happen. water availability (Figure 7.7). The simplest way to
In this example we assume that erosion is the great¬ introduce time into the modelling process is to repeat
est problem for continuing success with maize grow¬ the calculations for t time steps, using the results of
ing, in spite of the fact that erosion is not a serious one time step as input for the next.
problem in the Kisii area today. It is well known, how¬ Two different empirical erosion models were used
ever, that erosion is often a result of intensifying agri¬ to simulate the soil losses, namely the Universal Soil
culture, and in the Kisii area the local farmers had Loss Equation (USLE) developed by Wischmeier and
already been encouraged to make use of maize stalks Smith (1978) and the Soil Loss Estimation Model for
to construct Trash lines’ along the contours in order Southern Africa (SLEMSA) developed by Stocking
to reduce soil losses (Wielemaker and Boxem 1982). (1981). These empirical methods were used because
So the main problem is one of how to estimate soil they are well known, easy to compute, and the data
losses. This can be done by including an empirical for both was already in the database (see Box 7.2 and
erosion model into the land evaluation procedure in 7.3). The disadvantage of these empirical models is
order to estimate (a) how soil depths might be reduced that they are a gross oversimplification of the real
Analysis of Discrete Entities in Space
The Universal Soil Loss Equation (USLE—Wischmeier and Smith 1978) predicts erosion
losses for agricultural land by the empirical relation:
A = R * K * L* S* C* P
where A is the annual soil loss in tonnes h'\ R is the erosivity of the rainfall, K is the
erodibility of the soil, L is the slope length in metres, S the slope in per cent, C is the
cultivation parameter, and P the protection parameter.
R factor. R = 0.11 abc + 66 where a is the average annual precipitation in cm, b is the
maximum day-precipitation occurring once in 2 years in cm, and c is the maximum total
precipitation of a shower of one year occurring once in 2 years, also in cm.
L factor. L = (//22.i)1/2 where / is the slope length in metres (Wischmeier and Smith 1978)
5 factor. S = 0.0065s2 + 0.0454s + 0.065 where s is the slope as per cent (Smith and
Wischmeier 1957).
The Soil Loss Estimation Model for Southern Africa (SLEMSA- Elwell and Stocking 1982).
Control variables:
E Seasonal rainfall energy (J/m2)
F Soil erodibility (index)
/ Rainfall energy intercepted by crop (per cent)
5 Slope steepness (per cent)
L Slope length (m)
Submodels
Bare soil condition K = exp[(0.4681 + 0.7663F). Inf + 2.884 - 8.1209F]
Crop canopy C=exp[-o.o6/]
problems of estimating erosion, but specialist models fying the slope length overlay and repeating the simu¬
are often location specific or require data that are not lations. In all, the simulations resulted in twelve new
to be found in general purpose databases. overlays, of new soil depth, of new rockiness and
The erosion models are point operations; they com¬ rate of erosion for both models with and without the
pute the soil loss per entity (which in this case is a grid effects of the trash lines. For a given model, subtrac¬
cell) without taking into account the soil losses and tion of the overlay of the rate of erosion with trash lines
gains over neighbouring grid cells. The effects of the from the overlay estimated using no trash lines gives
trash lines on the erosion were estimated by reclassi¬ a map of the benefit of terraces (Figure 7.8). Subtract-
175
Analysis of Discrete Entities in Space
USLE
Figure 7.8. Results of evaluating the Flowchart of Figure 7.7 with two different erosion models
(SLEMSA and USLE) and with different land protection practices (no protection and mulch terraces)
ing the overlays of the rates of erosion estimated under anywhere near the truth. It is well known that empir¬
the same conditions by each of the two models yields ical equations cannot easily be transported from one
an overlay of the difference in estimates provided by country to another, in spite of the many efforts that
the models. have been made to find truly ‘universal’ models. A fur¬
The results of the simulations suggest that there will ther problem is that in all these overlay operations it
be a slow degradation of the landscape in its suitabil¬ has been assumed that the original data are absolute
ity for maize over the period—the best areas decline and invariate. This is clearly not the case, even for the
from 14.1 per cent to 11.6 per cent, and marginal relatively detailed soil survey in Kisii. The problem
areas from 51.3 per cent to 50.6 per cent over the forty of error levels and error propagation in geographical
years. The simulations suggest that erosion will be con¬ information processing, particularly in the carto¬
centrated in certain areas, and that in some of these graphic modelling methods using choropleth thematic
the degree of erosion will be little modified by trash maps is seldom discussed by the developers and users
lines. The sensible conclusion would be to use these of many geographical information systems. The prob¬
areas for some other, more permanent crop that would lem of errors is clearly so important with respect to
support the local population in other ways. the decisions that may be taken as a result of geo¬
At first sight, the results seem quite convincing and graphical analyses that this whole question is discussed
realistic, but we have no way of knowing if they are more fully in Chapters 9 and 10.
176
Analysis of Discrete Entities in Space
177
Analysis of Discrete Entities in Space
Figure 7.9. Polygon overlay leads to an increase in the number of entities in the database.
(a) simple overlay—all boundaries are retained; (b) second map covers the first and changes
the map detail locally; (c) the covering map is used to cut out a small part of the first map
Figure 7.10. Polygon overlay can lead to a large number of spurious small polygons that have no
real meaning and must be removed
Operations on one or more entities that the database. In practice, the number of polygons
ARE LINKED BY DIRECTED POINTERS (OBjECT added or removed may be large and indeterminate,
orientation) so it is difficult to say just how great this overhead is.
Logical operations on lines and polygons that result To save modifying the original database, the changes
in entities being split or removed from the database are often only computed on a subset of the original
impose heavy computational costs on a spatial data¬ data, and the results are stored in a separate file or
base because the number of entities may depend on folder.
the operations being carried out. In practical terms, if In hybrid-relational GIS, adding and removing
two simple polygons intersect to create a third, then polygons means modifying both the spatial data and
the third and the attributes it inherits from the two the attribute data separately. Modifying the spatial data
original polygons must be added to the database. If is more than just adding or deleting an entry in a
reclassifying two adjacent polygons results in the table because all the topological connections need to
removal of a common, unnecessary boundary, then be recomputed. Most commercial GIS have adopted
two polygons must be removed and one added to compromise solutions, taking care of the modification
178
Analysis of Discrete Entities in Space
of the spatial representations with their own software putational problems by incorporating a large amount
and using commercial relational database programs of information to structure the data in such a way
for storing the associated attributes. that data volumes do not greatly change as queries are
The advantage of network-relational hybrid data¬ carried out. This means that the most common data
bases is that in principle there is no limit to the kind retrieval and analysis options need to be thought out
and number of analysis queries that can be defined. beforehand, which is why constructing an object-
Object-oriented GIS attempt to get around these com¬ oriented database can take much time.
Operations of the type ‘A is within/beyond distance Typical examples of using the zoning/buffering
D from B\ where D is a simple crow’s flight distance command with other analysis options are the follow¬
are carried out with the help of a buffering command. ing:
This is used to draw a zone around the initial entity • ‘Determine the number of fast food restaurants
where the boundaries of the zone or buffer are all dis¬ within 5 km of the White House.’
tance D from the coordinates of the original entity. If • ‘Investigate the potential for water pollution in
that is a point entity, then the zone is a circle, if a terms of the proximity of filling stations to nat¬
straight line, a rectangle with rounded ends, an irregu¬ ural waterways.’
lar line or polygon, and enlarged version of the same • ‘Compute the total value of the houses lying
(Figure 7.11). The buffer is in effect a new polygon within 200 m of the proposed route for a new
that may be used as a temporary aid to spatial query road.’
or that is itself added to the database. The determina¬ • ‘Compute the proportion of the world popula¬
tion of whether an entity is inside/outside or over¬ tion that lives within 100 km of the sea.’
laps the buffer zone is then carried out using the • ‘Compute the number of cattle grazing within 5
operations just described (including the problems km of a waterhole.’
of polygon overlay!—see Figure 7.7) and logical or • ‘Determine the potential amount of arable land
mathematical operations on those entities proceed as within 1 hour’s walk from a Neolithic village.’
before.
Figure 7.11. Generating buffer zones around exact entities such as points, lines, or polygons yields
new polygons which can be used in polygon overlay to select defined areas of the map
179
Analysis of Discrete Entities in Space
180
Analysis of Discrete Entities in Space
crow’s flight path based on simple buffering. Plate 2.2 is part of analysis of the accessibility of built-up areas
compares travel times computed from simple crow’s for emergency services (Figure 7.13—Ritsema Van Eck
flight distance obtained by buffering with those ob¬ 1993). Plates 2.3 and 2.4 show results of an analysis
tained from accumulated times over the connected of the travel times endured by commuters in the west
road network: the comparison of travel times depends of the Netherlands depending on whether they travel
on the traffic regimes at different times of the day. This by private cars or public transport.
Spatial entities can be retrieved and new attributes can sons. Such a set of commands constitutes a ‘model’ or
be computed by a wide range of logical and numer¬ a ‘procedure’ which can be stored in the GIS, refer¬
ical methods. The numerical procedures can also be enced directly by an icon or a name, and used on other
applied to inclusion and intersection problems and databases to carry out the same set of operations.
for proximity analysis and for analysis of relations None of the methods of analysis presented in this
over topological connections. The methods can be chapter pay any attention to data quality or errors;
combined to create complex models for addressing there is a tacit assumption that all data and all rela¬
many different kinds of spatial problem. Note that tions are known exactly. In spite of this (which is a
many data analysis operations are not commutative topic to which we return in Chapters 9 and 10), spa¬
so the sequence in which the commands is executed tial modelling with GIS has great value for exploring
is very important. While it can be very informative different scenarios. Rather than letting land use plan¬
to sit in front of the computer browsing through a ners loose in situations in which they have little or no
database to see what is there or how different proce¬ experience (and the increasing pressure on land in de¬
dures work (e.g. with Exploratory Data Analysis— veloping countries is giving rise to situations in which
Haslett et al. 1990, Gunnink and Burrough 1997) few have the necessary expertise to handle) it is surely
informal procedures are best for simple data retrieval preferable to train them on digital models of the land¬
and transformations. Elowever, when a complex series scapes. Just as pilots learn to fly on flight simulators
of commands must be used frequently to retrieve and architects and road designers build maquettes in
and transform data it is sensible to create a structured order to broaden their experience and develop their
command hie (using a DOS or UNIX interface or a ideas, so the land use planner can learn from ‘mistakes’
command language like the ARC-INFO AML) that made on digital landscape models before irretrievable
can be reviewed, modified, and used by several per¬ errors are made in the landscape itself.
Questions
1. Develop a simple entity-based model to analyse the effects of land use change
annually.
2. Work out a GIS-based system for the optimum location of (a) fire stations, (b)
banks, (c) health care services in cities.
4. Develop a GIS system for helping to manage the demand for building materials
required for constructing a new suburb.
181
Analysis of Discrete Entities in Space
Further reading
Batty, M., and Xie, Y. (1994a). Modelling inside GIS. Part 1: model structures,
exploratory spatial data analysis, and aggregation. International Journal of Geograph¬
ical Information Systems, 8: 291-307.
-(1994b). Modelling inside GIS. Part 2: Selecting and calibrating urban mod¬
els using ARC-INFO. International Journal of Geographical Information Systems, 8:
451-70.
Goodchild, M. F., Parks, B. O., and Steyaert, F. T. (1993). Environmental Modeling
with GIS. Oxford University Press.
-Steyaert, F. T., Parks, B. O., Johnston, C., Maidment, D., Crane, M., and
Glendinning, S. (eds.) (1996). GIS and Environmental Modeling: Progress and
Research Issues. GIS World Books, Fort Collins, Colo., 486 pp.
182
EIGHT
As we explained in Chapter 2, there are two main ways part of the standard generic tool kit of most GIS,
of representing continuous fields. The first is the although Kuniansky and Lowther (1993) report the
Delaunay triangulation (the TIN of digital elevation generation of finite element meshes by stand-alone
modelling); the second is the more common altitude macro programs in a vector GIS. Numerical models
matrix or grid used in raster GIS and image analysis. that use FEM are usually loosely coupled to the
Delaunay networks are often used outside GIS to sup¬ GIS, with the GIS being used to assemble the data
port the finite element modelling of dynamic flow and pass them to the model via an interface. The
processes in groundwater movement (MODFLOW results from the model are returned to the GIS,
—McDonald and Harbaugh 1988), discharge over converted to a square grid or contour lines for ease
floodplains (e.g. Gee etal. 1990) or air quality (Fedra of handling, and then overlaid on digitized base maps
1996). Finite element modelling (FEM) is not usually for display.
183
Spatial Analysis Using Continuous Fields
Finite element modelling is outside the scope of this to an altitude matrix (i.e. a gridded digital elevation
book. Here we describe the operations for spatial model), the z attribute can represent any other con¬
analysis of continuous fields that are represented by tinuously varying attribute or regionalized variable,
the regular square grid, where each attribute is rep¬ such as levels of pollutants in soil, atmospheric pres¬
resented by a separate overlay, and each grid cell is sure, annual precipitation, an index of marketing
allowed to take a different, scalar value (Chapter 3). potential, population density, or the costs of access to
Note that although in many examples we often refer a given location.
184
Spatial Analysis Using Continuous Fields
r 1 r r t " 1
1
T
2 3 4 3 2 4
J
1 3 5 3 6 3
G + [_
3 4 6
J |_
5 4 4
1
n , I
r 1 f 1
POINT
1
i 5 5 8
T FUNCTIONS
1 4 9 8
+ -/ #
1 sin, cos, tan,
i 8 8 10
Ln, Log, min, max,
.
185
Spatial Analysis Using Continuous Fields
Interpolation
Interpolation is the prediction of a value of an attrib¬ continuous grid. All other methods discussed here
ute z at an unsampled site (x0) from measurements assume that the grid has already been created. Re¬
made at other sites xt falling within a given neigh¬ member that the range of the variogram can be used
bourhood. As shown in Chapters 5 and 6, interpola¬ to define the radius of the interpolation search radius
tion is used to create discretized continuous surfaces (Chapter 6).
from observations at sparsely located points or for Interpolation is often a complicated operation and
resampling a grid to a different density or orientation while interpolation operations could in principle be
as in remote sensing images. Interpolation can be expressed in a mathematical command language most
seen as a particular class of spatial filtering where users will encounter specialist packages so that stand¬
the input data are not necessarily already located on a ard terminology cannot be used.
186
Spatial Analysis Using Continuous Fields
or for a 5 x 5 window: plex map (Figure 8.6) but note that smoothing a
gridded image with a modal filter is a different kind
of operation than the procedure of generalizing a
1 1/65 2/65 3/65 2/65 1/65 map by reclassifying the attributes and merging the
soil polygons given in Chapter 6.
2 2/65 3/65 4/65 3/65 2/65
187
Spatial Analysis Using Continuous Fields
(a) original (b) low-pass 3x3 window (c) high-pass = original - low-pass
Figure 8.6. Smoothing a complex polygon map with median smoothing aggregates areas but does
not reduce the number of classes (cf. Figure 7.5)
188
Spatial Analysis Using Continuous Fields
0 1 0
1 -4 1
0 1 0
189
Spatial Analysis Using Continuous Fields
Because the gridded surface is supposed to be math¬ much-used second-order finite difference method
ematically continuous it is in principle possible to (Fleming and Hoffer 1979, Ritter 1987, Zevenbergen
derive the mathematical derivatives at any location. and Thorne 1987) uses a second order finite difference
In practice, because the surface has been discretized, algorithm fitted to the 4 closest neighbours in the
the derivatives are approximated either by comput¬ window. This gives the slope by
ing differences within a square filter or by fitting a
tan S = [(8z/8x)2 + (8z/8y)2]0'5 8.7
polynomial to the data within the filter.
The two first order derivatives are the slope and the where z is altitude and x and y are the coordinate axes.
aspect of the surface; the two second order derivatives
The aspect is given by
are the profile convexity and plan convexity (Evans
1980). Slope is defined by a plane tangent to the sur¬ tan A = - (8z/8y)/(8z/8x) (-p <A<p) 8.8
face as modelled by the DEM at any given point and Zevenbergen and Thorne (1987) show how these at¬
comprises two components namely, gradient, the tributes and the convexity and concavity are computed
maximum rate of change of altitude, and aspect, the from a six-parameter quadratic equation fitted to the
compass direction of this maximum rate of change. data in the kernel—see Box 8.1.
The terms just used follow the terminology of Evans A third-order finite difference estimator using all
(1980); many authors use ‘slope’ to mean ‘gradient’ eight outer points of the window given by Horn (1981)
as just defined (e.g. Peuker et al. 1978, Marks et al. is:
1984) and ‘exposure’ for ‘aspect’ (e.g. Marks et al
1984). Gradient and aspect are sufficient for many for the east-west gradient,
purposes, being the first two derivatives of the altitude [8zl8x] = [ (z1+w+1 + 2 zi+w + zi+hhl)
surface or hypsometric curve, but for geomorpholog- ~ (Zf-ij+1 + 2z,_w + Zj_ljH)]/88x 8.9
ical analysis, the second differentials convexity—the
rate of change of slope expressed as plan convexity and for the south-north gradient
and profile convexity—and concavity (i.e. negat¬ [8zl8y] = [(z,+w+1 + 2z,.j+1 + z;_w+1)
ive convexity) are also useful (Evans 1980). Gradient - Oh-w-i + 2Z,,H + z,_w-i)]/8&c 8.10
is usually measured in per cent, degrees, or radians,
Alternative methods fit a multiple regression to the
aspect in degrees (converted to a compass bearing),
nine elevation points in the 3x3 window and derive
while convexity is measured in degrees per unit of
the slope and aspect from that.
distance (e.g. degrees per 100 m).
190
Spatial Analysis Using Continuous Fields
1—1
n r_i—i
1
1 6 3 4
1. |
I
1
4 1 2 1
1 3 1 2
1
. 1
Li_ L mm La mmmmm \mmmm mmM
Plan curvature
PrC = 2( DG2 + EH2+ FGH)/(G2 +H2) PIC = -2( DH2 + EG2- FGH)/(G 2 +H2)
concave = positive
convex = negative
Given the variety of methods available for com¬ polynomial method. Hodgson (1995) used Morrison’s
puting slope and aspect it is useful to know which is (1971) synthetic test surface to study algorithms
best. Skidmore (1989) reviewed six methods of esti¬ using four or eight neighbours and showed that with
mating slope and aspect, including those given above. those data algorithms employing only four neighbour¬
He concluded that both the second and third method ing cells were consistently more accurate at estimating
given above were superior to the simple algorithm slope and aspect values than eight-neighbour methods
given in equation (8.6) but that there was little differ¬ such as Horn’s. Recently Jones (1997) has carried out
ence in the results returned by Horn’s method and another analysis of eight algorithms for computing
191
Spatial Analysis Using Continuous Fields
slope and aspect using both real and synthetic DEM The reassuring news is that Horn’s method and the
surfaces. His order of preference for slope and aspect Zevenbergen and Thorne algorithm are used by sev¬
algorithms as tested by RMS residual error values de¬ eral well-known commercial GIS, so that there is gen¬
rived from the difference between that generated by eral agreement on the better algorithms. According to
the algorithm and the true values for the test surfaces Jones, the worst algorithms are consistently the Maxi¬
is as follows: mum Downward Gradient and the Multiple Linear
Regression Method.
1. Fleming and Hoffer (1979)/Ritter (1987)/Zeven-
bergen and Thorne (1987) 4-neighbours (best Displaying maps of slope and aspect After the ap¬
for smooth surfaces) propriate derivative has been calculated for each cell
2. Horn (1981) 8-neighbours (better than 1 for in the altitude matrix the results may need to be clas¬
rough surfaces) sified in order to display them clearly on a map. This
3. ‘One over distance weighting’ (see Chapter 5) is very often achieved by means of a lookup table in
8-neighbours which the appropriate classes and their colour or grey
4. Sharpnack and Atkins (1969)—8-neighbours scale representation have been defined. The value in
5. Ritter’s diagonal method (Ritter 1987)—4- each cell is compared with the lookup table, and the
neighbours appropriate grey tone or colour is sent to the output
6. ‘Simple method’ 3-cells device. Today, with high-resolution display devices it
7. Maximum downward gradient method (Travis is easy to obtain good images that the human eye can
etal. 1975). Centre cell plus 1-neighbour appreciate, but care is still needed to choose the best
8. Least squares regression. Centre cell plus 8- representation. For visual appreciation, display of
neighbours. the thematic data (slope, aspect, etc.) draped over a
192
Spatial Analysis Using Continuous Fields
digital elevation model is very effective. Figure 8.9 ginal surface—in general, roughness increases with the
gives examples of the slope, profile convexity, and plan order of the derivative. The images can be improved
convexity for a small catchment with moderate relief by drawing them as shaded or coloured maps on a
having a 30 x 30 m resolution. laser plotter or the derivatives can be smoothed by a
Aspect maps can be displayed by nine classes—one low-pass filter before the results are plotted. Smooth¬
for each of the main compass directions N, NE, E, SE, ing the DEM with a low-pass filter before computing
S, SW, W, NW, and one for flat terrain. An alternat¬ the derivatives also reduces noise but is at the expense
ive is to use a continuous, circular grey scale which is of removing the extremes from the data and under¬
chosen so that NE-facing surfaces are lightest: this gives estimates of slope angles will result. If necessary, the
a realistic impression of a 3D surface (Figure 5.12). results can be further smoothed for display by inter¬
Slope often varies quite differently in different polating to a finer grid.
regions and although adherents of standard classi¬ In systems using integer arithmetic it is unwise
fication systems usually want to apply uniform class to interpolate the altitude matrix to too fine a grid
definitions, the best maps are produced by calibrat¬ because the problem of quantization noise in the data
ing the class limits to the mean and standard devia¬ (rounding off) may lead to numerical problems when
tion of the frequency distribution at hand. Six classes, estimating the slopes (Horn 1981). Interpolation from
with class limits at the mean, the mean ±0.6 standard digitized contour lines can also give serious system¬
deviations, the mean ±1.2 standard deviations usually atic errors which show up in the derivatives (Chap¬
give very satisfactory results (Evans 1980; see also ter 5, Plate 4.5).
Mitasova et al. (1995) for original ways of displaying Slope and aspect maps can also be prepared from
slope information). TIN DEM’s by computing the slope or aspect for each
It is a general feature of maps derived from altitude triangular facet separately and then shading it accord¬
matrices that the images are more noisy than the ori¬ ing to the gradient class.
Before drainage basins and drainage networks could properties of real landscapes that contribute to the un¬
be analysed quantitatively, they had to be laboriously derstanding of material flows. They can be built in to
copied from aerial photographs or printed topo¬ a TIN or an altitude matrix by direct digitizing, but
graphical maps. Besides being tedious, this work they can also be derived automatically from the alti¬
inevitably led to an increase in the errors in the tude matrix (Band 1986, Hutchinson 1989, Jenson and
data. In areas of gentle relief it is not always easy Domingue 1988, Marks et al. 1984, McCormack et al.
to judge by eye on aerial photographs where the 1993, Morris and Heerdegen 1988). Automatic deri¬
boundary of a catchment should be and under thick vation of the drainage network has provided new tools
forest it may be difficult to even see the streams. for hydrologists (Beven and Moore 1994, Maidment
Even on very detailed topographical maps the 1993, 1995, Moore et al. 1991, Moore et al. 1993,
drainage network as represented by the drawn Moore 1996, van Deursen 1995) to estimate the flow
blue lines may seriously underestimate the actual of water and sediment over landscapes and to link dy¬
pattern of all potential water courses. It could be namic models of hydrological processes to GIS.
useful, for example, to be able to separate water¬ The following steps are required when automatic¬
carrying channels from dry channels at different ally deriving drainage networks from altitude matrices.
times of the year, with the information about water
coming from remote sensing and the channels from 1. Determine routing The flow of material over a
a DEM. gridded surface is determined by considering the dir¬
Drainage networks and streams, catchments (or ection of steepest downhill descent. There are several
watersheds), drainage divides or ridges are important algorithms for calculating this, called the D8 or 8-
193
Spatial Analysis Using Continuous Fields
194
Spatial Analysis Using Continuous Fields
of two strategies: cutting through or filling up. Cut¬ in terms of the received precipitation (i.e. for some
ting through one or more layers of boundary cells to levels of run-off the depression behaves as a pit and
find the next downstream cell requires enlarging the for others it is full of water and is part of the channel);
size of the search window to find a cell or series of cells McCormack etal. (1993) provide a feature-based ap¬
of the same elevation or lower as the core (pit) cell. proach for assigning drainage directions on plateaux
Once this path has been found the appropriate topo¬ and depressions.
logical links are written into the cells along the path, Once pits and plateaux have been identified then
irrespective of their true elevation. Filling up involves the DEM can be adjusted so that they have been re¬
increasing the elevation of the core cell until it is equal moved. For large and complex data sets with large area
to one or more of its neighbours, and then examining pixels a practical alternative is to digitize the river
if the neighbour drains downhill to another destina¬ network, convert the river vectors to grid cells, and
tion. If this does not happen the elevation is increased then ‘burn in’ the river cells at a lower level in the
again until a linkage is found. drainage network.
Pit removing is an interactive process regarded
by some as a necessary evil. Hutchinson (1989) has
developed a spline interpolator for ensuring that pits
Example of a generic command for extracting
do not occur, but it is not always sensible to remove
all pits automatically because closed and semi-closed surface topology from a gridded DEM.
depressions may be real features in some landscapes.
Iddmap = (dem.map, a, pi, p2, p3, p4,...)
MacMillan et al (1993) give a detailed description of
an application in Alberta, Canada, where the study where Iddmap is the derived topology, a is
of surface water storage after the spring snow melt
the algorithm used, and pi, p2, P3,... are
depends on the presence of many partially intercon¬
nected depressions. Van Deursen (1995) provides a
parameters for removing pits according to their
method for terminating the pit removal process for outflow depth, core volume, core area, etc.
given values of depression volume, area, or depth and
model because it explicitly contains information about S(Ci) = S(Ci) + ^(cw) 8.12
the connectivity of different cells. This knowledge U
195
Spatial Analysis Using Continuous Fields
(c) Cumulative upstream elements, before pit (d) Cumulative upstream elements, after pit
removal removal
flow over an ideal surface. For example, it is easy to and Kirkby 1979). Figure 8.12a shows a wetness map
compute a mass balance for each cell in terms of draped over the DEM from which it was derived.
196
Spatial Analysis Using Continuous Fields
Other products that can be derived from the all cells in the catchment and a ‘zero’ to those out¬
LOCAL DRAIN DIRECTION MAP side. This can be used as a ‘cookie cutter’ to identify
Stream channels can be defined as cells having more catchment-specific data from remotely sensed imagery
than Ncontributing upstream elements. Stream chan¬ or other sources at the same level of resolution. Us¬
nels can be determined by using a Boolean operator ing a high-pass or edge filter yields a linear catchment
such as: boundary which can be vectorized by converting the
cell representation to a chain code (Chapter 3).
Streams = if(upstreamelements ^ The Slopelength operator is similar to the accumu¬
50 then 1 else 0) 8.18 late operator but it computes a new attribute of a cell
as the sum of the original cell value and the upstream
This creates a binary map in which all cells with 50 cells multiplied by the distance travelled over the net¬
or more upstream elements are defined as belonging work, du.
to the set of‘streams’ (Figure 8.12c).
Ridges. By definition, ridges have no upstream S(Cl) = S(ct) + lnu(C*du) 8.17
elements, so selecting all cells with an upstream
elements value of 1 provides a first estimate of ridges Figure 8.13a shows slope lengths computed in this
(Figure 8.12d). way. The distance travelled can be a simple Euclidean
Catchments. Because all cells that drain through a distance depending on the size of the cells (1 * unit
given cell are part of the catchment of that cell, count¬ cell size for N-S and E-W; 1.414 for diagonals), or
ing upstream over the ldd automatically computes it can include a friction term to deal with resistances
the area and defines the catchment of the cell. A catch¬ within the cells on the network (see Spreading with
ment mask can be computed by assigning a ‘one’ to Friction, below).
197
Spatial Analysis Using Continuous Fields
Figure 8.13. Deriving slope lengths from a DEM (a) along the LDD, (b) by spreading perpendicular
to streams
Clumping
Very often the result of a Boolean selection or classifi¬ clump as distinct from others. The result is that each
cation on the attributes of cells will result in sets of contiguous group of cells is aggregated into a larger
cells that are spatially contiguous but which cannot spatial unit, which could be useful for many purposes.
be identified as being part of a spatial ‘entity’. The For example, identifying all ‘ridges’ via the upstream
clump operator examines every cell to see if any of element map may create several loose aggregations
its immediate neighbours in a 3 x 3 window have the of cells that belong to different ridges. Applying the
same class—if so, then both cells are assigned to the clump operator will identify each cell with a specific
same clump and given a value that identifies that clump.
198
Spatial Analysis Using Continuous Fields
This is not a window operation, but a continuous from any other linear feature such as roads or rail¬
analogue of the dilation or buffering operations on ways). Figure 8.13F gives an example of computing
exact entities. Whereas dilation (or buffering) of ex¬ ‘distance from the stream’ as a buffer for which the
act entities is usually limited to isotropic and isomor¬ friction has been computed using the slope map.
phic spreading (a buffer around a circle is just a larger
circle—Figure 8.14), spreading over a continuous
surface can be carried out heterogeneously to reflect
the variations in resistance to the spreading process Commands to create slope length and inverse
(Figure 8.15). slope length
In non-isotropic spreading, two components con¬
tribute to the cumulation of values from the starting- slopelength = spread (strm, 0, sip.map);
point. The first is distance, counted as cell steps or in slmx= mapmax\mum(slopelength);
real units. The second depends on the attributes of the
report slopelength = ((i-(slopelength/
cells through which the distance accumulation takes
slmx)) * slmx);
place. The larger the value of the ‘friction’ attribute,
the greater the accumulation of ‘distance’ when tra¬ where strm is the map of stream locations,
versing a cell. The result is that the effective spreading
sip. map is the map of slopes and slopelength
distance accumulates much faster where resistance is
greatest, so that geometrically longer paths may be is the resulting slope length perpendicular to
‘cheaper’ ways to reach a given destination. streams.
0 1 0 0 0 2 0 3 3 0 4 7 11
3 3
0 0 1 0 1 0 2 0 3 3 0 4
0 RPR FA D 1 2 6 7
0 0 1 0 0 2 0 1 3 0 2 4
0 0 1 0 0 thru b hr 3 0 1
1 6
6 0 2 4
0
L_
0 0 1 0
J
FOR C [4 3 3 0
L 1 J l_13j
8 4 0 2
_1
A B C
Figure 8.15. Spreading through a resistance function
199
Spatial Analysis Using Continuous Fields
upstream elements for such a surface can indicate the by the ldd map, or by a subset of the ldd. When
best routes to follow. the resistance varies over the network this can reveal
Instead of spreading over all cells, non-isotropic areas where flow problems might accumulate.
spreading can be confined to follow the routes defined
Three, strongly related methods concern the com¬ into account features such as woods or buildings in
putation of the paths of light between a light source the true landform and the results may need to be in¬
on or above the DEM and its effect at other locations. terpreted with care. In some cases the heights of land¬
Whereas all the spatial analysis methods discussed scape elements may be built into a DEM in order to
so far concern the attribute value of cells, or the model their effect on intervisibilty in the landscape.
differences in attribute values between cells in the Viewsheds can also be calculated from TINs (De
plane of the map overlay, the following operators are Floriani and Magillo 1994; Lee 1991b).
concerned with establishing new attributes that refer Estimating the intervisibility of sites is an import¬
to the three-dimensional form of the continuous ant GIS application in simulators for pilot training,
surface. The methods are for the computation of line the location of microwave transmission stations, the
of sight (determining the viewshed), for computing appreciation of scenery, or the location of forest fire
surfaces with shaded relief for quasi-3D display, and warning stations. The effects of errors in the DEM on
for computing the diurnal or annual inputs of solar the computation of viewshed have been studied by
energy. Fisher (1995).
This is the simplest operation; the aim is to determine Cartographers have developed many techniques for
those parts of the landscape that can be seen from a improving the visual qualities of maps, particularly
given point. Intervisibility is often coded as a binary with respect to portraying relief differences in hilly and
variable—0 invisible, 1 visible. The collective distri¬ mountainous areas. One of the most successful of these
bution of all the ‘true’ points is called the viewshed. is the method of relief shading that was developed
Over large distances it is necessary to take the curva¬ largely by the Austrian and Swiss schools of carto¬
ture of the earth into account and also the transpar¬ graphy and which has its roots in chiaroscuro, the
ency of the atmosphere may be important. technique developed by Renaissance artists for using
Determining intervisibility from conventional light and shade to portray three-dimensional objects.
contour maps is not easy because of the large number These hand methods relied on hand shading and air¬
of profiles that must be extracted and compared. brush techniques to produce the effect desired; con¬
Intervisibility maps can be prepared from altitude sequently the end-product, though often visually very
matrices and TIN’s using tracking procedures that are striking, was very expensive and was very dependent
variants of the hidden line algorithms already men¬ on the skills of the cartographer who, one suspects, was
tioned. The site from which the viewshed needs to be often also something of a mountaineer.
calculated is identified on the DEM and rays are sent As digital maps became a possibility, many carto¬
out from this point to all points in the model. Points graphers realized that it might be possible to pro¬
(cells) that are found not to be hidden by other cells duce shaded relief maps automatically, accurately, and
are coded accordingly to give a simple map (Figure reproducibly. Horn (1981) has given an extensive and
8.16). Because DEM’s are often encoded directly from thorough review of the developments and methods
aerial photographs, the heights recorded may not take that have been tried. The principle of automated
200
Spatial Analysis Using Continuous Fields
shaded relief mapping is based on a model of what the According to Horn (1981), the following method
terrain might look like were it to be made of an ideal is sufficient to generate shaded relief maps of reason¬
material, illuminated from a given position. The final able quality. The first step is to compute the slopes p,q
results (e.g. Chapter 5, Figure 5.12b) resemble an aerial at each cell in the x (east-west) and y (south-north)
photograph because of the use of grey scales and con¬ directions, as given in equations 8.9 and 8.10 above.
tinuous tone techniques for portrayal, but the shaded These values are then converted to a reflectance value
relief map computed from an altitude matrix dif¬ using an appropriate ‘reflectance map’. This is a graph
fers from aerial photographs in many ways. First, the relating reflectance to the slopes p,q for the given
shaded relief map does not display terrain cover, only reflectance model used. Horn suggests that the follow¬
the digitized land surface. Second, the light source is ing formulations for reflectance give good results:
usually chosen as being at an angle of 45° above the
horizon in the north-west, a position that has much (i) R(p,q) = V2 + lh(p' + a)/b
more to do with human faculties for perception than where p' = (p0p + q0q)/^(p20 + q20) 8.19
with astronomical reality. Third, the terrain model
is usually smoothed and generalized because of the is the slope in the direction away from the light source.
data-gathering process and will not show the fine For a light source in the ‘standard cartographic posi¬
details present in the aerial photograph. tion’ (45° above the horizon in the NW) p0 = 1/V2
The shaded relief map can be produced very sim¬ and q0 = -1/V2.
ply. All that is required are the estimates of the orien¬ The parameters a and b allow the choice of grey
tation of a given surface element (i.e. the components values for horizontal surfaces and the rate of change of
of slope) and a model of how the surface element will grey with surface inclination, a - 0, and b = 1/V2 are
reflect light when illuminated by a light source placed recommended.
45° high to the NE. The apparent brightness of a sur¬
face element depends largely on its orientation with (ii) R(p,q) = V2 + lh(p' + a)/^l(b2 + (p'+ a)2) 8.20
respect to the light source and also to the material.
Glossy surfaces will reflect more light than porous maps all possible slopes in the range 0-1.
or matt surfaces. Most discussion in the development Some formulations for reflectance are computa¬
of computed shaded relief maps seems to have been tionally complex and it may be more efficient to cre¬
generated by the problem of how to estimate reflect¬ ate a lookup table for converting slopes to reflectance.
ance (Horn 1981). The reflectance value for each cell is then converted
201
Spatial Analysis Using Continuous Fields
Irradiance mapping
202
Spatial Analysis Using Continuous Fields
as a complex function of atmospheric absorbers and variation of irradiance (Figure 8.18). Note that the sky
scatterers (clouds, dust). Atmospheric transmittance view factor limits the direct irradiation when the sun
also increases with altitude because of a decrease in the falls below the horizon and this needs to be included
number of absorbers and scatterers. in the calculations (i.e. direct irradiation is only com¬
For an atmospheric transmittance T0 the direct putable for those parts of the terrain directly illumin¬
irradiance I on a slope is given by: ated or in the ‘viewshed’ of, the sun).
Note that it is a simple matter to integrate the daily
I = cosiS0 exp(-To/cos0o) or monthly estimates of irradiance for a whole season
= [cos00cosf3 + sin#0sin/3cos(</)0 - A]S0 or year, and thereby to create a map that distinguishes
x exp(-T0/cos6>0) 8.21 sites in terms of the energy inputs for plant growth,
home heating, or rock weathering. Combining a clas¬
where cosi is the cosine of the solar illumination on sified map of warm and cold sites with a map of site
the slope, S0 is the exoatmospheric solar flux, 60 is the wetness derived from the upstream elements (see
solar zenith angle, (f)0 is the solar azimuth, A is the azi¬ equation 8.14) enables maps of warm-wet, warm-dry,
muth of the slope, and ft is the slope angle. Because cold-wet, and cold-dry conditions to be made. Fur¬
both p and A are derived from digital elevation data, ther combination with soil information and vegeta¬
equation 8.19 describes the spatial variation of the tion indices from classified satellite images can quickly
dominant component of incident solar radiation (ir¬ provide a detailed reconnaissance model of the land¬
radiance). As cosi varies with time of day and year, it scape which can be used for hypothesis generation and
is possible to compute both the spatial and temporal fieldwork planning.
203
Spatial Analysis Using Continuous Fields
Table 8.1. Summary of attributes that can be computed from DEMs and their application
Elevation Height above mean sea level or local Potential energy determination; climatic variables —pressure,
reference temperature, vegetation and soil trends, material volumes,
cut and fill calculations.
Slope Rate of change of elevation Steepness of terrain, overland and sub-surface flow, land
capability classification, vegetation types, resistance to uphill
transport, correction of remotely sensed images
Aspect Compass direction of steepest Solar irradiance, evapotranspiration, vegetation attributes,
downhill slope correction of remotely sensed images
Profile curvature Rate of change of slope Flow acceleration, zones of enhanced erosion/deposition,
vegetation, soil and land evaluation indices
Plan curvature Rate of change of aspect Converging/diverging flow, soil water properties
Local drain Direction of steepest downhill flow Computing attributes of a catchment as a function of stream
direction (Idd) topology. Assessing lateral transport of materials over locally
defined network.
Upstream Number of cells/area upstream of a Catchment areas upstream of a given location (if the outlet,
elements/area/ given cell/upslope area per unit width area of whole catchment), volume of material draining out of
Specific catchment of contour catchment
area
Stream length Length of longest path along Idd Flow acceleration, erosion rates, sediment yield
upstream of a given cell
Stream channel Cells with flowing water/cells with Flow intensity, location of flow, erosion/sedimentation
more than a given number of upstream
elements
Ridge Cells with no upstream contributing Drainage divides, vegetation studies, soil, erosion, geological
area analysis, connectivity.
Wetness index ln(specific catchment area/tan(slope)) Index of moisture retention
Stream power index specific catchment area * tan(slope) Measure of the erosive power of overland flow
VV m
Sediment transport sin(3 Characterizes erosion and deposition processes (cf. USLE)
index (LS Factor) (n+i)
{22.13 j v 0.0896
Catchment length Distance from highest point to outlet Overland flow attenuation
Viewshed Zones of intervisibility Stationing of microwave transmission towers, fire watch
towers, hotels, military applications
Irradiance Amount of solar energy received per Vegetation and soil studies, evapotranspiration, location of
unit area energy-saving buildings, shaded relief
204
Spatial Analysis Using Continuous Fields
This chapter has demonstrated that there is a large input their networks explicitly as topologically con¬
range of products that can be derived from continu¬ nected lines or objects.
ous surfaces that have been discretized as regular grids The most commonly encountered continuous field
(Table 8.1). Some derived data, such as slope, aspect, is the digital elevation model, and most of the deriva¬
and line of sight, hill shading and irradiance, can also tives mentioned above have a direct bearing on the use
be easily obtained from triangular irregular networks and interpretation of terrain elevation. The operators
but it is not usual to find systems using TINs that pro¬ presented can be used on any continuous field, how¬
vide facilities for deriving the topology of the surface ever, such as remotely sensed images or the results of
and for computing material flows over these derived interpolation or spatial modelling, as will be described
networks. Frequently, the users of TINs will have in the following examples.
The rest of this chapter presents several examples that forests, and (d) the post-Chernobyl investigation of
demonstrate the extra value of including spatial op¬ links between summer flooding and increased 137Cs
erations on continuous fields in GIS analysis. The pro¬ levels in soils and vegetation in riparian areas of the
vision of operators that can be used to analyse spatial northern Ukraine. The aim of the examples is to dem¬
interactions and the creation of surface topology opens onstrate how the spatial operators can be used rather
up many new possibilities, particularly those that in¬ than to present fully developed spatial models.
volve the transport of materials or fluids over space.
One first thinks of hydrology where the movement
of water over drainage basins and catchments is an Spatial analysis in surface water hydrology
essential part of the process, but there are many other
areas where adding spatial interaction to models that From the material presented in this chapter it is clear
previously operated only at points or on an entity-by- that the analysis of continuous fields provides much
entity basis (lumped models) provides new predictive information for hydrological modelling, in particu¬
power and insights. Note that in this chapter, with the lar the provision of slope and aspect information,
exception of the computation of irradiance, we limit the derivation of drainage nets, the computation of
the discussion to static modelling of spatial inter¬ hydrological indices, and the delineation of catch¬
actions. Linking time series data with spatial models ments. Together with other data on potential and
opens up many more exciting possibilities, but goes actual evaporation, land cover, and soil types, a GIS
beyond the scope of this book. Dynamic modelling can provide much input data for running existing sur¬
in a GIS environment is explained in the companion face water hydrological models such as TOPMODEL
volume (Burrough and van Deursen 1998). (Beven et al. 1984) and ANSWERS (De Roo et al.
The areas examined include (a) simple hydrolog¬ 1992). Beven and Moore (1994) andMaidment (1993,
ical modelling of catchments, (b) the modelling of 1996) provide more details.
erosion hazards with additional help from remote Even with a simple raster GIS and a limited com¬
sensing, (c) the optimization of timber extraction from mand language structure it is possible to carry out
205
Spatial Analysis Using Continuous Fields
simple analysis of hydrological surface processes. the movement of water surpluses from cell to cell as a
Knowing the topological linking between grid cells flow process (Box 8.2).
not only allows material transport from cell to linked Figures 8.19a,b show how the surplus rainfall from
cell to be calculated, but also for that transfer to simulated rainfall events high up or low down in the
be modified by some other property of the cell. For catchment of Figure 8.11 will activate the drainage
example, accumulating the water surplus from a mass system down to the catchment outlet. When combined
balance for each cell over the net allows one to model with data on land cover, infiltration, and evaporation,
206
Spatial Analysis Using Continuous Fields
such an analysis can be useful for examining scenarios is physically unrealistic. This aspatial approach to
of rainfall run-off under different kinds of land use. soil erosion has been reinforced by the many field
A similar approach allows the easy estimation of experiments using Wischmeier plots in which the
pollutant dispersal from point sources if the pollut¬ plot of land being studied for its erosion susceptibil¬
ant is allowed to leak into streams. The procedure is ity has been artificially cut off from the surrounding
shown in Figure 8.20 as flowchart and example. landscape by a wall of bricks or tiles.
Erosion as a process, rather than as a descriptor, is
more properly treated using a data model of continu¬
Modelling soil erosion hazards ous variation than as an attribute of crisp entities. The
Until relatively recently, soil erosion has been treated simplest way to introduce sediment transport into
by many site managers as a static process, in the sense
that each site has been evaluated separately and little
attention has been paid to the overland transport
of sediment. As shown in the previous chapter, soil
erosion models like the USLE (Wischmeier and Smith
1978), SLEMSA (Stocking 1981), or the Morgan,
Morgan, and Finney Method (Morgan etal. 1984) are
site-specific point models. Their calculations ignore
both the transport and deposition of eroded soil.
Models incorporating overland transport of sediment
have been developed but specialists may be needed
to run them, particularly if they are written as self-
contained modules in FORTRAN or similar program¬
ming languages. Also, using site-specific models like
the USLE (see Chapter 7) to compute soil erosion over
landscapes modelled as sets of independent entities
(polygons or pixels) ignores all spatial interactions
between the polygons or cells and clearly this approach
20 7
Spatial Analysis Using Continuous Fields
erosion modelling is to treat eroded soil as a material of soil particles by raindrop impact and the transport
that can flow over the ldd network in a manner sim¬ of these particles by overland flow, but does not con¬
ilar to water. The potential erosion for each cell may be sider splash transport or detachment by run-off.
computed using the point models. Then the transport The water phase uses mean annual rainfall to deter¬
capacity of each cell determines how much of the mine the energy of the rainfall for splash detachment
potentially erodible soil can be moved to the neigh¬ and the volume of run-off. The volume of overland
bouring cell. If the cell has attributes that retard over¬ flow is computed using the ‘mean rain per rain day’
land flow then little material will be transferred, and the ‘soil moisture storage capacity’. The water
and if the cell has bare land that cannot offer much phase does not include exchange of water between the
resistance to lateral flow, then much or all of the po¬ grid cells.
tentially erodible soil will be transported. As each cell The sediment phase of the model considers soil par¬
is topologically connected to upstream neighbours it ticle detachment by rainsplash and transport of soil
will also receive sediment. If the amount of sediment particles by run-off. Detachment is modelled as a func¬
received is greater than that discharged, then deposi¬ tion of rainfall energy using the results of an empirical
tion occurs; if not, then there is net erosion. The trans¬ study of Meyer (1981), modified account for rainfall
port capacity of the network depends on geometrical interception by a vegetative cover (Laflen and Colvin
aspects of the landform, such as slope, length of slope, 1981). In the sediment phase the transport capacity is
and roughness, and also on the potential and kinetic calculated as a function of the volume of overland flow,
energy of the water falling on it. the slope steepness, and the effect of crop cover. USLE
It is fairly easy to add a transport component to the C-factor values (Wischmeier and Smith 1978) are used
USLE once an ldd network has been established, and to account for the effects of crop cover.
the model immediately demonstrates the phenom¬
enon of deposition in valley bottoms, which the point
model could never manage. By highlighting com¬ Data sources
binations of steep slopes, shallow, unprotected soils
and aggressive surface flow, the modified models are Climatological data. A distributed approach to model
much better at indicating the location of erosion erosion requires distributed rainfall data, but these are
potential over large areas than their point equivalents not always available, though radar mapping of rain¬
(cf. Desmet and Govers 1996; Mitasova et al. 1996). fall is improving greatly. Lumped climatological data
De Jong (1994) also demonstrated the benefit of for the Ardeche province were gathered from pub¬
using surface topology to improve the prediction of lished sources (CMD 1984). As the multi-temporal
soil erosion hazards from DEMs, remotely sensed land satellite images used in this study were acquired in
cover data and numerical modelling with his model 1984, the annual rainfall figures of that year were used.
called SEMMED. Soil physical properties. The French regional soil map
(Bornand et al. 1977) at a scale 1 : 100 000 was digit¬
SEMMED When assessing erosion hazard for large ized and rasterized into a spatial continuous raster
areas it is useful if data from remote sensing can be map with a cell size of 30 x 30 m, matching the pixel
used as at least one main input to the model because size of the TM-images. Soil properties required to
its continuous variation and resolution in both space compute the soil moisture storage capacity were gath¬
and time may more appropriately reflect the nature ered from field data supplemented by literature data
of spatial and temporal variation than measurements and were assigned as attributes to the soil polygon
made only at a few point locations that are linked to a entities of the digital soil map.
priori soil units (polygons). SEMMED uses continu¬ Landsat TM. Vegetation data derived from Landsat
ous function analysis to improve the prediction power TM using the NDVI relation (Tucker 1979) were used
of erosion models by using topological linkages and to compute cell-specific crop cover, evapotranspira-
data from remote sensing. tion, and interception.
SEMMED is based on the Morgan, Morgan, and Topographical factors. Contours of the 1 : 25 000
Finney Method to predict annual soil losses (Morgan topographical map (IGN 1985) were digitized and
etal 1984), modified to take account of overland flow. interpolated to yield a gridded model DEM (Figure
SEMMED separates the point process of soil erosion 8.21a). Slope steepness was calculated from this
into a water phase and a sediment phase. The model DEM using the window procedure given in equa¬
considers soil erosion to result from the detachment tion (8.6). The topological network of potential
208
Spatial Analysis Using Continuous Fields
209
Spatial Analysis Using Continuous Fields
1. Compute mean rainfall per rain day = annual rainfall volume/number of rain days.
2. Compute soil moisture storage capacity for each grid cell.
3. Compute overland flow per raster cell by combining annual rainfall, mean rain per
rain day and the soil moisture storage capacity.
4. Derive the local drain direction map of the area.
5. Accumulate overland flow over the Idd net. A correction was made to this map for the
river channels: erosion by overland flow stops as soon as the overland flow reaches
the river channel.
6. Combine the map of overland flow per rastercell (step 3) with the map of potential
overland flow (step 5); a correction was made for the amount of water that infiltrates
in the (saturated) top soil by subtracting a map with the saturated infiltration capa¬
city per cell (the infiltration map was produced by reclassifying the digital soil map).
7. Combine the result of step 6 with the slope steepness and the map of the vegetation
factor to yield a distributed transport capacity map.
entry point in the south east, and the location of valu¬ streams (in this case more than 100 upstream cells—
able trees. The distribution of trees could have been cells are 30 x 30 m).
determined by field survey or by aerial photo inter¬ 2. The costs of traversing any given cell are computed
pretation or classification of remotely sensed images. as a point operation as the sum of (i) the basic costs
to clear a path or drive along the road, (ii) the extra
The steps in the analysis 1. The usual slope and costs incurred by steep terrain, and (iii) the extra
drainage analyses yield maps of slopes and the major costs incurred by having to cross stream bottoms.
Figure 8.22. Flow chart for optimizing timber extraction from a forest
210
Spatial Analysis Using Continuous Fields
(c) (d)
Figure 8.23. The components of the costs of timber extraction (a) tree locations, (b) access road,
(c) slopes, (d) streams
Figure 8.24a shows the total costs per cell for the from the tree locations over the cumulative costs of
whole area. access.
3. The cumulative costs of access are computed by a 5. The cumulative increase in timber value over
spreading operation starting at the entry to the forest these optimum paths can be computed by calculating
using the cost of cell traverse as a friction. The full set the current market price of timber by multiplying the
of optimal paths over the resulting continuous surface tree locations map by the current price and using this
is then derived in just the same way as a stream net is as a friction value when spreading upstream over this
derived for a topographical DEM. Figure 8.24b shows subset of optimal paths. The resulting surface (Figure
the cumulative costs of access to reach all cells along 8.25) gives the cumulative increase in timber value as
the optimum paths. a function of optimum cost distance from the entry
4. The subset of the optimum paths needed to point. Subtracting the costs of access from this map
reach the trees is achieved by accumulating Tree flow’ of cumulative tree values gives a map that shows the
211
Spatial Analysis Using Continuous Fields
23600 106660
20650 93330
17700 79990
14750 66660
11800 53330
8850 39990
5900 26660
2950 13300
0 0
(a) Total access costs per cell <b> Cumulative costs of access
from exit point
Figure 8.24. (a) Total access costs per cell; (b) cumulative costs to reach each cell from exit point
areas that can be profitably extracted (Figure 8.26). growth model to show how the exploitation of the
The analysis can be repeated for different market prices forest could be conducted in a sustainable way.
to see how the profitability of timber extraction var¬
ies both spatially and in gross monetary value.
Note how the optimum cost paths can be modified The redistribution of Chernobyl 137Cs by
would be easy to modify the procedure to show not This example presents an interesting problem: data on
just tree locations but variations in timber value over the same attribute, in this case radiocaesium levels in
the area due to different tree species or different stages soil, were collected from sets of point locations in 1988
of maturity. The analysis could also be linked to a tree and 1993 (Burrough et dl. 1996). Remarkably, many
212
Spatial Analysis Using Continuous Fields
213
Spatial Analysis Using Continuous Fields
3 fodder grass
pasture
woodland/rough
KNNN veg. garden/flax
sod-podzolic
cereals
potatoes sod-gley
I I fodder
IBM builtup meadow-swamp
fe=| peat digging peatbog
■H water
sand
alluvium
low-midgley
high-gley
wood
water
urban
unclassified
2-3km
1-2km
< 1km from floods
annually flooded
Figure 8.27. Input maps for modelling the redistribution of 137Cs by flooding
a large background level of radiocaesium with strong A database of sampling locations, natural and arti¬
spatial and temporal variability. The topography is ex¬ ficial drainage, areas affected by annual spring floods,
tremely flat and the main drainage is by meandering field parcel numbers, soil series, and land use maps was
river channels that drain an area of approximately created by digitizing 1:10 000 scale paper source maps
6500 km2 in a northerly direction, uniting north of (Figure 8.27a,b). A gridded database with a resolution
the Ukraine-Belarus border to flow east as the River of 50 x 50 m was created from the basic map data for
Pripyat which joins the Dnieper River at Chernobyl. quantitative spatial analysis and interpolation. Sam¬
The whole Belarus-Ukraine border area is char¬ ples of contaminated soil were collected in 1988,1993,
acterized by extensive areas of soddy gley and peaty and 1994 at 72, 87, and 47 sites respectively and were
soils which are marked on modern and historical analysed for 134Cs and 1 vCs. The location of each sam¬
1 : 3 000 000 and 1 : 500 000 maps. The actual study ple was digitized (Figure 8.28a,b).
site measures some 10.5 x 12.5 km and covers 76.5 km2 Correction for 134Cs and decay. To study temporal
of the Dubrovitsa District of the Rovno region. changes in the levels of 13/Cs, which had possibly been
214
Spatial Analysis Using Continuous Fields
(b)
Standard Standard
deviations deviations
3 i
2.375 0.625
1.75 0.25
1.125 -0.125
0.5 -0.5
-0.125 -0.875
-0.75 -1.25
-1.375 -1.625
-2 -2
(f) 93-94
Figure 8.28. (o,b) Location of data points in 1988 and 1993; (c,d) Interpolation of 137Cs levels for
1988 and 1993; (e,f) Normalized differences computed for 1988-93 and 1993-94, respectively
215
Spatial Analysis Using Continuous Fields
caused by hydrological processes the raw data were kriging (Figure 8.28c,d) showed that in each year the
first corrected for the decay rates of the two isotopes highest levels occur in the zone that is annually flooded
using an isotopic ratio for 1986 of l37Cs/134Cs = 2:1. or within 1 km from water. However, because the soil
All analyses refer to radiocaesium data that have been samples were collected each year at different locations
converted to 1986 equivalents (Table 8.2). The data the differences in the 137Cs levels from 1988 to 1993
are strongly positively skewed and were normalized could be due to different samples from a highly vari¬
by converting to natural logarithms. All spatial analysis able population. We decided to use conditional simu¬
uses transformed data but the results have been back- lation to examine the hypothesis that measurements
transformed for display and ease of interpretation. in 1988 could have returned 137Cs levels as large as
Risk of flooding. Simple buffering of the drainage actually found in 1993 if samples had been taken in
map gave a map of site proximity to (or risk of) 1988 at the 1993 sample sites.
flooding for each location in the study area with Using the 1988 data, 100 simulations were made of
respect to major rivers, lakes, and canals (Figure 8.27c). the expected value of 13/Cs at each of the 87 data points
Geostatistical analysis: comparing data between the sampled in 1993. The procedure was repeated to
years. Maps of 137Cs levels interpolated by ordinary predict the values at the 1994 sampling sites from the
1993 data. The 100 simulated values at each grid point
were averaged to determine the expected value and
its standard deviation.
The simulated predictions and their standard devia¬
tions were used to compute a normalized index of
increase or decrease over the years between the two
sampling periods defined as
NDIF, x v > X-
(M,, - P, x „) / SDIF^ 8.22
nans
Table 8.3. Proportions of soils in the annually flooded areas and the 1 km zone
— ItiwilM
Soil group area (ha) area in flood zone (ha) % unit in flood zone
216
Spatial Analysis Using Continuous Fields
Table 8.4. Proportions of landcover classes in annually flooded areas and the 1 km zone
Landuse Area (ha) Area in flood zone (ha) % unit in flood zone
1.000
Figure 8.29 shows the normalized indices for 1988— 0.875
93 plotted against distance from water or annual 0.750
0.625
flooding and demonstrates the close association be¬ 0.500
0.375
tween proximity to flooding and large (and variable) 0.250
137Cs levels. The spatial distributions of the normal¬ 0.125
0.000
ized indices for both 1988-93 and 1993-4 were ob¬
tained by interpolating the normalized indices (Figure
8.28e,/). These were overlaid in the GIS with the maps N
217
Spatial Analysis Using Continuous Fields
of 137Cs levels in milk before and after a major flood nearly completed we found a set of NOAA images on
in July 1993 confirmed the direct effect of flooding the Internet which confirmed the flood of July 1993
on uptake into the foodchain, particularly on unim¬ and provided more support for the movement of 137Cs
proved pastures on peaty soils. After the study was by floodwaters in the Ukraine (Figure 8.30).
Conclusions
This chapter has shown a wide range of generic meth¬ act entities and continuous fields. Frequently, but not
ods for deriving new data from attributes modelled as always do the continuous surfaces represent landform.
continuous surfaces. Table 8.5 summarizes the func¬ These methods have applications in many fields, not
tional capabilities of a GIS that can deal both with ex¬ just including hydrology, but also erosion and land
Table 8.5. Functional capabilities of GIS for analysis of entities and continuous fields
Compute statistics Compute statistics (means, areas, enclosures, etc.) for points/lines/
polygons/cells
Overlay analysis Overlay and combine several maps in either vector (polygon)
or raster (grid) mode using Boolean (AND/OR/NOT) logic and
arithmetical functions/filters to manipulate both the geographic
and the attribute data
Network analysis Find shortest path along a road network with weighted route
parameters for traffic density. Link specific entities (roads, cables)
to each other and follow the network through
Digital terrain analysis Represent landform as a 3D surface. Compute slope, aspect,
intervisibility, rate of change of slope, shaded relief, direction of
flow, determine watershed boundaries.
Models Ability to interface with simulation models
218
Spatial Analysis Using Continuous Fields
degradation studies, forest management and soil and or more attributes can be modelled as a continuous
water pollution. Although all the examples relate to surface, such as surfaces of market potential, exposure
physical problems of the landscape, these kinds of to disease, or economic well-being.
analyses can also be applied to any study in which one
Questions
1. Compare and contrast the effects of (a) spatial filtering and (b) polygon reclassifi¬
cation and entity merging, for generalizing a soil or land use map.
2. Explain why digital elevation models are essential prerequisites for modelling
environmental processes. Derive a standard procedure for estimating the range of
ecological sites in an area from gridded DEMs before commencing fieldwork.
3. Explore the assumptions and difficulties in collecting data for a regional erosion
model such as SEMMED.
4. Devise a suitable set of spatial analysis operations for deriving the best location
of hiking trails in a national park, taking into account the wish to have good views
but to avoid difficult or dangerous terrain.
5. Explain how you would use the tools presented in this chapter to investigate how
the following organisms or societies might exploit a given part of the earth’s surface:
(a) Asterix’s village in Brittany.
(b) Black mould growing on the bathroom wall.
(c) Owners of fast food outlets in a city.
(d) Polynesian seafarers in the Pacific Ocean.
(e) The dispersion of plants and animals along hedgerows.
(/) Manufacturers of solar heating systems.
6. Discuss the physical factors affecting the transport capacity TC for the transport
of sediment over a landscape and explain how they affect the flow or storage of
sediment.
219
NINE
220
Errors and Quality Control
ous fields (e.g. Isaaks and Srivastava 1989, Journel provide useful tools for optimizing sampling (and
1996, Deutsch and Journel 1992, Cressie 1991) and it thereby improving value for money) and for identify¬
is time to link the ideas developed in these areas to ing the weak and strong parts of spatial analysis. A
provide a sound basis for understanding the role of good understanding of errors and error propagation
uncertainty in spatial data and spatial data analysis. leads to active quality control.
Data accuracy is often grouped according to Many GIS users conduct data analysis using the
thematic accuracy, positional accuracy, and temporal techniques presented in Chapters 7 and 8 under the
accuracy (Aalders 1996) but errors in spatial data can implicit assumption that all data are totally error free.
occur at various stages in the process from observa¬ By ‘error free’ is meant not only the absence of factu¬
tion to presentation. Errors in perception (improper ally wrong data caused by faulty survey or input, but
identification) can occur at the conceptual stage. also statistical error, meaning free from variation. In
Errors and approximations in determining the geo¬ other words, the arithmetical operation of adding two
graphical location depend on surveying skills, the maps together by means of a simple overlay implies
provision of hardware (GPS satellites, laser theodolites, that both source maps can be treated as perfect, com¬
etc.) and the choice of map projections and spheroids. pletely deterministic documents with uniform levels
Errors in the measurement of attributes, due to varia¬ of data quality over the whole study area. This view is
tion in the phenomenon in question, the accuracy of imposed to a large extent by the absence of informa¬
the measurement device, or observer bias can occur tion about data quality, the exact concepts embodied
during the recording of the primary data. For phe¬ in most databases and retrieval languages (though it
nomena treated as continuous fields, the density of can be otherwise), a lack of understanding of how
samples, their support (physical size of the sample), errors are propagated, and the absence of GIS tools
and the completeness of the sampling are all sources for error evaluation.
of uncertainty. Many field scientists and geographers know from
Errors can creep in when data are stored in the com¬ experience that carefully drawn boundaries and con¬
puter because too little computer space is allocated to tour lines on maps are elegant misrepresentations
store the high-precision numbers needed to record of changes that are often gradual, vague, or fuzzy
data to a given level of accuracy. Some data may be so (Burrough and Frank 1996). People have been so con¬
expensive or difficult to collect that one must make ditioned to seeing the variation of the earth’s surface
do with a few samples and rely on inexact correla¬ portrayed either by the stepped functions of choro-
tions with other, cheaper to measure attributes, and pleth maps (sharp boundaries) or by smoothly vary¬
so inevitably uncertainties arise. The logical or math¬ ing mathematical surfaces (see Chapter 2) that they
ematical rules (‘models’ or interpolations) used to find it difficult to conceive that reality is otherwise.
derive new attributes from existing data maybe flawed Besides the ‘structure’ that has been modelled by the
or may involve computational methods that lead to boundaries or the isolines, there is very often a residual
rounding errors. When data that have been measured unmapped variation that occurs over distances smaller
on different entities, or sampled on different supports than those resolvable by the original survey. More¬
are combined, the differences in spatial resolution may over, the spatial variation of natural phenomena is
be so great that simple comparisons cannot be made. not just a local noise function or inaccuracy that
Finally, in the visual presentation of results, users can be removed by collecting more data or by increas¬
can obtain erroneous impressions if the semiotic lan¬ ing the precision of measurement, but is often a
guage is not clear, if colours and shading are inappro¬ fundamental aspect of nature that occurs at all scales,
priate or if displays are too crowded, or if it is just too as the proponents of fractals have pointed out (see
difficult to get a clear result. Mandelbrot 1982, Burrough 1983a,fe, 1984, 1985,
The usual view of errors and uncertainties is that 1993a, Goodchild 1980, Lam and De Cola 1993).
they are bad. This is not necessarily so, however,
because it can be very useful to know how errors and
uncertainties occur, how they can be managed and It is very important to understand the nature of
possibly reduced, and how knowledge of errors and
errors in spatial data and the effect they may
error propagation can be used to improve our under¬
standing of spatial patterns and processes. Linking a
have on the quality of the analyses made with
good understanding of spatial uncertainties to num¬ GIS.
erical methods of modelling and interpolation can
221
Errors and Quality Control
The first part of this chapter explores the sources resentation to another (vector-raster), byline digitiz¬
of errors in spatial data, and the factors affecting ing, and through polygon overlay. Chapter 10 presents
their quality, both with respect to entity-based and a statistical approach to the understanding of error
continuous field-based models of spatial phenomena. propagation in numerical modelling in the context of
The second examines the factors that affect the qual¬ the kinds of spatial analysis presented in Chapters 6
ity of spatial data, while the third covers the develop¬ and 7 and shows how a proper understanding of un¬
ment and understanding of errors associated with certainties can be used for optimizing sampling and
transforming entity and field-based data from one rep¬ spatial analysis.
Box 9.1 shows the main factors governing the errors Measurement errors
that may be associated with geographic information Poor data can result from unreliable, inaccurate, or
processing. The word ‘error’ is used here in its widest biased observers or apparatus. The reader should
sense to include not only ‘faults’ but also to include clearly understand the distinction between accuracy
the statistical concept of error meaning ‘variation’. The and precision. Accuracy is the extent to which an es¬
‘errors’ include faults that are obvious and easy to timated value approaches the true value and is usually
check on but there are more subtle sources of error estimated by the standard error. In statistical termin¬
that can often only be detected while working inti¬ ology, precision is a measure of the dispersion (usu¬
mately with the data. The most difficult sources of ally measured in terms of the standard deviation) of
‘errors’ are those that can arise as a result of carrying observations about a mean. Precision also refers to the
out certain kinds of processing; their detection ability of a computer to represent numbers to a cer¬
requires an intimate knowledge of not only the data, tain number of decimal digits.
but also the data models, the data structures, and the
algorithms used. Consequently they are likely to evade
Field data
most users. Many of these aspects of‘error’, or more
correctly ‘data quality’, are being addressed through The surveyor is a critical factor in the quality of data
international agreements (cf. Aalders 1996). that are put in to many geographical information sys¬
tems. Well-designed data collection procedures and
standards help reduce observer bias. The human fac¬
Accuracy of content tor is most important in data collection methods re¬
The accuracy of content is the problem of whether the lying on intuition such as in soil or geological survey
attributes attached to the points, lines, and areas of the where an interpretation is made in the field, or from
geographic database are correct or free from bias. We aerial photographs or seismographs, of the patterns
can distinguish between qualitative accuracy, which of variation in the landscape or subsurface. The user
refers to whether nominal variables or labels are cor¬ should realize that some observers are inherently more
rect (for example, an area on a land use map might be perceptive or industrious than others—‘the quality of
wrongly coded as ‘wheat’ instead of ‘potatoes’) and soil surveyors varies from the two minute job of an
quantitative accuracy which refers to the level of bias irresponsible aerial photo interpreter to that of the
in estimating the values assigned (for example a badly surveyor whose sampling plan suggests that he is plant¬
calibrated pH meter might consistently estimate all pH ing onions’ (Smyth, quoted in Burrough 1969). Very
values 1 unit high). Ensuring accuracy is a matter of large differences in the appearance of a map can re¬
having reliable, documented input and transformation sult from differences in surveyor or from mapping
procedures. methods as studies by Bie and Beckett (1973) and
222
Errors and Quality Control
1. Currency
Are data up to date?
Time series
2. Completeness
Areal coverage —is it partial or complete?
3. Consistency
Map scale
Standard descriptions
Relevance
4. Accessibility
Format
Copyright
Cost
5. Accuracy and Precision.
Density of observations
Positional accuracy
Attribute accuracy—qualitative and quantitative
Topological accuracy
Lineage—When collected, by whom, how?
6. Sources of errors in data
Data entry or output faults
Choice of the original data model
Natural variation and uncertainty in boundary location and topology
Observer bias
Processing
Numerical errors in the computer
Limitations of computer representations of numbers
7. Sources of errors in derived data and in the results of modelling and analysis
Problems associated with map overlay
Classification and generalization problems
Choice of analysis model
Misuse of logic
Error propagation
Method used for interpolation
more recently Legros et ah (1996) for soil survey and ardizing observational techniques and data recording
Salome et ah (1982) for geomorphology have clearly forms and by developing a joint commitment between
demonstrated. survey management and staff to work to the highest
In large survey organizations it should be possible possible standards.
to determine and record the qualities of each surveyor, The increasing use in many held sciences of auto¬
an extra attribute that could be stored with the data mated sampling devices linked to electronic data log¬
themselves. Such procedures might be resisted by the gers means that if all is operating properly then the
staff as a slur on professional expertise but the best accuracy and the precision of the data are good. Data
method for improving observer quality is to improve from electronic sampling devices collectors can be
all aspects of the data-gathering process, such as stand¬ automatically screened for extreme values indicating
223
Errors and Quality Control
malfunctioning. New sampling devices in areas such raster scanning (Dunn etal. 1990, Bolstad etal. 1990).
as geoengineering and pollution science mean that Local errors can often be corrected by interactive
observations can be made in situ of materials that digitizing on a graphics work station, while general
otherwise must be analysed in the laboratory (Rengers positional errors can be corrected by various kinds of
1994). transformation, generally known as 'rubber-sheeting’
techniques, that have been described in Chapter 4. The
combination of modern hardware and software for
Laboratory errors error detection has greatly improved the quality of
Intuitively, one expects the quality of laboratory digitizing in recent years.
determinations to exceed those made in the field. The success of rubber-sheeting methods for cor¬
Although determinations carried out within a single recting geometrical distortion depends largely on the
laboratory using the same procedure may be repro¬ type of data being transformed, and the complexity
ducible, the same cannot be said of analyses performed of the transformations. Many methods work well for
in different laboratories. The results of a major world¬ simple linear transformations but break down when
wide laboratory exchange program carried out by the complex shrinkages must be corrected. The methods
International Soil Reference and Information Centre do not necessarily work well when the original map
in Wageningen (van Reeuwijk 1982, 1984) showed consists largely of linked, straightline segments. For
that variation in laboratory results for the same soil example, some years ago, attempts were made at
samples could easily exceed ±11 per cent for clay con¬ the Netherlands Soil Survey Institute to use rubber¬
tent, ±20 per cent for cation exchange capacity (±25 sheeting methods to match a digitized version of an
per cent for the clay fraction only), ±10 per cent for early nineteenth-century topographic map on to a
base saturation, and ±0.2 units for pH. The implica¬ modern 1 : 25 000 topographical sheet for the purpose
tions for the results of numerical modelling are enor¬ of assessing changes in land use. The road pattern
mous! Laboratory analyses should be improving in of the area in question was similar to that of a rigid
reproducibility thanks to the wider use of automated girder structure. When submitted to the rubber¬
laboratory equipment, but no amount of laboratory sheeting process, the road lines were not stretched
technology will make up for poorly collected or poorly but the structure crumpled at the road intersections,
prepared samples. in much the same way that a bridge or crane made
from meccano might crumple at the joins!
Locational accuracy
graphic data depends largely on the type of data Many thematic maps, particularly those of natural
under consideration. Topographical data are usually properties of the landscape such as soil or vegeta¬
surveyed to a very high degree of positional accuracy tion, do not take into account local sources of spatial
that is appropriate for the well-defined objects such variation or ‘impurities’ that result from short-range
as roads, houses, land parcel boundaries, and other changes in the phenomena mapped. This problem
features that they record. With modern techniques has been the subject of much research, particularly
of electronic surveying and GPS the position of an in soil survey, soil physics, and groundwater studies
object on the earth’s surface can now be recorded to (e.g. Beckett and Webster 1971, Bouma and Bell 1983,
millimetre accuracy. In contrast, the position of soil Nielsen and Bouma 1985, Burrough 1993b). The prob¬
or vegetation unit boundaries often reflects the judge¬ lems are as much associated with paradigms of soil
ment of the surveyor about where a dividing line, classification and mapping that are spatially sim¬
if any, should be placed. Very often, vegetation types plistic as with the natural variation of the soil which
grade into one another over a considerable distance was incompletely understood (Burrough etal. 1997).
as a result of transitions determined by microclimate, Cartographic conventions forced soil scientists to
relief, soil, and water regimes. Changes in slope class map soils as crisply delineated, homogeneous areas.
or groundwater regime are also unlikely to occur Information about gradual change within boundaries,
always at sharply defined boundaries. and boundaries of varying width could not be repres¬
Positional errors can result from poor fieldwork, ented on conventional chorochromatic maps. These
through distortion or shrinkage of the original paper maps have been diligently digitized and the digital soil
base map or through poor quality vectorizing after polygon has been presented to GIS users as an entity
224
Errors and Quality Control
that is spatially as well-defined as a cadastral unit. Un¬ of just what was a matching observation, and so the
fortunately, the truth is often otherwise—these crisp purity, could be manipulated at will. Subsequent work
polygons are really crude, but convenient approx¬ has demonstrated the natural variation of soil and
imations, and a major problem concerns the lack shown its importance for understanding pollution
of information about the difference between these problems or for optimizing soil fertilization in preci¬
models and reality. sion agriculture (e.g. Burrough 1993b, Goode 1997).
Initially, conventional soil series maps at scales of There is increasing information on the variability of
1 : 25 000-1 : 50 000 were characterized in terms of soil, and other natural phenomena such as water qual¬
the ‘impurities’ within the units delineated (Soil Sur¬ ity or species composition, which as yet may only be
vey Staff 1951), which were supposed to be no more available to specialists.
than 15-25 per cent. Impurities were defined as ob¬ It is important to realize that the unseen spatial vari¬
servations that did not match the full requirements ation of phenomena like soil, lithology, or water qual¬
as specified in the map legend. Many studies (e.g. see ity can contribute greatly to the relative and absolute
Beckett and Webster 1971 or Burrough 1993b for a errors of the results produced by modelling and map
review) have shown that not only was the 15 per cent overlay. More details about how to estimate how these
a wild guess, but that the concept of ‘impurity’ had errors propagate through numerical models are given
little meaning. By varying the legend, the definition below in this chapter.
225
Errors and Quality Control
recent years with the development of digital informa¬ to resolve the spatial patterns of interest and this sub¬
tion systems has it been possible to have the original ject is treated in Chapter 6 and in the next chapter.
field observations available for further processing.
Large-scale maps not only show more topological de¬
Relevance
tail (spatial resolution) but usually have more detailed
legends (e.g. a soil map of scale 1 : 25 000 and larger Not all data used in geographical information process¬
usually depicts soil series legend units, while a soil map ing are directly relevant for the purpose for which they
of scale 1 : 250 000 will only display soil associations- are used, but have been chosen as surrogates because
—see Vink 1963 for details). It is important that the the desired data do not exist or are too expensive to
scale of the source maps matches that required for collect. Prime examples are the electronic signals from
the study—small-scale maps could have insufficient remote sensors that are used to estimate land use,
detail and large-scale maps may contain too much biomass, or moisture, or observations of soil series
information that becomes a burden through the sheer based on soil morphology that are used to predict soil
volume of data. Many survey organizations provide fertility, erosion susceptibility, or moisture supply.
their mapped information at a range of scales and the Provided that the links between the surrogates and
user should choose that which is most appropriate to the desired variables have been thoroughly estab¬
the task in hand. lished then the surrogates can be a source of good
information.
The calibration of surrogates is a major part of re¬
Density of observations mote sensing technology. Briefly, a number of pixels
Much has been written about the density of observa¬ on the image is selected for use as a ‘training set’.
tions needed to support a map or interpolation (e.g. The variation of reflectance of each frequency band
Vink 1963, Burrough 1993b, Webster and Burgess recorded is displayed in the form of a histogram; the
1984), yet there are still organizations that produce practice is to select a training set of pixels that return
maps without giving any information whatsoever narrow, unimodal distributions. These training set
about the amount of ground truth upon which it is pixels are calibrated by ‘ground-truth’ observations so
based. This attitude is changing—the Netherlands that the set of pixels can be equated with a crop type,
Winand Staring Institute provides its contract survey a soil unit, or any other definable phenomenon. The
clients with maps showing the location and classifica¬ remaining pixels in the image are then assigned to the
tion of all soil observations; the UK Land Resources same set as the training set using allocation algorithms
Development Centre has published maps showing the based on discriminant analysis (minimum distance in
density and location of sample points and transects multivariate space of the original frequency bands),
in surveys (see for example the Reconnaissance Soil maximum likelihood or parallelopiped classifiers (see
Survey of Sabah, Acres et al 1976). for example, Estes et al. 1983, Lillesand and Kiefer
Although the actual density of observations may be 1987).
a reasonable general guide to the degree of reliability
of the data, it is not an absolute measure, as statistical
studies of soil variation have shown. A rough guide to Data format, data exchange, and
pattern is given by the ‘sampling theorem’ originat¬ There are three kinds of data format of importance.
ing from electronic signal detection, that specifies that First there is the purely technical aspects of how data
at least two observations per signal element need to can be written on magnetic media for transfer from
be made in order to identify it uniquely. There has also one computer system to another. This includes aspects
been considerable work in photogrammetry to estim¬ such as the kind of medium (digital tape, floppy disk,
ate the densities of observations that need to be made compact disk), the density of the written information
from aerial photographs on a stereoplotter in order (tape block lengths, number of tracks and the density),
to support reliable digital elevation models (Makarovic the type of characters used (ASCII or binary), and
1975, Ayeni 1982). the lengths of records. For data lines, it is essential
In short, sampling density is only a rough guide to that the speed of transmission of the two computers
data quality. It is also important to know whether the is matched, but most modems ensure that this is
sampling has been at an optimum density to be able automatic.
226
Errors and Quality Control
tium (Schell 1995a). Note that interoperability issues As well as the problems inherent in the data, indicated
are forcing people to think of the conceptual problems above, there are other sources of unseen error that can
of exchanging data as the first step, rather than solely originate in the computer. The most easily forgotten,
concentrating on technical arguments. yet critical aspect of computer processing is the abil-
227
Errors and Quality Control
There is a simple, and very revealing test of calculation precision that can demonstrate
just how computer word length can affect the results of calculation (Gruenberger 1984).
The number 1.0000001 is squared 27 times (equivalent to raising 1.0000001 to the
134 217 728’th power). The table shows the results of performing this calculation on a
Personal computer with a 80486 processor using Microsoft Quick Basic with 4-byte or
8-byte precision. After 27 squarings the single precision result has acquired a cumulat¬
ive error of more than 1300 per cent! Clearly, the programmer must avoid situations in
which the results of a calculation depend on accuracies of representation that exceed
the number of digits available for representing the numbers.
1 1 1.00000020000001 100.0000038418561
2 1 1.00000040000006 100.0000076837067
3 1.000001 1.00000080000028 100.0000153673913
4 1.000002 1.000001600001201 100.000030734694
5 1.000004 1.000003200004962 100.0000614690337
6 1.000008 1.000006400020163 100.00012293665
7 1.000015 1.000012800081287 100.0002458676304
8 1.000031 1.000025600326416 100.0004917125829
9 1.000061 1.000051201308209 100.0009833344561
10 1.000122 1.000102405237993 100.0019663060907
11 1.000244 1.000204820962818 100.003931161034
12 1.000488 1.000409683877262 100.0078565185848
13 1.000977 1.000819535595404 100.0157136544184
14 1.001955 1.0016397428294 100.0314297315305
15 1.003913 1.003282174415347 100.0628687841718
16 1.007841 1.006575121499587 100.125771907418
17 1.015744 1.013193475221909 100.2517048611989
18 1.031735 1.026561018232249 100.5040404509439
19 1.064478 1.053827524154032 101.0106167959662
20 1.133113 1.110552450664617 102.0314517808015
21 1.283945 1.233326745677186 104.1041728221037
22 1.648514 1.521094861602679 108.3767906630347
23 2.717598 2.313729577994073 117.4552872926174
24 7.385337 5-353344560084633 137.9574445444612
25 54.54321 28.65829797898773 190.3225694558652
26 2974.962 821.2980430524522 362.2268061013606
27 8850397 674530.4755217875 1312.082599848986
ity of the computer to be able to store and process data when results are required that must be obtained by
at the required level of precision. The precision of the subtracting or multiplying two large numbers. For
computer word for recording numbers has important example, the ‘shorthand’ method of estimating the
consequences for both arithmetical operations and for variance of a set of numbers involves adding all the
data storage. numbers together, squaring the result and dividing by
Many people do not appreciate that use of com¬ the number of numbers. This ‘constant’ is then sub¬
puter variables and arrays having insufficient precision tracted from the sum of the squares of all the num¬
can lead to serious errors in calculations, particularly bers to obtain the sum of squared deviations. Box 9.2
228
Errors and Quality Control
229
Errors and Quality Control
Table 9.1. The relation between computer word length and digital range and precision
mas of the kind shown in Figure 9.1. While it is un¬ to different levels of precision. These problems are
likely that a user will require a precision better greater when working with established inventories that
than 10 m for an area of 1000 x 1000 km, data from may have been geometrically using old 16-bit systems
sensors like the French satellite SPOT with its 10 m than when all data must be collected for specific pro¬
resolution, which may be used to supply data for the jects, because often in the latter case, the data are
resource inventory, mean that the 32-bit floating point collected from scratch.
representation in the GIS is stretched to the limit.
Moreover, it may be necessary in the inventory to
refer to ground control points that have been located Through national and international agreements
with much greater precision.
and improvements in hardware and software,
Chrisman (1984b) and Tomlinson and Boyle (1981)
have pointed out that locational precision is critical
information on the quality of digital data is
when the user wishes to interface different kinds of becoming an important part of the data itself.
data sets that have been acquired at different scales and
As already noted, most procedures commonly used in and (/) class intervals defined for one or other
geographic information processing assume implicitly ‘natural’ reason are necessarily the best for all mapped
that (a) the source data are uniform, (b) digitizing attributes. These ideas result from the traditional
procedures are infallible, (c) map overlay is merely ways in which data were classified and mapped.
a question of intersecting boundaries and recon¬ They have presented large technical difficulties for the
necting a line network, (d) boundaries can be designers of geographical information systems but
sharply defined and drawn, (e) all algorithms can rarely have these problems been looked at as a
be assumed to operate in a fully deterministic way, consequence of the way in which the various aspects
230
Errors and Quality Control
Arable
Grassland
Woodland
Orchard
Park
Urban
Figure 9.3. The mixed pixel problem occurs when grid cells
are too large to resolve spatial details
231
Errors and Quality Control
pixel problem leads to the dilemma of whether to and Maling calculated the value of a as 0.0452 but sub¬
classify a cell according to the class covering the geo¬ sequent work reported by Goodchild (1980) suggests
metric reference point of the cell (the centre or the that a better value is a - 0.0619.
south-west corner) or according to the dominant class The error variance in an estimate of area for any
occurring in the cell. In remotely sensed and other given polygon is given by a summation of all the
scanned imagery, the problem is complicated because errors from all the bounding cells. If m cells are inter¬
the cell value is a weighted average of the information sected by the boundary, the error variance will be
reaching the sensor from the area not only covered
V=maS4 9.2
by the pixel but from nearby surrounding areas.
with standard error
Vectors to fine rasters Converting polygons from
vector to raster representation when using grid cells SE = (ma)l/2S2 9.3
smaller than the polygons (see Chapter 4) brings
assuming that the contributions of each cell are inde¬
with it the probl em of topological mismatch when the
pendent. Goodchild (1980) suggests that this assump¬
smooth polygon boundaries are approximated by
tion should not always be regarded as valid.
grid cells. Although high-quality raster scanners and
The number of boundary cells m, can be estimated
plotters have largely removed the problem of loss of
from the perimeter of the polygon. Frolov and Maling
information through rasterizing from the visualiza¬
(1969) showed that m is proportional to VlG, where
tion area of GIS, there are still many instances where
N is the total number of cells in the polygon. The
thematic data, originally in vector polygon form, need
standard error of m is estimated by kNViav2 S2. Because
to be rasterized to match data on regular grids, such
the estimate of polygon area A = N.S2, the standard
as those obtained by remote sensing, or for some
error as a percentage of the estimate is proportional
of the analysis examples in Chapter 7. Therefore it is
to N~3/4 (Goodchild 1980), i.e.
necessary to estimate the seriousness of the problems
of mismatch caused. Piwowar et al. (1990) examined SE = ka/2N~3/4 9.5.
several algorithms for vector to raster conversion
for the quality of the results, the accuracy, the lateral If the variable is cell side S instead of cell number
displacement of boundaries, and their speed of opera¬ N, the percentage error depends on S3,1. Goodchild
tion. They concluded that not all algorithms worked (1980) reports studies that have verified these relation¬
equally well; some are fast but cause distortion, while ships empirically.
others take more time but produce better results. The constant k depends on the polygon shape, long
Note that in the following discussion of vector to thin shapes having more boundary cells than a circu¬
raster conversion, the polygons are regarded as exact lar form of the same area. Frolov and Maling (1969)
entities with precisely located boundaries; the errors give values of k for various standard shapes, using the
of conversions are therefore merely the result of independent straight line hypothesis.
representing a geographical area by one geometry or
another. Errors of misidentifkation or of the inability Switzer’s method Switzer (1975) presented a gen¬
to define exactly what the area comprises are not eral solution to the problem of estimating the preci¬
treated here. sion of a ras ter image that had been made from a vector
polygon map. His analysis does not deal with obser¬
Statistical approaches to estimating the errors of vec¬ vational or location errors, but assumes that error is
tor to raster conversion Frolov and Maling (1969) solely a result of using a series of points located at the
considered the problem of error arising when a grid centres of grid cells to estimate an approximate grid
cell is bisected by a True’ boundary line. They assumed version of the original map. Switzer’s method deals
that the boundary line could be regarded as a straight essentially with ideal choropleth maps, i.e. thematic
line drawn randomly across a cell. The mean square maps on which homogeneous map units are separ¬
area of the cut-off portion of each bisected boundary ated by infinitely thin, sharp boundaries. The method
cell i (the error variance) can be estimated by assumes that a ‘true’ map exists, against which an
estimated map obtained by sampling can be com¬
Vt = aS4 9.1
pared. Realizing that the ‘true’ map is often unknown
where V is the error variance, S is the linear dimen¬ or unknowable, Switzer showed that by applying
sion of the (square) cell, and a is a constant. Frolov certain assumptions and by using certain summary
232
Errors and Quality Control
The Prt{n %) and ^(2n Vz) probabilities are estimated from a frequency count as follows:
1.Estimate the total number of cell pairs at distance d = 1 cell width. The total number
of pairs at a given distance is equal to
2.For each pair of mapping units / and j, count the number of cell pairs along and up
and down the grid that lie in different mapping units (TALLY/j).
statistics, errors of mismatch could be estimated from derived the following expression for the percentage
the estimated or gridded map itself. overlap Oy for each pair of rasterized map units i,j
The analysis begins by assuming that a map M has using square grid cells,
been partitioned into k homogeneous map units, or
0,j = 0.60P,i(«-1/!)-0.11P,/2«-'/2) 9.5
colours. Each of the k map units may be represented
on the map by one or more sub-areas. This True’ map (Note that the values of the coefficients differ from
is estimated by laying an array of n basic sampling cells Switzer’s published formula; the corrections are given
over the map. Here we shall only consider the situa¬ by Goodchild (1980)).
tion where the array of sampling cells is regular and The total error for all k map units is given by
congruent, and each cell is defined by a single sam¬
k
pling point at the cell midpoint. The map units on
the ‘true’ map are denoted Ml, M2,. . . Mk, and
0= £os 9.6
on the estimated map by Ml, M2, . . . Mk. Each cell i=i
on the estimated map is allocated to a map unit Mi if Box 9.3 shows how O can be estimated in practice.
the sampling point in the cell falls within map unit Mi An example. Figure 9.4a shows the boundaries of a
on the ‘true’ map. This is the procedure commonly simple thematic map depicting soil or geological units.
used when converting a vector polygon network to Assuming that they form a true map, what will be the
raster format. For the purposes of this analysis we shall, relative mismatch errors arising from digitizing it us¬
like Switzer, assume the total area of the map is scaled ing grid rasters of different sizes, as shown in Figures
to unity, i.e. A(M) = 1. 9.4a,b for 16 x 16 and 32 x 32 grids, respectively?
The degree of mismatch of the estimated map is a Table 9.2 gives the results. For a grid measuring
function of two independent factors, (a) the com¬ 16x16 cells, there are 960 cell pairs at distance d= 1.
plexity of the true map, and (b) the geometrical The total number of cell pairs straddling a boundary
properties of the sampling net. Considering first the lead to frequency estimates for each mapping unit. For
complexity of the map, we can define a quantity Ptj(d) a distance d = 2, the number of pairs is 896. Entering
as the probability that a random point is in true map the frequency estimates into equation (9.6) leads to
unit i and that the cell centre point is in true map unit an estimated mismatch of 9.5 per cent. Using the
j when the points are separated by distance d. Switzer 32 x 32 cell grid leads to an estimate of 4.1 per cent,
233
Errors and Quality Control
Bregt et aids method Bregt et al. (1991) developed As already noted, the methods of Switzer, Goodchild,
an elegant method for estimating the error associ¬ and Bregt et al. to estimate mismatch assume implic¬
ated with vector-raster conversion called the double¬ itly that a ‘true’ map exists that has homogeneous (uni¬
conversion method, because it involves rasterizing the form) mapping units, and infinitely sharp boundaries.
map twice. First the vector to raster conversion is In practice, however, even the best-drawn maps are
carried out using the desired target raster size; this not perfect, and extra errors are introduced by the digi¬
produces what they call the base raster. The map is tizing process as authors such as Blakemore (1984),
then rasterized to a very much smaller grid and the Bolstad et al. (1990), Dunn et al (1990), and Poiker
two are compared. Those cells in the fine raster (1982) have pointed out. Consider the problem of
differing from those on the base raster provide an boundary width and location (the problem ofwithin-
estimate of the error in the base raster. map unit homogeneity will be dealt with later) on a
Table 9.2. The results of using Switzer’s method on the map in Figure 9.3
Mismatch per 1 Gi 1 f 12 Ln /
12 31 31
polygon pair 0.020 0.020 0.008 O.OIO O.OIO 0.004
G3 G3 I 32 Ln G3 L 32
234
Errors and Quality Control
Table 9.3. Relations between rasterizing method, cell size, and rasterizing error
235
Errors and Quality Control
L - (7X)/0.6366 9.9
236
Errors and Quality Control
Category %
associated with an employment office area. The mis¬ McAlpine and Cook (1971) were among the first
match errors and ambiguities were relatively larger for to investigate the problem when working with land
long, thin polygons and for employment areas having resources data and their method still holds. They con¬
narrow protuberances or insets than for large, broadly sidered two maps of the same locality containing re¬
circular areas. The study resulted in a considerable spectively ml and m2(ml >- m2) initial map segments
amount of validation and checking of the data bases (polygons) that are overlaid to give a derived map hav¬
to ensure that the errors brought about by the grid¬ ing n segments. To simplify the problem, they experi¬
cell point geocoding were removed. Perkal’s epsilon mented by throwing a single hexagon with random
assumes the boundary is real, the problem merely orientation and displacement over a mosaic of hexa¬
being one of knowing its location. Sometimes it is not gons. The trials were done using a hexagon of side 0.5,
the location but the existence of the boundary that is 1,2, and 3 times the sides of the mosaic hexagons. They
in doubt, and then other methods must be used—see found that the number of derived polygons n on the
Chapter 11. derived map could be estimated by
237
Errors and Quality Control
Original line
First digitized version
Second digitized version
238
Errors and Quality Control
Spurious polygons are in fact equivalent to the distorted unless the user specifies that they should
mismatch areas resulting from rasterizing a polygon. remain constant.
Their total area should decrease as digitizing accuracy
increases, but the greater problem is their removal
to avoid nonsense on the final map. They can be Adopting exact paradigms of exact boundaries or
removed by erasing one side on a random basis, smooth contour lines for spatial entities presents
after screening the polygon for minimum area, or problems for converting from one representation
the two end-points can be connected by a straight to another and for entity overlay and intersection,
line and both sides dissolved. A more sophisticated
but methods exist to estimate the errors that are
approach is to consider all points within a given
distance from the complex arc as estimates of the involved in these actions. The degree of error
location of a new line, and then fit a new line by caused by forcing of spatial phenomena into
least squares or maximum likelihood methods. possibly inappropriate, exact, crisply defined
Unless one version of the digitized boundary can entities has received less attention but may be
be taken to be definitive, it is highly likely that the
a major source of unseen errors and information
complex line will be moved from its topographically
True’ position. The net result of overlaying a soil map
loss. Geographical phenomena with uncertain
(having not very exact boundary locations) with a boundaries are covered in Chapter 11. (See also
county boundary map (topographically exact bounda¬ Burrough and Frank 1996.)
ries) may be that the topographic boundaries become
Questions
1. Review the different methods that can be used to determine errors in spatial data.
Consider a range of different GIS applications and assign appropriate error analysis
techniques to each application.
2. Design a meta data system for recording the results of data quality and error propa¬
gation as active aspects of a spatial data set.
3. Review four practical situations where lack of information about errors could
be critical for the acceptance of the results of GIS analyses.
4. Compile lists of the sources of errors for each of the examples of GIS analysis given
in Chapters 6 and 7 and classify these errors using the terms given in this chapter. For
239
Errors and Quality Control
each example decide which source of error is most likely to be critical for successful
analysis.
240
TEN
Error Propagation in
Numerical Modelling
This chapter reviews the sources of statistical uncertainty in spatial data, particu¬
larly with reference to the creation and use of continuous fields and the propaga¬
tion of errors through numerical models. Following an exposition of how errors
can accrue from spatial mismatch between support size, sampling interval and spa¬
tial correlation structures, the chapter presents both theory and tools for comput¬
ing error propagation. Monte Carlo methods for error analysis are followed by the
statistical theory of error propagation. Comparative studies of interpolation errors
and errors propagated through regression models linked to simple cost-benefit
analysis show users how to choose the best techniques for predicting heavy
metal pollution in different circumstances of data abundance. Given knowledge
on errors and error propagation, an intelligent GIS would be able to check the
intrinsic quality of the results of GIS modelling and advise users how best to
achieve the precision of the result they require.
241
Error Propagation in Numerical Modelling
242
Error Propagation in Numerical Modelling
the surface temperature variations were causally linked of the original data and their variograms which have
to crop response. The second aim was to relate the air¬ been normalized to make comparison easier. Clearly
borne measurements of reflected radiation to ground- the two infrared measurements show little relation to
based measurements which are not only unaffected by each other. The airborne data reflect the coherent spa¬
atmospheric absorption but are cheaper and easier to tial variation over the field but the Heimann data show
carry out at any time. only short-range variation or noise; a variogram model
The experiment was carried out under clear weather cannot be fitted to these data. Observations with the
conditions at midday on 26 March 1986. Thermal thermocouple demonstrated why this is so. The ex¬
infra-red imagery was obtained by flying a Daedalus perimental field had been ploughed and had plough
DS 1260 multispectral scanner using the 8-14 /dm ridges of some 15-20 cm high some 25-30 cm apart.
spectral window (Daedalus Enterprises Inc., PO Box One side of the ridges was warmed by the low March
1869, Ann Arbor, Michigan 48106, USA) in a light air¬ sun to yield surface soil temperatures of 20-21 degrees
craft (Plate 4.2). The altitude of the aircraft carrying Celsius; the other side of the plough ridges were in
the scanner yielded a pixel size of 1.5 x 1.5 m, which shadow and temperatures were only 14-15 degrees
means that all variations within a block of this size were Celsius. Consequently the short range differences in
averaged out to yield a single data value (a weighted temperature were causing the large fluctuations in the
average) for this support. Heimann readings and swamped any weak variations
Ground data on reflected radiation and soil prop¬ across the field. Because the Daedalus scanner pixels
erties were collected at the same time as the aerial were large enough to average out these short-range
survey. For this comparison we use only the thermal variations they returned information about the longer-
infrared emission of the soil surface and information range variations over the field. The soil texture data,
about the texture of the 0-5 mm surface layer of soil. having been bulked to a support similar in size to the
The field data were collected at points spaced 4 m apart airborne data shows a similar pattern and variogram
along two 200 m long transects (50 observation points) to the Daedalus scanner data and suggests that the
located perpendicular to the expected main tempera¬ gradual variations in surface temperature across the
ture gradients. The transects were clearly marked so field are linked to spatial variations in surface texture
that they could easily be located on the airborne ther¬ of the soil.
mal imagery. The thermal infrared emission data for Lessons learned. The lesson provided by this ex¬
the soil surface were collected by a portable Heimann ample is that it is essential to know the size of the
KT-15 radiometer mounted on a tripod with a spa¬ support that has been used to collect ‘point’ data or
tial resolution of 0.03 m2 (c. 0.17 x 0.17 cm) for the bulked samples. This information should be supplied
same 8-14 /dm spectral window. Soil texture samples as a matter of course in the metadata files describing
were obtained by bulking 9 subsamples taken within the provenance and quality of publicly and com¬
an iron grid measuring 1.00 x 1.00 m laid over the mercially available databases (see Chapter 12), but
soil at each sample site, thereby effectively creating most often this information is not easy to come by.
a support for the texture data of 1 square metre. In
addition, surface soil temperatures for the 0-5 cm
layer were measured with a needle thermocouple with Error propagation theory (1):
Monte Carlo simulation
digital readout.
Recapitulating, the survey provided data for two 50- Given that we are (a) aware of the possibility of statis¬
point transects but the data had been collected using tical errors in the data, and (b) we have the means to
different support sizes. The airborne Infrared scanner quantify these errors using ordinary statistics or
data had a resolution of 1.5 x 1.5 m, the Heimann geostatistics, stochastic simulation or retrospective
infrared data had a resolution of 0.17 x 0.17 m, and validation with independent data, the question arises
the soil texture data had an effective support size as to how we can use this information to quantify and
of 1.00 X 1.00 m. All data were first examined for then reduce the inaccuracies that may accrue in the
normality, and none were found to be seriously non- results of numerical modelling. Put simply, if a new
normally distributed. attribute U is defined as a function of inputs Al5
The effects of the different support sizes became A2, .. . An, we want to know what is the error associ¬
evident when the different data were plotted to exam¬ ated with U, and what are the contributions from each
ine the spatial variation. Figure 10.1 shows the plots An to that error? If we can solve this problem we can
243
Error Propagation in Numerical Modelling
Figure lo.l. Top left: plots of effective soil surface temperature against location, as measured by
the Daedalus airborne scanner and the Heimon ground instrument.
Top right: plots of Daedalus data and per cent clay for bulked samples from soil surface
Bottom left: experimental variogram of Heimon data
Bottom right: variograms of Daedalus data and per cent clay
attach a pedigree to the results of modelling and can in order to compute the mean result per entity or pixel
compare the results of scenarios with confidence. and its standard deviation. The technique is popularly
The simplest, but most computer intensive ap¬ known as the Monte Carlo method, because of its reli¬
proach, to error propagation is to treat each attribute ance on chance.
as having a Gaussian (normal) probability distribution Although the Monte Carlo method is computer in¬
function (PDF) with known mean jj and variance o2 tensive (known as a ‘brute force’ technique) it provides
for each entity or cell. In the simplest case we would interesting information about how possible errors in
use a single PDF for all cells, and we assume sta- the data can affect the results of numerical operations
tionarity. If more information about spatial conti¬ in different parts of a geographic area. Consider the
guity is available we can use conditional simulation operations described in Chapter 8 for computing the
to estimate cell-specific PDFs that reflect the location derivatives of a raster DEM, such as slope. Elevation
of known data points and the spatial correlation struc¬ data are frequently accompanied by specifications of
ture of the attributes (Chapter 6). the RMS (root mean squared error) which is assumed
The arithmetical operation to derive new data is to apply uniformly over the whole domain. The effect
then carried out not with the single mean values but of different levels of RMS error in the DEM on calcu¬
using a value that is drawn from the PDFs for each cell. lations of the slope and the topological drainage net
To take care of the variation within the PDFs the cal¬ can be investigated as follows. An error field with mean
culations are repeated many times (at least 100 times) = 0 and variance o2 is generated for a given stand-
244
Error Propagation in Numerical Modelling
IP
m
M i
IIP**': sjjlg
flip f- 4 ij
jll I 1
l(| f
■: 1* S *' v
Figure 10.2. The effect of adding an RMS error to the DEM on the estimates of slope (degrees):
(a) ± 5 cm, (b) ± 10 cm, (c) ± 20 cm, (d) ± 40 cm. Grey scale shows relative errors
ard deviation, say ± 1.0 m. This is added to the ori¬ ments are also very sensitive to errors in the elevation
ginal DEM and the slope operation is carried out data. Figure 10.3 presents the results of an analysis to
using the algorithm of choice. The result is stored and examine how RMS errors in the DEM affect the size
the calculation repeated 100 times, each time yielding of the upstream contributing area. The procedure is
a different realization of the derived slopes. Averag¬ simply to compute an error surface with a given RMS
ing the 100 results yields a mean slope map and a error and add this to the DEM, and then to compute
map of the standard deviation of the slope for each the local drainage network and the upstream contrib¬
cell. Dividing the standard deviation map by the mean uting area by the usual procedures such as the D8
slope map yields a map of relative error. The proce¬ algorithm (Chapter 8). Repeating the simulation 100
dure can be repeated for other values of a2. times and averaging for different levels of RMS error
Figure 10.2 presents results for the Catsop catch¬ clearly shows how sensitive derived attributes such as
ment (see Appendix 3) for RMS errors of 5, 10, 20, contributing upstream area are to errors in elevation.
and 40 cm respectively. It clearly shows the strong In some parts of the area the drainage lines are strongly
effect of the relative errors in areas of low relief, im¬ concentrated while in other areas the average cumu¬
plying that there it will be more difficult to obtain lative upstream area suggests drainage is diffuse. Note
clear-cut results. that in this case even a small RMS error of ±5 or 10
Not only slope, but other derivatives such as local cm (which is negligible in comparison to the surface
drainage networks and upstream contributing catch¬ roughness created by ploughing) can strongly affect
245
Error Propagation in Numerical Modelling
Figure 10.3. The variation in the cumulative upstream elements as a result of different levels of
RMS error on the DEM: (a) deterministic solution; (b) with ± 5 cm error; (c) with ± 10 cm error;
(1d) with ± 20 cm error
the direction of the most probable drainage in areas measurements are made in areas where the probabil¬
of low relief. ities are lower, then the chance that the predictions and
Such information has major implications for the the validation measurements will converge is naturally
calibration and validation of numerical surface flow smaller. This explains why in hydrological modelling
models (Desmet 1997). If small relative errors strongly a simple validation based on stream flow at the outlet
influence the location of computed stream channels of the catchment will be robust, while validation meas¬
then calibrating a model for these channels alone may urements made at randomly chosen locations within
optimize it for only one out of a large range of possible the catchment may be subject to considerable error.
alternatives. For validating a model, the same holds. If Clearly modelling using single numbers instead of
validation measurements are made in locations where probability distributions can produce errors of un¬
the probability is large that a stream flow of magni¬ known magnitude, particularly if the relative errors are
tude F corresponds to a given upstream contributing large and it is not known if the single numbers in the
area, then the validation will be robust. If validation database represent modal or average conditions or
246
Error Propagation in Numerical Modelling
some large or small deviation from the mean. This be better known than A ± 8a x. Af could be the value
is why the latest trends in stochastical modelling in of readily available water in a soil mapping unit that
hydrology and other areas of environmental science is assumed to be statistically homogeneous. We wish
(De Roo et al. 1992; Fisher 1991, Journel 1996) are to combine the readily available soil water with an
towards the use of Monte Carlo methods for explor¬ estimate of irrigation effectiveness A-, with an error
ing the complete range of outputs that a model can A; ± 8aj. If the attributes A{ and A- are statistically
generate as a result of the probability distributions of independent, and if 8a, and <5a; are each of the order
both the input data and the model parameters (those of 20 per cent, it can be shown that the error of total
numbers in a model used to configure it for applica¬ available water 8u in the computation of U= (A,- + A ■)
tion in a given area: the coefficients in a regression is of the order of 28 per cent. For cartographic overlay
model are simple examples). Experience is demon¬ operations involving more than two steps, the increase
strating that the methods of conditional simulation in error can be explosive.
aided by ‘soft’ information about geology, soil, land Box 10.1 presents the partial differential equations
use, or other controlling variables provide the best for the simple theory of error propagation. Using these
inputs for Monte Carlo simulation (Bierkens 1994, equations we can examine how errors propagate
Bierkens and Burrough 1993a, 1993h Gomez- through simple bivariate models with ax - 10 ± 1 and
Hernandez and Srivastava 1990, Journel 1996). a2 = 8 ± 1.
Sum or difference—no correlation. Let u- a,± a]
± .. ., then 8u/8ax - 1, 8u/8a2 = ± 1.
Error propagation theory (2): analytical By equation
APPROACHES TO ERROR PROPAGATION
(10.4.2) S„ = V(S«i + Sal) 10.2
Though Monte Carlo methods of error analysis are
straightforward and can be adapted to many kinds of SO
247
Error Propagation in Numerical Modelling
If at and al are 100 per cent positively correlated, the Su = ^l{a\ • Sa2{ + al • Saj)
error in u can be as much as, but not more than the = V{ 64*1 + 100*1)
sum of the errors of at and ar If at and a} are negatively = Vl64
correlated, the error in u, Su, could be less than if at = 12.8
VI — Cl\ * 2
(^ ) u= 102= 100
then
Su = V{(2aj)2 • Sa2}
u = 8^ 10 = 80
= V{202•1} - V400
and = 20
248
Error Propagation in Numerical Modelling
Not only has the absolute error increased, but the rel¬ and the uncertainty is
ative error (= 20/100 = 20%) has also doubled.
Logarithmic and other relations. 5n = 1{^ + ^}
Let u-C In ax 10.9 = 1 {13600 + 400}
then = 118.32 currency units 10.16
8u/Sal = C/at
The error in the Universal Soil Loss Equation
so
Consider the effects of error propagation in one of the
Su = al{(C2/a]) • Sa]} modelling exercises discussed in Chapter 7, namely the
= C-Sa1/al 10.10 simulation of soil erosion in Kisii, Kenya. The aim of
this study was to estimate the amount of erosion that
Equation (10.24) shows that increase or decrease in
might occur in a given area under a land use scenario
error depends solely on the ratio of C : at.
of smallholder maize, grown over a period of forty
If
years. The amount of erosion was estimated using the
u-C sin ax Universal Soil Loss Equation (USLE—Wischmeier
and Smith 1978):
then
249
Error Propagation in Numerical Modelling
250
Error Propagation in Numerical Modelling
Consider a multiple regression model in which the The theory behind ADAM. Equation (10.35) above,
at values are raster maps of input attributes and the b{ can be generalized to:
values are the coefficients.
y = g(zi>z2>-’Zn) 10*22
u = b0 + bxax + b2a2 + .. . + bnan 10.21 where g is a continuously differentiable function from
Rn into R. The arguments zf of g may consist of both
Contributions to the error in output map u come from
the input attributes At relating to an entity or grid cell
the errors associated with the model coefficients bt
and the model coefficients b{. The problem is to
and from the errors associated with the spatial varia¬
determine the error in the y caused by errors in the
tion and measurement errors (nugget) of the a, input
z(. To do this we assume that y, z,- are realizations of
attributes.
random variables 7, Z;. For notational convenience
we define vectors = [jj,ly jj.2> . . . ., fln]T and Z =
Errors associated with the model coefficients It is
not always easy to determine the errors associated
The simplest way to compute the quantities of
with model coefficients, but for multiple regression
interest, namely the mean and variance of 7, is to use
models the errors are given by the standard errors of
a Taylor Series expansion of g around /I, neglecting
the coefficients bl and their correlations. All these terms
higher-order terms (see Heuvelink et al. 1989 for
should be provided by a good statistical package.
details).
Note that when the correlation between the argu¬
Errors accruing from spatial variation These can ments Z{ is zero, the variance of 7 is
be derived in several ways, including intuition, and
sampling combined with conventional or spatial sta¬
tistics. Intuitive estimates of the standard deviation of a1='Yi{G)(Sgl8z1-(n))2} 10.23
i=l
attribute values can provide error estimates if there are
no hard data. Alternatively, sampling within spatial This means that the variance of 7 is the sum of parts,
entities like soil or vegetation polygons can provide each attributable to an input Zf. This partitioning
useful information on the estimated standard devia¬ property allows one to analyse the contribution of each
tions of the attributes and their correlations. If there input, be it regression coefficient or variable, to the
are sufficient data to interpolate to raster overlays final error.
using methods like kriging or conditional simulation ADAM uses this technique to apply the error pro¬
then the standard deviation per cell is a possible source pagation to each entity, which in the case of a raster
of the required uncertainties. Both co-kriging and co ¬ map is the grid of pixels, but in other implementa¬
conditional simulation (Deutsch and Journel 1992) tions could be polygons or other entities in vector
can be used to generate correlation surfaces, though mode (Wesseling and Heuvelink 1991). The regres¬
if the data are reasonably evenly distributed over the sion model coefficients and their errors can be com¬
area, the simple correlation coefficient between the puted from sample data and the same point samples
different attributes rt] will suffice. can be used to create raster input maps of cell values
and their errors. ADAM computes the model output
The balance of errors The errors 5u in the output and the errors as maps, and also maps of the spatial
maps of u accrue from all sources, be they coefficients variation of the error contribution from all coefficients
or variables and it is useful to know which of these and input variables (Figure 10.4).
is the major source of uncertainty. This knowledge
can be used to improve either the model or the data
collection if deemed necessary. Comparing error An example of computing error propagation
analyses for different models (e.g. a single term regres¬ IN REGRESSION MODELLING WITH ADAM
sion versus a multiple term regression) would allow Chapters 5 and 6 presented a range of numerical
users to decide whether collecting data on a second methods for predicting the concentrations of heavy
attribute was worth the time and money. Comparing metals polluting the part of the flood plain of the River
error contributions between model and data provides Maas in the south of the Netherlands. The options
a useful basis for optimizing survey effort between included regression modelling based on a multiple
model calibration and mapping. regression analysis of heavy metal concentration, dis-
251
Error Propagation in Numerical Modelling
Results
Model output
Model error
Run model
on raster
or vector
overlays
Error
contributions
Figure 10.4. Flow chart of operations when using the ADAM procedure of follow the accumulation
of errors in regression models in GIS
tance from the river, and relative elevation and dir¬ and relative elevation [-0.624] and a positive cor¬
ect geostatistical interpolation by ordinary kriging or relation between ln(distance) and relative elevation
co-kriging. Though several authors (Lam 1983, Laslett [0.481]. The multiple regression
et al. 1987, Leenaers et al. 1990, Englund 1990) have
compared the performance of different interpolation In(zinc) = B0 + Bx- In(distance)
techniques this has rarely been done in a cost-benefit + B2 • elevation + £ 10.24
context. We use the ADAM error propagation tool and
independent validation samples to compare different was fitted with an R2 = 0.74, when the regression was
interpolation techniques in order better to understand based on 102 samples, so this appears to be a reason¬
the circumstances under which approach gives not able model of the dependence of zinc concentration
the best interpolation, but the best value for money. on the other two attributes.
In this example we use the same basic data set as in
Chapters 5 and 6 (but from a slightly larger area). This Procedure We compare the following options for
time the complete data set of 155 soil samples is split mapping the zinc content of the floodplain soils:
into a validation set of 53 samples drawn at random
and 102 samples which are used for interpolation. Sta¬ (a) Use choropleth maps of soil type or flood
tistical comparisons of the two sub-data sets suggested frequency and compute means and standard
that they were not significantly different samples of deviations per mapping unit, assuming within
the study area and that the validation data would pro¬ unit variation is given by the mean and vari¬
vide a good test of the predictions. ance of the included sample sites.
Preliminary analysis (Chapter 5) showed that the (b) Map the zinc content by applying the bivariate
zinc content was strongly correlated with the attrib¬ linear regression equation (10.24) to maps
utes ‘distance to the river’ and ‘elevation’. Because of distance to the river and relative elevation
both zinc content and ‘distance to river’ were strongly that have been obtained by ordinary kriging.
log-normally distributed, they were transformed to The kriging error maps for the distance and
natural logarithms and all further statistical analysis elevation provide the sources of the errors for
uses the log-normally transformed data. Correlation the attributes.
analysis showed strong negative correlations between (c) Interpolate the zinc content directly using
ln(zinc) and ln(distance) [-0.768], between ln(zinc) ordinary kriging
252
Error Propagation in Numerical Modelling
Direct
Interpolate kriging error i
interpolation heavy metal
Compute
concentrations
variogram
directly
Sample Exploratory
data statistical Validate
analysis results
Regression
modelling
\ Determine Interpolate
Use regression
model to convert
surrogate data
regression surrogate
from model data
to maps of
heavy metal
surrogate concentrations
data
^ error propagation using ADAM J
Costs/number of samples
Figure 10.5. The procedures for mapping heavy metals in floodplain soils by direct interpolation or
regression modelling
Figure 10.5 summarizes the procedures for options (b) of 51 samples, two sets of 27 samples, and two sets of
and (c). 14 samples. The costs of collecting and processing
The costs of each method depend on the number the data for each data set were computed using com¬
of observations and types of samples used. Distance mercial rates for the attributes in question.
measurements are very cheap because they can be read Each method was applied to all data sets to predict
off a map, or generated by a buffer command; eleva¬ values for 20 x 20 m grid cells; results for the data sets
tion data can be read cheaply off a detailed map or of the same size were pooled. Error propagation us¬
can be recorded more accurately in the field, where ing ADAM, kriging standard errors, and independent
the major cost is the cost of labour. Determining the validation by the 53 test samples were used as criteria
levels of heavy metals in soil samples is expensive be¬ of success.
cause samples must be located and collected by hand, For regression modelling it was assumed that the
and laboratory analyses are expensive. Therefore it is input maps were always those interpolated from the
possible that under some circumstances better results 102 sites, and the variograms of both attributes are
might be obtained by using fewer direct measurements given in Figure 10.6. All data were interpolated by
of the zinc content of the soil and more of the cheaply block kriging to a 20 x 20 m grid (Figure 10.7). The
measurable site attributes. coefficients and correlations of the regression model
To see how the quality of results depend on the were determined for each sub-data set separately
numbers of samples the data set of 102 samples was used using 14, 27, 51, or 102 zinc data respectively to pro¬
to create several smaller data sets, namely two sets vide a range of models with different goodness of fit.
253
Error Propagation in Numerical Modelling
100 200 300 400 500 600 700 800 900 1000 0 10o 200 300 400 500 600 700 800 900 1000
Figure 10.6. Variograms of key attributes in the cost/benefit analysis of interpolation. Top left:
floodplain elevation using 102 samples; top centre: In(zinc) using 102 samples; top right: In(zinc)
using 27 samples; bottom left: ln(distance to river) using 102 samples; bottom centre: In(zinc) using
51 samples; bottom right: ln(zinc) using 14 samples
ADAM used each model to compute a map of zinc The contributions from ‘distance to river’ are greatest
corresponding to the number of zinc samples used; where this attribute has not been measured, such as
these maps were validated using the 53 test data. at the eastern boundary of the area. The contributions
Box 10.2 shows the control file for the linear regres¬ from ‘elevation’ also peak at the sample points, pos¬
sion model with the model coefficients, their stand¬ sibly because this property has a low overall variation
ard errors and correlations, and the model results and and at the sample sites the local deviations are signifi¬
error surfaces. ADAM recommended that this model cant in terms of the small overall range of relief over
could be approximated by a second order Taylor ser¬ the site and its large nugget compared to the error in
ies. Box 10.3 shows the results of automatically trans¬ the logarithm of distance from the river.
lating the control file into a set of commands in the The errors propagated by the regression models
PCRaster raster modelling system (Wesseling et al. based on example sets of 27 and 102 data points are
1996); this shows the difficulties of trying to write given in Figures 10.8/and 10.8h.
error propagation routines by hand.
The results of computing the ln(zinc) surfaces by Direct interpolation For mapping the ln(zinc) dir¬
regression from example sets of 27 and 102 data points, ectly, variograms were computed for each subset of
respectively, are shown in Figures 10.8b and 10.8d. the data and used for interpolating from the same data
set. No variograms could be modelled for the data sets
The balance of errors ADAM apportioned the with 14 sites: variograms based on 27 data were poorly
error of the inputs according to contributions from defined—see Figure 10.6.
the distance to the river, the elevation, and the model Figures 10.8a and 10.8c, and Figures 10.8c and 10.8g
coefficients and presented them as continuous maps show the maps and error surfaces obtained by ordi¬
(Figure 10.9). These maps show that at grid cells nary kriging for example data sets of 27 and 102 sam¬
coinciding with sample locations, the model is the larg¬ ples, respectively. Figure 10.8 shows that ordinary
est source of error, sometimes reaching 65 per cent. kriging using 102 samples gives the best combination
254
Error Propagation in Numerical Modelling
Figure 10.7. Interpolated surfaces (predictions and errors) used as inputs for the regression
modelling of In(zinc): (a) predictions of ln(distance to river); (b) prediction variance for ln(distance to
river); (c) predictions of elevation (m); {d) prediction variance for elevation
of spatial resolution and low predicted errors, but set aside for this purpose. For each map produced by
kriging using only 27 sites is clearly unsatisfactory and interpolation or modelling, the mean prediction stand¬
worse than either regression model. The differences ard deviation (PSD) was calculated from the error
between the two regression model results are small, surfaces shown in Figure 10.8e-h.
implying that increasing the number of zinc samples
from 27 to 102 has not much improved the maps. Costs and benefits Figure 10.10 shows how the
Similar studies were carried out using the maps of mean prediction standard deviation (PSD) and the
soil types and flood frequency. All maps were valid¬ validation error (VSD) vary with survey costs for
ated by computing the validation standard deviation the flood frequency map, the regression modelling,
VSD = V[(z(Xj) - z(v;))2/53] using the 53 data points and the ordinary kriging of zinc. Figure 10.10 shows
255
Error Propagation in Numerical Modelling
/*******★*************************************
/* predict In zinc content */
/* of Maas soils (Inzkp) */
/* linear regression on InDM & RA-102 sites */
/*********************************************
/* model parameters */
coi: random(normal, 9.973,0.299);
C02 : random(normal, -0.333, 0.333);
C03 : random(normal, -0.291,0.041);
ro(coi, C02): -0.014;
ro(coi, C03): -0.860;
ro(co2, C03): -0.487;
clearly that when only a few samples (<20) can be Once more than 20 samples are used the empirical
afforded the simple flood frequency maps score best regression model performs better than the flood fre¬
(and the predicted standard errors are similar to the quency map (the validation results are much better
validation errors), but using more data to improve than predicted!) but once the spatial trends described
the precision of the means and standard deviations by the regression have been modelled correctly, little
of the mapping units does not improve spatial predic¬ further improvement in either prediction or valida¬
tion. Clearly, as shown in Chapter 6, there is much tion is obtained by using more than 35 sites to estim¬
spatial variation within the flood frequency classes ate the regression model.
which is not detected by computing simple area- With insufficient data (<50 samples in this case)
average statistics. useful variograms cannot be modelled; interpolation
256
Error Propagation in Numerical Modelling
I**************************************
/* */ '
/* This script is generated by ADAM, */
/* an error propagation tool */
/* developed at the */
/* University of Utrecht */
/* by C. G. Wesseling & */
/* G. B. M. Heuvelink 7
/* 7
/****************************^^^
tmpo.$$$ = -o.29i*rai02.sd;
tmpi.$$$ = sqr(tmpo.$$$);
daio2ra.sd = cont(sqrt(tmpi.$$$));
tmp2.$$$ = -o.333*lndmio2.sd;
trnp3.$$$ = sqr(tmp2.$$$);
daio2dm.sd = cont(sqrt(tmp3.$$$));
tmp4.$$$ = lndm102.esC0.033;
tmp5.$$$ = ra102.esC0.041
tmp6.$$$ = (((o.299*tmp4.$$$)*-0.oi4 + ((o.299*tmp5.$$$)*-0.86)) +
((tmp4.$$$*tmp5.$$$)*-o.487;
tmp5 $$$ = ((sqr(o.299) + sqr(tmp4.$$$)) + sqr(tmp5.$$$)) + sqr(o.36s)-
zmodio2.sd = cont(sqrt(((2.o*tmp6.$$$) + tmp5.$$$)));
lznmi02.est = cont((9.973 + (-o.333*lndmio2.est)) + (-o.29i*raio2.est));
tmp6.$$$ = tmp6.$$$ + ((trnpo.$$$*tmp2.$$$)*o.487)-
tmpo.$$$ = sqr(lndmio2.sd)*sqr(o.033);
tmp2.$$$ = (-o.487*(o.o33*o.04i))*(o.487*(raio2.sd*lndmio2.sd));
tmp4.$$$ = sqr(raio2.sd)*sqr(o.04i);
lznmio2.sd =
cont(sq((((2• o*tmp6-S$$) + (tmpi.$$$ + (tmp3.$$$ + tmp5.$$$))) + (o.25*(tmp4 $$$
, (tmp2.$$$ , (tmp4.$$$ + (tmp2.$$$ + (tmp2.$$$ + (tmpo.$$$ + (tmp2.$$$ +
(tmpo.$$$ + (tmp4.$$$ + (tmp2.$$$ + ((tmp2.$$$ + (tmp2.$$$ + (tmpo.$$$ +
(tmpo.$$$ + tmp2.$$$)))) + tmp4.$$$)))))))))))))));
257
In(zinc) (In (ppm))
7.49
7.15
6.80
6.46
6.11
5.77
5.42
5.08
estimation standard
deviation (In(ppm))
0.90
0.82
0.73
0.64
0.55
0.46
0.37
0.28
0.19
Figure 10.8. The comparison of predictions of In (zinc) in floodplain soils made by regression and
by ordinary point kriging, using different numbers of samples (a) kriging using 27 zinc samples for
variogram and interpolation; (b) regression modelling using 27 zinc samples and ln(distance to river)
and elevation based on 102 samples; (c) kriging using 102 samples for variogram and interpolation;
(d) regression modelling using 102 zinc samples and ln(distance to river) and elevation based on 102
samples
Maps e-h show the respective error surfaces for maps a-d, expressed as standard deviations of
In (zinc)
0.64
0.56
0.48
0.32
0.24
0.16
0.08
Figure 10.9. The contributions of inputs to the errors in the maps obtained by regression modelling
of tn(zinc) from 102 samples (a) Contributions of the regression model; (b) Contributions of
ln(distance to river); (c) Contributions of elevation
258
Error Propagation in Numerical Modelling
0.9 0.9
0.8 \ ♦ 0.8
0.7
y\ ; 0.7 Regression prediction
VSD
0
b\
S 0.6 _* - Kriging prediction
Qh * _*
Kriging validation
0.5 \ S 0.5
Flood map prediction
v- - _-1
0.4 0.4 Flood map validation
Figure 10.10. Cost-quality comparisons for the different interpolation methods, expressed in terms
of their own prediction standard deviations (PSD) and standard deviations obtained from 55
independent, validation data (VSD)
At the beginning of this chapter we demonstrated that matches the spatial correlation structure of the phe¬
sample networks and sample sizes (the support) per¬ nomenon in question. If sampling frequency and size
form best when they are tuned to the natural scales of of sample are ill-chosen then the patterns of variation
variation. This is stated empirically by the sampling cannot be resolved and will largely appear as noise.
theorem (which was originally derived from experience As many natural phenomena vary at many differ¬
in electronic signal processing)—observations must be ent scales (i.e. have fractal-like behaviour—Burrough
made at a frequency and resolution (i.e. support) that 1983afr, 1993a, Mandelbrot 1982) most sampling net¬
works will pick up some structured information but
they may not be providing the best quality data for the
investments made. In Chapter 5 we pointed out the
The lessons to be learned
use of bulking in sampling to remove short-range spa¬
Though these results are not necessarily general tial variation before interpolation, and in Chapter 6
for all rivers and for all kinds of pollution modelling we showed that with block kriging we can make pre¬
dictions for a block of land larger than the support with
the study demonstrates the value of correctly
a smaller kriging variance. We also showed in Chapter
identifying the spatial correlation structures 6 (e.g. 6.17) that because the block kriging variance
(patterns) and their relation to physical depends on (a) the variogram, (b) the block size, and
processes. They demonstrate the advantages of (c) the configuration of the data points, it is possible,
tuning both the interpolation and the sampling if the variogram is known, to derive quantitative rela¬
tions between within-block variance and sample spac¬
density to the spatial correlation structure of the
ing on a regular grid. These relations can be used to
attributes being mapped and show that costs can choose the best sample spacing for a given PSDn for a
be saved by using appropriate techniques. given budget. McBratney and Webster (1981) wrote
a program called OSSFIM for this purpose.
259
Error Propagation in Numerical Modelling
Increasing block size, however, reduces prediction sample spacing (and hence costs). It demonstrates that
errors but the consequence is that one must be pre¬ in this area, sampling elevation on a 50 m net and com¬
pared to deal with larger blocks of land. If interpolated puting 100 m block averages yields a relative precision
data are to be used in numerical models then it is sen¬ that is as good as sampling zinc on a 150 m net for the
sible to match the sizes of the cells in the interpolated same block size. This means that when sufficient zinc
surface to the sizes of cells used in the model, and samples can be afforded it will be better in this case to
this has direct consequences for error propagation. opt to interpolate zinc directly, and not to compute
The relation between block size and error can also be from a regression model involving elevation, even
important when decisions must be made about how though the elevation is recorded on a net that is three
land must be treated. For example, if the level of zinc times as dense as the sample net for zinc. Put another
pollution is deemed to be too high, then the soil may way, increasing sampling density on a regular square
have to be removed or treated. Small units (blocks) grid for zinc brings more benefit per sampling point
are desirable because they can be cleaned up for less than when measuring elevation.
money than large blocks. If the errors associated Applying this knowledge to the costs and sample
with the means of small blocks are too large, however, numbers used in the comparative study allows as
there is a reasonable chance that blocks that really to compare the levels of PSD and VSD obtained for
fall below the acceptable threshold levels must be kriging from the original sampling net with the max¬
treated and blocks that are really above that level are imum expected kriging standard deviations for a range
missed. of block sizes on regular square grids having as many
data points as were actually used (14, 27, 52, 102).
These results are shown in Figure 10.12. They sug¬
Using the variogram to optimize the zinc gest that in this case optimizing the sampling net
SAMPLING NET to a square grid for mapping would not bring great
benefits, as compared with the validation of the actual
The comparison of costs and benefits of different point kriging interpolation. If the variogram was un¬
methods of interpolation for the zinc data was based known, sampling on sparser grids would also hinder
on the original sampling pattern, which was the same variogram estimation as there would be no close pairs
in all cases (except for sub-sampling). If we had known of points to compute semivariance at short lags.
the variograms of the input data (elevation, distance- Referring back to Chapter 6, Table 6.2, although we
to-river, zinc) before the field survey had been carried
out, and if we had been prepared to accept interpo¬
lated mean values for a block of land side B instead
Geostatistics in error propagation
of for areas the size of the support we could have com¬
puted the best possible sample spacing for mapping Geostatistics have an important role to play in
for any given level of sampling investment.
reducing errors in numerical modelling with GIS.
Figure 10.11 (top) shows the normalized vario¬
grams of ln(zinc) and elevation. Clearly, the spatial Geostatistics can provide information on
correlation structure for zinc is stronger than eleva¬ interpolation uncertainty and can be used to
tion, as shown by the larger nugget for elevation and provide optimal estimates of attribute values for
the greater deviation of estimated semivariance from spatial blocks of given size and known variance.
the model. Figure 10.11 (middle) shows the graphs
Examination of variograms can show whether
of sampling spacing versus block size obtained using
OSSFIM for both attributes—one sees that for a given
different data are spatially compatible.
number of samples, better results can be obtained from Knowledge of the variogram can be used to
ln(zinc) than elevation, but the advantage of the spa¬ optimize the geometry of regular sampling
tial structure of zinc is offset by the cost of measuring patterns used for interpolation mapping.
it. So if we are going to use maps of elevation to help Geostatistics linked to error propagation by
predict zinc it would be useful to compute the sample
Monte Carlo or analytical methods can be used
spacing on a square grid and the block size combina¬
tions that would bring about improvements. to follow the propagation of errors through
Figure 10.11 (bottom) shows how the normalized numerical modelling.
kriging variance for these two attributes varies with
260
Error Propagation in Numerical Modelling
♦ Elevation-20m
A Elevation-60m
■ Elevation-lOOm
• lnZn-20m
T lnZn-60m
^ InZn-lOOm
Figure 10.11. The relations between variogram (top), sample spacing and block size (middle), and
the relative efficiency of sampling networks on a square grid (bottom) for mapping In(zinc) and flood
plain elevation
261
Error Propagation in Numerical Modelling
Kriging prediction
“* Kriging validation
Ossfim 20m
Ossfim 40m
"*■ Ossfim 60m
Ossfim 80m
Cost (thousands)
Figure 10.12. Cost-quality comparison of the results of ordinary kriging using the actual sample net
with (theoretical) optimized networks
cannot directly compare results computed on a log is that the spatial resolution of the flood frequency
scale with untransformed data, there is a strong sug¬ map more than makes up for a sparse sampling net¬
gestion that the best way to improve mapping in this work. This demonstrates the value of using geo¬
area is to include ‘soft’ information, for example by statistics and GIS together. As an exercise, readers
stratifying according to flooding frequency rather than might like to work out for themselves whether this is
opting for a regular square grid. The reason in this case really so.
Intelligent GIS
Using error analysis for optimizing spatial accuracy. A really intelligent GIS would be able to carry
MODELLING IN GIS out error propagation studies before a major data
This and the previous chapter have presented many crunching operation to estimate if the methods and
aspects of how errors accrue in spatial data and spa¬ data chosen were likely to yield the results intended.
tial data analysis. We have dealt with several methods It would report to the user where the major sources
for assessing loss of information through errors and of error come from and would present him or her with
how to cope with problems like vector-raster conver¬ a set of options which would achieve better results.
sion, error propagation in numerical modelling, the In spite of pessimism by Dunn et al. (1990), we
matching of correlation structures, and optimizing believe that systematic error checking in GIS model¬
sample networks. The challenge now is to put these ling is now possible. This chapter has demonstrated
error tracking methods together in such a way that an that there are methods available for following errors
intelligent GIS can indicate to a user whether a par¬ through GIS operations, and that once error pro¬
ticular spatial analysis problem can be solved to an pagation is understood, the user will be able to select
intrinsic level of quality, and if not, to advise on the those methods that lead to better results. The options
steps to be taken to correct matters. An intelligent GIS for improvement include:
would include formal rules to help the user choose the (a) using better methods for spatial interpolation
best set of procedures and tools to solve the problem or use numerical models instead of simple
within the constraints of data, data quality, cost, and logic
262
Error Propagation in Numerical Modelling
End note
Questions
1. Discuss the value and the difficulties of getting a good idea of spatial correlation
structures when using GIS for environmental modelling.
2. How do you think the errors in the output of any given model will depend (a) on
the uncertainties in the regression equation, (b) the spatial variation in the data? How
would you investigate this quantitatively?
3. How would you design a cost-benefit study to relate the prediction errors to (a)
numbers of model calibration points? (b) numbers of sites used for interpolation? (c)
the kinds of attributes used to predict zinc content?
4. For each of the examples of GIS analysis given in Chapters 7 and 8, work out the
sources of errors and develop flow charts to show how uncertainties propagate through
the models.
263
Error Propagation in Numerical Modelling
264
ELEVEN
Chapter 2 explained that in order to model geograph¬ vides the ground rules for manipulating spatial en¬
ical phenomena it is first necessary to divide the world tities with the usual rules of logic, which includes
either into crisp entities (land parcels, administrat¬ mathematics. Paramount for any form of data retrieval
ive units, soil mapping units, or ecotopes) or into and manipulation is the division of data into groups,
continuous fields (atmospheric pressure, temperature or sets: those entities that match selection criteria, and
gradients, groundwater levels, population density) those that do not.
that are discretized, usually as a regular grid. These In the process of classification and retrieval we
fundamental spatial entities—points, lines, polygons, unconsciously use the basic laws of thought, first
pixels—are described by their location, attributes and developed by Aristotle, namely:
topology. A set of axioms (Chapter 2, pp. 28-9) pro¬
265
Fuzzy Sets and Fuzzy Geographical Objects
• The law of identity (everything is what it is— of logic to binary decisions we limit the retrieval and
a house is a house) classification of data to those situations in which only
• The law of non-contradiction (something and its a complete match is possible. In real life we often make
negation cannot both be true—a house cannot compromises based on the degree with which an ob¬
be both house and not a house), and ject meets our specifications—if the object is almost
• The principle of the excluded middle (every state¬ what we want we will gladly make do. For example,
ment is true or false—this house is lived in or it you want to buy a new house. Your specifications
is not). are that it must have three bedrooms, have a garden
The principle of the excluded middle ensures that that is at least 300 m2, is located within 10 minutes’
all statements in conventional logic can have only two drive of the station and shops, is no more than 20 min¬
values—true or false, which can be coded as zero utes’ walk from your work, and costs no more than
or one. This assumption lies at the heart of most of £150 000. This sounds pretty exact, but what do you
mathematics and computer science (Barrow 1992) mean by ‘10 minutes’ drive’? Is this an ‘average’ figure,
so naturally the paradigm is very deeply embedded in and if so how is it measured? Can you specify the
our tools for computation and data retrieval. The prin¬ traffic conditions? Would you consider other options
ciples of two-valued logic make class overlap, the con¬ outside the specification, such as a house with 290 m2
cept of partial membership of a set, and the concept of ground costing £155 000 with four bedrooms if
of partial truths impossible. it matched other criteria?
Many geographical phenomena, however, are not Similar problems have not only bedevilled the defi¬
simple clear-cut ‘entities’. The patterns produced by nition of classification of natural geographical phe¬
natural processes vary over many spatial and tempo¬ nomena such as rock types, soil or vegetation classes,
ral scales and the ensemble entities are defined not by but also affect the derivation of socio-economic
one but many interacting attributes. Consequently, it groupings, decisions in law courts, the definitions of
is often a very difficult practical problem to partition nationality, and even the borders of the nation state
the real world into unique, non-overlapping sets. Fig¬ (Burrough and Frank 1996).
ure 2.2 showed that when our perception of reality
yields something that is neither clearly an entity nor a
continuous field, we use simple pragmatism to force
the observation into one or the other mode, because
Guilty, not guilty or not proven, nTlud? Some
until recently we had no means in GIS, apart from sta¬ legal systems (e.g. in Scotland) provide for
tistics, for dealing with entities that are not crisply de¬ uncertainty, while others allow only binary
fined, nor for groups that are not mutually exclusive. decisions.
The realization that one could invent all manner Even though we think we have defined classes ex¬
of different, self-consistent forms of logical actly, we may not be able to assign individuals to the
correct groups, either because the rules are ambigu¬
reasoning... had a liberating effect upon
ous or measurements cannot be made with sufficient
thinkers struggling with problems that seemed accuracy. Differences between soil surveyors on how
to defy traditional forms of argument. (Barrow to partition landscapes can lead to different data, and
1992:18) hence decisions based on them could be intrinsically
unreliable (see e.g. Bie and Beckett 1973, Legros et al.
1996). The differences in interpreting the discretiza¬
Much effort has been expended on how best to tion rules adopted in data collection and classification
define standards for geographical data, and the dis¬ persist through the database and have a direct effect
cussions lie at the heart of issues of interoperability on the presentation and interpretation of the results
(see Chapter 12). The fact is that by limiting the rules of modelling (Chapter 2).
266
Fuzzy Sets and Fuzzy Geographical Objects
Because formal thought processes in Western logic we cannot decide, so in practice we take the middle
have traditionally emphasized the paradigm of truth ground and create a new set of classes—short—
versus falsehood, which is implemented in Binary average—tall. This creates new problems for classify¬
or Boolean logic, we have very little formal training ing persons 154 or 186 cm in length, but we can easily
on how to deal with overlapping concepts. The law of create new classes in the areas of overlap to deal with
the excluded middle and its role in mathematical the below average and above average cases. This pro¬
proof—reductio ad absurdum—has been of para¬ cess can continue as long as necessary until we have
mount importance in scientific and philosophical reached a generally acceptable system. Similar quali¬
development. The rules of logic used in computer tative approaches allow us to deal with concepts such
query languages are all based on exact ideas of truth as baldness, the busyness of a holiday beach or the per¬
or falsehood, which implies that all objects in a data¬ ception of climatic zones.
base fit the same paradigm. In environmental data this Many users of geographical information have a clear
is not necessarily so. notion (or central concept) of what they need. Land
Conventional logic sometimes leads to unsolvable evaluators and planners can usually define the ideal
paradoxes. A statement such as ‘Bongo says that he requirements for a particular kind of land use, but they
always lies’ can be neither true nor false, because if the are often unsure about just where the boundaries
statement is true, it is incompatible with always lying. between ‘suitable’ and ‘unsuitable’ classes of land
Such paradoxes can be solved by admitting partial should be drawn. In many situations they also for¬
truths, or by admitting class overlap, which has natur¬ mulate their requirements generally in terms such as
ally happened in some cultures (e.g. see Barrow 1992, ‘Where are areas of soil suitable for use as experi¬
Burrough and Frank 1996). Also, we often allow class mental fields’, ‘Flow much land can be used for small¬
overlap and partial truths in natural language when holder maize?’, ‘Which areas are under threat of
we are not forced by scientific or legal considerations flooding?’ or ‘Which parts of the wetlands suffer from
to work with exact concepts. polluted discharges?’ Such imprecisely formulated
In natural language we have the full freedom to requests must then be translated in terms of the basic
add modifiers to divide up the middle ground between units of information available. Furthermore, not all
potentially overlapping classes and any difficult situ¬ information stored in the simple data model is exact,
ation can be decided locally. For example, consider the however. Many data collected during field survey
discrimination between tall and short people. At first, are often described in seemingly vague terms: soils can
we might consider that a person who is 190 cm (6 ft be recorded as being ‘poorly drained’, having ‘mod¬
2.5 inches) in length is tally and one who is 150 cm erate nutrient availability’, being ‘slightly suscep¬
(4 ft 11 inches), short (Figure 11.1). But is a person tible to soil erosion’; vegetation can be described as
who is 170 cm long, tall or short? If the tall/short class vital, partially vital, and so on. Even though standard
boundary is fixed halfway between these two extremes, manuals define these terms with more precision, in
practice they retain a strong flavour of qualitative
ambiguity.
short tall In natural resource inventory and modelling it is not
sensible to permit users to define their classes in a tot¬
short average tall ally ad hoc way because we need to agree on a limited
number of standardized classes to facilitate training,
very below a bove very research, and general information transfer. Clearly, we
short average average tall need methods for specifying how we must deal with
1 1 1 1 I imprecise information and borderline cases.
140 150 170 190 210cm
267
Fuzzy Sets and Fuzzy Geographical Objects
must consider grouping both in attribute space and in the implicit, and not unreasonable, assumption that
geographical space. Grouping in attribute space deter¬ similar entities will cluster together. Very often this
mines whether all entities are of the same kind, and may be so, but there is a growing and large literature
the problem of determining a group identity is one of devoted to the study of short-range natural variation,
choosing a classification based on attributes that leads which can be particularly acute in soils, groundwater
to unambiguous, non-overlapping classes with clear quality, and other aspects of natural and cultural land¬
rules for allocation. Grouping in geographical space scapes (Burrough 1993b).
determines whether entities (or multivariate fields) of In the following text we first examine how to deal
similar kind occupy contiguous regions. with imprecision in overlapping attribute classes. We
Very often, when setting up taxonomies of natural then illustrate how this understanding can be extended
or man-made entities, we have concentrated only to geographical phenomena that are expressed either
on building class definitions from attributes, with as continuous fields or as entities like polygons.
Although other cultures have concepts of overlapping statistically defined probability function. Rather, it
groups (see Barrow 1992), until the late nineteenth is an admission of possibility that an individual is
century Western thought had no formal means to a member of a set, or that a given statement is true.
handle such logical monsters. The wider introduction The assessment of the possibility can be based on
of ideas on multi-valued logic started in 1965 when subjective, intuitive (‘expert’) knowledge or prefer¬
Zadeh (1965) introduced the idea of‘fuzzy sets’ to deal ences, but it could also be related to clearly defined
with inexact concepts in a definable way. The term uncertainties that have a basis in probability theory.
‘fuzzy’ has been seen by some as unfortunate, because For example, uncertainty in class allocation could be
it suggests an image of woolly, unstructured thought, linked to the possibility of measurement errors of a
which is therefore unscientific and bad. This is defin¬ certain magnitude.
itely not the case, but the name has stuck. Since
the 1960s, however, the theory of fuzzy sets has been
developed to the point where useful, practical tools Crisp sets
are available for use in other disciplines (Zadeh 1965, Conventional or crisp sets allow only binary member¬
Kandel 1986, Kauffman 1975). ship functions (i.e. true or false)—an individual is
Fuzziness is a type of imprecision characterizing a member or it is not a member of any given set. Fuzzy
classes that for various reasons cannot have or do not sets, however, admit the possibility of partial mem¬
have sharply defined boundaries. These inexactly bership, so they are generalizations of crisp sets to
defined classes are called fuzzy sets. Fuzziness is often situations where the class boundaries are not, or can¬
a concomitant of complexity. It is appropriate to use not be sharply defined, as in the case of tall and short
fuzzy sets whenever we have to deal with ambiguity, people given above. The same is true for an environ¬
vagueness, and ambivalence in mathematical or con¬ mental property such as ‘internal soil drainage’ which
ceptual models of empirical phenomena. If one accepts embraces all conditions from total impermeability
that spatial processes interact over a wide range of spa¬ to excessively free draining. To state just what is, and
tial scales in ways that cannot be completely predicted what is not, ‘a moderately well-drained soil’ requires
(cf. Chapters 6 and 10), then one can appreciate the not strict allocation to an exactly defined class, but a
need for the ‘fuzzy’ concept in geographical informa¬ qualitative judgement that by implication allows the
tion. Fuzzy set theory is a generalization and not a re¬ possibility of partial membership.
placement for the better-known abstract set theory A crisp set is a set in which all members match the
which is often referred to as Boolean logic. class concept and the class boundaries are sharp. The
Fuzziness is not a probabilistic attribute, in which degree to which an individual observation z is a mem¬
the degree of membership of a set is linked to a given ber of the set is expressed by the Membership function
268
Fuzzy Sets and Fuzzy Geographical Objects
MFl\ which for crisp (Boolean) sets can take the value 11.2 uses the method of Venn diagrams to illustrate
0 or 1. Note that z is used here as a general attribute the difference between crisp and fuzzy sets.
value, as in regionalized variable theory (Chapter 6). Put simply, in fuzzy sets, the grade of membership
Formally, we write: is expressed in terms of a scale that can vary con¬
tinuously between 0 and 1. Individuals close to the
MFb(z) = 1 ifbx < z<b2
core concept have values of the membership function
MFB(z) = 0 ifz<b1orz>b2 11.1
close to or equal to 1: those further away have smaller
where bx and b2 define the exact boundaries of set A. values. Note that this immediately gets around the
For example, if the boundaries between ‘unpolluted’, problems of the principle of the excluded middle—
‘moderately polluted’, and polluted soil were to be set truth is not absolute—and individuals can, to differ¬
at bx — 50 units and b2- 100 units, then the member¬ ent degrees, be members of more than one set. The
ship function given in (11.1) defines all ‘moderately problem is to determine the membership function
polluted soils’. unambiguously.
A fuzzy set is defined mathematically as follows: In many cases the boundary values of crisp sets are
if Z denotes a space of objects, then the fuzzy set A in chosen either (a) on the basis of expert knowledge (e.g.
Z is the set of ordered pairs boundary values of discriminating criteria chosen
by custom, law, or an external taxonomy), or (b) by
A = (z, MF/(z)) for all z e Z 11.2
using methods of numerical taxonomy. Classes based
where the membership function MF^(z) is known as on expert knowledge are usually imposed or imported
the ‘grade of membership of z in A’ and zeZ means classes that are set up without direct reference to the
that z belongs to Z. Usually MFB(z) is a number in local data set. They may approximate ‘natural’ divi¬
the range 0,1 with 1 representing full membership of sions, but they are not optimal in any statistical sense.
the set (e.g. the ‘representative profile’ or ‘type’) and Only two parameters, the lower and upper boundary
0 non-membership. The grades of membership of z values, are needed. These classes are used a great deal
in A reflect a kind of ordering that is not based on in practical science and administration.
probability but on admitted possibility. The value So-called natural classification methods produce
of MT^(z) of object z in A can be interpreted as the classifications which are locally optimized to match
degree of compatibility of the predicate associated the data set. In practice, the choice of classification
with set A and object z; in other words MFB(z) of method, parameter values, etc. can strongly affect the
z in A specifies the extent to which z can be regarded results of the classification. Methods of numerical tax¬
as belonging to A. So, the value of MFB{z) gives us a onomy are mainly research tools that are data driven.
way of giving a graded answer to the question ‘to what Both options are also possible with fuzzy sets. The
degree is observation z a member of class A?’. Figure first and simpler approach uses an a priori member¬
ship function with which individuals can be assigned
a membership grade. This is known as the Semantic
Import Approach or Model (SI). This is an analogue
of (a) above.
The second is analogous to cluster analysis and
numerical taxonomy in that the value of the mem¬
bership function is a function of the classifier used.
One frequently used version of this model is known
as the method of fuzzy k-means.
Both methods can be usefully applied to environ¬
mental data, as is explained in the rest of this chapter.
The semantic import model and other principles of
Figure ll.2. Images of fuzzy sets (left) and Boolean (crisp) fuzzy logic are described first, and then the method
sets (right) of fuzzy k-means.
269
Fuzzy Sets and Fuzzy Geographical Objects
The semantic import approach is useful in situations such a way that these conditions hold, so not all func¬
where users have a very good, qualitative idea of how tions are possible.
to group data, but for various reasons have difficul¬
ties with the exactness associated with the standard
Boolean model. The choice of fuzzy sets does not mean Suitable membership functions for use with
that one is opting out. The selection of boundaries for THE SI APPROACH
crisp sets and of class intervals can be an objective or Just as there are various type of probability distribu¬
a subjective process (e.g. see Burrough 1986, Evans tion (normal, lognormal, rectangular, hyperbolic,
1977) depending on the way in which scientists agree Poisson, etc.), so there can be different kinds of fuzzy
to define classes: very often much thought goes into membership functions. These are used to determine
selecting sensible boundaries between class intervals. the membership value at the edges of the set. Most
The same is just as true for assigning the membership common are the linear MFf and the ‘sinusoidal’ MFf.
function of a fuzzy set. The membership function The linear MFF is given by a pair of sloping lines
should ensure that the grade of membership is 1.00 at that peak at MF = 1 for the central concept of the set,
the centre of the set, that it falls off in an appropriate cy and have ME values = 0.5 at the boundaries (Figure
way through the fuzzy boundaries to the region out¬ 113b). The slope of the line gives the width of the fuzzy
side the set where it takes the value 0. The point where transition zone. Note that areas inside the sloping lines,
the grade of membership = 0.5 is called the ‘crossover but outside the Boolean rectangle are zones of partial
point’. The membership function must be defined in truth.
mf . a) MF
Boolean
not A not A
MF | d) mf | e) MF
Figure 11.3. Boolean and fuzzy membership functions for the SI method
270
Fuzzy Sets and Fuzzy Geographical Objects
The Sinusoidal MFf is given by: essentially half-widths of the transition zones because
they give the width inside the Boolean boundary to the
MFF(z) = 7--r for 0 <z<P 11.3 value of z corresponding with the central concept. This
(l + a(z - c)2) also ensures that selections made by Boolean and fuzzy
sets can be strictly compared and that at the bound¬
where A is the set in question, a is a parameter gov¬ aries, by, h2y MFf = 0.5.
erning the shape of the function and c defines the value Model 2 is defined by three equations:
of the property z at the central concept. By varying the
value of a, the form of the membership function and 1
MFF(z) - if z < by + dy 11.4a
the position of the crossover point can be fitted to
^ z - by - d^
Boolean sets of any width (Figure 11.3c). We call this 1+
membership function ‘Model 1’. V dy J
For example, consider the universe of soil depths
from 0 to 200 cm, in which we wish to distinguish
MFF(z) : 1 ifby + dy<z<b2-d2 11.417
271
Fuzzy Sets and Fuzzy Geographical Objects
Fuzzy:
272
Fuzzy Sets and Fuzzy Geographical Objects
Compared with Boolean classification, where only ing the values of the dispersion indices dx and d2.
the boundary values have to be chosen, the extra prob¬ Therefore the Semantic Import approach needs more
lems the semantic import approach brings are (a) information than the conventional Boolean method,
choosing between a linear, sinusoidal, or other func¬ but this is amply repaid by the extra sensitivity in data
tion for assessing class membership and (b) select¬ analysis, as will be demonstrated below.
languages in relational database management sys¬ Box 11.3 demonstrates how union and intersection
tems have been modified to accept continuous logical work with classes of different attributes. Suppose we
operations (e.g. Kollias and Voliotis 1991). The basic have data from soil profiles that give the clay content
operations on fuzzy subsets are similar to and are a for three layers—0-20 cm, 30-40 cm and 70-80 cm
generalization of the AND/OR/NOT/XOR and other to an accuracy of ± 5 per cent. Let us define a ‘heavy
operations used for Boolean sets. Readers are referred textured soil’ as one with a clay content of >30 per
to Kandel (1986), Kauffman (1975), or Klir and Folger cent in the first layer, >35 per cent in the second layer,
(1988) for more details. and >40 per cent in the third layer. As before we use
Box 11.2 gives the main logical operations for fuzzy a sinusoidal function with a dispersion value of 5 per
sets. Note that the integral sign does not mean ‘sum¬ cent (Model 3). In practice it may be necessary to dis¬
mation’ but ‘for the universe of objects Z\ For our tinguish profiles in which only one of the three hor¬
purposes, the most used operations will be union izons is ‘heavy’ (clay anywhere in the profile) from
(maximize), intersection (minimize), negation (com¬ profiles where all three layers are heavy (clay through¬
plement), and the convex combination (weighted out). The first situation is a union (OR), and requires
sum). The concept of the convex combination is use¬ selection of the maximum MF value: the second is a
ful when linguistic modifiers such as ‘essentially’ and union (AND) and requires selection of the minimum.
‘typically’ are to be used or when different attributes The right-hand columns of the table show which pro¬
can compensate for each other. This idea is particu¬ files would be selected, and which rejected by both
larly appropriate in land evaluation as we shall see. categories. Note that if the measurement error is in¬
Union, intersection, or convex combination lead to deed ±5 per cent, measured clay values of 5 per cent
the computation of a new MF value, which we call the less than the Boolean boundary (corresponding to a
Joint membership function value or JMF. MFf = 0.2) would be possible members of the set of
It is how easy to see how an individual entity can ‘heavy soils’, though they would be completely rejected
be a member of one or more sets. Figure 11.5 shows by the Boolean selection.
the membership functions for two adjacent classes of Figure 11.6 shows the same analysis, but now ap¬
clay texture of soil. Observation 1 is clearly in class A, plied to interpolated surfaces of the clay content for
and has a membership value of 1.0 for class A and 0.01 the Lacombe data to distinguish the situation of‘clay
for class B. Observation 2 is near the class boundaries: in some part of the area’ (Figure 11.6a,b) from ‘clay
it has a membership value of 0.4 in class A and a mem¬ throughout the profile’ (Figure 11.6c,d). Clearly, fuzzy
bership value of 0.7 in class B. The JMF for union (OR) classification is a trivial operation in geographical in¬
for observation 1 is 1.0, and for 2 is 0.7; for intersec¬ formation systems that provide ‘cartographic algebra’.
tion (AND) the JMF is 0.01 for observation 1 and 0.4 Note that the operation would be essentially the same
for observation 2. Note that if the two classes have dif¬ were the clay content of the three layers expressed as
ferent transition widths, then the membership values soil polygons, but the appearance of the result would
for a given observation do not have to sum to 1 for be dictated by the form of the mapping units.
the Semantic Import approach. The same is true for If data are originally on a point support, one must
classes defined using different attributes. decide whether first to interpolate all data to a corn-
273
Fuzzy Sets and Fuzzy Geographical Objects
1. Two fuzzy sets, A and B, are said to be equal (A = B) iff (where iff means if and only if)
2. A is contained in B (A B) iff
3. The union of fuzzy sets A and B is the smallest fuzzy subset containing both A and B
A u B = f (MF„(z) v MFfi(z))/z
Jz
AnB = J(MF,(z)'MFs(z))/z
AB = J(MF/1(z)-MFb(z))/z
9. If Av ... Ak are fuzzy subsets of Z, and wlt... w* are non-negative weights summing
to unity, then the convex combination of A,,... Ak is a fuzzy set A whose member¬
ship function is the weighted sum:
mon grid before carrying out fuzzy classification, or putational factors may decide which route is the best.
to classify the profile data first and then interpolate the For many users of geographical information systems
membership functions. The second option is diffi¬ it will be easier to apply the classification models to
cult for Boolean classes unless they are interpolated data that have already been interpolated.
as indicator functions (see Chapter 6), but not for
fuzzy MF values, though there may be theoretical
problems (see Gruijter et al. 1997) because the fuzzy Fuzzy classification with ordinal data
274
Fuzzy Sets and Fuzzy Geographical Objects
275
Fuzzy Sets and Fuzzy Geographical Objects
1
0.875
0.75
0.625
0.5
0.375
0.25
0.125
0
(d)
Figure ll.6. (a,b) Boolean and fuzzy union of sets to show where day occurs anywhere in any
layer; (c,d) Boolean and fuzzy intersection to show where clay occurs throughout the soil profile
Figure 11.7. Comparison of results of top-down land capability classification using conventional
rules (left) and the fuzzy equivalent (right)
through the grid cell, g is the area covered by each grid where the higher values of the topologically derived
cell, and /3 is the slope of the land at the grid cell. wetness index combine with subsoil clay content. The
Because maps of upstream contributing areas and limits of the clay content are taken as 40 per cent with
the wetness indices derived from them vary continu¬ a threshold of 5 per cent and the wetness index limit
ously there are no simple ways to extract features that is 14 with a threshold of 6. Figure 11.8c shows the
could be termed ‘boundaries’. Even contour lines Boolean retrieval of all wetness zones with index >=
do not make sense when the attribute in question is 14 lying on subsoil clay >= 40 per cent and Figure 11.8d
both continuous and clustered along limited path¬ shows the fuzzy equivalent. Note that the fuzzy re¬
ways. Fuzzy sets are particularly appropriate for trieval retains the whole of the drainage net in the
handling this kind of data, as the following example areas of heavy clay, thereby preserving the topolog¬
shows. ical continuity of the drainage, which is lost in the
Figure 11.8 shows the derivation of a map showing Boolean solution.
276
Fuzzy Sets and Fuzzy Geographical Objects
Figure 11.8. Using fuzzy methods for intersecting data layers with continuous data preserves detail
and connectivity which is lost with Boolean selection
2 77
Fuzzy Sets and Fuzzy Geographical Objects
As we have seen, a major problem with the Boolean MF(Z) = 1. The left-hand observation is outside the
model is that it assumes that the attributes have been kernel and the Boolean boundaries so it returns an
described and measured exactly. In many natural MF(Z) < 0.5.
phenomena this is not possible because of measure¬ Now consider what happens when <7Z is non-zero.
ment errors and spatial variation (Burrough 1986, In Figure 11.9c, the probability density pz of Z takes the
1989, Goodchild 1989). Errors also increase because characteristic bell-shaped form and lies well within the
the spatial variation cannot be seen in its entirety, but limits of the Boolean class. The probability that Z
must be reconstructed from data gathered at sample really falls outside the class limits is vanishingly small
locations. The resulting maps (in either vector or raster so for all practical purposes MF(Z) = 1. The same situ¬
form) suffer from both measurement and interpola¬ ation occurs with the continuous membership func¬
tion errors and it is impossible to assume that the data¬ tion whenp2 falls within the class kernel (Figure 11.9d).
base is error free. Consequently, when uncertain data Equally clear-cut results follow when pz lies well out¬
are used in logical or quantitative models, the results side the class limits. So in these cases the error in the
will also contain errors (Heuvelink et at. 1989) espe¬ observation does not cause an error in the classification
cially when these have values near the boundaries of result. This is in itself a result that is worth noting.
the Boolean classes. Using the Monte Carlo methods The results are less than clear-cut whenpz straddles
given in Chapter 9, Heuvelink and Burrough (1993) the Boolean class boundaries or the continuous tran¬
analysed the propagation of errors through Boolean sition zones. Figures 11.9c,/show the situation when
and fuzzy reclassification and retrieval, and demon¬ the mean of Z equals bv In the Boolean case (Figure
strated that fuzzy selection and reclassification was far 11.9c) the distribution of MF(Z) becomes a discrete
less sensitive to small errors in the data, particularly distribution with two possible values, 0 and 1. In this
when data values are near critical class boundaries. case the chances are equal that Z falls within or out¬
side the class boundary so the probability of obtain¬
ing each value is 0.5. In the continuous case shown in
The effects of attribute errors on data
Figure 11.9/, the mean of Z equals bl and the distribu¬
RETRIEVAL AND CLASSIFICATION
tion band is narrower than the transition zone. In this
To understand how errors in the data affect the results case MF(Z) is continuously distributed just asZis. The
of both Boolean and continuous classification, we mean of MF(Z) is about 0.5 and because the mem¬
replace the deterministic attribute z by a random bership function varies steeply at bv the standard
variable Z, with a certain probability distribution func¬ deviation of MF{Z) is much larger than for Z, though
tion. We will denote the standard deviation of Z by it is still substantially smaller than the Boolean stand¬
Gz, which is a measure of the ‘error5 in Z. ard deviation in Figure 11.9c.
Figure 11.9 illustrates several possible situations Figures 11.9g,h show the situation when pz mainly
that can arise when an observation is to be placed in a lies inside the Boolean limit bv but still with consid¬
class whose boundaries on the attribute Z are bx and erable overlap. In the Boolean case (Figure 11.9g) the
b2. Figure 11.9a shows the standard Boolean case when effect is to unbalance the probabilities of returning a
there is no error, so Gz - 0. The attribute Z is in effect membership value of 0 or 1; the probability of return¬
deterministic so the individual observation either falls ing a value of 1 is increased. In the continuous casepz
entirely within the class boundaries (right-hand bar) runs into the class kernel so the resulting distribution
or it falls outside (left-hand bar). The corresponding is a mixed distribution: it is the combination of a con¬
values of the membership function are 1 or 0 re¬ tinuous and a discrete distribution. There is a definite
spectively. Because oz is zero the membership value chance that the membership value will be exactly 1 in
MF(Z) is also error-free. Figure 11.9b shows the same some cases, though values less than 1 can also occur.
situation where the individual observation is now Although not shown, the situation where pz is broader
classified by a continuous membership function. The than the transition zone will generally yield mixed dis¬
right-hand observation is within the class kernel so tributions of MF(Z) with peaks at 0 and/or 1.
278
Fuzzy Sets and Fuzzy Geographical Objects
Figure 11.9. The effects of measurement errors on the results of Boolean classification (left
column) and fuzzy classification (right column)
279
Fuzzy Sets and Fuzzy Geographical Objects
So far we have concentrated on classifying attributes, tion about the nature of those boundaries and also to
but many kinds of mapping involve the delineation calculate sensible area measures. There are at least two
of similar areas of land, often by examining its exter¬ separate approaches, the map-unit approach and the
nal appearance, either in the held or from aerial photo¬ individual boundary approach.
graphs or satellite images. Observable features, such
as colour changes, breaks of slope or edges of texture
patterns, are used to infer ‘boundaries’. The delinea¬ The map unit approach
tions are first drawn in pencil, then checked by mak¬ The simplest approach is to assume that a type of
ing point observations within similarly classified areas, boundary (diffuse or abrupt) can be uniquely attrib¬
and then finalized in ink. Conventionally, the held uted to each kind of map unit or polygon. For ex¬
sheets are redrawn as printing masters by cartogra¬ ample, soil units in narrow floodplains might have
phers who draw all soil, ecotope, geological bound¬ sharp, well-defined boundaries whereas boundaries
aries with a line of standard width (usually 0.2 mm) between loess and coarser aeolian deposits might oc¬
irrespective of the original nature of the boundary in cur over several hundred metres. Information about
the held. These artifacts later digitized to yield digital the type of boundary can be converted to parameters
choropleth maps on which, in theory at least, the for a fuzzy membership function of type Model 2, 3,
drawn boundaries have zero width. This imposition or 4 (equation 11.4a,F,c) which is now applied to the
of cartographic neatness suppresses useful informa¬ distance from the drawn boundary. The widths of
tion about the nature of spatial change and has the transition zones must be chosen with the scale of
misled generations of users of choropleth maps into the map in mind. For example, a sharp boundary on
thinking that soil, vegetation, or geological bound¬ a 1 : 25 000 map drawn 0.2 mm thick covers 50 m so
aries are always and everywhere sharp, and that the the width of the spatial transition zone centred over
units are always homogeneous. the drawn boundary location would be 25 m. A dif¬
All held scientists know that spatial variations in fuse boundary at the same scale might extend over 500
soil, vegetation, or geology can occur abruptly or m and have a transition zone width of 250 m.
gradually. A dike intrusion can give rise to geological Fuzzy transition zones can be computed from poly¬
boundaries that are sharp at the scale of centimetres, gon boundaries by first spreading isotropically and
but variations in texture of alluvial or aeolian depos¬ outwards from the original delineation (cf. Chapter
its can occur over hundreds of metres or kilometres. 7) and then applying a SI model to indicate the exter¬
It is not difficult to indicate whether a boundary iden¬ nal gradation of MF value from well inside the poly¬
tified during heldwork or aerial photo interpretation gon (MF = 1) to the outside. As an example, consider
is sharp or diffuse, because that can often readily be the soil map fragment in Figure 11.10 (taken from the
seen. It is also not difficult to attach an attribute indic¬ Soil Map of the Mabura Hill Forest reserve in Guy¬
ating the relative or the estimated width of a bound¬ ana, Jetten 1994). Figure 11.1 On shows the plateau
ary (sharp, medium, diffuse) to the arcs of a polygon phase of the plinthisols. This unit has fairly diffuse
when soil maps are digitized in a vector system, nor is boundaries and as a first approximation let us assume
it impossible to draw boundary lines in different styles that the true boundary has a locational uncertainty
to indicate their width—in fact all these have been pos¬ extending over 500 m, i.e. a transition zone of width
sible for years (cf. Burrough 1980). The fact is that, 250 m.
apart from some concerns about the accuracy of The procedure to compute a fuzzy boundary for a
boundary location, until recently few had thought it single polygon in a raster representation is simple. One
necessary and useful to include information about the extracts the boundary with an edge filter and then ap¬
nature of the boundary in a choropleth map (Laga- plies a spread command to compute the zones around
cherie et al. 1996). Indeed, one objection to so doing the boundary (see Chapter 8). The parameters of the
was that it was impossible to compute the area of a membership function are selected so that the locations
polygon with diffuse boundaries. corresponding to the original drawn boundary are at
By using the methods of fuzzy logic on polygon the crossover value, i.e. MF = 0.5. The membership
boundaries it is very simple to incorporate informa¬ function is then applied so that those sites well within
280
Fuzzy Sets and Fuzzy Geographical Objects
a. Plinthisols - plateau with diffuse fuzzy b. Boundary widths defined per polygon for
boundary ( dj = 250m) all polygons.
c. Plinthisols - plateau, with local internal d. Boundary widths defined per boundary
boundary widths of sharp (50m), medium section for units 1, 7 and 8
(100m) and diffuse (250m).
the original boundary receive a membership value of fore it is better to confine the display of the boundary
1, those sites inside, but near the boundary receive a JMF to the internal transition zone (JMF ge 0.5). Fig¬
membership value between 0.5 and 1, and those sites ure 11.10b shows the result for the different map units
outside the boundary receive a membership value when the boundary widths have been assigned accord¬
below 0.5 concomitant with their distance from the ing to Table 11.1.
boundary. Figure 11.10a shows the result, with the
original boundary given in white. Clearly, it is
now trivial to compute the area of the polygon that The individual boundary approach
corresponds to any level of the membership func¬ Commonly, areas of soil or vegetation do not have the
tion, so previously voiced doubts about estimating the same kind of boundary all the way round the occur¬
area of polygons with fuzzy boundaries are no longer rence. For example, a raised river terrace may have a
valid. diffuse boundary on the upper side and a sharp bound¬
The procedure can be repeated for all map units, ary on the lower side where the river has cut it away.
varying the width of the boundary to match the unit Other boundaries might be sharp for part of their
(Table 11.1). When all polygons are displayed with length and diffuse for others. It is not difficult to add
fuzzy boundaries it is not sensible to display both the attributes to boundary sections and to spread these
inner and the outer zones of transition for them all individually. The results are then pooled and fuzzy
because the resulting map is difficult to read. There¬ membership functions can be computed as before.
281
Fuzzy Sets and Fuzzy Geographical Objects
Albic arenosol 50
Gleyic arenosol 50
Histosol 25
Haplic ferralsol 50
Dystric fluvisol 25
Plinthisol—valley 50
Plinthisol—plateau 100
Plinthisol—hillside 100
Figure 11.10c shows the result of recognizing three medium (100 m), and diffuse (250 m): Figure ll.lOd
kinds of boundary for the plinthisols—sharp (50 m), shows this procedure applied to more units.
Besides making the map image more realistic and pro¬ transformed map shows a discontinuity at the bound¬
viding the means to compute which parts of a poly¬ ary between them. If the boundary is sharp, then the
gon can really be regarded as part of the core area, result is realistic, but if the boundary is really quite
fuzzy boundaries can also be useful when derived maps
are made from original soil, geological, vegetation,
or land use maps. Conventionally, derived maps are
made by converting the map unit class into a number
or code representing the representative value of the
attribute in question. The result is that if adjacent map
units have large, differing values of an attribute, the
282
Fuzzy Sets and Fuzzy Geographical Objects
diffuse, the result is an ugly artefact which possibly 400 mm/a respectively and boundaries of transition
can be removed by pycnophylactic interpolation width 250 m and Figure 11.12 shows a close-up of
(Chapter 5). the results compared with the usual Boolean lookup
Information about the gradual variation of an attrib¬ table equivalence.
ute over the boundary between two dissimilar map
units can be provided if the boundary membership
functions of the two polygons are used to compute Recapitulation: fuzzy polygon boundaries
a weighted estimate of the attribute values over the The SI approach can be used to add information about
boundary zone. the abruptness of boundaries to a polygon database.
Information about how sharp or diffuse the boundar¬
11.7 ies are can be incorporated on a polygon-by-polygon
IMF, or on an individual line segment basis. The results give
Figure 11.11 shows the weighting procedure used a sensible picture about spatial variation across and
to compute variation in soil moisture across the tran¬ along boundaries which can be used to improve the
sition from Plinthisol plateau to hillside soil units, information content of maps derived from the poly¬
which have a soil moisture supply of 480 mm/a and gon database by simple transfer functions.
283
Fuzzy Sets and Fuzzy Geographical Objects
cluster and MF = 0 to all others. With fuzzy k-means, equal value, as with SI, the sets are ranked according
membership values may range between 0 and 1. Box to importance. This has obvious implications for the
11.4 lists the algorithms used for calculating MF and procedures for manipulating sets given in Box 11.2
the class centres. In ordinary k-means the k mem¬ and means that procedures such as intersection and
berships of each object sum to 1. This causes a loss of convex combination need to be carried out with care
degrees of freedom that may be compensated for by on the sets derived by fuzzy k-means.
introducing an extragrade class (Gruijter and McBrat- Because the values of the MFk are in effect new at¬
ney 1988). The parameter q is the so-called fuzzy ex¬ tributes, the geographical distribution of each set can
ponent, which determines the amount of overlap in be displayed by conventional methods of mapping,
the cluster model. For q —> 1, allocation is crisp and including interpolation. Flowever, because of class
no overlap is allowed. For large q there is complete overlap it is not always easy to read a map when more
overlap and all clusters are identical. Ideally q should than interpolated surface is displayed, particularly
be chosen to match the actual amount of overlap, when displays are limited to black and white. New
which is generally unknown. Because distance is an methods for visualizing the results of fuzzy k-means
overall measure of similarity, departure from the class classification on maps are clearly needed.
centre of one attribute may be compensated by close
correspondence of another. The effective relative
weights of individual attributes in this valuation are Examples of using fuzzy k-means and kriging
determined by the type of distance measure used. TO MAP OVERLAPPING, MULTIVARIATE CLASSES
The net result of fuzzy k-means clustering is that (a) Heavy metal pollution on the Maas Flood-
individual multivariate objects (points, lines, or poly¬ plains Chapters 5 and 6 demonstrated interpola¬
gons) are assigned an MF value with respect to each tion methods with a data set of soil samples taken from
of k overlapping classes. The centroid of each class is a regularly flooded area of the River Meuse floodplain
chosen optimally with respect to the data. In the vari¬ in the south of the Netherlands and data from a slightly
ant of the technique used here, the MF values sum to larger area were used to illustrate costs and benefits
1 rather than individually as is the case with the SI in Chapter 10. Here we also use the larger subset of
approach. This means that rather than all sets having the data, but this time include all attributes (elevation,
284
Fuzzy Sets and Fuzzy Geographical Objects
1. Membership p of the /th object to the cth cluster in ordinary fuzzy /(-means, with d
the distance measure used for similarity, and the fuzzy exponent q determining the
amount of fuzziness:
21-1/(17-1)
[(4 Y] 11.4.1
A,tc
2] -l/(<7-l)
It (40
c —1
2. membership q of the /'th object to the cth cluster in fuzzy k-means with extra-grades,
d and q as above, and with a as a weighing factor between extragrade and ordinary
classes:
21-1/(17-1)
[( 4 )2]
11,4.2
totc
1 -a
£[(402] -l/(£?-l)
+
a
XK40T* /(£?-i)
c =1 c -1
3. membership q of the /'th object the extragrade cluster in fuzzy k-means with extra¬
grades:
i -i-i/(<7-l)
1- a
a
1 4-y
c =1
(
to,* -i-l/(<?-l)
H.4.3
-a
I[( r,/(^,)] +
c =1
402
1
a
I(402
c =1
I(Ac)’ * Z; ;
1=1
ZCj n.4.4
/= !
5. y'th attribute value z of the cth cluster in fuzzy /(-means with extragrades:
distance from the river, heavy metal concentrations scatter plot of membership values for the three classes
(Cd, Zn, Pb, Hg), and organic material) to illustrate (Figure 11.14) shows that the data are generally well
the multivariate fuzzy /c-means approach. separated in attribute space, with only a few observa¬
Following the original flooding frequency map (see tions scoring intermediate membership values. Table
Chapter 5), the attribute data were classified into three 11.2 presents summary statistics for the classes.
classes with a fuzzy overlap q= 1.5. The fuzzy /c-means Figures 11.15a,b,c show the interpolated surfaces for
classification yielded a new data set of memberships the three classes. There is a strong suggestion of a rela¬
to each of the three classes for each site sampled. The tion between the three fuzzy classes and the flooding
285
Fuzzy Sets and Fuzzy Geographical Objects
Figure 11.14. Scatter diagrams of membership values for soil pollution data
frequency classes, namely that flood frequency class 1 of each 0.05 ha on a 100 x 100 m grid in the rainforest
covers similar areas as fuzzy class 3, flooding frequency in Guyana (Jetten 1994). The original species presence/
class 3 covers a similar area to fuzzy class 1, and fuzzy absence data were first grouped into abundance classes
class 3 occupies areas to the south and north. Note the before being analysed by a fuzzy k-means procedure
distribution of these classes over the part of the area which yielded three stable classes, termed ‘Dry Ever¬
used in chapters 5 and 6, which is outlined in Figure green Forest’, ‘Mixed Forest’, and ‘Wet Forest’ (Jetten
11.15e. In that area only two fuzzy classes are really 1994). Table 11.3 presents summary statistics of these
important, which was one of the conclusions reached classes.
in the anovar analysis of the flood frequency mapping Class memberships were interpolated to a 50 m grid
of the smaller area. and the maps are shown in Figure 11.16. Unlike the
pollution classes these are not spatially well correlated,
(b) Rainforest types A similar exercise was carried as can be seen from the ratio of sill to nugget variances
out on the results of a vegetation survey of 252 plots for the fitted variograms. Clearly, although the attrib-
286
(a) Fuzzy class 1 (b) Fuzzy class 2 (c) Fuzzy class 3
(d) Confusion Index (e) Hard classes (f) Zinc levels stratified by
flood frequency classes
Figure 11.15. iP,b,c) Maps of interpolated fuzzy /(-means membership values for the three soil
pollution classes; id) map of confusion indices; (e) crisp map showing where each class is
dominant—the dashed line box outlines the area used in Chapters 5 and 6 to illustrate interpolation;
if) map of zinc levels obtained by stratified kriging for comparison with (e)
0 ?\
Table 11.3. Statistics of fuzzy classes, rainforest species
. * • , sv
287
Fuzzy Sets and Fuzzy Geographical Objects
i
0.875
0.75
0.625
0.5
0.375
0.25
0.125
0
(c) Wet Forest (d) Confusion index
ute classification is little different in quality from the of the forest classes is not clear. This leads us to con¬
floodplain data in terms of minimum and maximum sider the issue of spatial contiguity in fuzzy classes as an
and mean membership values, the spatial distribution attribute that is worthy of more study in its own right.
288
Fuzzy Sets and Fuzzy Geographical Objects
spatial contiguity of the interpolated classes. The would not be a very successful way of describing the
floodplain pollution classes yield well-defined zones spatial variation of the forest.
of large Cl where dominance of one class is replaced
by another (Figure 11.15d). In this case the zones
of large Cl are so thin that they can easily be refined Confusion indices and the SI approach
to polygon boundaries. An effective procedure is to Although the geographical expression of the confusion
compute those cells having the local maximum pro¬ index has been presented above using the results of
file convexity and extract them as a Boolean data type. fuzzy k-means analyses, the same methods can be used
Reclassifying all cells within these boundaries yields to explore the spatial expression of SI classes. Indeed,
the choropleth map Figure 11.15c. The striking sim¬ by repeating the SI classification for a range of transi¬
ilarity of the three mapping units obtained this way to tion zones widths d,, one can examine the relations
the original flooding frequency classes is provided by between geographical location and the core of a class
comparing Figure 11.15c with the flooding frequency- in attribute space. If transition zone widths dt are zero,
stratified kriging map of zinc (Figure 11.15f). Clearly, there is no confusion, because there is no overlap in
when surfaces of continuous membership functions attribute space. Classes maybe spatially disjoint, how¬
are strongly contiguous, computing the confusion ever. As overlap increases, so confusion will become
index is a useful way to delineate the zones that are more widespread. Areas in the core of a class will not
dominated by any given class. experience an increase in confusion as d increase: areas
Applying the same method to the rainforest data near the edge of a class will. So, by computing confu¬
shows that the different classes in Tree space’ are not sion indices for a range of d values (class overlap) one
strongly separated in geographical space—indeed can obtain maps of geographical locations that are in
there is considerable overlap and few clear ‘bound¬ the core of the attribute classes, and therefore stable
aries’ (Figure 11.16d). Because of the driving processes, with respect to class overlap, and areas that are bound¬
the spatial covariance of these classes is much weaker ary cases, both geographically and in attribute space. If
than for the pollution data. We conclude that a poly¬ the result is a map in which the Cl is low everywhere, we
gonal mapping of Tree classes’ based on the attributes may conclude that delineation is not useful. So the Cl
chosen and the resolution of the sampling scheme is a locally tuned form of Perkal’s epsilon (Chaper 9).
289
Fuzzy Sets and Fuzzy Geographical Objects
As with the SI approach, the overlapping classes and tance measure to apply, which determine the degree
gradual boundaries resulting from the fuzzy k-means and shape of fuzziness in the model and the way dif¬
approach appear to be more congruent with reality ferent attributes are compromised. It should be noted
than conventional methods. The fuzzy membership that the degree of fuzziness is always assumed equal
values carry through more of the initial information for all classes.
than crisp classification (McBratney etal. 1992). Mem¬ While the fuzzy /c-means procedure may seem to
bership values can easily be interpolated over space result in less arbitrary shaped fuzzy boundaries than
and the resulting variogram analysis and patterns used with the SI approach, this is only a matter because
demonstrate clearly if a given attribute class also has the method is less direct. Adequate cartographic tech¬
a coherent geographical distribution. niques to map k classes (or k + 1 extragrade) concur¬
In contrast to the SI approach with predefined rently still have to be developed for both SI and fuzzy
classes and class boundaries, the fuzzy k-means ap¬ k-means (but see van der Wei and Hootsmans, 1993).
proach yields locally optimal classes which are not
necessarily based on assumptions of linearity, unlike
conventional data-reduction techniques, such as prin¬ Conclusions—SI or fuzzy k-means?
cipal component and correspondence analysis. While The decision to use SI or fuzzy k-means approaches
linear correlations may sometimes be enhanced by in continuous classification depends on the context of
means of suitable transformation functions (logarith¬ the problem and also on the level of prior informa¬
mic, polynomial, etc.), attributes of soil, sediment, or tion. In situations where a well-defined and functional
water often co-vary in a complex, inherently non¬
linear manner. The fuzzy k-means approach captures
this non-linear covariation (de Gruijter and McBratney
1988, Odeh et al. 1990). Moreover, both continuous Field data
and categorical attribute data can be easily combined
(Odeh et al. 1990). In general, ordination techniques
1
are sensitive to deviations from normality of the fre¬
quency distributions of the attributes, including bimo¬
dality and outlying values (Vriend etal. 1988). Because
of the fuzziness allowed, the fuzzy k-means approach
appears to deal accurately with both ends of the spec¬
trum from co-variation to discrete clustering.
Disadvantages
As with most parametric methods, the greatest diffi¬
culties come with choosing the values of the control
parameters to obtain the best results. With the SI
approach, the user must choose the kind of member¬
ship function, boundary values, and transition widths.
With fuzzy k-means using multiple attributes, the
goodness of the fuzzy clustering obtained is difficult
to visualize and evaluate. This relates to the choice of
k: the number of classes, and q: the amount of over¬ Calculate confusion index
lap or fuzziness allowed. The added difficulty is that and derive geographical boundaries
the optimal fuzziness may depend on the number of
classes and vice versa. A formal approach using diag¬ GENERAL LOCAL
nostic functionals is possible (Bezdek 1981; Roubens APPLICATION OPTIMIZATION
1982), but not always successful. Scientific insight or
compliance with existing classification schemes may
help (Vriend et al. 1988; McBratney and de Gruijter Figure 11.17. Flowchart governing the choice of fuzzy
1992; Frapportiefa/. 1993). Other problems with fuzzy classification using either the SI or the fuzzy /(-means
k-means are the choice of attributes and which dis¬ approach
290
Fuzzy Sets and Fuzzy Geographical Objects
classification scheme exists, the straightforward SI of the original attribute values, some loss of informa¬
approach offers great advantages but the precise defi¬ tion will always be faced.
nition of the fuzzy boundaries requires some special The fuzzy /c-means approach is appropriate
attention. Studies reported here have shown that SI when information about the number and definition
continuous classes are more robust and less prone of classes is lacking. Fuzzy /c-means methods yield
to errors and extremes than simple Boolean classes sets of optimal, overlapping classes that can also be
that use the same attribute boundaries. The results mapped in data space and in geographical space. The
can be mapped showing the relations between gradual results can be used as (a) a once-only analysis of a
changes in attribute values/membership functions in limited area to identify the lineage of (spatial) variab¬
both data space and geographical space. ility (cf. Vriend et al. 1988), (b) as a means to set up
In the fuzzy k-means approach, the criteria that dis¬ classes so that membership functions can be inter¬
tinguish between the ultimate classes are a result of the polated which in turn can predict attribute values at
analysis rather than input to the model. These criteria unsampled locations (McBratney et al 1992), or (c)
may be complex non-linear functions of the ori¬ as a means better to define the simple membership
ginal attributes, thus there is no clear-cut relation be¬ functions that are to be used in straightforward SI
tween membership values and attribute values. While analyses (Figure 11.17). To our knowledge no one has
objects having identical attribute values will end up yet published an example of the latter, but the pro¬
with identical memberships, the reverse is not gener¬ cedure parallels that using numerical classification
ally true because of the mutual compensation pos¬ and discriminant analysis by Burrough and Webster
sible among attributes. Since data reduction is one of (1976) to set up and improve a reconnaissance clas¬
the main reasons for using class memberships instead sification for soil survey (Acres et al. 1975).
Questions
1. List 20 geographical phenomena that cannot be modelled satisfactorily by crisp
entities as explained in Chapter 2. Suggest alternative data models for these phenom¬
ena and discuss the value of the fuzzy set approach in each case.
2. Explain how you would go about measuring the width of geographical boundaries
in practice, (a) in a landscape, (b) in a city.
3. Explore the value of using fuzzy membership functions in sequential data retrieval
operations and sieve mapping.
291
TWELVE
In this book we have explored many of the concep¬ social sciences that greatly extends the capabilities of
tual, computational, and mathematical principles of conventional maps or images.
spatial data analysis that form the bases of many GIS. Since the publication of the first edition of this book
Starting with geographical phenomena and the way (Burrough 1986) the technology (and the associated
people perceive them and build conceptual models of field of study) has matured considerably, although it
space, we proceeded to demonstrate how the differ¬ is fascinating to find that GIS researchers are still tack¬
ent data models of exact entities and continuous fields ling subjects that were major issues ten years ago. For
are modelled in the computer, and how databases of example, the development of efficient storage tech¬
digital spatial data can be created. We showed how niques for huge volumes of geographical data which
these data can be retrieved and analysed, and how allow the full dimensional nature of the informa¬
many different methods of spatial analysis provide tion to be represented continues to be an important
means to derive new information from raw data, and research area. The systems available today are still
how logical and numerical models that use these data evolving and the changes in the last few years in data
can provide new understanding, insights, and also the handling and in products are altering the way GIS are
development of new hypotheses for testing. In short, used and by whom. At the same time there has been a
we have seen that the integration of computer tech¬ great increase in the provision and use of geograph¬
nology and geographical data can provide the user ical data and a growing awareness by organizations of
with a powerful tool for analysis in the natural and its importance. This has been given impetus by the
292
Current Issues and Trends in GIS
demands for an improved understanding of the inter¬ many applications including cadastral mapping
actions between people and environment in many in¬ and resource analysis. The use and development of
vestigations, which requires huge amounts of spatial GIS has been supported by an information infra¬
and temporal data to be handled (Burrough 1996b, structure of dedicated books, journals and magazines,
Schell 1995a, 1995b). conferences and exhibitions, and professional organ¬
The book has said little about the people who use izations which allow new theoretical, technological,
these systems and the whole industry that has grown and application developments to be disseminated to
up around them. The increasingly complex data hand¬ a wide audience.
ling and analyses provided by GIS means that they Changes in GIS in the next few years will be gov¬
are used in policy development, decision-making, erned by (a) developments in information technology,
and research in a range of political, economic, and (b) by various international and industrial standard¬
academic settings. These systems are being adopted ization agreements, and (c) by the expansion of data
by a broader base of users than ever before and there availability and the provision of associated infrastruc¬
are large international markets for this technology. tures. These will all bring changes in the GIS user com¬
GIS are no longer the sole domain of the developed munity and in the way technology is perceived, and
world, with many developing countries using GIS for the following is a summary of the main trends.
Changes in technology
Ten years ago a commercial, off-the-shelf GIS was a moved GIS from a back-room Techy’ peripheral activ¬
monolithic all-embracing system with many volumes ity to a desktop application. The current move towards
of manuals that explained the various functions and 64-bit computer processors will allow faster data pro¬
capabilities; a training period measured in many days, cessing and greater memory usage as well as enabling
weeks, or even months was required to use it. Today concurrent processing to take place. Whilst these devel¬
many GIS tools are different and consist of a range of opments will no doubt allow users of GIS to do more
specialized subproducts, such as for cartographic pro¬ complex, data hungry tasks, more quickly, there are
duction, database access and retrieval, spreadsheets likely to be more key changes in systems in the future.
and spatial data analysis tools, geostatistical interpo¬ Until recently, most GIS were set up for local, or
lation, remote sensing image handling, or computa¬ single main user applications, or for work on a lim¬
tional modelling, which are integrated to a greater or ited project area. As long as the collection, storage,
lesser extent. This widespread integration has been analysis, presentation, and dissemination of geo¬
made possible by the standardization of data exchange graphical information was limited to a single institute,
protocols, both between different modules as well as government organization or business unit, differences
with major data storing and handling tools, such as in hardware, software, and data standards (including
large relational database management systems. Non¬ data models, database implementation, exchange
specialists, such as managers and politicians are now standards, etc.) were of little importance. The ‘best’
in a position to use spatial data in their work without system for any particular application was determined
having intermediaries to tackle the complexities of the by examining the functionality, precision, speed, and
previous systems. On the other hand, there is a grow¬ price-performance relations relative to the user’s re¬
ing distinction between^ the highly skilled professional quirements. GIS were evaluated by ‘benchmark tests’,
who build GIS databases and who are skilled in com¬ as if they were no more than machines on an engineer¬
plex spatial analysis^and the majority of users. ing shop floor. The vendors of GIS protected their
GIS have benefited greatly from developments in interests by not publishing the algorithms, data struc¬
the IT industry of faster and relatively cheaper com¬ tures, and procedures so that each system remained
puter processors which support their data handl¬ largely cut off from every other one (Burrough 1996b,
ing needs. Personal computers and networking have Burrough and Masser 1997).
293
Current Issues and Trends in GIS
It is no surprise therefore that two topics of current more dynamic interactive tools. Developments in pro¬
interest to members of the geographical information gramming languages such as JAVA (of Sun Micro¬
community are Open GIS and Interoperability (Schell systems), will be useful in this the context as they
1995a, 1995b). They are closely interrelated and refer allow applications (known as Java applets) to be
to the need to create systems which support the effi¬ written which are automatically activated by the main
cient description, storage, access, and transfer of geo¬ web browsing software packages. These applets are
graphical data throughout organizations, countries, easily downloaded from the web server to the local
and even globally. These developments will allow more client computer so ensuring faster interaction with
data to be accessible to a wider audience, promoting the Internet. One of the real benefits of the JAVA lan¬
more use of geographical in formation in problem solv¬ guage is that the program code is directly portable to
ing and will lead to considerable efficiencies in the data various computer platforms running under UNIX,
capture process. Macintosh, Windows 95, Windows NT, and so on.
The development in ideas of Open GIS, supported This will relieve web developers from the need to
in many countries through government action (e.g. the write different application programs for the various
US Federal Data Committee and the National Spatial platforms.
Data Initiative—NSDI), international collaboration Current GIS are most successful at handling static,
through organizations such as EUROGI (European easily identifiable units in geographical space, and data
Umbrella Organization for Geographic Information) structuring and handling of more complex data, par¬
and commercial ventures (the Open GIS Consortium) ticularly with a temporal component, continues to
will allow the GI user community to acquire data that challenge GIS developers. In the last few years, data¬
may be efficiently exchanged and shared between dif¬ base research has been dominated by developments
ferent software systems. The OGC is concerned with in the object-oriented approach and whilst this ap¬
developing technology to exchange data efficiently and proach has benefited the structuring of entity data
to provide links to the larger IT business, so over¬ (especially of easily defined objects, such as property
coming the current problems of proprietary system parcels, and utility infrastructure, and offers possibil¬
databases. Data or GIS purchased through an organ¬ ities for spatio-temporal data handling), there are dif¬
ization supporting these standards and technology will ficulties in applying it to continuous field data where
be readily transferable. permanent relations may not exist.
The developments in interoperability aim to remove More flexibility and efficiency may be possible
current constraints on using specific hardware and through systems that allow multi-scale representation
software platforms for particular data or tasks. For using a nested hierarchical approach in data storage
example by setting standards for the database parts and analysis. These support varying degrees of gener¬
of GIS, software from various vendors can be used alization and abstraction in the data and are likely
to query data resident on many different computers. to gain in popularity as they reflect better our under¬
Many commercial systems can now input, or at least standing of spatial scales of geographical data. The
display, geographical data in a wide range of for¬ developments in logic-based databases also offer pot¬
mats. The problems of data exchange then pass from ential in representing data in a more intuitive man¬
the technical domain—can I read your tapes, or can ner but this is still an active research area (Worboys
you input the data I am sending over the net?—to the 1995).
semantic domain. This concerns not the data per se, Three-dimensional GIS are becoming increasingly
but how they were perceived, recorded, and modelled. available (cf. Plates 3.3, 3.4) but the problem of rep¬
A guarantee that one can read in data is no sinecure resenting the temporal nature of geographical infor¬
that they make sense. mation remains a major hurdle to be overcome. Many
The Internet and the World Wide Web will con¬ current GIS database structures can only represent
tinue to influence GIS use and system developments static 2D information and can only represent tem¬
although their dynamic, uncontrolled nature makes poral changes by either a series of different datasets
trends somewhat difficult to predict. There is no doubt of an area, or by multi-temporal attributes. This leads
that both will continue to grow as important sources to a number of problems including redundancy in
of information about data and software availability, the database and clumsiness in integrating the dif¬
and for their dissemination. The networks are also ferent datasets in any analysis (Langran 1992). It
likely to become an important GIS ‘operating environ¬ becomes obvious from current developments and
ment’ in their own right, with the development of reported research that there is still considerable need
294
Current Issues and Trends in GIS
for theoretical and technical developments in the sometimes be used for space-time modelling, but are
modelling of geographical phenomena (Burrough often difficult to use. Recent developments in raster
and Frank 1996). GIS (Mitasova etal. 1996, van Deursen and Burrough
Many human-computer interfaces for GIS now 1998) are providing powerful, easy to use dynamic
include some means of writing high-level programs modelling tools that are themselves a high-level pro¬
(macros) for repetitive tasks. With effort they can gramming language (cf. Wesseling etal 1996).
Data for GIS have been a perennial problem limiting tures. There have also been a number of continental
GIS adoption and use although one of the major trends and international initiatives (e.g. CORINE 1992,
of the last five years has been in the growth in their EUROSTAT 1996, UNEP/FAO 1994, Digital Chart of
provision. As the geographical information industry the World 1991) which attempt to collate data from
has developed, users realized that cost savings could many sources and bring them to common levels of
be made if they could trade spatial data: if another or¬ geometry and classification (RIVM 1994). Develop¬
ganization has created a digital version of the data you ments in standards for metadata (data about data) that
need why spend time and money redoing the work? include information about the source, method of re¬
In practice this data transfer sometimes cost as much cording, resolution, data of collection, data model, and
or more than the original digitizing because of the lack data type are making it easier for users to judge if the
of data exchange formats available to support this, as data offered are suitable and reliable for their purpose.
discussed in Chapter 4. Until recently, the approach These developments are very important if some of
of many governments and other organizations to the organizational and environmental problems be¬
implement standards for data transfer and quality ing faced by countries today are to be tackled. Many
has been in most cases rather ad hoc and this lack of situations result from actions and accidents in one
firm policy decisions has led to a poorly controlled, country which affect the lives and natural resources
fragmented market in geographical data. Users of of persons living in others. The consequences of nat¬
data are not always sure of the accuracy or precision ural, industrial, economic, and political actions are not
of the information and given the burgeoning provi¬ restricted by administrative boundaries whether local
sion of data from so many sources across the globe or international. A particularly poignant demonstra¬
on the Internet this has many serious repercussions. tion of the international dimension of environmental
Data from different sources which may be combined hazards occurred in Europe in 1986 when the cloud
technically very easily, in terms of different quality, col¬ of radioactive waste from the Chernobyl accident
lection standards, georeferencing, etc. maybe incom¬ roamed the northern hemisphere, ignoring national
patible leading to spurious analysis results. boundaries and government mandates, to deposit its
Various new initiatives are beginning to address load on areas far from the source (Karaoglou et al.
these problems. The developments of Open GIS and 1996). This, and problems such as the uncontrolled
their implications for data exchange have already been movement of refugees in zones of human conflict
discussed. In parallel have been various spatial data ini¬ need to be addressed through qualitative and quant¬
tiatives, at the national, continental, and global scale, itative analysis using integrated spatial datasets cov¬
which attempt to ease data availability and integration ering the whole of an affected area rather than just that
problems by setting up agreements, guidelines, rules, of one particular country or province.
and policies for the orderly exchange and distribution The difficulties and problems of assembling a uni¬
of geographical data. These initiatives involve estab¬ form geographic database over several administrative
lishing large, usually virtual, spatial databases and areas such as provinces or countries are legion (c.f.
countries such as the USA and Australia have already Langas 1997). The georeferencing data may refer to
made progress with national spatial data infrastruc¬ different.coordinate.systemsM)ase levels, or map
295
Current Issues and Trends in GIS
projections. Attribute data may have been collected These various developments, are important in
using different sampling methods or size of sample. termsAffjaving money and improving efficiency in
Laboratory methods, if used to analyse soil, water, air, an industry in which the costs of collecting, storing,
or biological materials, may be different in different ancTexchanging datafar exceed those of data exploit-
countries or maybe subject to varying degrees of error ^tTpntBiTrrough and Masser 1998). As a consequence
due to differences in local expertise. Some data, in markets for geographical information are opened
particular social data or land cover data, maybe linked up, bringing sometimes complex and voluminous
to administrative units that are defined differently and amounts of data to a wider public and leading to an
have different spatial resolution in different countries improvement in decision-making and, importantly,
(Burrough 1996b, Burrough and Masser 1998). our knowledge and understanding of the spatial dis¬
In addition there are problems such as data owner¬ tribution of phenomena on the earth’s surface.
ship, legal liability, and standards not to mention the The semantic problems of data integration and
myriad political and institutional issues that need to exchange remain more intractable than the technical
be resolved. Initiatives to standardize data collection ones. The problems begin with culture, language, and
and referencing such as the basic geodetic frameworks perception and extends into realms of spatial and tem¬
for determining geographic location, or elevation data, poral analysis. Better, more powerful theoretical con¬
or thematic data on the location of natural objects, cepts are needed than those embedded in the current
such as rivers, coasts, and lakes, and anthropogenic commercial GIS: this is the domain of GI Science
features such as roads, railways, towns, and cities, often (Burrough 1996b, Burrough and Masser 1998).
meet with resistance in defending home territories.
GIS users have always ranged across a number of to-use interfaces allow users, who have a good under¬
disciplines, reflecting the wide application of geo¬ standing of the data, to interact with the system dir¬
graphical data. The systems have been used in stud¬ ectly, much as they would a spreadsheet. This means
ies of the spatial distribution of animals, minerals, that they do not have to communicate their analyt¬
and vegetables as well as industrial location, voting ical needs to a second person. Intranet developments
patterns, and ancient and modern human habitation. will also offer organizations an efficient way of pro¬
Until recently, most GIS were limited in their scope viding geographical data and system capabilities to a
in terms of both the spatial or study field covered. wide range of people in a user-friendly way. For many
The systems tended to be used by a number of spe¬ users of the current Internet/Intranet data and tech¬
cialists who were familiar with the software and where nology, GIS are essentially digital atlases or gazetteers.
responsible for the technology and the database. The These users tend not to collect data and many do not
generated results for GIS analysis, usually transferred make maps. They are essentially a new type of user—
in paper form, were used by a much wider group of the ‘spatial browser’.
people in their work. One of the marked changes in the user community
The trends in GIS technology towards a series of have been the increasing number of GIS professional
specialist subproducts in tandem with the increased people. They offer consultancy and advice to users on
provision of digital data have brought about a change purchasing, maintaining, and using systems and are
in users. In some areas, for example spatial modelling, proficient in GIS as a generic idea, and not just as a
GIS use continues to be a specialist (some might say specific system type. These professionals work for a
elitist) activity. However, the development of more range of different companies ranging from one-man
specialized modules offering querying and mapping bands to divisions within major international con¬
capabilities, support new sets of users who do not sultancy organizations. Some GIS professionals are
necessarily have GIS technical know-how. The easy- specialized within a particular field such as hydrology
296
Current Issues and Trends in GIS
or market research and are proficient at using systems cultural richness, however, bringing in a system that
to solve analytical problems in these areas. Various GIS has been developed largely within the cultural bounda¬
professional organizations have been established in a ries of an alien mindset often means that the recipi¬
number of countries and provide a forum for user and ents treat the imported system as the only reality and
system vendor interaction. apply it indiscriminately (Burrough 1996b). Techno¬
The GIS user community is now more worldwide logy, as it embraces the concepts of progression and
than ever and Internet developments will ensure development, is often seen as something that must
that trend continues. Their applications are in a wide always be followed slavishly.
range of fields and projects, and most interestingly GIS is an important tool for use in solving prob¬
across a variety of cultures. The employment of these lems which require the manipulation and analysis of
systems has met with varying degrees of success geographical data. Many of today’s ambitions of im¬
with data availability and importantly institutional proving economic and social well-being and environ¬
and societal issues usually being more influential than mental improvement go hand in hand, and require a
the technology itself. There have been a number of spatial awareness and understanding of processes and
studies on the impact of GIS on organizations (e.g. form. The Earth Summit at Rio recognized this and
Huxhold and Levinsohn 1995) and more recently many of the actions approved by governments include
the larger influences of this type of technology on the references to spatial data and geographical informa¬
thinking of communities have been studied (Bijker tion (Thacher 1996) in areas such as soil erosion,
etal. 1987). marine pollution, biological diversity, and breeding
The concepts embodied in GIS reflect the thinking stocks, drinking water supplies, climate change, pov¬
ofceftamclasses of technically oriented people in rela- erty, urbanization, or demographic change. GIS will
tivelyTew parts of the world. The power of the ideas is continue to provide answers but their successful use
evident from the speed at which these ideas are assimi¬ will depend on the questions posed, the data available,
lated (e.g. the first edition of this book has been trans¬ the institutional and political support necessary to
lated into Japanese, Chinese, and Thai, and there is an bring about any changes, and of course, on the skills
ARC-INFO in every country). Instead of enhancing and insights of those who use them.
297
Appendix 1. Glossary of Commonly Used GIS Terms
Absolute georeference, the referencing in space of the Arc, a complex line connecting a sequence of coordinate
location of a point using a predefined coordinate system points. Also known as a chain or string.
such as latitude and longitude or a national grid. Archival storage, magnetic and optical media (tapes, remov¬
Abstraction, the division of real world phenomena into able disks) used to store programs and data outside the
individual, distinct items. normal addressable memory units of the computer.
Acceptance test, a test for evaluating a newly purchased Area, a fundamental unit of geographic information (a geo¬
system’s performance and conformity to specifications. graphic primitive) which is a measure of a particular ex¬
Access time, a measure of the time interval between the tent of the earth’s surface (see point, line, and polygon).
instant that data are called from storage and the instant Array, a series of addressable data elements in the form of a
that delivery is complete. grid or matrix.
Accuracy, conformance to a recognizable standard; can Array processor, special hardware for high-speed process¬
often mean the number of bits in a computer word avail¬ ing of data encoded on a matrix.
able in a given system. The statistical meaning of accu¬ Assembler, a computer program that converts programmer-
racy is the degree with which an estimated mean differs written instructions into computer-executable (binary
from the true mean. instructions).
Addressability, the number of positions (pixels) in the X Assembly language, a low-level (primitive) programming
and Y axes on a VDU or graphics screen. language that uses mnemonics rather than English-like
Addressable point, a position on a visual display unit (VDU) statements.
that can be specified by absolute coordinates. Attribute, non-graphic information associated with a point,
Algorithm, a set of rules for solving a problem. An algorithm line, or area element in a GIS.
must be specified before the rules can be written in a com¬ Autocorrelation, autocovariance, statistical concepts ex¬
puter language. pressing the degree to which the value of an attribute at
Aliasing. 1. The occurrence of jagged lines on a raster-scan spatially adjacent points varies with the distance or time
display image when the detail exceeds the resolution of separating the observations.
the screen. 2. In Fourier analysis the effect of wavelengths Automated cartography, the process of drawing maps with
shorter than those sampled by the observation points on the aid of computer driven display devices such as plot¬
the form of the estimated power spectrum. ters and graphics screens. The term does not imply any
Altitude matrix, a grid of elevation values. information processing.
Alphanumeric code, machine processable letters, numbers, Background processing/mode. Tasks such as printing are
and special characters. Hence alphanumeric screen, alpha¬ given a lower priority by the computer than those requir¬
numeric keyboard for displaying and entering alphanu¬ ing direct user interaction.
meric characters. Backup, making a copy of a file or a whole disk for safe keep¬
American National Standards Institute (ANSI), an asso¬ ing in case the original is lost or damaged.
ciation formed by the American Government and Indus¬ BASIC (Beginner’s All-purpose Symbolic Instruction Code),
try to produce and disseminate widely used industrial a simple, high-level computer programming language, for
standards. inexperienced computer users.
American Standard Code for Information Interchange Baud rate, a measure of the speed of data transmission
(ASCII), a widely used industry standard code for ex¬ between a computer and other devices—equivalent to
changing alphanumeric codes in terms of bit-signatures. bits per second.
Analogue. 1. Representation of information by a continu¬ Benchmark test, a test to evaluate the capabilities of a com¬
ously varying signal (contrast with discretized signals such puter system in terms of the customer’s requirements.
as digital data). 2. A representation of a physical variable Binary arithmetic, the mathematics of calculating in pow¬
or phenomenon by another variable which displays pro¬ ers of 2.
portional relationships over a specified range; e.g. using Binary coded decimal, the expression of each digit of a deci¬
a map to describe an area on the earth’s surface. mal number in terms of a set of bits.
Anisotropic, an adjective to describe a spatial phenomenon Binary trees, a data compression and indexing technique
having different physical properties or actions in differ¬ which subdivides the spatial or attribute data into a
ent directions. series of levels.
Application, a task addressed by a computer system. Bit, the smallest unit of information that can be stored and
Application program or package, a set of computer pro¬ processed in a computer. A bit may have two values—0
grams designed for a specific task. or 1; i.e. yes/no, true/false, on/off.
298
Appendix 1. Glossary
Bit map, a pattern of bits (i.e. on/off) on a grid stored in Character, an alphabetical, numerical, or special graphic
memory and used to generate an image on a raster scan symbol that is treated as a single unit of data.
display. Chorochromatic maps, a map in which the area is divided
Bit plane, a gridded memory in a graphics device used for into a series of zones which are each displayed using a
storing information for display. single colour or shading.
Bits per inch (BPI), the density of bits recorded on a mag¬ Choropleth map, a map consisting of a series of single¬
netic tape; 800, 1600, and 6250 are common standards valued, uniform areas separated by abrupt boundaries.
for old, large tapes. Classification, the process of assigning items to a group or
Block codes, a compact method of storing data in raster set according to their attributes.
databases which simplifies the region into a series of two Clump, the fusion of neighbouring cells which are of the
dimensional blocks of various dimensions, e.g. 2x2, same class into larger units.
3x3, etc. The record in the database records the origin, Code, a set of specific symbols and rules for representing
the bottom left, and the side of each square. data and programs so that they can be understood by the
Block kriging, the prediction of attribute values for square computer. See ASCII, FORTRAN, PASCAL, etc.
blocks of land using the kriging methods of geostatistical Co-kriging, estimation of a regionalized variable using
interpolation. observations of that variable supplemented by ob¬
Boolean operators, these are operators based on logic which servations of one or more additional variables from
are used to query two or more sets of data. The operators within the same geographical area, thereby reducing the
allow inclusion, exclusion, intersection, and differences estimation variance if the original variable has been
in the data to be determined. undersampled.
Break points, points on the graph of a continuous vari¬ Colour display, a computer screen or VDU capable of dis¬
able (e.g. splines) where there is an abrupt change in playing maps and results in colour.
direction. Command, an instruction sent from the keyboard or other
Buffering, the creation of a zone of specified width around control device to execute a computer program.
a point, line, or area. The buffer is a new polygon which Command language, an English-like language for sending
is used in queries to determine which entities occur within commands for complicated program sequences to the
or outside the defined area. computer.
Bug, an error in a computer program or in a piece of elec¬ Command Language Interpreter (CLI), a computer pro¬
tronics that causes it to function improperly. gram for converting English-language-like commands
Byte, a group of contiguous bits, usually 8, that represent into instructions for the computer.
a basic unit of information which is operated on as a Compiler, a computer program that translates a high-level
unit. The number of bytes is used to measure the programming language, such as FORTRAN, C++ or PAS¬
capacity of memory and storage units, e.g. 256 kbytes, CAL into machine-readable code.
300 Mbytes. Composite map, a single map created by joining together
C++, a high-level programming language often used to write several maps that have been digitized separately.
graphics programs. Computer assisted/aided cartography (CAC), the use of
Cadastral map, a map showing the precise boundaries and computer hardware and specific software for making
size of land parcels. maps and charts.
Cartography, the art and science of drawing charts and Computer graphics, a general term embracing any comput¬
maps. ing activity that results in graphic images.
Cartridge tape, a type of magnetic memory tape enclosed Computer word, a set of bits (typically 16 or 32) that
in a plastic cartridge. occupies a single storage location and is treated by the
Cathode ray tube, the electronic device used in an elec¬ computer as a unit of information.
tronic screen which controls the display of information Computing environment, the total range of hardware and
or graphics. software facilities provided by a computer and its oper¬
CD-ROM, a compact disk, used as a Read Only Memory ating system.
device. Conceptual model, the abstraction, representation, and
Cell, the basic element of spatial information in the raster/ ordering of phenomena using the mind.
grid description of spatial entities (see pixel). Conditional simulation, the simulation of a single random
Central processing unit (CPU), the part of the computer function that honours the data values at the sampling
that controls the whole system. points.
Chain, see arc. Configuration, a particular combination of computer
Chain codes, a compact method of storing data in raster hardware and software for a certain class of application
databases which simplifies the boundary of a region in tasks.
terms of a sequential series of north, south, east, or west Confusion index, a measure of the relative dominance of
directional vectors grid, and the number of cells in each the membership values assigned to an individual for two
direction. or more fuzzy classes.
299
Appendix i. Glossary
Connectivity, the linking of different spatial, mostly linear, Delaunay triangulation, the graph obtained by joining
units into complex chains. pairs of points whose polyhedra are Thiessen (Voronoi/
Console, a device that allows the operator to communicate Dirichlet) divisions of the plane.
with the computer. DEM, see Digital Elevation Model.
Contiguous, adjacent spatial units touching to form an Device, a piece of equipment external to the computer
unbroken chain or surface. designed for a specific function such as data input, data
Contour, a line connecting points of equal elevation. storage, or data output.
Convolution, the conversion of values from one grid to Differentiable continuous surface, the representation of
another which is different in terms of size or orientation. a continuously varying phenomenon using scalar or
Cross-hatching, the technique of shading areas on a map integer data so that the rate of change across and within
with a given pattern of lines or symbols. the area may be derived.
Crossover point, the point at which there is a 50 per cent Digital, the representation of data in discrete, quantized
possibility that something belongs to a particular fuzzy units or digits.
class. Digital Elevation Model (DEM), a quantitative model of
Cross validation, a validation method in which observations a part of the earth’s surface in digital form. Also digital
are dropped one at a time from a sample size n, and n terrain model (DTM).
estimates are computed from the remaining (n-1) obser¬ Digitize, (noun) a pair of XY coordinates; (verb) to encode
vations. The statistics of the estimates are used to evalu¬ map coordinates in digital form.
ate the goodness of fit. In geostatistics this is done using Digitizer, a device for entering the spatial coordinates of
the kriging equations to check the variogram with respect mapped features from a map or document to the com¬
to the sample data. puter. A pointer device, a cursor, puck, or mouse is used
Cursor, a visible symbol guided by the keyboard, a joystick, to locate key points.
a tracking ball, or a digitizer, usually in the form of a cross Discretization, the process of dividing an area into a series
or a blinking symbol, that indicates a position on a com¬ of self-contained units.
puter screen or VDU. Disjunctive kriging, a non-linear distribution-dependent
Data analysis models, a series of commands that when com¬ estimator for regionalized variables that do not have sim¬
bined perform a particular kind of data analysis. ple (Gaussian) distributions. It is the most demanding
Data link, the communication lines and related hardware kriging method in terms of computer resources, math¬
and software systems needed to send data between two ematical understanding, and stationarity conditions.
or more computers over telephone lines, optical fibres, Disk, a storage medium consisting of a spinning disk coated
satellite networks, or cables. with magnetic material for recording digital information.
Data model, the abstraction and representation of real Diskette, see floppy disk.
world phenomena according to a formalized, concep¬ Dirichlet tessellation, see Thiessen polygons.
tual schema, which is usually implemented using the Distributed processing, the placement of hardware pro¬
geographical primitives of points, lines, and polygons, cessors where needed, instead of concentrating all com¬
or discretized continuous fields. puting power in a large central CPU.
Data record, see tuple. Dot-matrix plotter, a plotter of which the printing head
Data structure, the organization of data in ways suitable for consists of many, closely spaced (100-200 per inch) wire
computer storage and manipulation. points that can write dots on the paper to make a map.
Data types, the classification of different data according to Also known as an electrostatic plotter or matrix plotter.
their characteristics. For example Boolean (0/1), nom¬ Double precision, typically refers to the use in 32-bit word
inal, ordinal, integer, scalar (real), directional, or topo¬ computers of a double word of 64 bits to represent real
logical data types according to their function and degree numbers to a precision of approximately 16 significant
of precision. digits.
Database, a collection of interrelated information, usu¬ DPI (dots per inch), a measurement of the density of dots
ally stored on some form of mass-storage system such as used to print or scan an area with larger values represent¬
magnetic tape or disk. A GIS database includes data about ing more detail and a finer resolution.
the position and the attributes of geographical features Drift, a trend in the data.
that have been coded as points, lines, areas, pixels, or grid Drum scanner, a scanning device for converting maps to
cells. digital form, in which the Y-axis movements are governed
Database Management System (DBMS), a set of computer by the rotation of the drum.
programs for organizing the information in a database. Edit, to remove errors from or to modify a computer file of
Typically, a DBMS contains routines for data input, veri¬ a program, a digitized map, or a file containing attribute
fication, storage, retrieval, and combination. data.
Debug, to remove errors from a program or from hardware. Element, a fundamental geographical unit of information,
Debugger, a program that helps a programmer to remove such as a point, line, area, or pixel. May also be known as
programming errors. an ‘entity’.
300
Appendix 1. Glossary
Ellipsoid, mathematical model for the shape of the earth, structured programming and interactive data input much
taking account of flattening at the poles. easier.
Entities, distinct units of a real world phenomenon. Fourier analysis, a method of dissociating time series or
Exact interpolator, an interpolation method that predicts spatial data into sets of sine and cosine waves.
a value of an attribute at a sample point that is identical Fractal, an object having a fractional dimension; one which
to the observed value. has variation that is self-similar at all scales, in which
Experimental variogram, an estimate of a (semi-)variogram the final level of detail is never reached and never can be
based on sampling. reached by increasing the scale at which observations are
Extrapolation, the estimation of the values of an attribute made.
at unsampled points outside an area covered by existing Fuzzy set, a set of objects in which the membership of the
measurements. set is expressed in terms of a continuous membership
Feature planes, a series of separate different classes of phe¬ function having values between 0 and 1. Unlike Boolean
nomena which are often used as the basis of the different sets, fuzzy sets can overlap and an individual can be a
overlays. member of the overlapping sets to different degrees.
Field. 1. A type or class of data. 2. Within a database, a set of Gap, the distance between two graphic entities (usually lines)
records containing information (cf. tuple). on a digitized map. Gaps may arise through errors made
File, a collection of related information in a computer that while digitizing or scanning the lines on a map.
can be accessed by a unique name. Files may be stored on Generalization, the process of reducing detail on a map as
tapes or disks. a consequence of reducing the map scale. The process can
Filter, in raster graphics, a mathematically defined opera¬ be semi-automated for certain kinds of data, such as topo¬
tion for removing long-range (high-pass) or short-range graphical features, but requires more insight for thematic
(low-pass) variation. Used for removing unwanted com¬ maps.
ponents from a signal or spatial pattern. Geocoding, the activity of defining the position of geo¬
Finite difference modelling, a numerical modelling tech¬ graphical objects relative to a standard reference grid.
nique used with data held in regular grid form in which Geodetical surveying, the determination of the position of
algebraic equations are used to solve changes in a vari¬ points on the earth’s surface accounting for its curvature,
able at each location. rotation, and gravitational field.
Finite element modelling, a numerical modelling technique Geographical data, data that record the location and a value
used with data held in irregular grid (usually triangu¬ characterizing the phenomenon.
lar) form in which algebraic equations are used to solve Geographical data model, formalized schema for represent¬
changes in a variable at each location. ing data which has both location and characteristic.
Flatbed plotter, a device for drawing maps whereby Geographical information system, a set of computer tools
the information is drawn by the plotting head being for collecting, storing, retrieving at will, transforming, and
moved in both the X and Y directions over a flat, displaying spatial data from the real for a particular set of
fixed surface. Draws with a pen, light beam, or scribing purposes.
device. Geographical primitives, the smallest units of spatial in¬
Floatingpoint, a technique for representing numbers with¬ formation: in vector form these are points, lines, and
out using a fixed-position decimal point in order to im¬ areas (polygons); in raster form they are pixels (2D) and
prove the calculating capability of the CPU for arithmetic voxels (3D).
with real numbers. GPS (Global Positioning System), a set of satellites in
Floating point board, a printed circuit board placed in the geostationary earth orbits used to help determine geo¬
CPU in order to speed up arithmetic operations for real graphic location anywhere on the earth by means of port¬
numbers. (The alternative is to use special software, which able electronic receivers.
is usually much slower.) Graphics tablet, a small digitizer (usually measuring 30 x
Floppy disk, a cheap, low-capacity storage medium, usu¬ 30 cm) used for interactive work.
ally measuring 3.5 inches in diameter, capable of storing Greyscales, levels of brightness (or darkness for displaying
up to 2.0 Mbytes of data. Also known as a diskette or information on monochrome display devices.
floppy. Grid. 1. A set of regularly spaced sample points. 2. A tessel¬
Font, symbolism used for drawing a line, or the name of a lation by squares. 3. In cartography, an exact set of refer¬
typeface used for displaying text. ence lines over the earth’s surface. 4. In utility mapping,
Format, the way in which data are systematically arranged the distribution network of the utility resources, e.g. elec¬
for transmission between computers, or between a com¬ tricity or telephone lines.
puter and a device. Standard format systems are used for Grid map, a map in which the information is carried in the
many purposes. form of regular squares. Also called a raster.
FORTRAN (FORmula TRANslation), a high-level program¬ Hardcopy, a copy of a graphics or map image on paper or
ming language, much used in computer graphics. Recent other stable material.
improvements, embodied in FORTRAN 77, have made Hard data, data obtained by direct measurement.
301
Appendix i. Glossary
Hardware, the physical components of a GIS—the com¬ Intrinsic hypothesis, a form of spatial stationarity less
puter, plotters, printers, VDUs, and so on. restrictive than second order stationarity, in which the
Hexadecimal system, the representation of numbers and stationarity requirements are confined to the first differ¬
letters using base 16 alphanumeric values. ences and not the underlying regionalized variable. The
Hidden line removal, a technique in 3D perspective intrinsic hypothesis is useful for modelling regionalized
graphics for suppressing the appearance of lines that variables in which the form of the variogram is a func¬
ordinarily would be obscured from view. tion of domain size.
Hierarchical database structure, a method of arranging Isoline, a line which joins points of equal value.
computer files or other information so that the units of Isopleth map, a map displaying the distribution of an attrib¬
data storage are connected by a hierarchically defined ute in terms of lines connecting points of equal value; see
pathway. From above to below, relations are one-to- contour, contrast with choropleth map.
many. Isotropic, an adjective to describe something that has the
High-level language, a computer programming language same physical properties or actions in all directions.
using command statements, symbols, and words that Join. 1. (verb) to connect two or more separately digitized
resemble English-language statements. Examples are maps; 2. (noun) the junction between two such maps,
FORTRAN, PASCAL, C++, PL/1, COBOL, BASIC. sometimes visible as a result of imperfections in the data.
Histogram, a diagram showing the number of samples that Justification (right, left, or centre), the relative position of
fall in each contiguously defined size class of the attribute a text string or symbol on the map to the location at which
studied. it has been digitized.
Hole effect, a condition in which the variogram does not Key file, in some CAD/CAM systems, a file containing the
increase monotonically beyond the range. The cause codes defining the operation of certain keyboard func¬
may be real or pseudo periodicities in the sample data. tions, or menu commands. In DBMS, a file containing
Host computer, the primary or controlling computer in a information about search paths or indexes used to access
data network. data.
Hypsometry, the measurement of the elevation of the earth’s Keyboard, the device used for typing information into the
surface with respect to sea level. computer.
Indexed files, files of data records in which pointers, based Kriging, name (after D. G. Krige) for a suite of interpola¬
on a particular ordering such as alphabetic, are used to tion techniques that use regionalized variable theory to
quicken accessing and searching instructions. incorporate information about the stochastic aspects of
Indicator kriging, a kriging interpolation method which is spatial variation when estimating interpolation weights.
non-linear and in which the original data are transformed Kriging variance, a measure of the uncertainty of estima¬
from a continuous to a binary scale. tion for values predicted by kriging.
Inexact interpolator, interpolation methods that provide LANDSAT, a series of earth resource scanning satellites
estimates at data locations that are not necessarily the launched by the United States of America.
same as the original measurements. Laser printer, a printer in which the information is written
Input, (noun) the data entered to a computer system; (verb) onto light-sensitive drum or material using a laser.
the process of entering data. Layer, a logical separation of mapped information accord¬
Input device, a hardware component for data entry; see ing to theme. Many geographic information systems and
digitizer, keyboard, scanner, tape drive. CAD/CAM systems allow the user to choose and work on
Integer, a number without a decimal component; a means a single layer or any combination of layers at a time.
of handling such numbers in the computer which requires Legend, the part of a map explaining the meaning of the
less space and proceeds more quickly than with num¬ symbols used to code the depicted geographical elements.
bers having information after the decimal point (real Library, a collection of standard, often used computer sub¬
numbers). routines, or symbols in digital form.
Interactive, a GIS system in which the operator can initiate Light pen, a hand-held photosensitive interactive device for
or modify program execution via an input device and can identifying elements displayed on a refreshed computer
receive information from the computer about the pro¬ screen.
gress of the job. Line, one of the basic geographical primitives, defined by
Interface, a hardware and software link that allows two com¬ at least two pairs of XY coordinates.
puter systems or a computer and its peripherals to be con¬ Line printer, a printer that prints a line of characters at a
nected together for data communication. time.
Interpolation, the estimation the values of an attribute at Linear interpolator, describes a method whereby the
unsampled points from measurements made at sur¬ weights assigned to different data points are computed
rounding sites. using a linear function of distance between sets of data
Intersection. 1. Geometric: the crossing of lines or polygons points and the point to be predicted.
to form new units. 2. Logic: the combination of data from Local drain direction (ldd), the direction of steepest down¬
two Boolean sets using the AND operator. hill slope as determined from a gridded DEM.
302
Appendix 1. Glossary
Lookup table, an array of data values that can be quickly Mouse, a hand-steered device for entering data or com¬
accessed by a computer program to convert data from one mands to a computer.
form to another, e.g. from attribute values to colours. Nested sampling, the measurement of data at a series of
Machine language, instructions coded so that the com¬ points whose locations are hierarchically structured.
puter can recognize and execute them. Network 1. Two or more interconnected computer systems
Macro, a text file containing a series of frequently used for the implementation of specific functions. 2. A set of
operations that can be executed by a single command. interconnected lines or arcs.
Can also refer to a simple high-level programming lan¬ Network database structure, a method of arranging data
guage with which the user can manipulate the commands in a database so that explicit connections and relations
in a GIS. are defined by links or pointers of a many-to-many type.
Magnetic media, tape or disks coated with a magnetic sur¬ Node, the point at which arcs (lines, chains, strings in a poly¬
face used for storing electronic data. gon network) are joined. Nodes carry information about
Mainframe, a large computer supporting many users. the topology of the polygons.
Map. 1. A hand-drawn or printed document describing the Noise, irregular variations or error, usually short range, that
spatial distribution of geographical features in terms of a cannot be easily explained or associated with major map¬
recognizable and agreed symbolism. 2. A collection of ped features or process.
digital information about a part of the earth’s surface. Non-differentiable continuous surface, representation of
MAP (Map Analysis Package), a computer program writ¬ a continuously varying phenomenon using binary, nom¬
ten by C. D. Tomlin for analysing spatial data coded in inal, or ordinal data types that do not support the calcula¬
the form of grid cells. tion of the rate of change across and within an area.
Mapping unit, a set of areas drawn on a map to represent a Non-linear kriging, see indicator kriging and disjunctive
well-defined feature or set of features. Mapping units are kriging.
described by the map legend. Non-removable storage, computer data storage which can¬
Map projection, the basic system of coordinates used to not be moved easily and includes hard disks.
describe the spatial distribution of elements in a GIS. Non-transitive variogram, a variogram in which the sill is
Mass storage system, auxiliary, large-capacity memory for not reached within the domain of interest (see intrinsic
storing large amounts of data. Usually optical or magnetic hypothesis).
disk or tape. Normalization, methods that are used to reduce redund¬
Maximum likelihood, a method embodying probability ancy and improve efficiency in a database.
theory for fitting a mathematical model to a set of data. Nugget, in kriging and variogram modelling, that part of
Mean, the average, or most likely value. the variance of a regionalized variable that has no spatial
Menu, a list of available options displayed on the computer component (variation due to measurement errors and
screen that the user can choose from by using the key¬ short-range spatial variation at distances within the small¬
board or a device such as a mouse. est inter-sample spacing).
Metadata, information about the provenance, resolution, Numerical taxonomy, quantitative methods for classifying
availability, age, ownership, price, copyright, and other data using computed estimates of similarity.
matters concerning digital spatial data that is easily avail¬ Object code, a computer program that has been translated
able to potential users. into machine readable code by a compiler.
Minicomputer, a medium-sized, general purpose single Object-oriented database structure, the organization of
processor computer often used to control GIS (see data within a database defined by a series of pre¬
workstation). defined objects and their properties and behavioural
Model. 1. A representation of attributes or features of the characteristics.
earth’s surface in a digital database. 2. A set of algorithms ODYSSEY, computer program developed at the Laboratory
written in computer code that describe a given physical for Computer Graphics, Harvard, for overlaying polygon
process or natural phenomenon of the earth’s surface. networks.
3. A function fitted to an experimental variogram derived Operating system (O/S), the control program that co¬
from sample data. 4. A statistical distribution or a con¬ ordinates all the activities of a computer system.
ceptualization of spatial variation. Optimal estimator, an estimator for minimizing the value
Modem (MOdulator-DEModulator), a device for the inter¬ of a given criterion function; in kriging this is the estima¬
conversion of digital and analogue signals to allow data tion variance.
transmission over telephone lines. Ordered sequential files, a hie of data records which are in
Module, a separate and distinct piece of hardware or soft¬ sequence according to some structuring method such as
ware that can be connected with other modules to form a the alphabet.
system. Ordinary kriging, a method for interpolating data values
Morton ordering, a technique for reducing the geograph¬ from sample data using regionalized variable theory in
ical referencing of grid data to one dimension by follow¬ which the prediction weights are derived from a fitted
ing a set ‘Z’ shape directional pattern through the cells. variogram model.
303
Appendix 1. Glossary
Orthophotos, a scale-correct photomap created by geo¬ Probability distribution function, a real-valued function
metrically correcting aerial photographs or satellite (in the range 0,1) whose integral over a set gives the prob¬
images. ability of a random variable having a value within the set.
Output, the results of processing data in a GIS; maps, Program, a set of instructions directing the computer to
tables, screen images, tape files. perform a task.
Overlay. 1. (verb) the process of stacking digital representa¬ Proximity, the closeness of one item to another.
tions of various spatial data on top of each other so that Puck, a hand-held device for entering data from a digitizer
each position in the area covered can be analysed in terms which usually has a window with accurately engraved
of these data; 2. (noun) a data plane containing a related cross-hairs, and several buttons for entering associated
set of geographic data in digital form. data.
Package, a set of computer programs that can be used for a Quadrant, a quarter of a circle measured in units of 90
particular generalized class of applications. degrees.
Paint, to fill in an area with a given symbolism on a raster Quadratic polynomial, one in which the highest degree of
display device (see cross-hatching). terms is 2.
PASCAL, a high-level computer programming language. Quadtree, a data structure for thematic information in a
Peano-Hilton ordering, a technique for reducing the geo¬ raster database that seeks to minimize data storage.
graphical referencing of grid data to one dimension by Range. 1. In arithmetic, the difference between the larg¬
following a recursive route through the cells. est and smallest values in a set. 2. In geostatistics, the dis¬
Pen plotter, a device for drawing maps and figures using a tance at which a transitive variogram ceases to increase
computer steered pen. monotonically.
Performance, the degree to which a device or system fulfils Raster, a regular grid of cells covering an area.
its specifications. Raster data structure, a database containing all mapped,
Peripheral, a hardware device that is not part of the central spatial information in the form of regular grid cells.
computer. Raster display, a device for displaying information in the
Photogrammetry, a series of techniques for measuring form of pixels on a computer screen or VDU.
position and altitude from aerial photographs or images Raster map, a map encoded in the form of a regular array
using a stereoscope or stereoplotter. of cells.
Photomosaic, a collection of aerial photographs which are Rasterization, the process of converting an image of lines
joined to form a contiguous view of an area. and polygons from vector representation to a gridded
Pit, a depression on the surface of a digital elevation model representation.
that may represent a real feature or which may be an Raster-to-vector conversion, see vectorization.
artefact of the gridding. Real data, numbers that have both an integer and a decimal
Pixel, contraction of picture element; smallest unit of component (scalars).
information in a grid cell map or scanner image. Real time, tasks or functions executed so rapidly that the
Plotter, any device for drawing maps and figures. user gets an impression of continuous visual feedback.
Polygon, a multi-sided figure representing an area on a map; Realization, an equi-probable result of stochastic simula¬
a geographic primitive. tion based on a known probability distribution function.
Polygon overlay and intersection, the creation of new Record, a set of attributes relating to a geographical entity;
polygons (entities) by the process of overlaying and a set of related, contiguous data in a computer file.
intersecting the boundaries from two or more vector Redundancy, the inclusion of data in a database that con¬
representations of area entities. tribute little to the information content.
Polynomial, an expression having a finite number of terms Region, a set of loci or points having a certain value of an
of the form ax + bx5 + ... nx. attribute in common.
Precision. 1. Degree of accuracy of numerical representa¬ Regionalized variable, a single-valued function defined over
tion; generally refers to the number of significant digits a metric space (a set of coordinates) that represents the
of information to the right of the decimal point. 2. Statis¬ variation of natural phenomena that are too irregular at
tical; the degree of variation about the mean. the scale of interest to be modelled analytically.
Principal component analysis (PCA), a method of analys¬ Relational database structure, a method of structuring data
ing multivariate data in order to express their variation in the form of sets of records or tuples so that relations
in terms of a minimum number of principal components between different entities and attributes can be used for
or linear combinations of the original, partially correlated data access and transformation.
variables. Relative georeferencing, the referencing in space of the
Prism, 1. A polygonal solid; a polyhedron having parallel, location of a point to a local base station rather than to a
polygonal, and congruent bases and sides that are paral¬ global grid.
lelograms. 2. Sometimes used in GIS to indicate a 3D solid Removable storage, computer data magnetic or optical stor¬
body delineated by irregular polygonal faces. age media which are portable and include examples such
Probability, the chance of an event or occurrence. as floppy disks, DAT tapes, and CD ROMs.
304
Appendix 1. Glossary
Resampling, technique for transforming a raster image from Smoothing spline, a method of fitting a smooth polynomial
one particular scale and projection to another. function through erratic data to capture the long-range
Resolution, the smallest spacing between two displayed or variation and to suppress local components.
processed elements; the smallest size of feature that can Spatial data model, see geographical data model.
be mapped or sampled. Spheroid, a geometric representation of the shape of the
Response time, the time that elapses between sending a com¬ earth.
mand to the computer and the receipt of the results at Soft data, data obtained by inspection, intuition, or from
the workstation. other parties. Not measured directly and therefore often
R-trees, a spatial indexing technique which groups entities judged to be less reliable than hard data.
according to their proximity by using minimum bound¬ Software, general name for computer programs and pro¬
ing rectangles. Hierarchies of rectangles may be estab¬ gramming languages.
lished. When querying the database any search is directed Source code, a computer program that has been written in
to the rectangle and any subsequent lower-level ones an English-language-like computer language. It must be
which contain the item of interest. compiled to yield the object code before it can be run on
Run-length codes, a compact method of storing data in the computer.
raster databases which simplifies the grid on a row-by¬ Spike. 1. An overshoot line created erroneously by a scan¬
row basis by coding the start and end values of contigu¬ ner and its raster-vector software. 2. An anomalous data
ous cells for each class. point that protrudes above or below an interpolated sur¬
Sampling, the technique of obtaining a series of measure¬ face representing the distribution of the value of an attrib¬
ments to obtain a satisfactory representation of the real ute over an area.
world phenomenon being studied. Spline, a polynomial curve or surface used to represent spa¬
Scale, the relation between the size of an object on a map tial variation smoothly.
and its size in the real world. Stationarity, a statistical name for expressing degrees of
Scanner, a device for converting images from maps, photo¬ invariance in the properties of random functions; it
graphs, or from part of the real world into digital form. refers to the statistical model, and not to the data. Most
The scanning head is made up of a light or other energy commonly used to indicate invariance in the mean and
source and a sensing device which records digital values variance, but also in the variance of first differences (see
of light reflected back from the surface. intrinsic hypothesis).
Scenario, a result of a numerical simulation model in which Statistical moments. First order is the mean; second order
certain input data may be given values to represent are the variance, the covariance, and the semivariance.
conditions not yet observed. Scenarios are often used Stereo plotter, a device for extracting information about the
to compare forecasts of how landscape changes may turn elevation of landform from stereoscopic aerial photo¬
out. graphs. The results are sets of X, Y, and Z coordinates.
Semivariogram. 1. Given two locations x and (x + h), a Stochastic imaging, see conditional simulation.
measure of one-half of the mean square differences (the Stochastic simulation, simulation using a probabilistic
semivariance) produced by assigning the value z(x + h) model to generate a range of allowable data values.
to the value z(x), where h (known as the lag) is the inter¬ Storage, the parts of the computer system used for stor¬
sample distance. 2. A graph of semivariance versus lag h. ing data and programs (see archival storage, magnetic
Semivariogram model, one of a series of mathematical media).
functions that are permitted for fitting the points on an Stratified kriging, interpolation by any kriging method
experimental variogram (linear, spherical, exponential, within a set of strata or divisions of the land into differ¬
Gaussian, etc.). ent classes.
Sill, the maximum level of semivariance reached by a tran¬ Structured Query Language (SQL), a standard language for
sitive semivariogram. interrogating and managing relational databases.
Simple lodging, an interpolation technique in which the pre¬ Support, this is the term used in geostatistics for the area or
diction of values is based on a generalized linear regres¬ volume of the sample on which measurements are made
sion under the assumption of second order stationarity (e.g. a volume of soil or water, or a pixel in a remotely
and a known mean. sensed image).
Simulation, using the digital model of the landscape in a SYMAP (SYnagraphic MAPping program), the original
GIS for studying the possible outcome of various pro¬ grid-cell mapping program developed by Howard T.
cesses expressed in the form of mathematical models. Fisher at Harvard.
Sink, see pit. Syntax, a set of rules governing the way statements can be
Sliver, a narrow gap between two lines created errone¬ used in a computer language.
ously by digitizing or by the vectorization software of Tablet, a small digitizer used for interactive work on a
a scanner. graphics workstation.
Smoothing, a set of procedures for removing short-range, Tape drive, a device for reading and writing digital in¬
erratic variation from lines, surfaces, or data series. formation from and to magnetic tapes.
305
Appendix 1. Glossary
Terminal, a device, usually including a computer screen Tuple, a set of values of attributes pertaining to a given item
or VDU and a keyboard for communicating with the in a database. Also known as a record.
computer. Turnkey system, a GIS or CAD/CAM system of hardware
Tessellation, the process of dividing an area into smaller, and software that is designed, supplied, and supported
contiguous tiles with no gaps in between them. by a single manufacturer ready for use for a given class
Text editor, a program for creating and modifying text files. of work.
Thematic map, a map displaying selected kinds of informa¬ Union. 1. Databases: the joining of two or more datasets
tion relating to specific themes, such as soil, land use, together. 2. Boolean logic: the joining of two sets using
population density, suitability for arable crops, and so on. the ‘OR’ operator
Many thematic maps are also choropleth maps, but when Universal kriging, simple kriging of the residuals of a re¬
the attribute is modelled by a continuous field, repres¬ gionalized variable after systematic variation has been
entation by isolines or colour scales is more appropriate. modelled by a drift or trend surface.
Thiessen polygons, a tessellation of the plane such that UNIX, a computer operating system.
any given location is assigned to a tile according to the Upstream element map, a map showing the cumulative
minimum distance between it and a single, previously catchment areas for each cell according to the topology
sampled point. Also known as Dirichlet tessellation or of the local drain direction map.
Voronoi polygons. Utility, a term for system capabilities and features for
Tile, a part of the database in a GIS representing a con¬ processing data.
tiguous part of the earth’s surface. By splitting a study Utility mapping, a special class of GIS applications for
area into tiles, considerable savings in access times and managing information about public infrastructure such
improvements in system performance can be achieved. as water pipes, sewerage, telephone, electricity, and gas
Tiling, the creation of a seamless spatial coverage by join¬ networks.
ing contiguous areas (tiles) together. Variogram, common term for semivariogram.
Topographical map, a map showing the surface features of Vector. 1. Physics: a quantity having both magnitude and
the earth’s surface (contours, roads, rivers, houses, etc.) direction. 2. GIS: the representation of spatial data by
in great accuracy and detail relative to the map scale used. points, lines, and polygons.
Topology, a term used to refer to the continuity of space Vector data structure, a means of coding and storing point,
and spatial properties, such as connectivity, that are un¬ line, and areal information in the form of units of data
affected by continuous distortion. In the representation expressing magnitude, direction, and connectivity.
of vector entities, connectivity is defined explicitly by a Vectorization, the conversion of point, line, and area data
directed pointer bej3&£.en records describing things that from a grid to a vector representation.
are somehow finked in space (for example a junction be¬ Vector-to-raster-conversion, see rasterization.
tween two roads). In regular and irregular tessellations Viewshed, those parts of the landscape that can be seen from
of continuous surfaces (e.g. grids) the topological prop¬ a particular point.
erty of connectivity between different locations may Visual display unit (VDU), a computer screen used for
only be implicitly^defined by the spatial rate of change of graphical display.
attribute values over the grid. The topology (connect¬ Voronoi polygon, see Thiessen polygons.
ivity) of gridded surfaces can be revealed by computing Voxels, three-dimensional, cubic units of space.
first, second, or higher order derivatives of the surface Weighted moving average, value of an attribute computed
(see Chapter 8). for a given point as an average of the values at surrounding
Transect, a set of sampling points arranged along a straight data points taking account of their distance or importance.
line. Window, an area (usually square) that is used to capture
Transfer function. 1. A numerical method of transferring data needed to compute derived attributes from an ori¬
spatial data from one projection to another. 2. A numer¬ ginal map.
ical model for computing new attribute values from exist¬ Windows 95, Windows NT, operating systems used with
ing data using regression models or other algorithms. personal computers and workstations.
Transform, the process of changing the scale, projection, Word, a set of bits (typically 16 or 32) that occupies a single
or orientation of a mapped image. storage location and is treated by the computer as a unit
Transitive variogram, a semivariogram having a range and of information.
a sill. Workstation, a minicomputer or high-level personal com¬
Trend surface analysis, methods for exploring the func¬ puter used for local computations; it is often connected
tional relationship between attributes and the geograph¬ to other computers by a network. The operating system
ical coordinates of the sample points. used for workstations is often Unix or Windows NT.
Triangular Irregular Network (TIN), a vector data struc¬ Zero, the origin of all coordinates defined in an absolute
ture for representing geographical information that is system. Where X, Y, and Z axes intersect.
modelled as a continuous field (usually elevation) which Zoom, a capability for proportionately enlarging or reduc¬
uses tessellated triangles of irregular shape (see Delaunay ing the scale of a figure or maps displayed on a computer
triangulation). screen or VDU.
306
Appendix 2. A Selection of World Wide Web Geography
and GIS Servers
This appendix lists sources of free or cheap software which Global Positioning Systems (GPS)
can be used to carry out many or most of the operations
reviewed in the text, so that readers can create their own gopher://unbmvsi.csd.unb.ca:i57o/
coursework based on the software of their choice. It also gives iEXEC%3aCANSPACE>CANSPACE—Canadian Space
a very select list of World Wide Web sites where data and Geodesy Forum archive (lots of GPS info).
information are readily available. In several cases, the sites http://www.ggrweb.com — GeoWeb—Online resources
given are themselves lists of tens or even hundreds of GIS, for GIS/GPS/RS.
GPS, Remote Sensing, or Digital Cartography sites, so the http://ageninfo.tamu.edu/geoscience.html — GIS/Remote
reader should be able to find a very wide variety of material Sensing/GPS/Geosciences top level page at Texas A&M
very easily indeed. telnet://fedworld.gov — GPS Information Center, U. S.
Coast Guard (login: select database 34).
ftp://unbmvsi.csd.unb.ca/
Sources of software that can be used to
PUB.CANSPACE.GPS.INFO.SOURCES>GPS Information
support the material in this book Sources (maintained by Richard Langley),
finger gps@geomac.se.unb.ca — GPS satellite current
GIS and Geostatistics constellation status.
http://sideshow.jpl.nasa.gov/mbh/series.html — NASA—
Utrecht University: PCRaster
Global GPS Time Series.
http://www.frw.ruu.nl/pcraster.htmi
Source of downloadable software for raster mapping and
dynamic modelling via
ftp://pop.frw.ruu.nl/pcraster/version2/
General lists and sources of information
Gstat (Geostatistics—E. Pebesma). Information, dynamic on GIS, digital data, digital cartography,
demonstrations, software and manuals can be obtained remote sensing and other sites
from http://www.frw.uva.nl/~pebesma/gstat/
307
Appendix 2. GIS Software and Data
308
Appendix 3. Data Sets Used in the Examples
This appendix provides information on some of the data sets Data: Guyana rainforest
used in Chapter 5, 6, 8, and 10, where this is not given in the
text. The soil map data come from the Tropenbos Ecological
Reserve at Mabura and were obtained by conventional soil
survey using aerial photographs and field survey (Jetten
The digital elevation models used for slope 1994). The forest inventory of tree species was carried out
mapping and insolation in Chapter 8 in the Waraputra Compartment of the Demerara Timbers
Limited concession 20 km east of the Tropenbos Reserve
(Jetten 1994, Ter Steeg et al. 1993).
The small DEM has a cell size of 30 X 30 m, an area of
6.04 km2, a minimum elevation of 175 m, and a max¬
imum elevation of 560 m: the large DEM has a cell size The soil pollution data set used in Chapters
of 60 x 60 m, an area of 175.5 km2, a minimum elevation
of 160 m, and a maximum elevation of 677.5 m.
5 and 6
Note: all coordinates and elevations refer to local, not abso¬
The Catsop digital elevation model used in lute positions.
Soil samples were collected as bulked samples within a
Chapter 10 radius of 5 m.
Flood frequency classes and soil types were obtained from
This has a cell size of 10 x 10 m, an area of 41.6 ha, a min¬ Dutch Ministry of Public Works surveys. For more details
imum elevation of 80 m, and a maximum elevation of see Burrough et al. (1996). The following file is in GEOEAS
112 m. Further details are given in de Roo et al. (1989). format.
309
Appendix 3. Example Data Sets
310
Appendix 3. Example Data Sets
311
References
Aalders, H. J. G. L. (1996). Quality metrics for GIS. In Armstrong, M. P. and Hopkins, L. D. (1983). Fractal
M. J. Kraak and M. Molenaar (eds.), Advances in GIS enhancement for thematic display of topologically stored
Research II, Proc. 7th International Symposium on Spatial data. Proc. AUTOCARTO 6, Ottawa, Canada.
Data Handling, 12-16 Aug. 1996, Delft, The Netherlands, Aronoff, S. (1989). Geographic Information Systems: A
pp. 5B1-5B10. Management Perspective. WDL Publications, Ottawa,
Abel, D. J. (1983). Towards a relational database for geo¬ Canada.
graphic information systems. Workshop on databases Ayeni, O. O. (1982). Optimum sampling for Digital Ter¬
in the Natural Sciences. CSIRO Division of Computing rain Models. Photogrammetric Engineering and Remote
Research. 7-9 Sept. 1983. Cunningham Laboratory,
Sensing, 48(11): 1687-94.
Brisbane, Queensland, Australia, 225 pp. mimeo.
Band, L. E. (1986). Topographic partition of watersheds
-and Mark, D. M. (1990). A comparative analysis of with digital elevation models. Water Resources Research,
some two-dimensional orderings. International Journal of
22: 15-24.
Geographical Information Systems, 4: 21-31.
Barrow, J. D. (1992). Pi in the Sky. Oxford University Press,
Acres, B. D., Bower, R. P., Burrough, P. A., Folland,
Oxford, 317 pp.
C. J., Kalsi, M. S., Thomas, P., and Wright, P. S. (1976).
The Soils of Sabah. Land Resources Division, Ministry of Batty, M., and Xie, Y. (1994a). Modelling inside GIS: Part
Overseas Development, London, 5 vols. 1. Model structures, exploratory spatial data analysis,
and aggregation. International Journal of Geographical
Adams, J., Patton, C., Reader, C., and Zamora, D. (1984).
Information Systems, 8: 291-307.
Fast hardware for geometric warping. Proc. 3rd Austral¬
ian Remote Sensing Conference, Queensland. -(1994b). Modelling inside GIS: Part 2. Selecting
and calibrating urban models using ARC-INFO. Inter¬
AGI (1996). AGI Online Glossary of GIS. The Internet.
national Journal of Geographical Information Systems, 8:
(http://www.geo.ed.ac.uk/root/agidict/html/welcome.html)
451-70.
Aldenderfer, M., and Maschner, H. D. G. (1996).
Beckett, P. H. T. (1971). The cost-effectiveness of soil
Anthropology, Space and Geographic Information Systems.
survey. Outlook in Agriculture, 6(5): 191-8.
Oxford University Press, New York, 294 pp.
-and Burrough, P. A. (1971). The relation between
Aldred, B. K. (1972). Point in polygon algorithms. Peterlee
cost and utility in soil survey IV. Journal of Soil Science,
UK, IBM Ltd.
22: 466-80.
Almelio, C. F. (1974). Charge coupled devices. Scientific
-and Webster, R. (1971). Soil variability—a review.
American, 230 (2): 22-31.
Soils and Fertilisers, 34: 1-15.
Alonso, W. (1968). Predicting best with imperfect data.
Beek, K. J. (1978). Land Evaluation for Agricultural Devel¬
/. Am. Inst. Planners (APA Journal), July: 248-55.
opment. International Institute for Land Reclamation and
Altman, D. (1994). Fuzzy set theoretic approaches for Improvement, Pub. 23, Wageningen.
handling imprecision in spatial analysis. International
Beers, B. J. (1995). FRANK: The Design of a New Land-
Journal of Geographical Information Systems, 8: 271-90.
surveying System Using Panoramic Images. Delft Univer¬
American Society of Photogrammetry (1960). Manual of sity Press, Delft.
Aerial Photo Interpretation, Washington, DC.
Berge, H. F. M. ten, Stroosnijder, L., Burrough, P. A.,
American Society of Photogrammetry and Remote Sensing Bregt, A. K., and de Heus, M. J. (1983). Spatial variabil¬
(1980). Manual of Photogrammetry, Washington, DC. ity of physical soil properties influencing the temperature
Arctur, D., and Woodsford, P. (1996). Introduction to of the soil surface. Agricultural Water Management, 6:
Object-Oriented GIS Technology Workshop Outline. 213-26.
Laser-Scan, Cambridge. Berry, J. K. (1993). Beyond Mapping: Concepts, Algorithms,
Armstrong, M., and Matheron, G. (1986a). Disjunctive and Issues in GIS. GIS World Books, GIS World, Inc. Fort
kriging revisited: Part I. Mathematical Geology, 18: 711- Collins, Color. 246 pp.
27. Beven, K., and Kirkby, M. J. (1979). A physically-based,
-(1986b). Disjunctive kriging revisited: Part II. variable contributing area model of basin hydrology.
Mathematical Geology, 18: 729-41. Hydrological Science Bulletin, 24: 1-10.
312
References
-and Moore, I. D. (eds.) (1994). Terrain Analysis and In G. Dutton (ed.), First International Advanced Study
Distributed Modelling in Hydrology. Advances in Hydro- Symposium on Topological Data Structures for Geographic
logical Processes, Wiley, Chichester, 249 pp. Information Systems, vol. v. Laboratory for Computer
-Scholfield, N., and Tagg, A. F. (1984). Graphics and Spatial Analysis, Harvard, Cambridge,
Testing a physically-based flood forecasting model Mass., 17-21 Oct. 1977.
(TOPMODEL) for three UK catchments. Journal of Bouma, J., and Bell, J. P. (eds.) (1983). Spatial Variability.
Hydrology, 69: 119-43. Special Issue, Agricultural Water Management, 6 (2/3).
Beyer, R. I. (1984). A database for a botanical plant collec¬ -and Bregt, A. K. (1989). Land Qualities in Space and
tion. In B. M. Evans (ed.), Computer-aided Landscape Time, Proc. Symposium organized by the International
Design: Principles and Practice. The Landscape Institute, Society of Soil Science (ISSS), Wageningen, The Nether¬
Scotland, pp. 134-41.
lands, 22-6 Aug. 1988. PUDOC, Wageningen, 356 pp.
Bezdek, C. J. (1981). Pattern Recognition with Fuzzy Object¬
Boyle, A. R. (1981). Concerns about the present applica¬
ive Function Algorithms. Plenum Press, New York, 256 pp.
tions of computer-assisted cartography. Cartographica,
-Erhlich, R., and Full, W. (1984). FCM: the fuzzy 18:31-3.
c-means clustering algorithm. Computers and Geosci¬
ences, 10: 191-203. -(1982). The last ten years of automated cartography:
a personal view. In D. Rhind and T. Adams (eds.), Com¬
Bhalla, N. (1991). Object-oriented data models: a perspect¬
puters in Cartography. British Cartographic Society, Lon¬
ive and comparative review. Journal of Information Sci¬
don, pp. 1-3.
ence, 17: 145-60.
Bie,S. W., and Beckett, P. H. T. (1970). The costs of soil Brandenburger, A. J., and Gosh, S. K. (1985). The world’s
survey. Soils and Fertilisers, 33: 203-17. topographic and cadastral mapping operation. Photo-
grammetric Engineering and Remote Sensing, 51: 437-44.
-(1973). Comparison of four independent soil
surveys by air-photo interpretation, Paphos area (Cy¬ Bregt, A. K., and Beemster, J. G. R. (1989). Accuracy in
prus). Photogrammetria, 29: 189-202. predicting moisture deficits and changes in yield from soil
maps. Geoderma, 43: 301-10.
Bierkens, M. F. P. (1994). Complex Confining Layers: A
Stochastic Analysis of Hydraulic Properties at Various Scales. -Denneboom, J., Gesink, H. J., and van Randen, Y.
Royal Dutch Geographical Association/Faculty of Geo¬ (1991). Determination of rasterizing error: a case study
graphical Sciences, Utrecht University, Utrecht. with the soil map of the Netherlands. International Jour¬
-and Burrough, P. A. (1993a). The indicator approach nal of Geographical Information Systems, 5: 361-8.
to categorical soil data. (I) Theory. Journal of Soil Science, Brinkman, R., and Smyth, A. J. (1973). Land Evaluation
44: 361-68. for Rural Purposes. Summary of an Expert Consultation,
-(1993b). The indicator approach to categorical Wageningen, the Netherlands, 6-12 Oct. 1972. Interna¬
soil data. (II) Application to mapping and landuse suit¬ tional Institute for Land Reclamation and Improvement,
ability analysis. Journal of Soil Science, 44: 369-81. Wageningen, Pub. 17.
Bijker,W. E., Hughes, T. P., and Pinch, T. J. (eds.) (1987). Brunt, M. (1967). The methods employed by the Dir¬
The Social Construction of Technological Systems: New ectorate of Overseas Surveys in the Assessment of Land
Directions in the Sociology and History of Technology. Resources. Actes du 2me Symp. Intil. de Photo Interpreta¬
MIT Press, Cambridge, Mass. tion, Paris 1966.
Blakemore, M. (1984). Generalisation and error in spatial
Buiten, H. J., and Clevers, J. G. P. W. (1993). Land Obser¬
databases. Cartographica, 21: 131-9.
vation by Remote Sensing: Theory and Applications.
Boerma, P. N., Henneman, G. R., Kauffman, J. H., and Gordon & Breach, Reading, UK.
Verwey, H. E. (1974). Detailed soil survey of the
Bullock, A.,and Stallybrass, O. (1977). Fontana Diction¬
Marongo area. Preliminary Report No. 3. Training project
ary of Modern Thought. Fontana Books, London.
in Pedology. Agricultural University, Wageningen.
Bolstad, P. V., Gessler, P., and Lillesand, T. M. (1990). Burgess, T. M., and Webster, R. (1980). Optimal inter¬
Positional uncertainty in manually digitized map data. polation and isarithmic mapping of soil properties I.
International Journal of Geographical Information Systems, The semivariogram and punctual kriging. Journal of Soil
4: 399-411. Science, 31:315-31.
Bornand, M., Legros, J. P., and Moinereau, J. (1977). -(1984). Optimal sampling strategies for mapping
Carte Pedologique de France. N. 19: Privas. Versailles, Cen¬ soil types. I. Distribution of boundary spacings. Journal
tre Nationale de Recherche Agronomiques (CNRA). of Soil Science, 35: 641-54.
Bouille, F. (1978). Structuring cartographic data and spa¬ Burrough, P. A. (1969). Studies in soil survey methodo¬
tial processes with the hypergraph-based data structure. logy. Unpublished D.Phil. Thesis, Oxford University.
313
References
Burrough, P. A. (1975). Message sticks used by Murut and sion, EUROGI/Deutscher Dachverband fur Geoin¬
Dusun people in Sabah. Journal of the Malaysian Branch formation/Atlantic Institute/Open GIS Consortium/
of the Royal Asiatic Society, 48(2): 119-23. Federal Data Committee/Federation Internationale
-(1980). The development of a landscape information des Geometres (Commission 3), 4-6 Sep. 1996,
system in the Netherlands, based on a turn-key graphics Konigswinter, Germany.
system. Geoprocessing, 1: 257-74. -and Frank, A. U., (eds.) (1996). Geographical Objects
-(1983a). Multiscale sources of spatial variation in soil with Indeterminate Boundaries. Taylor 8c Francis, London,
I. The application of fractal concepts to nested levels of 345 pp.
soil variation. Journal of Soil Science, 34: 577-97. -and Masser, F. I. (1998). European Geographic
-(1983b). Multiscale sources of spatial variation in soil Information Infrastructures: Opportunities and Pitfalls.
II. A non-Brownian fractal model and its application in Taylor 8c Francis, London.
soil survey. Journal of Soil Science, 34: 599-620. -and Webster, R. (1976). Improving a reconnaissance
-(1984). The application of fractal ideas to geophysical soil classification by multivariate methods. Journal of Soil
phenomena. Bull. Inst. Mathematics and Its Applications, Science, 27: 554-71.
20(3/4): 36-42. -Beckett, P. H. T., and Jarvis, M. G. (1971). The rela¬
-(1985). Fakes, facsimiles and fractals: fractal models tion between cost and utility in soil survey I—III. Journal
of geophysical phenomena. In S. Nash (ed.), Science and of Soil Science, 22: 359-94.
Uncertainty, IBM UK Ltd/Science Reviews, Northwood, -Brown, L., and Morris, E. C. (1977). Variations in
pp. 151-70. vegetation and soil pattern across the Hawkesbury Sand¬
-(1986). Principles of Geographical Information Systems stone plateau from Barren Grounds to Fitzroy Falls, New
for Land Resources Assessment. Oxford University Press, South Wales. Australian J. Ecol. 2: 137-59.
Oxford, 194 pp. -Gillespie, M., Howard, B., and Prister, B. (1996).
-(1989). Fuzzy mathematical methods for soil survey Redistribution of 13/Cs in Ukraine wetlands by flooding
and land evaluation. Journal of Soil Science, 40: 477-92. and runoff. In K. Kovar and H. P. Nachnebel, Applica¬
-(1991a). Sampling designs for quantifying map unit tion of Geographic Information Systems in Hydrology and
composition. Chapter 7 in M. Mausbach and L. Wilding Water Resources Management. IAHS Publications No. 235,
(eds), Spatial Variabilities of Soils and Landforms, Soil IAHS Press, Institute of Hydrology, Wallingford, pp. 269-
Science Society of America Special Publication No. 28, 78.
Madison, Wis. pp. 89-125. -MacMillan, R. A., and van Deursen, W. P. A.
-(1991b). Soil information systems. Chapter lie. 1 in D. (1992). Fuzzy classification methods for determining site
J. Maguire, M. F. Goodchild, and D. W. Rhind (eds.), suitability from soil profile observations and topography.
Geographical Information Systems: Principles and Appli¬ Journal of Soil Science, 43: 193-210.
cations, vol. ii. Longman Scientific and Technical, Flarlow, -van Gaans, P., and Hootsmans, R. J. (1997). Con¬
pp. 153-69. tinuous classification in soil survey: spatial correlation,
-(1992). Development of intelligent geographical in¬ confusion and boundaries. Geoderma, 77: 115-136.
formation systems. International Journal of Geographical Carrara, A., Bitelli, G., and Carla, R. (1997). Compari¬
Information Systems, 6: 1-12. son of techniques for generating digital terrain models
-(1993a). Fractals and Geostatistical Methods in Land¬ from contour lines. International Journal of Geographical
scape Studies. In N. Lam and Lee de Cola (eds.), Fractals Information Science, 11: 451-74.
in Geography, Prentice Hall, Englewood Clifts, NJ, pp. 87- Carter, J. R. (1989). On defining the geographic informa¬
121. tion system. In W. J. Ripple (ed.), Fundamentals of Geo¬
-(1993b). Soil variability: a late 20th century view. Soils graphical Information Systems: A Compendium. ASPRS/
& Fertilizers, 56: 529-62. ACSM, Falls Church, Va., pp. 3-7.
-(1996a). Opportunities and limitations of GIS-based Carver, S. J. (1991). Integrating multi-criteria evaluation
modeling of solute transport at the regional scale. Chap¬ with geographical information systems. International
ter 2 in D. L. Corwin and K. Loague (eds.), Special SSSA Journal of Geographical Information Systems, 5: 321-40.
Publication Application of GIS to the Modeling of Non- -and Brunsdon, C. F. (1994). Vector to raster con¬
Point Source Pollutants in the Vadose Zone. American version error and feature complexity: an empirical study
Society of Agronomy. using simulated data. International Journal of Geogra¬
-(1996b). A European view of Global Spatial Data phical Information Systems, 8: 261-70.
Infrastructure. Proc. Emerging Global Spatial Data Infra¬ Ceruti, A. (1980). A method of drawing slope maps from
structure. ed. C. Chenez. A Conference held under the contour maps by automatic data acquisition and process¬
patronage of Dr Martin Bangemann, European Commis¬ ing. Computers and Geosciences, 6: 289-97.
314
References
Chance, A., Newell, R. G., and Theriault, D. G. (1995). Date, C. J. (1995). An Introduction to Database Systems. 6th
Smallworld GIS: an object-oriented GIS—issues and edn., Addison-Wesley, Reading, Mass.
solutions. Smallworld Technical Paper, paper no. 3. David, B., van den Herrewegen, M., and Salge, F. (1996).
Chrisman, N. R. (1984a). The role of quality information Conceptual models for geometry and quality of geo¬
in the long-term functioning of a geographic information graphic information. In P. A. Burrough and A. U. Frank
system. Cartographica 21: 79-87. (eds.). Geographic Objects with Indeterminate Boundaries.
Taylor & Francis, London.
-(1984b). On storage of coordinates in geographic
information systems. Geo-Processing, 2: 259-70. Davidson, D. A., Theocharopoulos, S. P., and Bloksma,
R. J. (1994). A land evaluation project in Greece using GIS
Christian, C. S., and Stewart, G. A. (1968). Aerial sur¬
and based on Boolean and fuzzy set methodologies. Inter¬
veys and integrated studies. Proc. Toulouse Conference, 21-
national Journal of Geographical Information Systems,
8 Sept. 1964. Natural Resources Research VI, UNESCO,
8: 369-84.
Paris.
Davis, J. C. (1986). Statistics and Data Analysis in Geology,
Cliff,A. D., and Ord, J. K. (1981). Spatial processes: Models
2nd edn. Wiley, New York, 646 pp.
and applications. Pion, London.
Davis,L. S. (1976). A survey of edge-detection techniques.
CMD (1984). Annales meteorologiques Ardeche 1984. Comite
CGIP, 4 (3): 248-70.
Meteorologiques Departmentate, Aubenas, France.
De Barker, H. (1979). Major Soils and Soil Regions in The
Cook, B. G. (1978). The structural and algorithmic basis
Netherlands. Junk, The Hague/PUDOC, Wageningen,
of a geographic database. Harvard Papers on Geographic
203 pp.
Information Systems: First International Advanced Study
Symposium on Topological Data Structures for Geographic De Floriani, L., and Magillo, P. (1994). Visibility algo¬
Information Systems, vol. iv, ed. G. Dutton. Laboratory for rithms on triangulated digital terrain models. Interna¬
Computer Graphics and Spatial Analysis, Graduate tional Journal of Geographical Information Systems, 8:
School of Design, Harvard University. 13-41.
databases. In Proc. Workshop on databases in the Natural fied fuzzy k-means method for predictive classification.
Sciences, CSIRO Division of Computing Research, 7-9 In H. H. Bock (ed.), Classification and Related Methods of
Sept. 1983, Cunningham Laboratory, Brisbane, Queens¬ Data Analysis. Elsevier, Amsterdam, pp. 97-104.
land, 175-86. -and Marsman, B. (1985). Transect sampling for
reliable information on mapping units. In J. Bouma and
Corine (1992). Corine Land Cover Project. Environmental
D. Nielsen (eds.), Spatial Analysis of Soil Data. PUDOC,
Agency, Copenhagen.
Wageningen.
Costa-Cabral, M., and Burges, S. J. (1993). Digital Eleva¬
-and Ter Braak, C. J. F. (1990). Model-free estima¬
tion Model Networks (DEMON): A model of flow over
tion from spatial samples: a reappraisal of Classical Sam¬
hillslopes for computation of contributing and dispersal
pling Theory. Mathematical Geology, 22: 407-15.
areas. Water Resources Research, 30(6): 1681-92.
-Walvoort, D., and van Gaans, P. F. M. (1997). Con¬
Couclelis, H. (1992). People manipulate objects (but
tinuous soil maps—a fuzzy set approach to bridge the gap
cultivate fields): beyond the raster-vector debate in GIS.
between aggregation levels of process and distribution
In A. U. Frank, I. Campari, and Formentini, U. (eds.),
models. Geoderma, 77: 169-96.
Theories and Methods of Spatio-Temporal Reasoning in
Geographic Space. Lecture Notes in Computer Science 639. De Jong, S. M. (1994). Applications of Reflective Remote
Springer Verlag, Berlin, pp. 65-77. Sensing for Land Degradation Studies in a Mediterranean
Environment. Netherlands Geographical Studies 177,
Cowen, D. J. (1988). GIS versus CAD versus DBMS: what
Koninklijk Nederlands Aardrijkskundig Genootschap,
are the differences? Photogrammetric Engineering and
University of Utrecht, 237 pp.
Remote Sensing, 54: 1551-4.
-and Riezebos, H. Th. (1997). SEMMED: a distributed
N. A. C. (1991). Statistics for Spatial Data. Wiley,
Cressie,
approach to soil erosion modelling. In A. Spiteri (ed.),
New York, 900 pp.
Remote Sensing 96. Integrated Applications for Risk Assess¬
-and Hawkins, D. M. (1980). Robust estimation of the ment and Disaster Prevention for the Mediterranean.
variogram. Mathematical Geology, 12: 115-25. Balkema, Rotterdam.
Dahl, O.-J., and Nygaard, K. (1966). SIMULA—an algol- Dell’Orco, P., and Ghiron, M. (1983). Shape repres¬
based simulation language. Comm ACM, 9: 671-78. entation by rectangles preserving fractality. Proc:
Dale, P. F., and McLaughlin, J. D. (1988). Land Informa¬ AUTOCARTO, 6 Oct. 1983, Ottawa.
tion Management. Oxford University Press, Oxford, 266 Department of Environment (DoE) (1987). Handling Geo¬
pp. graphic Information. HMSO, London.
315
References
De Roo, A. P. J., Hazelhoff, L., and Burrough, P. A. -(1996). Encoding and handling geospatial data with
(1989). Soil erosion modelling using ‘ANSWERS’ and hierarchical triangular meshes. In M. J. Kraak and M.
geographical information systems. Earth Surface Processes Molenaar (eds.), Advances in GIS Research II. Proceedings
and Land Forms, 14: 517-32. 7th International Symposium on spatial Data Handling,
-and Heuvelink, G. B. M. (1992). Estimating the 12-16 Aug. 1996, Delft, The Netherlands, pp. 8B15-28.
effects of spatial variability of infiltration on the output Egenhofer, M. J., and Herring, J. R. (eds.) (1995).
of a distributed runoff and soil erosion model using Advances in Spatial Databases. Springer Verlag, Berlin.
Monte Carlo methods. Hydrological Processes, 6: 127-43.
Elwell, H. A., and Stocking, M. A. (1982). Developing
Desmet, P. J. J. (1997). Effects of Interpolation errors on a simple yet practical method of soil-loss estimation.
the analysis of DEMs. Earth Surface Processes and Land- Tropical Agriculture, 59: 43-8.
forms, 22, 563-580.
Englund, E. J. (1990). A variance of geostatisticians. Math¬
- and Govers, G. (1996). Comparison of routing ematical Geology, 22: 417-55.
algorithms for digital elevation models and their impli¬
Estes, J. (1995). What GIS is, where it came from and what
cations for predicting ephemeral gullies. International
it does. Proc. Cambridge Conference for National Mapping
Journal of Geographical Information Systems, 10: 311-31.
Organisations 1995, Workshop Paper 2, 1/18.
De Soto, H. (1993). The missing ingredient, Economist, 11
Sept. -Hajic, E., andTiNNEY, L. R. (1983). Fundamentals of
image analysis: analysis of visible and thermal infrared
Deutsch, C., and Journel, A. G. (1992). GSLIB Geostat-
data. Chapter 24 of R. N. Colwell (ed.), Manual of Remote
istical Handbook. Oxford University Press, New York.
Sensing. American Society of Photogrammetry, Falls
Dilke, O. A. W. (1971). The Roman Land Surveyors. An Church, Va., 2440 pp.
Introduction to the Agrimensores. David and Charles,
EUROSTAT (1996). The spatial dimension of the European
Newton Abbot.
statistical system. EUROSTAT, Luxembourg.
Douglas, D. H., and Peuker, T. K. (1973). Algorithms for
the reduction of the number of points required to repre¬ Evans, I. S. (1977). The selection of class intervals. Trans¬
sent a digitized line or its caricature. Canadian Cartogra¬ actions Institute of British Geographers (ns), 2: 98-124.
pher, 10: 112-22. -(1980). An integrated system of terrain analysis and
Dozier, J. (1980). A clear-sky spectral solar radiation model slope mapping. Zeitschrift fur Geomorphologie Suppl. 36:
for snow-covered mountainous terrain. Water Resources 274-95.
Research, 16: 709-18. Fabos, J. G., and Caswell, S. J. (1977). Composite land¬
-and Frew, J. (1990). Rapid calculation of terrain scape assessment: metropolitan landscape planning model
parameters for radiation modelling from digital eleva¬ METLAND. Res. Bull. 637. Massachusetts Agric. Experi¬
tion data. IEEE Transactions on Geoscience and Remote mental Station, University of Massachusetts at Amherst.
Sensing, 28: 963-9. FAO (1976). A framework for land evaluation. Soils Bull.
Dubayah, R. (1992). Estimating net solar radiation using 32. FAO Rome and Int. Inst. Land Reclam. Improv. Pub.
Landsat Thematic Mapper and digital elevation data. 22.
Water Resources Research, 28: 2469-84. Fedra, K. (1996). Distributed models and embedded
-and Rich, P. M. (1995). Topographic solar radiation GIS: strategies and case studies of integration. In M. F.
models for GIS. International Journal of Geographical Goodchild, L. T. Steyaert, B. O. Parks, C. Johnston, D.
Information Systems, 9: 405-19. Maidment, M. Crane, and S. Glendinning (eds.), GIS and
Dubrule, O. (1983). Two methods with different objectives: Environmental Modeling: Progress and Research Issues. GIS
splines kriging. Mathematical Geology, 15: 245-55. World Books, Fort Collins, Colo., pp. 413-18.
-(1984). Comparing splines & kriging. Computers & Fisher, H. T. (1978). Thematic cartography: what it is and
Geosciences, 10: 327-38. what is different about it. Harvard Papers in Theoretical
Cartography. Laboratory for Computer Graphics and
Duffield, B. S., and Coppock, J. T. (1975). The delinea¬
Spatial Analysis, Harvard, Cambridge, Mass.
tion of recreational landscapes: the role of a computer-
based information system. Trans. Inst. British Geographers, Fisher, P. F. (1991). Modelling soil map-unit inclusions
66: 141-8. by Monte Carlo simulation. International Journal of Geo¬
Dunn, R., Harrison, A. R., and White, J. C. (1990). graphical Information Systems, 5: 193-208.
Positional accuracy and measurement error in digital -(1995). An exploration of probable viewsheds in land¬
databases of land use: an empirical study. International scape planning. Environment and Planning B: Planning
Journal of Geographical Information Systems, 4: 385-97. and Design, 22: 527-46.
Dutton, G. H. (1981). Fractal enhancement of cartographic -(1996). Reconsideration of the viewshed function in
line detail. American Cartographer, 8(1): 23-40. terrain modelling. Geographical Systems, 3: 33-58.
316
References
Fitzgerald, R. W., and Lees, B. G. (1996). Temporal con¬ Gee, D. M., Anderson, A. G., and Baird, L. (1990). Large-
text in floristic classification. Computers and Geosciences, scale floodplain modelling. Earth Surface Processes and
22:981-94. Landforms, 15: 513-23.
Fix, R. A., and Burt, T. P. (1995). Global Positioning Sys¬ Geertman, S. C. M., and Ritsema Van Eck, J. R. (1995).
tem: an effective way to map a small area or catchment. GIS and models of accessibility potential: an applica¬
Earth Surface Processes and Landforms, 20: 817-27. tion in planning. International Journal of Geographical
Information Systems, 9: 67-80.
Fleming, M. D., and Hoffer, R. M. (1979). Machine
Processing of Landsat MSS Data and DMA topographic Goldberg, A., and Robson, D. (1983). Smalltalk-80.
data for forest cover type mapping LARS Technical Report Addison-Wesley, Reading, Mass.
062879, Laboratory for Applications of Remote Sensing, Gomez-Hernandez, J. J., and Journel, A. G. (1992). Joint
Purdue University, West Lapayette, Indiana. sequential simulation of multigaussian fields. In A. Soares
Frank,A. U. (1988). Requirements for a database manage¬ (ed.), Proc. Fourth Geostatistics Congress, Troia, Portugal.
ment system for a GIS. Photogrammetric Engineering and Quantitative Geology and Geostatistics (5), Kluwer Aca¬
Remote Sensing, 54: 1557-64. demic Publishers, Dordrecht, pp. 85-94.
-and Srivastava, R. M. (1990). ISIM3D: an ANSI-C
-and Campari, I. (eds.) (1993). Spatial Information
three-dimensional multiple indicator conditional simu¬
Theory: A Theoretical Basis for GIS. Springer Verlag,
lation program. Computers and Geosciences, 16(4): 395-
Berlin.
440.
-— -and Formentini, U. (eds.) (1992). Theories and
Goodchild, M. F. (1978), Statistical aspects of the polygon
Methods of Spatio-Temporal Reasoning in Geographic
overlay problem. In G. Dutton (ed.), Harvard Papers
Space. Lecture Notes in Computer Science 639. Springer
on Geographic Information Systems vi. Addison-Wesley,
Verlag, Berlin, pp. 66-11.
Reading, Mass.
Franklin, W. R. (1984). Cartographic errors symptomatic
-M. F. (1980). Fractals and the accuracy of geo¬
of underlying algebra problems. Proc. Int. Symposium on
graphical measures. Mathematical Geology, 12: 85-98.
Spatial Data Handling, 20-4 Aug. 1984, Zurich, Switzer¬
-(1989). Modeling error in objects and fields. In M. F.
land, pp. 190-208.
Goodchild and S. Gopal (eds.), Accuracy of Spatial
Frapporti, G., Vriend, S. P., and van Gaans, P. F. M. Databases. Taylor 8c Francis, London, pp. 107-13.
(1993). Hydrogeochemistry of the shallow Dutch ground
-and Gopal, S. (1989). The Accuracy of Spatial
water: interpretation of the national ground water qual¬
Databases. Taylor 8c Francis, London.
ity monitoring network. Water Resources Research, 29:
2993-3004. -Parks, B. O., and Steyaert, L. T. (1993). Environmen¬
tal Modeling with GIS. Oxford University Press, Oxford.
Freeman, G. T. (1991). Calculating catchment area with
-Steyaert, L. T., Parks, B. O., Johnston, C.,
divergent flow based on a regular grid. Computers and
Maidment, D., Crane, M., and Glendinning, S. (eds.)
Geosciences, 17: 413-22.
(1996). GIS and Environmental Modeling: Progress and
Freeman, H. (1974). Computer processing of line-drawing Research Issues. GIS World Books, Fort Collins, Colo., 486
images. Computing Surveys, 6: 54-97. pp.
Frolov, Y. S., and Maling, D. H. (1969). The accuracy of Goode, J. (ed.) (1997). Precision agriculture: spatial and
area measurements by point counting techniques. Car¬ temporal variability of environmental quality. Wiley,
tographic J. 6: 21-35. Chichester (Ciba Foundation Symposium 210).
Gaans, P. F. M. van, Vriend, S. P., van der Wal, J., and Goudie, A. S., Atkinson, B. W., Gregory, K. J., Simmons,
Schuiling, R. D. (1986). Integral rock analysis, a new I. G., Stoddart, D. R., and Sugden, D. (eds.) (1988). The
approach to lithogeochemical exploration. Application: Encyclopaedic Dictionary of Physical Geography. Blackwell
Carboniferous sediments of a coal exploration drilling, Reference, Oxford, 528 pp.
Limburg, the Netherlands. Report to the Commission of Graham, I. (1994). Object Oriented Methods. 2nd edn.,
the European Communities, contract MSM-073-NL(N), Addison-Wesley, Wokingham.
87 pp. Grothenn, D., and Stanfenbiel, W. (1976). Das Stand-
-——- and Frapporti, G. (1992). Oxidation of pyrite arddatenformat zum Austausch Kartografischer Daten.
in the subsoil of Noord-Brabant. Setting, causes, and Nachrichten aus dem Karten- und Vermessungswesen,
effects for ground water quality. H20, 25: 736-45 (in 1(69): 25-49.
Dutch). Gruenberger, F. (1984). Computer recreations. Scientific
Gahegan, M. N., and Roberts, S. A. (1988). An intelligent, American, 250(4): 10-14.
object-oriented geographical information system. Inter¬ Gunnink, J. L., and Burrough, P. A. (1997). Interactive
national Journal of Geographical Information Systems, 2(2): spatial analysis of soil attribute patterns using Explanat¬
101-10. ory Data Analysis (EDA) and GIS. In M. Fischer, H. J.
317
References
Scholten, and D. Unwin (eds.), Spatial Analytical Perspect¬ Holroyd, F., and Bell, S. B. M. (1992). Raster GIS: Models
ives on GIS. Taylor 8c Francis, London, pp. 87-100. of raster encoding. Computers and Geosciences, 18: 419-
Guptill, S. C. (1991). Spatial data exchange and standard¬ 26.
ization. In D. J. Maguire, M. F. Goodchild, and D. W. Hopkins, L. D. (1977). Methods for generating land suit¬
Rhind (eds.), Geographical Information Systems, i: Prin¬ ability maps: a comparative evaluation. American Insti¬
ciples: Longman Scientific and Technical, Harlow, Essex, tute of Planners Journal (Oct): 386-400.
pp. 515-30. Horn, B. K. P. (1981). Hill shading and the reflectance map.
-and Morrison, J. (1995). (eds.). The Elements of Spa¬ Proc. IEEE 69(1): 14-47.
tial Data Quality. Elsevier, Amsterdam.
Hutchinson, M. F. (1989). A new procedure for gridding
Gutowitz, H. (1991). Cellular Automata: Theory and elevation and stream line data with automatic removal of
Experiment. MIT Press, Cambridge, Mass. spurious pits. Journal of Hydrology, 106: 211-32.
Guttman, A. (1984). R-trees: a dynamic index structure -(1995). Interpolating mean rainfall using thin plate
for spatial searching. Proceedings of the 13th Association smoothing splines. International Journal of Geographical
for Computing Machinery SIGMOD Conference. ACM Information Systems, 9: 385-404.
Press, Boston, New York.
Huxhold, W. E. (1991). An Introduction to Urban Geo¬
Haining, R. (1990). Spatial Data Analysis in the Social graphic Information Systems. Oxford University Press,
and Environmental Sciences. Cambridge University Press, New York, 337 pp.
Cambridge.
-and Levinsohn, A. G. (1995). Managing Geographic
Harvey, D. (1969). Explanation in Geography. Edward Information System Projects. Oxford University Press, New
Arnold, London. York, 247 pp.
Haslett, J., Wills, G., and Unwin, A. (1990). SPIDER— IGN (1985). Carte topographique 1 : 25.000. Institute
an interactive statistical tool for the analysis of spatially Geographique National, Paris.
distributed data. International Journal of Geographical
Isaaks, E. H., and Srivastava, R. M. (1989). An Introduc¬
Information Systems, 4: 285-96.
tion to Applied Geostatistics. Oxford University Press, New
Hawkins, D. M., and Merriam, D. F. (1973). Optimal
York, 561 pp.
zonation of digitized sequential data. Math. Geol. 5: 389-
95. Jackson, M. J., and Woodsford, P. A. (1991). GIS data
capture hardware and software. In D. J. Maguire, M. F.
-(1974). Zonation of multivariate sequences of digitized
Goodchild, and D. W. Rhind (eds.), Geographical In¬
geologic data. Math. Geol. 6: 263-9.
formation Systems, i: Principles. Longman Scientific and
Hearnshaw, H. M., and Unwin, D. J. (1994). Visualiza¬ Technical, Harlow, pp. 239-49.
tion in Geographical Information Systems. Wiley, Chich¬
Jacobsen, I., Christerson, M., Jonsson, P., and
ester, 259 pp.
Overgaard, G. (1992). Object-Oriented Software Engi¬
Herring, J. R. (1992). TIGRIS: a data model for an object- neering: A Use Case Driven Approach. Addison-Wesley,
oriented geographic information system. Computers and Wokingham, 524 pp.
Geosciences, 18(4): 443-8.
G. F. (1981). Lines, computers, and human frailties.
Jenks,
Heuvelink, G. B. M. (1993). Error Propagation in Quanti¬
Annals AAG 71(1): 1-10.
tative Spatial Modelling. KNAG, University of Utrecht
Pub. 163, 160 pp. Jenson, S. J., and Domingue, J. O. (1988). Extracting
topographic structure from digital elevation data for geo¬
-and Burrough, P. A. (1993). Error propagation in
graphic information system analysis. Photo grammetric
cartographic modelling using Boolean logic and continu¬
Engineering and Remote Sensing, 54: 1593-600.
ous classification. International Journal of Geographical
Information Systems, 7: 231-46. Jetten, V. (1994). Modelling the effects of logging on the
-and Stein, A. (1989). Propagation of errors water balance of a tropical rain forest: a study in Guyana.
in spatial modelling with GIS. International Journal Tropenbos Series 6, University of Utrecht, 183 pp.
of Geographical Information Systems, 3: 303-22. Johnston, R. J., Gregory, D., and Smith, D. M. (eds.)
Hills, G. A. (1961). The ecological basis for land use plan¬ (1988). The Dictionary of Human Geography. Blackwell
ning. Res. Rep. 46, Ontario Department of Lands and Reference, Oxford.
Forests, Canada. Jones, K. H. (1997). A comparison of algorithms used to
Hodgkiss, A. G. (1981). Understanding maps. Dawson, compute hill slopes and aspects as a property of the DEM.
Folkstone. (pers. comm.).
Hodgson, M. E. (1995). What cell size does the computed Jongman, R. H. G., ter Braak, C. J. F., and van Tongeren,
slope/aspect angle represent? Photogrammetric Engineer¬ O. F. R. (1995). Data Analysis in Community and Land¬
ing and Remote Sensing, 61:513-17. scape Ecology. Cambridge University Press, Cambridge.
318
References
Journel, A. G. (1996). Modelling uncertainty and spatial Laflen, J. M., and Colvin, T. S. (1981). Effect of crop resi¬
dependence: stochastic imaging. International Journal of due on soil loss from continuous row cropping. Transac¬
Geographical Information Systems, 10: 517-22. tions of the American Society of Agricultural Engineers, 24:
-and Huijbregts, C. J. (1978). Mining Geostatistics. 605-9.
Academic Press, London. Lagacherie, P., Andrieux, P., and Bouzigues, R. (1996).
-and Posa, D. (1990). Characteristic behavior and Fuzziness and uncertainty of soil boundaries: from real¬
order relations for indicator variograms. Mathematical ity to coding in GIS. In P. A. Burrough and A. U. Frank
Geology, 22:1011-25. (eds.), Geographical Objects with Indeterminate Bound¬
Kandel, A. (1986). Fuzzy mathematical techniques with aries. Taylor & Francis, London, pp. 275-86.
applications. Addison-Wesley, Reading, Mass. Lam,N. (1983). Spatial interpolation methods: a review.
Karaoglou, A., Desmet, G., Kelly, G. N., and Menzel, American Cartographer, 10: 129-49.
H. G. (eds.) (1996). The radiological consequences of the -and da Cola, L. (1993). Fractals in Geography, Prentice
Chernobyl Accident. Proceedings of the First Interna¬ Hall, Englewood Cliffs, NJ.
tional Congress, Minsk, Belarus, EUR 16544 EN, Brussels- Lambeck, K. (1988). Geophysical Geodesy. Oxford Science
Luxembourg. Publications, Oxford, 718 pp.
Kauffman, A. (1975). Introduction to the Theory of Fuzzy Langas, S. (1997). Transboundary European GIS databases:
Subsets. Academic Press, New York. a review of the Baltic Sea region experiences. In P. A.
Kelly, R. E., McConnell, P. R. EL, and Mildenberger, Burrough and F. I. Masser (eds.), European Geographic
S. J. (1977). The Gestalt photomapping system. Photo - Information Infrastructures: Opportunities and Pitfalls.
grammetric Engineering & Remote Sensing, 43: 1407-17. Taylor & Francis, London (in press).
Kennedy,M. (1996). The Global Positioning System and GIS. Langran, G. (1992). Time in Geographic Information Sys¬
Ann Arbor Press, Inc., Ann Arbor, 268 pp. tems. Taylor & Francis, London.
Kidner, D. B., and Jones, C. J. (1994). A deductive object- Lanter, D. P., and Veregin, H. (1992). A research para¬
oriented GIS for handling multiple representations. In digm for propagating error in layer-based GIS. Photo-
T. C. Waugh, and R. G. Flealey (eds.), Advances in GIS grammetric Engineering & Remote Sensing, 58: 825-33.
Research. Taylor & Francis, London, ii. 882-900.
Lapedes, D. N. (ed.) (1976). The McGraw-Hill Dictionary
Kim, W., and Lochovsky, F. FT (eds.) (1989). Object- of Scientific and Technical Terms. McGraw-Hill Book
Oriented Concepts, Databases and Applications. Addison- Company, New York, 1634 pp. + appendices.
Wesley, Reading, Mass.
Lapidus, D. F. (1987). The Facts on File Dictionary of
Kineman, J. J. (1993). What is a scientific database. In M. F.
Geology and Geophysics. Facts on File Publications, New
Goodchild, L. T. Steyaert, B. O. Parks, (eds.), Environ¬
York, 347 pp.
mental Modeling and GIS. Oxford University Press, New
York, pp. 372-8. Laslett, G. M., and McBratney, A. B. (1990a). Estima¬
tion and implications of instrumental drift, random meas¬
Kleiner, A., and Brassel, K. E. (1986). Hierarchical grid
urement error and nugget variance of soil attributes— a
structures for static geographic data bases. In M.
case study for soil pH. Journal of Soil Science, 41: 451-
Blakemore (ed.), Proc. Autocarto London. Imperial Col¬
72.
lege, London, Sept.
-(1990b). Further comparison of several spatial
Klir,G. J., and Folger, T. A. (1988). Fuzzy Sets, Uncertainty
prediction methods for soil pH. Soil Sci. Am. J. 54: 1553—
and Information. Prentice Hall, Englewood Cliffs, NJ.
8.
Kollias, V. J., and Voliotis, A. (1991). Fuzzy reasoning
in the development of geographical information sys¬ -Pahl, P., and Hutchinson, M. (1987). Com¬
tems. FRSIS: a prototype soil information system with parison of several spatial prediction methods for soil pH.
fuzzy retrieval capabilities. International Journal of Geo¬ Journal of Soil Sciences, 38: 325-41.
graphical Information Systems, 5: 209-24. Laurini, R., and Pariente, D. (1996). Towards a field-
Kosko, B. (1994). Fuzzy Thinking: The New Science of Fuzzy oriented language: first specifications. In P. A. Burrough
Logic. Harper Collins, London. and A. U. Frank (eds.), Geographical Objects with Inde¬
terminate Boundaries. Taylor & Francis, London, 345 pp.
Kumar, L., Skidmore, A. K., and Knowles, E. (1997).
Modelling topographic variation in solar radiation in a -and Thompson, D. (1992). Fundamentals of Spatial
GIS environment. International Journal of Geographical Information Systems. Academic Press, London.
Information Science, 11: 475-98. Lawrence, V. V. (1998). European Geoinformation Data
Kuniansky, E. L., and Lowther, R. A. (1993). Finite ele¬ Publishing: understanding the commercial volume indus¬
ment mesh generation from mappable features. Inter¬ try. In P. A. Burrough and F. I. Masser, European Geo¬
national Journal of Geographical Information Systems, 7: graphic Information Infrastructures: Opportunities and
395-405. Pitfalls. Taylor and Francis, London.
319
References
Leavesley, G. H., Restropo, P. J., Stannard, L. G., Lytle, D. J., Bliss, N. B., and Waltman, S. W. (1996).
Frankoski, L. A., and Sautins, A. M. (1996). MMS: A Interpreting the State Soil Geographic Database
Modeling Framework for Multidisciplinary Research and (STATSGO). In: M. F. Goodchild, L. T. Steyaert, B. O.
Operational applications. In M. F. Goodchild, L. T. Parks, C. Johnston, D. Maidment, M. Crane, and S.
Steyaert, B. O. Parks, C. Johnston, D. Maidment, M. Glendinning (eds.), GIS and Environmental Modeling:
Crane, and S. Glendinning (eds.), GIS and Environ¬ Progress and Research Issues. GIS World Books, Fort
mental Modeling: Progress and Research Issues. GIS World Collins, Colo., pp. 49-52.
Books, Fort Collins, Colo., pp. 155-8. McAlpine, J. R., and Cook, B. G. (1971). Data reliability
Lee,J. (199 In). Comparison of existing methods for build¬ from Map Overlay. Proc. Australian and New Zealand
ing triangular irregular network models of terrain from Association for the Advancement of Science, 43rd Con¬
grid digital elevation models. International Journal of Geo¬ gress, Brisbane, May 1971, Section 21—Geographical
graphical Information Systems, 5: 267-85. Sciences.
-{199 lb). Analysis of visibility sites on topographic sur¬ McBratney, A. B. and de Gruijter, J. J. (1992). A con¬
faces. International Journal of Geographical Information tinuum approach to soil classification by modified fuzzy
Systems, 5: 413-29. k-means with extragrades. Journal of Soil Science, 43:159—
76.
Leenaers, H., Burrough, P. A., and Okx, J. P. (1989a).
Efficient mapping of heavy metal pollution on floodplains -and Webster, R. (1981). The design of optimal sam¬
by co-kriging from elevation data. In J. Raper (ed.), Three pling schemes for local estimation 8c mapping of region¬
Dimensional Applications in Geographic Information Sys¬ alized variables: 2 Program 8c examples. Computers &
tems. Taylor 8c Francis, London, pp. 37-50. Geosciences, 7: 335-65.
-(1983). Optimal interpolation 8c isarithmic map¬
-Okx, J. P., and Burrough, P. A. (1989b). Co-kriging:
ping of soil properties: V. Co-regionalisation 8c multiple
an accurate 8c inexpensive means of mapping floodplain
sampling strategy. Journal of Soil Science, 34: 137-62.
soil pollution by using elevation data. In M. Armstrong
(ed.), Geo statistics, Proceedings of the third Geostatistics -de Gruijter, J. J., andBRUs, D. J. (1992). Spacialpre¬
Congress, Avignon, Oct. 1988, Kluwer, pp. 371-82. diction and mapping of continuous soil classes. Geoderma,
54: 39-64.
-(1990). Comparison of spatial prediction
methods for mapping floodplain soil pollution. Catena, McCormack, J. E., Gahegan, M. N., Roberts, S. A., Hogg,
17: 535-50. J., and Hoyle, B. S. (1993). Feature-based derivation of
drainage networks. International Journal of Geographical
Lees,B. G. (1996a). Sampling strategies for machine learn¬
Information Systems, 7: 263-79.
ing using GIS. In M. F. Goodchild et al., GIS and Envi¬
ronmental Modeling: Progress and Research Issues. GIS McDonald, M. G., and Harbaugh, A. W. (1988). Tech¬
World Books, Fort Collins, Colo., pp. 39-42. niques of water resources investigations of the United
States Geological Survey modular three-dimensional
-(1996b). Neural network applications in the geo¬ finite-difference groundwater flow model model. Refer¬
sciences: an introduction. Computers and Geosciences, 22: ence Manual Visual Modflow. USGS, Washington, DC.
955-7.
McDonnell. R. A. (1996). Including the spatial dimension:
Legros, J-P, Kolbl, O., and Falipou, P. (1996). a review of the use of GIS in hydrology. Progress in Phys¬
Delimitation d’unites de paysage sur des photographies ical Geography, 21: 159-77.
aeriennes: elements de reflexion pour la definition d’une
-and Kemp, K. K. (1995). The International GIS Dic¬
methode de trace. Etude et gestion des sols, 3(2): 113-24.
tionary. Geoinformation International, Cambridge.
Levialdi,S. (1980). Finding the edge. In J. C. Simon and
MacDougal, E. B. (1975). The accuracy of map overlays.
R. M. Haralick (eds.), Digital Image Processing. Reidel,
Landscape Planning, 2: 23-30.
Dordrecht, pp. 105-48.
McHarg, I. L. (1969). Design with Nature. Doubleday/
Lillesand, T. M., and Kiefer, R. W. (1987). Remote Sens¬ Natural History Press, New York.
ing and Image Interpretation. Wiley, New York.
Machover, C. (1989). The C4 Handbook: CAD, CAM, CAE,
Lodwick, W. A., Monson, W., and Svoboda, L. (1990). CIM. Tab Books Inc. Blue Ridge Summit, Pa., 438 pp.
Attribute error and sensitivity analysis of map opera¬
McLeish, J. (1992). Number. From ancient civilisations to the
tions in geographical information systems. International
computer. Flamingo, London, 266 pp.
Journal of Geographical Information Systems, 4: 413-27.
MacMillan, R. A., Furley, P. A., and Healey, R. G. (1993).
Lorie, R. A., and Meier, A. (1984). Using a relational DBMS
Using hydrological models and geographic information
for geographical databases. GeoProcessing, 2: 243-57.
systems to assist with the management of surface water
Luder, P. (1980). Das okologische Ausgleichspotential der in agricultural landscapes. In R. Haines-Young, D. R.
Landschaft. Basler Beitrage zur Physiogeografie. Geo- Green, and S. H. Cousins (eds.), Landscape Ecology and
grafisches Institut der Universitat Basel. GIS, Taylor 8c Francis, London, pp. 181-209.
320
References
McRae, S. G., and Burnham, C. P. (1981). Land evalua¬ Marsman, B., and de Gruijter, J. J. (1984). Dutch soil
tion. Oxford University Press, Oxford. survey goes into quality control, in P. A. Burrough and
Maguire, D. J., Goodchild, M. F., and Rhind, D. (eds.) S. W. Bie (eds.), Soil Information Systems Technology.
(1991). Geographical Information Systems: Principles and PUDOC, Wageningen.
Applications. Longman Scientific and Technical, Harlow, -(1985). Quality of soil maps; a comparison of
2 vols. survey methods in a sandy area. Soil Survey Papers no. 15,
Maidment, D. R. (1993). Developing a spatially distributed Netherlands Soil Survey Institute, Wageningen.
unit hydrograph by using GIS. In K. Kovar and H. P. Martin, D. (1996). An assessment of surface and zonal
Nachnebel (eds.), Application of Geographic Information models of population. International Journal of Geograph¬
Systems in Hydrology and Water Resources Management ical Information Systems, 10: 973-89.
HydroGIS 1993. IAHS Publication No. 211. 181-92.
Martin, J. J. (1982). Organization of geographical data with
-(1996a). GIS/Hydrologic Models of non-point source quadtrees and least squares approximation. Proc. Sympo¬
pollutants in the vadose zone. In D. L. Corwin and sium on Pattern Recognition and Image Processing (PRIP).
K. Loague (eds.), Application of GIS to the Modeling of Fas Vegas, Nevada. IEEE Computer Society, pp. 458-65.
Non-Point Source Pollutants in the Vadose Zone. Special
Masser, I., and Blakemore, M. (eds.) (1991). Handling
SSSA Publication, Soil Science Society of America Inc.,
Geographical Information: Methodology and Potential
Madison, Wis., pp. 163-74.
Applications. Longman Scientific and Technical, Harlow,
-(1996). Environmental modeling with GIS. In M. F.
317 pp.
Goodchild, L. T. Steyaert, B. O. Parks, C. Johnston, D.
Maidment, M. Crane, and S. Glendinning (eds.), GIS Mausbach, M., and Wilding, L. (1991). Spatial Vari¬
and Environmental Modeling: Progress and Research Issues. abilities of Soils and Landforms. Soil Science Society of
GIS World Books, Fort Collins, Colo., 315-24. American Special Publication No. 28, Madison, Wis.
Makarovic, B. (1973). Progressive sampling for digital Mead, D. A. (1982). Assessing data quality in geographic
terrain models. ITC Journal, 1973(3): 397-416. information systems. In C. J. Johannsen and J. L. Sanders
-(1975), Amended strategy for progressive sampling. (eds.), Remote Sensing for Resource Management. Soil
ITC Journal 1975(1): 117-28. Conservation Society of America, Ankeny, Iowa, pp. 51-
62.
-(1976). A digital terrain model system. ITC Journal
1976(1): 57-83. Meijerink, AMJ., de Brouwer, H. A. M., Mannaerts,
C. M., and Valenzuela, C. R. (1994). Introduction to the
-(1977). Composite sampling for digital terrain mod¬
use of Geographic Information Systems for Practical Hydro¬
els. ITC Journal 1977(3): 406-33.
logy. UNESCO International Hydrological Programme
Maling, D. H. (1973). Coordinate Systems and Map Projec¬
IHP-IV M2.3/International Institute for Aerospace Sur¬
tions. George Phillip, London.
vey and Earth Sciences (ITC) Publication No. 23, 243 pp.
-(1992). Coordinate Systems and Map Projections.
Meyer, F. D. (1981). How rain intensity affects interrill
Pergamon, Oxford.
erosion. Transactions of the American Society of Agricul¬
Mandelbrot, B. B. (1982). The Fractal Geometry of Nature. tural Engineers, 24: 1472-5.
Freeman, New York.
Miesch, A. T. (1975). Variograms and variance components
Marble, D. F., Calkins, H., Peuquet, D. J., Brassel, K., in geochemistry and ore evaluation. Geological Society of
and Wasilenko, M. (1981). Computer Software for Spa¬ America Memoir, 142: 333-40.
tial Data Handling (3 vols.). Prepared by the International
Geographical Union, Commission on Geographical Data Middelkoop, H. (1997). Embanked Floodplains in the Neth¬
Sensing and Processing for the US Department of the erlands: Geomorphological Evolution over Various Time
Interior, Geological Survey. IGU Ottawa, Canada, 1043 Scales. Royal Netherlands Geographical Association/
pp. Faculty of Geographical Sciences, Utrecht University,
Utrecht.
Mark, D. M., and Aronson, P. B. (1984). Scale-dependent
fractal dimensions of topographic surfaces: an empirical Miller, E. E. (1980). Similitude and scaling of soil-water
investigation with application in geomorphology and phenomena. In D. Hillel (ed.), Applications of Soil Phys¬
computer mapping. Math. Geol. 16: 671-83. ics, Academic Press, New York.
-and Lauzon, J. P. (1984). Finear quadtrees for Geo¬ Milne, P., Milton, S., and Smith , J. L. (1993). Geograph¬
graphic Information Systems. Proc. IGU Symposium on ical object-oriented databases: a case study. International
Spatial Data Handling, 20-4 Aug. 1984, Zurich, pp. 412— Journal of Geographical Information Systems, 7: 39-55.
31. Mitasova, H., and Hofierka, J. (1993). Interpolation by
Marks, D., Dozier, J., and Frew, J. (1984). Automated regularized spline with tension: application to terrain
basin delineation from digital elevation data. modeling and surface geometry analysis. Mathematical
GeoProcessing, 2: 299-311. Geology, 25: 657-69.
321
References
Mitasova, H., Mitas, L., Brown, W. M., Gerdes, D. P., Norbeck, S., and Rystedt, B. (1972). Computer Cartogra¬
Kosinovsky, I., and Baker, T. (1995). Modelling spatially phy. Studentlitteratur Lund, University of Lund, 315 pp.
and temporally distributed phenomena: new methods and Nortcliff,S. (1978). Soil variability and reconnaissance soil
tools for GRASS GIS. International Journal of Geogra¬ mapping: a statistical study. Norfolk. J. Soil Science, 29:
phical Information Systems, 9: 433-46. 403-18.
-Hofierka, J., Zlocha, M., and Iverson, L. R. (1996). Odeh, I. O. A., McBratney, A. B., and Chittleborough,
Modelling topographic potential for erosion and deposi¬ D. J. (1990). Design of optimal sample spacings for map¬
tion using GIS. International Journal of Geographical ping soil using fuzzy-k-means and regionalized variable
Information Systems, 10: 629-42. theory. Geoderma, 47: 93-122.
Monmonnier, M. (1993). How to lie with maps. University Okx, J. P., and Kuipers, B. R. (1991). 3-Dimensional prob¬
of Chicago Press, Chicago, 176 pp. ability mapping for soil remediation purposes. In J.-J.
Moore, I. D. (1996). Hydrological Modeling and GIS. In Harts, H. L. L. Ottens, and H. J. Scholten (eds.), Proc.
M. F. Goodchild, L. T. Steyaert, B. O. Parks, C. Johnston, EGIS’91. EGIS Loundation, Utrecht, 765-74.
D. Maidment, M. Crane, and S. Glendinning (eds.), GIS Olea, R. A. (1991). Geostatistical Glossary and Multilingual
and Environmental Modeling: Progress and Research Issues. Dictionary. International Association for Mathematical
GIS World Books, Fort Collins, Colo., 143-8. Geology Studies in Mathematical Geology No. 3, Oxford
-Grayson, R. B., and Ladson, A. R. (1991). Digital University Press, New York, 177 pp.
terrain modeling: a review of hydrological, geomorpho- Openshaw, S. (1977). A geographical solution to scale
logical and biological applications. Hydrological Processes, and aggregation problems in region-guilding, partition¬
5: 3-30. ing, and spatial modelling. Institute of British Geographers
-Turner, A. K., Wilson, J. P., Jenson, S. and Band, Transactions, 2 (ns): 459-72.
L. (1993). GIS and land-surface-subsurface process mod¬ -and Taylor, P. J. (1979). A million or so correlation
elling, in M. F. Goodchild, B.O. Parks, and L. T. Steyaert coefficients: Three experiments on the modifiable area
(eds.), Environmental Modeling with GIS. Oxford Univer¬ problem. In N. Wrigley (ed.). Statistical Methods in the
sity Press, New York, pp. 196-230. Spatial Sciences, Pion, London, pp. 127-44.
Morgan, R. P. C., Morgan, D. D. V., and Finney, H. J. Oswald, H., and Raetzsch, H. (1984). A system for gen¬
(1984). A predictive model for the assessment of soil erations and display of digital elevation models. Geo-
erosion risk. Journal of Agricultural Engineering Research, Processing, 2: 197-218.
30: 245-53. Ozemoy, V. M., Smith, D. R., and Sicherman, A. (1981).
Morris, G. D., and Heerdegen, R. G. (1988). Automatic¬ Evaluating computerized geographic information systems
ally derived catchment boundaries and channel networks using decision analysis. Interfaces, 11: 92-8.
and their hydrological applications. Geomorphology, 1: Pannatier, Y. (1996). Variowin: Software for Spatial Data
131-41. Analysis in 2D. Statistics and Computing, Springer Verlag,
Morrison, J. L. (1971). Method-produced error in Berlin, 91 pp.
isarithmic mapping. Technical Monograph No. CA-5, Parker, H. D. (1988). The unique qualities of a geographic
American Congress on Surveying and Mapping, Wash¬ information system: a commentary. Photogrammetric
ington, DC. Engineering and Remote Sensing, 54: 1547-9.
Mounsey, H. M. (1991). Multisource, multinational envi¬ Parratt, L. G. (1961), Probability and Experimental Errors.
ronmental GIS: lessons learnt from CORINE. In D. J. Wiley, New York, 255 pp.
Maguire, M. F. Goodchild, and D. W. Rhind (eds.), Geo¬ Pavlidis, T. (1982). Algorithms for Graphics and Image
graphical Information Systems, ii: Applications, Longman Processing. Springer Verlag, Berlin.
Scientific and Technical, Harlow, pp. 185-200.
Pebesma, E. (1996), Mapping Groundwater Quality in the
Nagy, G., and Wagle, S. G. (1979). Approximation of Netherlands. Royal Dutch Geographical Association/
polygonal maps by cellular maps. Comm. Ass. Comput. Laculty of Geographical Sciences, Utrecht University.
Mach. 22: 518-25.
-and Wesseling, C. G. (1998), GSTAT, a program for
National Research Council (1994). Promoting the National geostatistical modelling, prediction and simulation. Com¬
Spatial Data Infrastructure through Partnerships. National puters and Geosciences, 24: 17-31.
Academy Press, Washington, DC.
Perkal, J. (1966). On the length of empirical curves. Dis¬
Nichols, R. L., Looney, B. B., and Huddleston, J. E. cussion paper 10, Ann Arbor Michigan Inter-University
(1992), 3-D digital imaging. Environmental Science and Community of Mathematical Geographers.
Technology, 26: 642-9.
Peuker, T. K. (now Poiker) and Chrisman, N. 1975. Carto¬
Nielsen, D. R., and Bouma, J. (1985). Spatial Analysis of graphic Data Structures. The American Cartographer,
Soil Data. PUDOC, Wageningen. 2(1), 55-69.
322
References
-Fowler, R. J., Little, J. J., and Mark, D. M. (1978). Reumann, K., and Witkam, A. P. M. (1974). Optimizing
The triangulated irregular network. In Proc. of the DTM curve segmentation in computer graphics. International
Symposium, American Society of Photogrammetry— Computing Symposium 1973. North-Holland, Amster¬
American Congress on Survey and Mapping, St Louis, dam, pp. 467-72.
Missouri, pp. 24-31. Rhind, D. (1977). Computer-aided cartography. Trans. Inst.
Peuquet, D. J. (1977). Raster data handling in geographic British Geographers, ns 2: 71-96.
information systems. In G. Dutton (ed.), Proc. 1st Int. -and Green, N. P. A. (1988). Design of a geographical
Advanced Study Symposium on Topological Data Struc¬ information system for a heterogeneous scientific com¬
tures. Laboratory for Computer Graphics and Spatial munity. International Journal of Geographical Information
Analysis, Harvard. Systems, 2: 171-89.
-(1979). Raster processing: an alternative approach to Ripley, B. D. (1981). Spatial Statistics, Wiley, New York.
automated cartographic data handling. Am. Cartogr. 6:
Ritsema Van Eck, J. R. (1993). Analyse van transport
129-39.
netwerken in GIS vor sociaal-geografisch onderzoek. Nether¬
-(1984a). A conceptual framework and comparison of lands Geographical Studies 164, Koninklijk Nederlands
spatial data models. Cartographica, 21: 66. Aardrijkskundige Genootschap, University of Utrecht,
195 pp.
-(1984b). Data structures for a knowledge-based geo¬
graphic information system. In Proceedings of the Inter¬ Ritter, P. (1987). A vector-based slope and aspect genera¬
national Symposium on Spatial Data Handling. 20-4 Aug. tion algorithm. Photogrammetric Engineering and Remote
1984, Zurich, pp. 372-91. Sensing, 53: 1109-11.
-and Marble, D. F., (eds.) (1990). Introductory Read¬ RIVM (1994). The Preparation of a European Land Use
ings in Geographic Information Systems. Taylor 8c Francis, Database. RIVM Report no. 712401001, Rijksinstituut
London, 371 pp. voor Volksgezondheid en Milieuhygiene, Bilthoven, the
Netherlands.
Piwowar, J. M., LeDrew, E. F., and Dudycha, D. J. (1990).
Rivoirard, J. (1994). Introduction to Disjunctive Kriging and
Integration of spatial data in vector and raster formats in
Non-linear Geostatistics. Clarendon Press, Oxford.
geographical information systems. International lournal
of Geographical Information Systems, 4: 429-44. Robinove, C. J. (1986). Principles of Logic and the Use of
Digital Geographic Information Systems. US Geological
Poggio, T. (1984). Vision by man and machine. Scientific Survey Circular 977, US Geological Survey, Denver, Colo.,
American, 250(4): 68-78. 19 pp.
Poiker, T. K. (formerly Peuker) (1982). Looking at com¬ Robinson, A. H., Sale, R., and Morrison, J. (1978). Ele¬
puter cartography. Geojournal, 6: 241-9. ments of Cartography (4th edn.). Wiley, New York.
Powell, B., McBratney, A. B., and MacLeod, D. A. Rose, E. (1975). Erosion et ruissellment en Afrique de
(1991). The application of fuzzy classification to soil pH Fouest: vingt annees de mesures en petites parcelles
profiles in the Lockyer Valley, Queensland, Australia. experimentales. Cyclo. ORSTOM. Abidjan, Ivory Coast.
Catena, 18: 409-20. Rosenfeld, A. (1980). Tree structures for region represen¬
Quinn, P., Beven, K., Chevalier, P., and Planchon, O. tation. In H. Freeman and G. G. Pieroni (eds.), Map Data
(1991). The prediction of hillslope flow paths for distrib¬ Processing. Academic Press, New York, pp. 137-50.
uted hydrological modelling using digital terrain models. -and Kak, A. (1976). Digital Picture Processing. Aca¬
Hydrological Processes, 5: 59-79. demic Press, New York.
Quiroga, C. A., Singh, V. P., and Iyengar, S. S. (1996). Rossiter, D. G. (1996). A theoretical framework for land
Spatial data characteristics. In V. P. Singh, and M. evaluation. Geoderma, 72: 165-90.
Fiorentino (eds.), Geographical Information Systems in Roubens, M. (1982). Fuzzy clustering algorithms and their
Hydrology Kluwer Academic, Dordrecht, pp. 65-89. cluster validity. Eur. J. Opt. Res., 10: 294-301.
Raper, J., and Livingstone, D. (1995). Development of Salge,F. (1997). From an understanding of the European
a geomorphological spatial model using object-oriented GI Economic activity to the reality of a European datasef.
design. International Journal of Geographical Information In Burrough and Masser 1997.
Systems, 9: 359-84.
Salome, A. I., van Dorsser, H. J., and Rieff, P. L. (1982).
-Rhind, D., and Shepherd, J. (1992). Postcodes: The A comparison of geomorphological mapping systems.
New Geography. Longman Scientific and Technical, ITC Journal, 1982(3): 272-4.
Harlow, 322 pp.
Salton, G., and McGill, M. J. (1983). Introduction to Mod¬
Rengess,N. (ed.) (1994). Engineering Geology of Quaternary ern Information Retrieval. McGraw-Hill, New York, 448
Sediments, A. A. Balkema, Rotterdam. pp.
323
References
Samet, H. (1990a). Applications of Spatial Data Structures: Steiner, D., and Matt, O. F. (1972). Computer Program
Computer Graphics, Image Processing and Geographical for the Production of Shaded Choropleth and Isarithmic
Information Systems. Addison-Wesley, Reading, Mass. Maps on a Line Printer. User’s manual. Waterloo, Ont.
-(1990b). The Design and Analysis of Spatial Data Struc¬ Steinitz, C., and Brown, H. J. (1981). A computer model¬
tures. Addison-Wesley, Reading, Mass. ling approach to managing urban expansion. Geo¬
processing, 1: 341-75.
Scheidegger, A. E. (1970). Stochastic models in hydrology.
Water Resources Research, 6: 750-5. Stevens, P. (ed.) (1988). Oil and Gas Dictionary. Macmillan
Reference Books, London, 270 pp.
Schell, D. (1995a). What is the meaning of standards
consortia? GIS World (Aug.) 82. Stienstra, P., and Van Deen, J. K. (1994). Field data
collection techniques: unconventional sounding and
-(1995b). Harnessing change. Editorial in OpenGIS,
sampling. In N. Rengers (ed.), Engineering Geology of
Newsletter included in Geo Info Systems, May.
Quaternary Sediments. Balkema, Rotterdam, pp. 41-56.
(1980). The Penguin Dictionary of Civil Engineer¬
Scott, J. S.
Stocking, M. (1981). A working model for the estimation
ing. Penguin Books, Harmondsworth, 308 pp.
of soil loss suitable for underdeveloped areas. Develop¬
Serra, J. (1968). Les Structures gigognes: morphologie ment Studies Occasional Paper No. 15, University of
mathematique et interpretation metallogenique. Mineral. East Anglia, UK.
Deposita (Berl), 3: 135-54.
Switzer, P. (1975). Estimation of the accuracy of qual¬
Sharpnack, D. A., and Atkins, G. (1969). An algorithm itative maps. In J. C. Davis and M. J. McCullagh (eds.),
for computing slope and aspect from elevations. Photo- Display and Analysis of Spatial Data. Wiley, New York,
grammetric Engineering and Remote Sensing, 35: 247-8. 380 pp.
Sheehan, D. E. (1979). A discussion of the SYMAP pro¬ Takeyama, T., and Couclelis, H. (1997). Map dynamics:
gram. Harvard Library of Computer Graphics, Map¬ integrating cellular automata and GIS through GEO-
ping Collection, ii: Mapping Software and Cartographic Algebra. International Journal of Geographical Information
Databases, Cambridge, Mass., pp. 167-79. Science, 11: 73-92.
Skidmore, A. K. (1989). A comparison of techniques for Tang, H. J. and van Ranst, E. (1992). Testing of fuzzy set
calculating gradient and aspect from a gridded digital theory in land suitability assessment for rainfed grain
elevation model. International Journal of Geographical maize production. Pedologie, 42(2): 129-47.
Information Systems, 3: 323-34.
-Debaveye, J., Ruan, D. and van Ranst, E. (1991).
Smedley, B., and Aldred, B. K. (1980). Problems with Land suitability classification based on fuzzy set theory.
Geodata, in A. Blaser (ed.), Data Base Techniques for Pic¬ Pedologie, 41(3): 277-90.
torial Applications. Springer-Verlag, Berlin, pp. 539-54.
Taylor, C. C., and Burrough, P. A. (1986). Multiscale
Smith, D. D., and Wischmeier, W. H. (1957). Factors sources of spatial variation in soil III. Improved methods
affecting sheet and rill erosion. Trans. Am. Geophys. for fitting the nested model to one-dimensional semi-
Union, 38: 889-96. variograms. Mathematical Geology, 18: 811-21.
Smith, T. R.,Menon, S., Starr, J. L., and Estes, J. E. (1987). Taylor, J. R. (1982). An Introduction to Error Analysis. Uni¬
Requirements and principles for the implementation and versity Science Books, Oxford University Press, Oxford.
construction of large-scale geographic information sys¬
Teicholz, E., and Berry, B. J. L. (1983). Computer
tems. International Journal of Geographical Information
Graphics and Environmental Planning. Prentice Hall,
Systems, 1: 13-31.
Englewood Cliffs, NJ.
Sneath,P. H. A., and Sokal, R. R. (1973). Numerical Tax¬
Ter Steeg, H., Jetten, V., Polak, M., and Werger, M.
onomy, Freeman, San Francisco, 573 pp.
(1993). Tropical rainforest types and soils of a watershed
Soil Survey Staff (1951). Soil Survey Manual. USDA Hand¬ in Guyana, South America. Journal of Vegetation Science,
book No. 18, US Govt. Printing Office, Washington, DC. 4: 705-16.
-(1976). Soil Taxonomy. US Govt. Printing Office, Thacher, P. (1996). ‘Sustainable Development’ needs GIS.
Washington, DC. Unpublished keynote address, Tenth Annual GIS-World
Sokolnikoff, I. S., and Sokolnikoff, E. S. (1941). Higher Symposium on Geographic Information Systems, Van¬
Mathematics for Engineers and Physicists. McGraw-Hill, couver, British Columbia, Canada, 20 Mar. 1996.
New York, 587 pp. Thapa, K., and Bossler, J. (1992). Accuracy of spatial data
Solow,A. R. (1986). Mapping by simple indicator kriging. used in Geographic Information Systems. Photogram-
Mathematical Geology, 18: 335-51. metric Engineering and Remote Sensing, 58: 841-58.
-& Gorelick, S. M. (1986). Estimating monthly Tobler, W. (1979). Smooth pycnophylactic interpolation
streamflow values by co-kriging. Mathematical Geology, for geographic regions. /. American Statistical Association,
18: 785-809. 74(367): 519-36.
324
References
-(1995). The resel-based GIS. International Journal of van Roessel, J. W., and Fosnight, E. A. (1984). A relational
Geographical Information Systems, 9: 95-100. approach to vector data structure conversion. In Proc. IGU
-Deichmann, U., Gottsegen, J., and Maloy, K. Int. Symp. on Spatial Data Handling, 20-4 Aug., Zurich,
(1995). The Global Demography Project. NCGIA Tech¬ pp. 78-95.
nical Report 95-6. National Center for Geographic In¬ Varekamp, C., Skidmore, A. K., and Burrough, P. A.
formation and Analysis, University of California, Santa (1996). Spatial interpolation using public domain soft¬
Barbara, 75 pp. ware. Photogrammetric Engineering and Remote Sensing,
Tomlin, C. D. (1983) A map algebra. In Proc. Harvard Com¬ 62: 845-54.
puter Conf 1983, 31 July -4 August, Cambridge, Mass. Vink, A. P. A. (1963). Planning of soil surveys in land
-(1990). Geographic Information Systems and Carto¬ development. Pub. Int. Inst. Land Reclamation and Im¬
graphic Modeling. Prentice Hall, Englewood Cliffs, NJ, provement 10. 50 pp.
249 pp. -(1981). Landschapsecologie en landgebruik. Scheltema
Tomlinson, R. F. (1984). Geographic Information Systems: en Holkema, Amsterdam.
a new frontier. Proceedings of the International Symposium Voltz, M., and Webster, R. (1990). A comparison of
on Spatial Data Handling. 20-4, Aug. 1984, Zurich, kriging, cubic splines and classification for predicting
pp. 1-14. soil properties from sample information. Journal of Soil
-and Boyle, A. R. (1981). The state of development of Science, 41: 473-90.
systems for handling natural resources inventory data. Voss, R. (1985). Random fractal forgeries: from mountains
Cartographica, 18: 65-95. to music. In S. Nash (ed.), Science and Uncertainty, IBM
-Calkins, H. W. and Marble, D. F. (1976). Computer UK Ltd/Science Reviews, Northwood, England, pp. 69-
Handling of Geographic Data. UNESCO, Geneva. 88.
Travis, M. R., Elsner, G. H., Iverson, W. D., and Vriend, S. P., van Gaans, P. F. M., Middelburg, J., and
Johnson, C. G. (1975). VIEWIT computation of seen de Nijs, A. (1988). The application of fuzzy c-means clus¬
areas, slope and aspect for land use planning. US Depart¬ ter analysis and non-linear mapping to geochemical
ment of Agriculture Forest Service General Technical datasets: examples from Portugal. Appl. Geochem., 3:213—
Report PSW 11/1975, Pacific Southwest Forest and Range 24.
Experimental Station, Berkeley, Calif.
Walling, D. E. and Bradley, S. B. (1988). Transport and
Trier, O. D., and Taxt, T. (1994). Evaluation of bin- redistribution of Chernobyl fallout radionuclides by
arization methods for utility map images. Proceedings fluvial processes: some preliminary evidence. Environ¬
ICIP-94, Austin Texas, vol ii. 1046-50. mental Geochemistry and Health, 10: 35-9.
Tucker, C. J. (1979). Red and photographic infrared linear -Rowan, J. S., and Bradley, S. B. (1989). Sediment-
combinations for monitoring vegetation. Remote Sensing associated transport and redistribution of Chernobyl
of Environment, 8: 127-50. fallout radionuclides. Sediment and the Environment
Ullman, J. D. (1980). Principles of database systems. Com¬ (Proc. Baltimore Symposium 1989), IAHS Publication
puter Science Press, Potomac, Md. no. 184, pp. 37-45.
UNEP/FAO (1994). Report of the UNEP/FAO Expert Wang, F., Hall, G. B., and Subaryono (1990). Fuzzy in¬
Meeting on harmonizing land cover and land use formation representation and processing in conventional
classifications. GEMS Report series No. 25, UNEP/FAO GIS software: database design and application. Inter¬
Nairobi. national Journal of Geographical Information Systems, 4:
van Deursen, W. P. A. (1995). Geographical Information 261-83.
Systems and Dynamic Models. Ph.D. thesis, Utrecht Uni¬ Webber, R. (1997). Developments in cross-border stand¬
versity, NGS Publication 190, 198 pp. ards for Geodemographic segmentation. In Burrough and
-and Burrough, P. A. (1998). Dynamic modelling with Masser 1997 (in press).
GIS. Oxford (forthcoming). Weber, W. (1978). Three types of map data structures, their
van Oosterom, P. (1993). Reactive Data Structures for ANDs and NOTs, and a possible OR. G. Dutton (ed.),
Geographic Information Systems. Oxford University Press, Harvard Papers on Geographic Information Systems: First
Oxford. International Advanced Study Symposium on Topological
van Reeuwijk, P. (1982). Laboratory methods and data Data Structures for Geographic Information Systems, vol.
quality. Program for soil characterization: a report on the iv. Laboratory for Computer Graphics and Spatial Ana¬
pilot round. Part I. CEC and texture. Proc. 5th Int. Classi¬ lysis, Graduate School of Design, Harvard University.
fication Workshop, Khartoum, Sudan, Nov. 1982, 58 pp. Webster, R. (1968). Fundamental objections to the 7th
-(1984). Idem: Part II. Exchangeable bases, base satu¬ approximation. Journal of Soil Science, 19: 354-66.
ration and pH. International Soil and Reference Centre, -(1973). Automatic soil boundary location from
Wageningen, 28 pp. transect data. Mathematical Geology, 5: 27-37.
325
References
Webster, R. (1977). Quantitative and Numerical Methods -Karssenberg, D., Burrough, P. A., and van
in Soil Classification and Survey. Oxford University Press, Deursen, W. P. A. (1996). Integrating dynamic environ¬
Oxford, 269 pp. mental models in GIS: the development of a Dynamic
-(1978). Optimally partitioning soil transects. Journal Modelling language. Transactions in GIS, 1: 40-8.
of Soil Science, 29: 388-402. Whitten, D. G. A., and Brooks, J. R. V. (1972). The Pen¬
guin Dictionary of Geology. Penguin Books, Harmonds-
-(1980). DIVIDE: A FORTRAN IV program for seg¬
worth, 511 pp.
menting multivariate one-dimensional spatial series.
Computers and Geosciences, 6: 61-8. Whittington, R. P. (1988). Database Systems Engineering.
Oxford University Press, Oxford.
-and Beckett, P. H. T. (1970). Terrain classification
and evaluation using air photography: a review of recent Whittow, J. (1984). The Penguin Dictionary of Physical
work at Oxford. Photogrammetrica, 26: 51-75. Geography. Penguin Books, Harmondsworth, 591 pp.
-and Burgess, T. M. (1984). Sampling and bulking Wielemaker, W. G., and Boxem, H. W. (1982). Soils of
strategies for estimating soil properties in small regions. the Kisii area, Kenya. Agricultural Research Report 922,
Journal of Soil Science, 35: 127-40. PUDOC/Agricultural University, Wageningen.
-and McBratney, A. B. (1987). Mapping soil fertility Wilding, F. P., Jones, R. B., and Schafer, G. M. (1965).
at Broom’s Barn by simple kriging. J.Sci.Food Agric. 38: Variation of soil morphological properties within Miami,
97-115. Celina and Crosby mapping units in West-Central Ohio.
Proc. Soil Sci. Soc. Am. 29(6): 711-17.
Webster, R., and Oliver, M. A. (1990). Statistical Methods
in Soil & Land Resources Survey, Oxford University Press, Wirth, N. (1976). Algorithms + Data = Structures. Prentice-
Oxford. Hall, Englewood Cliffs, NJ.
Wischmeier, W. H., and Smith, D. D. (1978). Predicting
-and Wong, I. F. T. (1969). A numerical procedure for
Rainfall Erosion Losses. Agricultural Handbook 537,
testing soil boundaries interpreted from air photographs.
USDA, Washington DC.
Photogrammetria, 24: 59-72.
Worboys, M. F. (1994). Object oriented approaches to
Weibel, R. et al. (1998). Digital Elevation Models. Oxford
geo-referenced information, International Journal of Geo¬
University Press, Oxford (in press).
graphical Information Systems, 8: 385-99.
Weiden, M. J. J. van der, van Gaans, P. F. M., and Vriend,
-(1995). GIS—a Computing Perspective. Taylor 8c
S. P. (1992). Characterization of contaminated harbour
Francis, Fondon.
soil with statistical tools. Draft proceedings First Con¬
ference of the Working Group on Pedometrics of the -Hearnshaw, H. M., and Maguire, D. J. (1990).
International Society of Soil Science, Pedometrics-92: Object-oriented data modelling for spatial databases.
Developments in Spatial Statistics for Soil Science, International Journal of Geographical Information Systems,
pp. 191-209. 4: 369-83.
Weizenbaum, J. (1976). Computer Power and Human Yager,R. R., and Filev, D. P. (1994). Essentials of Fuzzy
Reason. W. H. Freeman, San Francisco. Modeling and Control. John Wiley, New York.
Wel, F. J. M. van der, andHooTSMANS, R. M. (1993). Visu¬ Yearsley, C. M., and Worboys, M. F. (1995). A deductive
alisation of quality information as an indispensible part model of planar spatio-temporal objects. In P. Fisher
of optimal information extraction from a GIS. Proc. 16th (ed.), Innovations in GIS 2. Taylor & Francis, Fondon,
International Cartographic Conference, Cologne, 2-9 May pp. 43 ff.
1993, pp. 881-97. P. (1982). Ueber digitale Gelandemodelle und deren
Yoeli,
Wellar, B., and Wilson, P. (1993). Contributions of computergesttitzte kartographische und kartometrische
GIS concepts and capabilities to scientific enquiry: initial Auswertung. Vermess. Photogramm. Kulturtech. 2/82: 34-
findings. GIS/LIS Proceedings 1993, pp. 753-67. 9.
-(1995). Impact of GIS on spatial theorizing: a -(1984), Cartographic contouring with computer and
status and progress report. Proceedings GIS/US ’95, ii. plotter. American Cartographer, 11: 139-55.
1026-36. Zadeh, F. A. (1965). Fuzzy sets. Information and Control,
Wentworth, C. K. (1930). A simplified method for deter¬ 8(3): 338-53.
mining the average slope of land surfaces. Am. J. Sci. Zevenbergen, F. W., and Thorne, C. R. (1987). Quant¬
Series, 5,20(117): 184-94. itative analysis of land surface topography. Earth Surface
Wesseling, C. G., and Heuvelink, G. B. M. (1991). Semi¬ Processes and Landforms, 12: 47-56.
automatic evaluation of error propagation in GIS opera¬ Zonneveld, J. I. S. (1973). Convergence en luchfoto-
tions. In }. Harts et al. (eds.), Proceedings EGIS ’91 interpretatie. K.N.A.G. Geografisch Tijdschrift, 7(1): 38-
(Utrecht: EGIS Foundation), 1228-37. 48.
326
Index
327
Index
continuous fields 10, 20-5, 27, 99, 183 ff. Digital Chart of the World 84
contour 3-4, 100, 126-8, 300 digital data:
convolution 99, 128, 300 acquiring 81
coordinate system 20, 76-7 characteristics of 81
copyright issues 227, 296 compatibility issues 82
costs and quality 220 ff., 252-5, 262 data sets 84-92
crisp sets 268 sources and sampling 76, 100-1, 295-6, 307-8
cross validation 142 transfer formats 81-2
crossover point 270, 300 Digital Elevation Model (DEM):
cursor 85-6, 300 altitude matrix 122, 190 ff., 298
block diagrams 99
data access 42-4 created by interpolation 121 ff.; from digitized
analysis model 162 contours 100, 126-8; from satellite imagery 129-31;
collection, ad hoc 80 from stereophotos 129-31
in the computer 41 ff. data sources 76, 79, 100
editing 91 derived products 190-8, 218, 244-7; detecting ridges
exchange, see interoperability and streams 197; determining catchment
formatting, see interoperability boundaries 197; drainage networks 194, 275-7;
input 13-14, 39 ff., 84, 302 irradiance maps 202-3; line of sight maps 200;
link 96-7,300 slope, aspect, convexity 190-2; shaded relief 200-2
model 21-6, 36, 39, 300; areas 22; lines 22; networks RMS errors and derived attributes 244-7
22; points 22; polygons 22; raster 22-4; vector TIN 24, 64-6, 99, 122-5, 193, 306
24-6; volumes 39 digital terrain model (DTM), see DEM
modelling and spatial analysis 29 digitizing 84-6, 300
organization 51; in raster form 51 ff.; in vector form digitizers:
57 ff. accuracy 86
output 13-14 programs 86
plane 51-2 Dilation, see buffering
presentation 13,92 DIME 61
provider 79-81 direct files 42
quality 223, 241 Dirichlet tesselation, see Thiessen polygons
records 44, 45 discretization 24, 300
reliability 225-30 document scanners, see scanners
retrieval 14-15 double precision 38, 300
storage 13-14, 41-4, 95-6, 303 DPI (dots per inch) 94, 300
structure, spatial: raster 40-1, 71-2; vector 40, 71-2 drainage network algorithms 194
structuring 51, 92 draping 93, 99
transformation 13-15, 105, 107
transfer, see interoperability ellipsoids 76-7, 301
types 27-8, 300 entities 20-6, 58-9, 301
updating 95 epsilon, see Perkal
verification 89 errors:
volumes 39 accessibility of data 227
database 12,71-3,90-2 age of data 225
indexing 67-9 areal coverage 225
Management System (DBMS) 14, 50, 300 arising from overlay and intersection 237-9
structures: hierarchical 44, 51; network 46-7,51; associated with combining attributes 247-8
object orientation 48-50, 51, 72-3; relational 47-8, associated with digitizing 89, 234-6
51,73-4 associated with point-in-polygon search 236-7
dead ends 60 associated with support sizes 242-3
Delauney triangulation 23,114,300 and costs of data 227
differentiable continuous surface 25, 27, 190, 300 data format 226
328
Index
329
Index
330
Index
331
Index
332
Index
333
This new edition of the best-selling Principles of Geographical Information Systems for Land Resources
Assessment has been completely revised and brought up to date.
■ New examples including site location analysis, marketing, route optimizing, land degradation, and
the distribution of Chernobyl radionuclides
■ Special attention for data quality and for the principles and use of fuzzy logic
Using real world examples this new book provides a comprehensive yet concise introduction to the
theory and practice of GIS for undergraduates and professionals alike in disciplines ranging from
hydrology to epidemiology, from spatial planning to agriculture.
'everything you need to know about the evolution of GIS in its third decade ... my recommended
course text for many years to come.'
Professor David Unwin, Birkbeck College, University of London
'this is a book written by two scientists who use GIS every day... That makes it an ideal text for
courses in environmental science for students who need to know the principles on which GIS
operates’
Professor Michael Goodchild, University of California, Santa Barbara
9780198233657
OXFORD UNIVERSITY PRESS 10/03/2016 16:44-3