You are on page 1of 17

1

OVERLAY ANALYSIS: AREAS AND SURFACES


TABLE OF CONTENTS
1 The importance of overlay in GIS..................................................................................................... 2 2 Learning objectives............................................................................................................................2 3 Area and surface overlay analysis......................................................................................................3 3.1 A brief history............................................................................................................................ 3 3.2 Field and feature perspectives....................................................................................................3 4 Polygon-on-polygon overlay operations............................................................................................4 4.1 Topological issues......................................................................................................................4 4.2 Creating new geometries............................................................................................................4 4.3 Weighted overlay and the vector data model.............................................................................6 4.4 Problems with overlay on vector data model.............................................................................7 4.4.1 Computational demands.....................................................................................................7 4.4.2 Sliver polygons...................................................................................................................8 5 Area-on-area overlay operations on raster data............................................................................... 10 5.1 Overlay analysis on raster data is easy! .................................................................................. 10 e-Tutorial Exercise 1................................................................................................................. 10 5.2 Difficulties of overlay on raster data models........................................................................... 11 5.2.1 What do the numbers mean?............................................................................................ 11 e-Tutorial Exercise 2................................................................................................................. 13 5.2.2 Constructing area entities................................................................................................. 13 5.2.2 Cell resolution.................................................................................................................. 13 6 Some more issues in overlay analysis..............................................................................................14 6.1 Scales of measurement.............................................................................................................14 6.2 Scale and overlay analysis....................................................................................................... 16 7 What have you learnt in this lesson?............................................................................................... 16

2005 Nigel Trodd

OVERLAY ANALYSIS: AREAS AND SURFACES

1 The importance of overlay in GIS


Overlay is a fundamental spatial operation. It is one of the functions that distinguishes GIS from other systems such as CAD and DBMS. The UK Chorley Report (Department of the Environment, 1986) illustrates what a GIS should be able to do by giving the example of an industrial siting case study that uses overlay to 'sieve' the various siting criteria and identify suitable locations. Overlay operators combine data from the same entity type or different entity types. In both cases they create new geometries and can change entity type and/or attribute value. There are four overlay operators in common use:

point-in-area (also known as point-in-polygon) line-in-area area-on-area (also known as polygon-on-polygon) weighted overlay

In this lesson we will concentrate on two of these operators, namely area-on-area and weighted overlay. These operators process area and surface entity types respectively. You will find that overlay techniques vary with the data model employed by your GIS. This means that the results of overlay analysis depend on the data model and, in general, techniques to analyse vector data are time consuming and computationally intensive whereas overlay of raster data is relatively straightforward, quick and efficient.

2 Learning objectives
Upon completion of this lesson you should be able to:

Identify and explain techniques to perform area-on-area overlay on raster and vector data.

Identify and explain techniques to perform weighted overlay on raster and vector surfaces.

Understand the main weaknesses of these overlay operations as they are implemented in GIS.

2005 Nigel Trodd

3 Area and surface overlay analysis


3.1 A brief history
The principles of area-on-area overlay pre-date GIS. Until the arrival of GIS, map overlay analysis was performed manually by superimposing transparent acetates of map layers on a light table. The stack of acetates was used to visually identify sites that met a number of criteria. In the 1960s the 'quantitative revolution' heralded a new era for spatial analysis. Several influential figures emerged including Ian McHarg whose work in landscape ecology is most well known for its attempt to explain the distribution of plants by combining information about the environment. His approach was to apply sieve-mapping techniques. This was a substantial step forward in computational spatial analysis , allowing considerably more work to be performed using the computer than was originally possible from field observation and single coverage cartography alone (Simpson, 1989). At the same time the first GIS prototypes were being developed and it is not coincidental that early products were designed, in part, to automate this `sieve mapping. Speedy and efficient analytical techniques such as these were of particular interest to governments intent on examining spatial relationships of large regions. Work done under the auspices of the Canadian government in developing CGIS was largely responsible for increasing the prevalence of polygon overlay in GIS.

3.2 Field and feature perspectives


Overlay analysis has generally taken the form of either area-on-area overlay or weighted overlay depending on your perspective. The former is more concerned with the analysis of particular features and adopts a discrete object perspective. The objectives of area-onarea overlay are to determine whether two features overlap (the technical term is to 'intersect') and, if so, to define the identity of areas formed by the overlap as one or more new area objects. Weighted overlay operations combine two or more complete map layers consisting of areas or surfaces. In addition to computing the identity of the new geometries the objective of weighted overlay is to compute new attribute values. Because the operation processes the complete data set so boundaries from all inputs will be retained but broken into shorter fragments by intersections that occur between boundaries in one input 2005 Nigel Trodd

4 dataset and boundaries in another. In both area-on-area and weighted overlay the output entity type is an area or surface respectively but the overlay operation has generated new geometries and new attributes.

4 Polygon-on-polygon overlay operations


4.1 Topological issues
If you are overlaying two vector map layers you need to ensure before you start that the input map layers are topologically correct. If this is so then the output maps will also be topologically correct (Figure 1). Figure 1. Polygon-on-polygon overlay.

Polygon-on-polygon overlay

New geometries

In polygon overlay it is necessary to add new intersections (nodes) and create new polygons to retain topology. Overlaying 2 sets of polygons can produce a large number of new polygons and increase the number of nodes and arcs. In Figure 2, for example, the number of nodes increased by 75% and the number of arcs by 83%. Warning!! Increasing the number of input data sets can rapidly increase the number of output features. The algorithms to compute the location of new nodes are the same as those used for line intersection. Once these have been identified so the arcs need to be split and then the new topology constructed.

4.2 Creating new geometries


Once the new set of nodes, arcs and polygons have been created the task is to extract a meaningful set of polygons. It may be desirable to retain only that area that is common to both input features. For example, a farmer is interested in knowing that part of a field that has a loam soil. He is able to overlay the map of loam soil polygons on field polygon to extract a feature that meets both criteria (loam soil AND in-field). 2005 Nigel Trodd

5 Figure 2. Creating geometries in overlay operations on the vector data model.

Figure 3. Polygon overlay: intersection

polygon a AND polygon b

new feature geometry (old boundaries dissolved)

It is worth noting that the variables the farmer is processing are both of categorical (or nominal) data type. This is because mathe maticians have developed a suite of algorithms to analyse these data, known as Boolean operators, that GIS analysts exploit in area-onarea overlay analysis.

2005 Nigel Trodd

6 In the example the farmer was analysing 2 criteria and applied an algorithm to create a new geometry that met both criteria the area of intersection of polygon a AND polygon b. In other situations you may be more interested in features that meet either criteria (polygon a OR polygon b). The algorithm is known as union and the effect is to retain all parts of both input polygons in the output feature. Likewise other Boolean operators frequently used in GIS are NOT and XOR. Figure 4. Polygon overlay: (a) Union, (b) NOT and (c) XOR.

a) polygon a OR polygon b Union

b) polygon a NOT polygon b Only parts of polygon a that are outside polygon b

c) polygon a XOR polygon b The inverse of intersection NOT (polygon a AND polygon b) As well as the mathematical rigour conveyed by the use of Boolean operators a strength of the basic polygon overlay in that it is intuitive when applied to a vector data model because we are handling discrete area objects and nominal attributes.

4.3 Weighted overlay and the vector data model


In the basic area-on-area overlay on a vector data model the objective was to identify one or more parts of the new geometry that met simple criteria. Areas that did not meet the 2005 Nigel Trodd

7 criteria were discarded. This was processed as a single task. The objective of weighted overlay is to calculate a new set of values for the complete coverage based on a combination of input values. When working with a vector data model there are two tasks to perform (i) create a new set of geometries for the entire area and (ii) compute a new set of attributes for those geometries. The latter task is a matter of describing a mathematical equation to process the input values. The first task, however, requires you to extend the basic polygon overlay operation to consider every intersection between all polygons in every data layer. As you can imagine this can be computationally demanding, especially if the GIS you are using computes topology 'on the fly' and does not store it in the data structure. As we shall see this is one of the reasons why weighted overlay is more frequently applied to a raster data model.

4.4 Problems with overlay on vector data model


4.4.1 Computational demands The data file produced as a result of polygon overlay may be considerably larger than the original because lines have been split into smaller segments and new nodes and polygons have been created. Although more file space is required to store the outputs a more common problem is that some implementations of polygon overlay in GIS require large amounts of memory or temporary file space to hold intermediate products during the processing. The result is that most GIS are limited in the number of polygons that they can handle in a polygon overlay operation. It is fairly obvious that larger map layers will take longer to process. It is therefore prudent to develop a strategy to minimise processing time (and memory use). A data processing strategy is particularly important if your GIS has to compute topology 'on the fly' e.g. ESRI ArcView 3 and ArcGIS 8, because this increases the computational demands. My advice is to design your analysis so that the fewest number of features are overlaid. Example: generate information on the area of coniferous forest in Bavaria. Poor strategy: Intersect all states in Germany with all land cover types (wait three

hours....) 2005 Nigel Trodd Select by attribute to extract coniferous forests in Bavaria.

8 Smart strategy: Select Bavaria Reclass by attribute all coniferous forest land cover (and all other

non-coniferous forest land cover) Intersect reclassified coniferous forest with Bavaria (wait 5 mins...)

The smart strategy requires an extra step in processing but will substantially reduce the computational load. 4.4.2 Sliver polygons Rogue or spurious polygons that are produced as a result of overlay are commonly known as sliver polygons. If you overlay two sets of data with the same area entities that have been acquired from different sources or have been digitised twice from the same source then you will almost certainly encounter such polygons. Figure 5. Sliver polygons caused by digitising the same line twice.

The two versions of such boundaries will not be coincident and as a result large numbers of small sliver polygons will be created by the polygon overlay process. Figure 6. Sliver polygons along the boundaries of administrative units.

2005 Nigel Trodd

9 There are two approaches to eliminating them: 1. close them during processing. 2. eliminate them after processing. Removing them automatically during processing is normally done using a user-defined tolerance. The analyst adjusts the tolerance to create an optimal solution. If the tolerance is too big then lines which are close together, but actually separate, may be joined. Figure 7. Setting tolerance to close sliver polygons.

The alternative is to remove slivers after processing. This may speed up the actual overlay processing but requires a degree of intelligence for a computer to be able to distinguish between real and sliver polygons. There are several differences between typical (real) polygons and sliver polygons. Figure 8. Real and sliver polygons.

'Real' polygons

Sliver polygons

2005 Nigel Trodd

10 'Real' polygons Size and shape vary Generally more than two bounding arcs Attributes vary randomly between neighbouring polygons Usually three arc intersections Sliver polygons Generally small, long and thin Generally only two bounding arcs Attributes may alternate between adjacent polygons Four arc intersections generally

Once the sliver polygons have been identified they can be closed by replacing them with a central line.

5 Area-on-area overlay operations on raster data


5.1 Overlay analysis on raster data is easy!
If two grids are aligned and have the same grid cell size then it is relatively easy to perform overlay operations. A new layer of values is produced from each pair of coincident cells. The values of these cells can be added, subtracted, divided or multiplied, the maximum value can be extracted, mean value calculated, a logical expression computed and so on. The output cell simply takes on a value equal to the result of the calculation. e-Tutorial Exercise 1 Time: 20 mins Let us return to Klinkenbergs excellent demonstrations of GIS operations. http://www.geog.ubc.ca/courses/klink/java/java_examples.htm l Use the Binary Overlays demonstration to investigate the effects of different Boolean operators on two layers.

2005 Nigel Trodd

11 Figure 9. Some mathematical operators for overlay operations on the raster data model. Input layers Output layer

Simple addition A+B=C

C A

Multiplication A*B=D

D B

Unique conditions If A =1, B =1 then E = 1 If A = 2, B = 1 then E = 2 If A = 1, B = 2 then E = 3 If A = 2, B = 2 then E = 4 E

5.2 Difficulties of overlay on raster data models


The main problems are not technical GIS problems, they are data problems. 5.2.1 What do the numbers mean? The simplicity of the operator makes the overlay process very easy to implement. Problems usually start with interpreting the outputs. For example, to identify an area that meets criteria on two inputs (intersection) can be done one of two ways. The most logical 2005 Nigel Trodd

12 approach is to reclass the cell values in each layer as either 0 or 1 to indicate whether they meet the criteria or not and then multiply. The extra effort to reclassify 2 layers is time consuming and many analysts will seek to multiply the inputs and then reclassify the output. This reduces the effort but might not always produce a set of unambiguous output values. For example, using the farmer interested in identifying that part of a field that has a loam soil then if the loam soil is coded 3 and the field number is 2 then he should look for cells in the output layer with a value of 6. The problem is that other combinations of inputs can generate the same value 4 and 2, 6 and 1. The problem is caused by the analyst and in my experience it happens far too frequently with the results being published without anyone being aware of the consequences. Perhaps this is because of the widespread availability of such easy-to-use operators. The problem of an output layer not having unique records is not restricted to multiplication. Jenks has illustrated how the same problem can be caused by addition and it is easy to show the problem arises in all mathematical operators if the analyst is unaware of the meaning behind the data. Figure 10. Ambiguities in the output of overlay operations on raster data.

2005 Nigel Trodd

13 e-Tutorial Exercise 2 Time: 20 mins Let us visit Klinkenbergs excellent demonstrations of GIS operations. http://www.geog.ubc.ca/courses/klink/java/java_examples.htm l Can you solve the problem posed in the CROSSTAB and reclassification demonstration? This requires an understanding of how new output values are created for each unique combination of input values.

5.2.2 Constructing area entities A problem in the overlay analysis of raster data models is the correct identification of area features in the output because, unlike polygon-on-polygon analysis on the vector data model, there is no intuitive geometry. Each cell is processed individually and the analyst has to create new geometries based on only the new cell attributes. The operator does not distinguish between area entities and surface entities. The analyst is faced with at least 2 questions (i) has the mathematical computation produced unambiguous cell values i.e. each value has a single, distinctive meaning?, and (ii) should diagonally adjacent cells with the same value be part of the same area feature in the output or should adjacency be defined in terms of horizontal and vertical neighbouring cells. 5.2.2 Cell resolution Resolution is the pixel, grid cell or mesh size of spatial data. For example, remotely sensed multispectral data from the SPOT satellite has a resolution of 20m. This means that each pixel in the image represents a ground area of size 20m by 20m. Imagine you wish to overlay a SPOT image with a raster representation of urban fox population densities which has been coded with a 10m pixel size. Will your output have a resolution of 10m or 20m or 30m size?

2005 Nigel Trodd

14

6 Some more issues in overlay analysis


6.1 Scales of measurement
GIS gives you immense flexibility in the way you can overlay raster data - probably too much flexibility for the casual user. There are some computations you can achieve which simply do not make sense! For instance imagine you have one map layer coded with different soil types given codes such as 1 (clay) or 5 (loam). And you have a second map layer with rainfall totals. It is perfectly possible to add, subtract, multiply etc. these two map layers, but in all cases the answers are nonsense. Why? Because the two sets of data have been collected using different scales of measurement. Rainfall is generally measured using values on what is known as a RATIO scale, and soil classes are NOMINAL or categorical data. The rainfall values are fine, you can add, subtract, divide etc. using ratio numbers, but you cannot apply these operators to nominal scale numbers. Using a nominal scale, the numbers allocated are simply labels: they may as well be letters A - E. 5 on the soil scale is not five time larger than 1, neither is it one unit more than 4. In fact, the only thing we can say about different values on a nominal scale is that the property is 'different'. Therefore, multiplying, or adding soil type to rainfall produces a meaningless result. So, although the GIS will let you perform these operations, it will not tell you when they produce meaningless answers. It is important, therefore, to know what scale of measurement has been used for the measurement of your data. Scales of measurement are summarised below together with the details of the operations possible on each type of data. Although the problem is often associated with analysis on raster data models because many vector-GIS are supported by a database that recognises alphanumeric values you should be aware that knowledge of measurement scale is fundamental to any data processing work.

2005 Nigel Trodd

15 Scales of Measurement summary Nominal : such as I.D. number or soil type. Such numbers have no meaning, they simply represent distinct categories. So, cities given a reference number, or telephone numbers, are examples of measurements on a nominal scale. The only relationship between numbers on a nominal scale is one of identity. Ordinal : such as positions in a competition - the order is important. It is possible to rank data, but we know nothing about other numerical relationships between data. For example, we can rank cities in terms of their population totals, with city number 1 having the highest total. However, the city with rank 2 will not have a population half that of the city with rank 1, but we do know that the population of city 2 is smaller than that of city 1. Interval : such as temperature measured in centigrade or Fahrenheit. There is no real zero but intervals between integers are equal. Temperature data, in common with other interval data can be added and subtracted (for example to find daily temperature range from the maximum and minimum) but we cannot say 20 oC is twice as hot as 10 oC. Ratio : such as distance. There is a real zero, negatives are possible, intervals between numbers are equal and so is order. Ratio scale data can be added and subtracted and have ratio properties. Thus we can say 20/10 equals 30/15. Each scale has the property described by its name and, below nominal scale, has all the properties of the one above. The table below summarises the operations possible on the different types of measurement (adapted from Unwin, 1981): Table 2. Scales of measurement. Level Nominal Basic operations Frequency (count) Recognition of equality Examples Name (of person or road), Address (postcode), House type (detached, semidetached, terrace, apartment), Colour

2005 Nigel Trodd

16 Level Ordinal Basic operations Examples band (High rate, Standard rate, Basic rate) Interval Ratio Addition, subtraction Addition, subtraction, multiplication, division Temperature (degrees Celsius), date Distance, rainfall, income

Determination of order (rank) Grade (A-B-C-D-E), Tax

6.2 Scale and overlay analysis


Berry (1991) identified scale as a cause of error that may be incorporated almost effortlessly into overlay analysis. Overlay analysis can be implemented on any pair of inputs if they cover the same spatial extent. Many analysts, however, ignore the consequences of combining data of different scales. Two maps at very different scales are frequently the product of very different data modelling exercises e.g. the GB Ordnance Survey produces 1:10,000, 1:50,000 and 1:250,000 data products that have been maintained separately and are designed for different purposes.

7 What have you learnt in this lesson?


The integration of spatial data is at the heart of GIS and area-on-area and weighted overlay epitomise the analysis of multiple data layers. They were some of the first operators implemented in GIS in its' early years and have attracted considerable attention from both researchers to extend the range of algorithms and investigate the consequences of different algorithms and software developers to improve efficiency in the implementation of algorithms. Vector algorithms for area-on-area overlay analysis are elegant and intuitive but computationally demanding. They also produce sliver polygons in their thousands. These problems can be reduced by adopting a set of heuristics and implementing additional processing to clean up the unwanted artefacts. It remains highly desirable to design a smart strategy when using polygon overlay. Even so, I suspect that more overlay analysis is performed on the raster data model. Area-on-area and weighted overlay are simple and quick to apply to the raster data model 2005 Nigel Trodd

17 if the grids are aligned and of equal cell size. The inherent weaknesses of the raster data model become apparent in post-processing when the analyst might be faced with making some arbitrary decisions as to the meaning of the output.

2005 Nigel Trodd

You might also like