You are on page 1of 26

Data Quality and Error

Presented By,
S Bensinghdhas, M.E (Design)
Asst. Lecturer
SJCET, Dar Es Salam

Error
Whenever you work with spatial data (or any
data for that matter) you will deal with some
sort of error due to the many steps involved
in creating spatial data.
Spatial data is just an abstraction of what is
really there. Because of this abstraction, we
can expect error due to:
How we conceptualize the data in the first
place
How we collect the data
How we present the data
Additionally, there are other sources of error
such as:
Obvious Errors
Errors in natural variation

Data Quality
GIS IS A GARBAGE MAGNIFIER
GARBAGE IN / GARBAGE OUT

MOST FAILED GIS PROJECTS ARE DUE


TO POOR PLANNING AND POOR DATA
QUALITY

Obvious Error

The errors we just discussed are illustrative of the general


types of obvious errors you would encounter when using
geospatial information. As a geospatial analyst, you will have
to give thought as to how to correct those errors before
proceeding with a project.
Also, as a geospatial analyst, you should always approach a
project with the obvious sources of error we just discussed
firmly on you mind. Therefore, when given a task to perform,
and the associated data, the following should act as a good
checklist:
Is the data current?
Were the data mapped at the correct scale? Do they have
the same accuracies?
What is the resolution of the data? Will it support the kinds
of analysis we want to perform?
Do we have all the data for the project areas, or is there
some data missing?
If we need other data sets, are they available, or will we
have trouble getting them?

Components of Data Quality

Positional Accuracy
Attribute Accuracy
Logical Consistency
Resolution
Completeness

Spatial Accuracy

As we previously stated, positional accuracy relates


to the coordinate values for the geographic objects.
But, even positional accuracy is divided into two
different categories:
Absolute accuracy: refers to the actual X,Y
coordinates of a geographic object. If one knows
the correct position of the geographic object, they
can compare the differences with the position
represented in the geographic database.
Typically, absolute accuracy will measure the total
different between an object, or the difference in
the X coordinate and the difference in the Y
coordinate.
Relative accuracy: refers to the displacement of
two or more points on a map (in both the distance
and angle), compared to the displacement of
those same points in the real world.
The figures on the right show two different maps of
the Cornell campus and the City of Ithaca. The top
map, a USGS quadrangle, has an absolute accuracy
of around 40 feet. That is, the coordinates for a
building on the quadsheet are probably within 40 feet
of their real world coordinates. The bottom map, a
photogrammetrically derived map of the same area
has an absolute accuracy of about 2.5 feet.

Relative Accuracy
Even though the USGS
quadrangle has much less
absolute accuracy than the
photogrammetrically
derived map, if were were
to zoom into an area and
measure the distance
between two points, the
relative distance, and the
angle would be fairly
similar. In this case, the
distance along Tower Road
is only about 15 feet
different, and the azimuth

Positional Accuracy

Attribute Accuracy
Connecticut

New Jersey

Pennsylvania

New York

Logical Consistency
Representation of
data that does not
make sense
Road in the water
Contours that cross
or end
Features on steep
slopes

Resolution
Generalization may
improperly represent
size and shape
Cartographic
Asthetics
Entire regions may be
eliminated (islands,
peninsulas, etc.)

Completeness
Fragmented
coverage of many
developing
countries
Soils
Vegetation

Must determine
methods for
uniformity

Obvious Errors

The statement to err is human is very applicable to creating spatial data. Humans
make a lot of errors. Typing in the wrong value in a computer is a common mistake that
humans make. However, there are other sources of obvious error besides human error:
Age: a map is a representation of real-world objects at a given point in time. The
reliability of a dataset typically goes down as it gets older. This is especially true of
data that would frequently change such as housing within a city. Many GIS projects
take years to complete, and it is entirely possible that much of the data collected in
the beginning of a project may be out of date by the end of the project.
Map Scale: In general, larger scale maps show more detail than smaller scale maps.
Also, larger scale maps tend to have greater accuracy than smaller scale maps,
especially maps within the same family such as the differences between 1:250,000,
1:100,000 and 1:24,000 USGS maps. Computers, and GIS software really dont care
what data you give it. That being the case, a GIS will process any of your data,
whether the processing is appropriate or not. Therefore, you can combine data from
different scales rather easily, however, doing so may not be a good idea due to the
different accuracies of the products.
Data Format: The way we represent data also presents an obvious source of error.
For example, a raster map of landuse represented by 10 meter grid cells will differ
significantly from a raster map of landuse represented by 100 meter grid cells. The
following is a grid of landuse values around Ithaca, New York. You can see the
differences in representation between a map with 10 meter grid cells, 30 meter grid
cells, and 100 meter grid cells.
Aerial Coverage: Many data sets may not have uniform coverage. That is, there
may be pieces missing in one section.
Accessibility: Not all data sets are equally accessible. For example, land resources
in one country may be available, but are considered a state secret in another country.
Also, due to the recent events of September 11, 2001, some data are unavailable
due to security reasons.

Problems with Age


The following maps show the different land cover types between
1968 and 1995. You can see how the data has changed over 30
years, and why using older data might present a problem.

Obvious Sources of Error


Areal Coverage
Many data sets do not have a
uniform coverage of information

NASSAU COUNTY BASEMAP

SUFFOLK COUNTY PARCELS

Problems with Format


You can see the different way in which
data is represented when using different
formats. In this case, 10, 30, and 100
meter grid cells are used.

10 meter

30 meter

100 meter

Errors Due to Natural Variation

You can see why each of the previous error types are called
Obvious Errors. But there are other types of errors that are not
so obvious, and oftentimes overlooked. Nonetheless, you will
have to be aware of these kinds of errors too. The errors are
termed errors in natural variation, and take the form of:

Positional Errors Due to Natural Variation: there are


natural variations in materials that might make them less
accurate. For example, a paper map stored in a humid room
will actually shrink. The shrinking of the material is virtually
unnoticeable by a user, but depending upon the scale of the
map, the real world errors could be quite large.

Variations Due to Equipment: Some equipment may not


measure information correctly, or may have slight variations
from measurement to measurement. For example, a
temperature gauge or pH meter may have slightly different
readings when measuring the same location. If youve ever
measured your blood pressure on one of the automatic
machines in the drug store, you have probably noticed that
two readings taken after one another can be different. While
some of this is based on your own fluctuations in blood
pressure, the machines themselves have some variability.

The variations of measurements are often related to two


important concepts called precision and accuracy

Errors Resulting from Natural


Variations from Original
Measurements
Positional Accuracy
Result of poor field work, media shrinkage and
expansion, poor vectorization (line digitizing)
Correction through rubbersheeting

Accuracy of Content
Attribute errors caused by miscoding, or faulty
equipment (thermometer, pH meter)

Sources of Variation in Data:


Data entry or output faults

Errors Resulting from Natural


Variations from Original
Measurements

Measurement Error

Accuracy vs. Precision


Accuracy: extent to which an estimated value
approaches the true value
Precision: measure of dispersion of
observations about a mean
Accuracy vs. Precision example

Laboratory Errors
Results of World-wide Laboratory Exchange
Program
Same soil samples in different laboratories
exceeded:
11% for clay content

Accuracy and Precision

Accuracy is defined as
displacement of a plotted point
from its true position in relation to
an established standard while
Precision is the degree of
perfection; or repeatability of a
measurement.
For mapping, accuracy is
associated with position of an
object to its true position.
Precision is then the ability to
repeat a measurement, or how
likely you are to return to the same
location time and time again.
The figures to the right illustrate
the differences between accuracy
and precision.
Therefore, if there are natural
variations in either the instruments
used for measurement, or the

Errors Arising Through


Processing
Numerical Errors in the Computer
Numerical precision
PC ARC/INFO is Single Precision
Some GIS are using Integer values to store coordinates and
large areas may not be stored precisely.
Scaling a triangle

Faults Arising Through Topological Analysis


Assumes

Source data is uniform


Digitizing procedures are infallible
Map overlay is only concerned with line intersection
Boundaries can be sharply defined and drawn

Raster to Vector

GIS allows you to convert raster and vector features between one another.
For example, we can take a raster feature and convert it to vector format.
Or, we can take a vector feature and convert it to raster. But, as the
examples show, depending upon the resolution of the features, the
representation of the geographic objects may be quite different. In some
cases, you can see how the raster version of the map actually caused
some buildings to merge together.

Vector Data of Buildings

Vector data converted to raster


with 10 grid cells

Raster data converted back to


vector, using 10 grid cells

Errors in Data Processing

Digitizing Data: Once again, scale presents a problem with digitized


data. On a soil map, drawn at a scale of 1:100,000, a 1 mm wide line (the
thickness of a sharp pencil) would actually represent 100 meters on the
ground. Or, as shown in the example below, the road edge on the USGS
quadrangle is actually 4 meters wide in some spots.
Spatial Analysis: Some GIS functions such as overlay present problems
W
such ambiguous locations,
and the concept of sliver polygons. Also,
id
th
of to vector format will also introduce errors.
converting data from raster
ed
ge
Each of the examples are shown
in the illustrations below.
of

pa

ve
m

en

ti
s

gr
ea

te
r

th
a

et
er
s

Errors Associated with Spatial


Analysis
Errors in Digitizing a Map
Source errors
Distortion
Boundaries drawn on a map have a thickness
1 mm line

1.25 m wide on 1:250 map


100m wide on 1:100000
Estimates show that 10% of a 1:24000 soil map may
represent the boundary lines alone

Digital Representation
Curves are approximated by many vertices
Boundaries are not absolute, but should have a
confidence interval

Sliver Polygons

In the following example, there are two polygons. When we overlay the
two of them, the resulting polygon has not only the logical intersection
between the two polygons, but also many small polygons that are probably
due more to the fact that the representation of the polygon boundaries are
slightly different. These smaller, or sliver polygons, represent spatial errors
in the data.

Errors Associated with Spatial


Analysis
Boundary Problems
Definitely in
Definitely out
Possibly in
Possibly out
Ambiguous (on the digitized border
line)

You might also like