You are on page 1of 17

Point Pattern Analysis in an ArcGIS Environment

Jared Aldstadt and Arthur Getis


San Diego State University

For a number of years in the Departments of Geography at San Diego State University and the University of

Illinois, we have been working on the development of software for a variety of ESDA routines, especially in

the field of spatial association. These software programs make it possible to easily apply such analytical

devices as nearest neighbor, general pattern, and local statistics. We have called our routine Point Pattern

Analysis (PPA), but the actual product is constantly in flux as we attempt to make the approach as user

friendly and comprehensive as we can. Those who have worked on the programs over the years include

Marc Armstrong (Iowa), Laura Hungerford (Illinois), DongMei Chen (Queens), Lauren Scott (ESRI), and

the authors.

Currently we are adapting some of the materials of PPA for use in an ArcGIS environment. We

use Visual Basic for Applications (VBA) to create a visual representation of several PPA routines. We are

now in a position to demonstrate the use of certain local statistics in a fully functioning, uncoupled system.

In this paper we demonstrate our approach on the local G statistic (Ord and Getis 1995). We should point

out that one of our recent endeavors is to create a user-friendly approach to the new O statistic (Ord and

Getis 2001), a spatial statistic designed to describe local clustering in a non-stationary setting.

To better understand what it is that we are attempting to do, we call attention to our on-line

software materials, which represent an earlier approach to 14 different ESDA tools with documentation. It

can be seen at: http://xerxes.sph.umich.edu:2000/cgi-bin/cgi-tcl-examples/generic/ppa/ppa.cgi.

Building on that work, Lauren Scott recently included in an ArcView 3.2 environment several of these

routines. We used these successfully in exercises assigned for social science students and young faculty in
CSISS workshops on pattern analysis the last two summers. Now, we are moving further along by

translating these routines into the ArcGIS environment.

Thus far PPA has been used in about a dozen published research projects. In addition, SpaceStat,

the well-known spatial econometrics package, and SAGE, a British ESDA package, have a few of the PPA

statistics included. Just recently, ClusterSeer, from BioMedware, the health-related software company, has

adopted our approach for certain cluster analyses.

ArcGIS

ArcGIS is the latest version of Environmental Research Systems Institute's (ESRI) suite of GIS products.

ArcGIS is designed as a scalable system for geographic data creation, management, and analysis. ESRI

products have a large user base. The ESRI website states that there are over 1 million ESRI software users

worldwide, and that 50,000 university students receive instruction utilizing ESRI products every year (ESRI

2002).

In previous versions, ESRI's desktop GIS, ArcView, and its enterprise level GIS, ArcInfo, were

very different in terms of the primary geographic data model and user interface. In ArcGIS, however, all of

the products use the same data model, GUI interface, and development environment (Limp 2001). In

addition to ArcView and ArcInfo, there is a medium sized version of ArcGIS called ArcEditor. Each

variety of ArcGIS consists of several individual applications that provide a set of functionality. ArcMap is

a primary component of all three versions and is the interface for data display and analysis. An analysis

tool developed for ArcMap can be used in all versions of ArcGIS.

ArcObjects and VBA

The development platform for the ArcGIS applications is called ArcObjects. ArcObjects is built using

Microsoft's Component Object Model (COM) and can be embedded and extended using any COM

compliant language. Visual Basic and Visual C++ are COM compliant languages that can be used to

extend existing ArcGIS applications or write stand-alone applications that utilize ArcObjects components.

This method of development generally requires that the developer first becomes a competent Visual Basic

or Visual C++ developer in order to exploit ArcObjects.


An easier way to extend ArcGIS is to use VBA, which is embedded in the ArcMap application.

VBA is a scripting language that allows the developer to use the existing framework of ArcMap and add

functionality through the creation of custom tools, menus, and modules (Zeller 2001).

The primary advantage of using VBA to develop spatial analysis tools for ArcMap is ease of

development. The integrated development environment (IDE) allows the programmer to design, code, and

debug within a common environment. Building GUI interfaces is simplified with a drag and drop control

toolbox. Easy integration with common Windows features such as context help, Windows help files, and

file access dialogs allow custom tools to have the same look and feel as larger applications. As with all

scripting languages, the primary drawback of VBA is slow execution compared to compiled code. This

may be especially important in developing computationally intensive statistical analysis tools.

PPA version 1.0

The package is written in the C programming language and includes a variety of pattern analysis routines.

There are both DOS and UNIX versions of the program. A screenshot of the DOS version of the program

running in Windows is shown in Figure 1.

Figure 1: A screenshot of the PPA menu.

There are several limitations that the PPA users face when analyzing their data. PPA accepts input

as an ASCII file in three columns: x coordinate, y coordinate, and value at the point. The output is saved to
an ASCII file and displayed on the screen as text. Often, PPA users must convert their data files to ASCII

before running PPA and again convert their files to a GIS compatible format for display after analysis. The

interface of PPA is text menu based. The lack of a GUI interface and features such as file dialogs create

frustration among users used to windowing environments. Integrating PPA routines into a GIS environment

would eliminate these drawbacks and allow for truly integrated exploratory spatial data analysis (ESDA).

The following routines are available in PPA version 1.0:

• Nearest Neighbor Analysis – A test of significance on the observed mean nearest neighbor distance
in a point pattern

• Refined Nearest Neighbor Analysis – A Monte Carlo test comparing the complete distribution
function of the observed nearest neighbor distances.

• K-function Analysis – Also called second order analysis, a point pattern test based on inter-event
distances and a Monte Carlo significance test.

• Local K-function – A local test based on K-function analysis that examines the clustering of
neighbors around each point individually.

• Weighted K-function analysis – A statistical test based on the permutation of the weights at each
point.

• Knox space time clustering – A method to analyze the clustering of events in both space and time
based on the Poisson distribution.

• Join count statistic – A simple measure of spatial autocorrelation for data measured at the nominal
scale.

• Moran’s I – A measure of global spatial autocorrelation produced by standardizing the spatial


autocovariance by the variance of the data.

• Geary’s c - A measure of global spatial autocorrelation that uses the sum of the squared differences
between pairs of data values as its measure of covariation.

• Getis and Ord’s General G - A multiplicative measure of overall spatial association of values that
fall within a given distance of each other.

• Local Moran’s I – Local Autocorrelation statistic based on Moran’s I.

• Local Getis and Ord G-statistics – These statistics evaluate the extent to which a location is
surrounded by a cluster of high or low values.
PPA in ArcGIS

We have selected the local Getis and Ord G-statistics as a test case for implementing PPA routines in

ArcGIS. This custom tool is written as a VBA macro. The macro uses one form to receive user input, and

an additional dialog box to query the user for an output field name. After computing the statistic for each

observation, the result is saved in the specified field of the original dataset. The complete code listing of

the local G macro is included in Appendix A, and an example of its use follows.

The example will use the local G-statistic to analyze the cancer rates of New Mexico counties.

The data are taken from the National Cancer Institute Biometry Research Group Datasets, and it is available

on their website, http://dcp.nci.gov/bb/datasets.html. The total number of cases diagnosed between 1980

and 1989 were divided by the 1985 population estimate to determine the cancer rate (cases per 10,000

population) of each county. County seats are used as a proxy for population centroids. The data are in the

shapefile format and are loaded into ArcMap. For reference, a polygon layer of the New Mexico counties

is also displayed.

Figure 2: Running a macro in ArcMap


In order to run a macro, the user goes to the Tools menu and selects the Macros option. A popup

menu will be displayed with two more options. Selecting the Visual Basic Editor option will bring up the

VBA editor and allow the user to create, view, and modify macros. Selecting the Macros… option will

display the macros available to the user in the current ArcMap document. The Macros dialog is shown in

Figure 3.

Figure 3: The Macros Dialog

In the Macros dialog box, the user selects the desired macro and clicks an action button along the

right hand side. In this case the only macro available is the LocalG.LocalG macro. The name before the

period indicates the code module that the macro is in, and the name after the period is the name of the

subroutine to run. When the Run button is clicked the macro begins execution.

The Local G Statistic dialog is displayed to prompt the user for input. The first combo box

prompts the user to select a layer to analyze. This combo box will be populated with all the point layers in

the active map. After selecting a point layer, the second combo box is populated with all the numeric fields

available for the specified layer. In our example, the user selects the NM County Seats point layer and the

CancerRate field.

The text box labeled Distance is used to enter a distance for analysis. The distance units are the

same as the units of the map, and in this case the units are meters. The checkbox labeled Star specifies
whether to include the value at the observation itself in the analysis at that point. For the example we have

chosen a distance of 100,000 meters and we have checked the Star checkbox. After specifying the

parameters of analysis the user presses the OK button to continue. At anytime the user may select the

Cancel button to exit the macro.

Figure 4: The Local G Statistic parameter input dialog

Next, the user is prompted for an output field name (Figure 5). A default name is supplied, but the

user may change the name. For display purposes we have changed the field name to Gi*(100km) from the

generated default field name, Gi*(100000).

Figure 5: Output Field dialog box

After the user selects an output field name, the specified G-statistic is calculated for each

observation and stored in the attribute table of the layer being analyzed. The attribute table for our example
dataset is shown in Figure 6. The existing data and our newly created field, the column on the right hand

side of the table labeled Gi*(100km), can be viewed together immediately after execution of the macro.

Figure 6: Attribute Table for NM County Seats dataset

The existing functionality of ArcMap can then be used to display the results graphically using

charts, graphs, and maps. These graphical outputs can be dynamically linked to the data table to further aid

ESDA. The map in Figure 7 shows the local G-statistic values calculated above. Figure 8 is a more
complex map that displays both the cancer rates and the G-statistic value at five different distances for each

county.

Figure 7: The local Gi* statistic value for 100km calculated from cancer rates of New Mexico counties

Looking Ahead

The use of VBA macros to create spatial data analysis tools in ArcMap is relatively easy to learn and quick

to implement. Leveraging the existing file manipulation, data interfaces, and visualization functionality of

ArcMap allows the programmer to concentrate on developing spatial analysis methods. There are many

existing features of ArcObjects that we have not yet implemented. The integration of help files and the

utilization of the graphing objects are planned additions. Still unexplored is the ability of ArcObjects to

create new layers and graphics to aid in the visualization of data and results. We hope to show distance

qualities in analysis by stages. This will provide an outstanding tool for those wanting to learn about the

relevant spatial statistics.


In the case of computationally intensive methods, however, the slow execution of VBA will not

suffice. To perform analysis that use Monte Carlo simulations, such as K-function analysis, or involve a

large numbers of computations, like the O statistic, requires compiled software components. These

components will demand a more intensive development effort. Although ESRI has a large user base, the

expense of purchasing a distribution of ArcGIS makes these tools inaccessible to many institutions,

companies, and individuals. It is important, therefore, to maintain existing free versions of PPA and work

with other vendors and developers of open source spatial analysis tools.

Figure 8: Map of New Mexico Cancer Rates by County and Gi* values for five distances
References

ESRI (2001). ArcObjects Developer Help, Environmental Research Systems Institute, Inc., Redlands,
California

ESRI (2002). Environmental Research Systems Institute, Inc., http://www.esri.com/

Limp, W. Frederick, (2001). Quick Take Reviews: ArcGIS 8.1, GeoWorld,

National Cancer Institute Biometry Research Group Datasets http://dcp.nci.gov/bb/datasets.html

Ord, J.K. and Getis, A., (1995). Local Spatial Autocorrelation Statistics: Distribution Issues and an
Application, Geographical Analysis, 27(4): 286-306

Ord, J.K. and Getis, A., (2001). Testing for Local Spatial Autocorrelation in the Presence of Global
Autocorrelation, Journal of Regional Science, 41(3): 411-432

Getis, Arthur, Chen, Dong Mei, and Aldstadt, Jared (2000). Point Pattern Analysis 1.0,
http://xerxes.sph.umich.edu:2000/cgi-bin/cgi-tcl-examples/generic/ppa/ppa.cgi

Zeller, Michael, (2001). Exploring ArcObjects, Environmental Research Systems Institute, Inc., Redlands,
California
Appendix A: VBA Code for Local G Macro

This appendix is a listing of the code for the macro module that calculates the local G statistic. There are

global variable declarations, two module subroutines (LocalG and getFields), and three form event

subroutines.

Global Variable Declarations

Public star As Boolean


Public gDist As Double
Public valueField As String
Public layerName As String

LocalG Subroutine

Sub LocalG()
'This macro computes the local G statistics for point feature classes
'in ArcMap

'Set document and Map objects


Dim pMxDoc As IMxDocument, pEnumFeat As IEnumFeature
Dim pGeom As IGeometry
Dim pMap As IMap
Set pMxDoc = ThisDocument
Set pMap = pMxDoc.FocusMap
Dim pFeatureLayer As IFeatureLayer
Dim pILayer As ILayer
Dim pFeatureClass As IFeatureClass

'Set a UID for GeoFeatureLayers


Dim pId As New UID
pId = "{E156D7E5-22AF-11D3-9F99-00C04F6BC78E}" 'IGeoFeatureLayer

'Create an enumerator of GeoFeatureLayers


Dim pEnumLayer As IEnumLayer
Set pEnumLayer = pMap.Layers(pId)

'Load the input form


Load frmLocalG

'Populate the layer combo box with point layers


pEnumLayer.Reset
Set pILayer = pEnumLayer.Next

Dim pointExists As Boolean


pointExists = False
Do While Not pILayer Is Nothing
Set pFeatureLayer = pILayer
Set pFeatureClass = pFeatureLayer.FeatureClass
If pFeatureClass.ShapeType = esriGeometryPoint Then
frmLocalG.cmbLayer.AddItem pILayer.Name
pointExists = True
End If
Set pILayer = pEnumLayer.Next
Loop

'If there are no point layers in the map, send a message and exit
If pointExists = False Then
MsgBox "This routine requires a point layer"
Exit Sub
End If

'Show the form


frmLocalG.Show

'If cancel button is selected, exit subroutine


If layerName = "" Then
Exit Sub
End If

'Set the layer to be analyzed


pEnumLayer.Reset
Set pILayer = pEnumLayer.Next

Do While Not pILayer Is Nothing


If pILayer.Name = layerName Then
Exit Do
End If
Set pILayer = pEnumLayer.Next
Loop
Set pFeatureLayer = pILayer
Set pFeatureClass = pFeatureLayer.FeatureClass

'Get the value field index


Dim fieldIndex As Integer
fieldIndex = pFeatureClass.FindField(valueField)

'find or create field for output


'generate the default name string
Dim outFieldName As String
Dim defaultName As String
If star Then
defaultName = "Gi*(" + Str(gDist) + ")"
Else
defaultName = "Gi(" + Str(gDist) + ")"
End If

'Give default name in InputBox for output field name


outFieldName = InputBox("Please enter an output field name.", _
"Output Field", defaultName)

'look for existing field


Dim outFieldIndex As Integer
outFieldIndex = pFeatureClass.FindField(Left(outFieldName, 10))

'if field does not exist, create it


If outFieldIndex < 0 Then
Dim pOutField As IField
Dim pOutFieldEdit As IFieldEdit
Dim pFields As IFields
Dim pFieldsEdit As IFieldsEdit
Set pOutField = New Field
Set pOutFieldEdit = pOutField
Set pFields = pFeatureClass.Fields
Set pFieldsEdit = pFields

With pOutFieldEdit
.Editable = True
.IsNullable = False
.Name = Left(outFieldName, 10)
.AliasName = outFieldName
.Type = esriFieldTypeDouble
.Length = 16
.Precision = 15
.Scale = 4
End With

pFeatureClass.AddField pOutField
Set pOutField = Nothing
outFieldIndex = pFeatureClass.FindField(Left(outFieldName, 10))
End If

'create feature cursor to edit the table


Dim pFeatureBuffer As IFeatureBuffer
Dim pFeatureCursor As IFeatureCursor
Set pFeatureCursor = pFeatureClass.Search(Nothing, False)
Set pFeatureBuffer = pFeatureClass.CreateFeatureBuffer

Dim icount As Integer


icount = pFeatureClass.FeatureCount(Nothing)
ReDim dataset(icount, 4) As Double 'array used for calculations
Dim counter As Integer
Dim sumval As Double, sumsqr As Double
sumval = 0
sumsqr = 0

Dim pFeature As IFeature


Dim nvalue As Double

'read data into array, get coordinates


For counter = 0 To (icount - 1)
Set pFeature = pFeatureClass.GetFeature(counter)
nvalue = pFeature.Value(fieldIndex)
Set pGeometry = pFeature.Shape
Dim pPoint As IPoint
Set pPoint = pFeature.Shape
dataset(counter, 0) = pPoint.x
dataset(counter, 1) = pPoint.Y
dataset(counter, 2) = nvalue
sumval = sumval + nvalue
sumsqr = sumsqr + (nvalue * nvalue)
Next

'Calculate statistic and store values in the table


D = gDist
If star Then

For counter = 0 To (icount - 1)


Set pFeature = pFeatureClass.GetFeature(counter)

Dim s As Double, xbar As Double, sumw As Double, sumwx As Double

xbar = sumval / icount


s = Sqr((sumsqr / icount) - (xbar ^ 2))
sumw = 0
sumwx = 0

For jcounter = 0 To (icount - 1)

Distance = Sqr(((dataset(counter, 0) - _
dataset(jcounter, 0)) ^ 2) + _
(((dataset(counter, 1) - _
dataset(jcounter, 1)) ^ 2)))

If Distance <= D Then


sumwx = sumwx + dataset(jcounter, 2)
sumw = sumw + 1
End If
Next

dataset(counter, 3) = (sumwx - (xbar * sumw)) / _


(s * Sqr(((icount * sumw) - _
(sumw ^ 2)) / (icount - 1)))

pFeature.Value(outFieldIndex) = Round(dataset(counter, 3), 4)


pFeature.Store
Next

Else
For counter = 0 To (icount - 1)
Set pFeature = pFeatureClass.GetFeature(counter)

Dim sn As Double, xbarn As Double


Dim sumwn As Double, sumwxn As Double

xbarn = (sumval - dataset(counter, 2)) / (icount - 1)


sn = Sqr(((sumsqr - (dataset(counter, 2) ^ 2)) / _
(icount - 1)) - (xbarn ^ 2))
sumwn = 0
sumwxn = 0

For jcounter = 0 To (icount - 1)

If jcounter <> counter Then

Distance = Sqr(((dataset(counter, 0) - _
dataset(jcounter, 0)) ^ 2) + _
(((dataset(counter, 1) - _
dataset(jcounter, 1)) ^ 2)))

If Distance <= D Then


sumwxn = sumwxn + dataset(jcounter, 2)
sumwn = sumwn + 1
End If
End If
Next

If sumwxn = 0 Then
dataset(counter, 3) = 0
Else
dataset(counter, 3) = (sumwxn - (xbarn * sumwn)) _
/ (sn * Sqr(((icount * sumwn) _
- (sumwn ^ 2)) / (icount - 1)))
End If

pFeature.Value(outFieldIndex) = Round(dataset(counter, 3), 4)


pFeature.Store
Next
End If 'Star
End Sub
getFields Subroutine

Public Sub getFields(layerName As String)


'This subroutine populates the fields combo box in the frmLocalG when
'a point layer is selected.
Dim pMxDoc As IMxDocument, pEnumFeat As IEnumFeature
Dim pGeom As IGeometry
Dim pMap As IMap
Set pMxDoc = ThisDocument
Set pMap = pMxDoc.FocusMap

Dim pFeatureLayer As IFeatureLayer


Dim pILayer As ILayer
Dim pFeatureClass As IFeatureClass
Dim pFields As IFields
Dim pfield As IField

Dim pEnumLayer As IEnumLayer


Set pEnumLayer = pMap.Layers()

pEnumLayer.Reset
Set pILayer = pEnumLayer.Next

Do While Not pILayer Is Nothing


If pILayer.Name = layerName Then
Exit Do
End If
Set pILayer = pEnumLayer.Next
Loop

Set pFeatureLayer = pILayer


Set pFeatureClass = pFeatureLayer.FeatureClass
Set pFields = pFeatureClass.Fields

For i = 0 To (pFields.FieldCount - 1)
Set pfield = pFields.Field(i)
If pfield.Type = esriFieldTypeDouble Or esriFieldTypeInteger Or _
esriFieldTypeSingle Or _
esriFieldTypeSmallInteger Then
frmLocalG.cmbField.AddItem pfield.Name
End If
Next i
End Sub
Form Event Subroutines

Private Sub btnCancel_Click()


layerName = ""
Unload frmLocalG
End Sub

Private Sub btnOK_Click()


If CDbl(txtDistance.Value) = 0 And chkStar.Value = False Then
MsgBox "Gi cannot be performed for distance = 0."
Exit Sub
End If

layerName = cmbLayer.Value
valueField = cmbField.Value
gDist = CDbl(txtDistance.Value)
star = chkStar.Value
Unload frmLocalG
End Sub

Private Sub cmbLayer_Change()


getFields (cmbLayer.Value)
cmbField.Enabled = True
End Sub

You might also like