You are on page 1of 9

A Series of Tutorials for Teaching Statistical Concepts in

an Introductory Course
I. Sampling From an Aerial Photograph
Glenys Bishop
University of Adelaide, Australia

Journal of Statistics Education v.6, n.2 (1998)

Copyright (c) 1998 by Glenys Bishop, all rights reserved. This text may be freely shared
among individuals, but it may not be republished in any medium without express written
consent from the author and advance notification of the editor.

Key Words: Excel; Natural resources science; Proportions; Sampling distribution; Spatial
statistics.

Abstract

This paper outlines one of a series of tutorials developed as part of an introductory statistics
course for Agricultural and Natural Resource Sciences students. Here we compare two
methods of sampling from an aerial photograph to obtain an estimate of the proportion of a
particular type of vegetation. One method, transect sampling, is traditionally used by field
ecologists, while the other is simple random sampling in a plane. Preparation details and
possible extensions to the tutorial are described.

1. Introduction
1 Faced with teaching a compulsory introductory statistics subject to two large groups of
Agricultural and Natural Resource Sciences students, we searched for motivating examples to
be used in both lectures and tutorials. This is the first of several papers describing the tutorials
we have developed. Our target audience consisted of first year Agricultural and Natural
Resource Sciences students, but some of the examples are more widely applicable.

2 Students participate in eleven tutorials throughout the semester. In one tutorial we use the
fruitfly data of Hanley and Shapiro (1994) to teach hypothesis testing, while in another we
use the conditional probability examples of Rossman and Short (1995). These examples have
been well received by the students.

3 In this and subsequent papers, we shall describe some of the tutorials that we have
developed. The tutorials are conducted in computer laboratories, but access to a computer is
not essential for the exercise discussed here.

4 We use Microsoft Excel for calculations, graphics, and statistical functions. However, any
statistical package with a random number generator, or just a table of random digits, will
suffice for this exercise.
2. Teaching Goals
5 The primary goal of this tutorial is to enable students to compare transect sampling,
traditionally used by ecologists, with simple random sampling. We also want to prepare
students for the idea of the sampling distribution of a proportion by recording the values
obtained by all the students.

6 A more detailed objective of this tutorial is to teach students how to select a sample -- in
this case, a sample of points from an aerial photograph. To do this, they must understand the
concept of randomness and how to use either a random number generator or a table of
random digits. In this tutorial we use random number generation from the Excel analysis
tools.

3. Background
7 Rangeland managers and field ecologists often need to estimate the area covered by a
particular vegetation type within a region. Although remote sensing by satellite imaging has
made this task easier, it is often necessary to employ simpler methods. An aerial photograph
provided by courtesy of the Resource Information Group, Department of Environment and
Natural Resources, South Australia, illustrates the types of information that may be sought.

8 The photograph in Figure 1 shows the Sandy Creek Conservation Park and surrounding
areas about 50 kilometres north of Adelaide. The conservation park is in the bottom left
quadrat of the picture. The surrounding area includes roads, a river, cleared and uncleared
farmland, plantations, dams, and a few houses. The whole region is fairly flat.

Figure 1 (540K jpg)

Figure 1. Aerial Photograph of the Sandy Creek Conservation Park.

9 Transect sampling when non-moving objects are to be counted involves choosing a line or
series of lines along which the counts are to take place. The transects may be chosen
randomly, or the first may be chosen randomly and subsequent transects systematically. They
may be parallel, perpendicular, or at some other angle suitable for the situation.

10 An alternative method is point sampling. Imagine a grid placed over the area such that the
grid lines are as close together as is practical. For instance, they may be a metre or half metre
apart in a conservation park. Coordinate pairs are randomly generated, and the points
represented by those pairs are examined for the presence or absence of the objects of interest.

4. Logistics
11 To assist others in running this tutorial we have prepared details about the materials
required and also guidelines for tutors. They should be read in conjunction with the student
notes shown in the Appendix at the end of this paper.

4.1. Materials

12 Students should be able to easily distinguish undisturbed bushland in the photograph. We


have experimented with photocopies and found the photographic button on a photocopying
machine takes a clear copy of a photograph. Copies of a photocopy are not clear enough to be
useful. In each tutorial class, we have two or three original photographs, laminated for
protection, so that students can view the fine details.

13 So that students can find randomly generated points, we have glued a measurement scale
to each edge of the photograph before copying it. We obtained these scales by photocopying a
ruler marked in millimetres onto white paper.

4.2. Guidelines for Tutors

14 In the week before the tutorial, students should be advised to familiarise themselves with
the problem as described in the first part of the student notes (see the Appendix). They should
also be told to bring a ruler and a highlighter pen to the class.

15 The tutorial is designed to last for 50 minutes, and it is important to keep things moving as
the main aim of the exercise can only be met when all students have collected two different
samples and calculated estimates.

16 The tutorial can be divided into three parts: definition, execution, and conclusions. First,
divide the students into discussion groups of about four to define the regions they will regard
as undisturbed bushland. Allow five minutes for this discussion, and then hold a five-minute
forum to establish a class definition of undisturbed bushland. This definition must be
operational; that is, students must be able to decide whether any point on the graph is
undisturbed bushland or not.

17 Two points should emerge from the forum. To compare sampling methods, the same
definition of bushland must be used for both methods. Furthermore, if students' estimates are
to be directly comparable, they must all be using the same definition.

18 The tutorial now moves into the execution stage. Ensure that students clearly highlight the
undisturbed bushland areas on their photocopies of the photograph. (In the past, we have
found that the means of all students' estimates for the two methods differ substantially. This is
probably because many students classify isolated clumps of trees as bushland in simple
random sampling, but not in transect sampling.) Warn students that highlighting and sample
selection should take no more than 20 minutes.

19 The last stage of the tutorial involves discussion about what can be learnt from the data
and reaching conclusions. Collect each student's estimates on the board so that everyone can
see them. The estimates should be arranged in two columns: transect and simple random
sample. Get students to discuss the precision and usefulness of the two methods. Discussion
points can include

 A comparison of five number summaries for the two methods to see which
estimate is less variable or more precise,

 An examination of the fairness of the above comparison, taking into


consideration the amount of effort required to obtain each estimate in class
and how this might change in the field,

 The usefulness of each method in terms of precision -- that is, the range of
class estimates should be small enough to give us a reasonable idea of the
proportion of native bushland, and

 Consideration of improving precision by increasing the number of points or


transects sampled.

5. Extensions
20 If more time is available there are several issues that can be developed from this tutorial.
The most obvious is the sample size effect for the simple random sample. We have used 20
points because that number was thought to be achievable in the allocated time. As a short cut,
you could ask students to use the first 10 points to obtain an estimate, and then all 20 points.

21 More advanced students could be asked to examine other sampling schemes such as
systematic sampling. In this method, r parallel transects at equal intervals are examined; the
first transect is selected randomly in the range 1 to (180/r). For this example, if ten transects
are to be used, the first is chosen by generating a random number between 1 and 18 (180/10).
Subsequent transects are 18 mm below the previous one.

22 Points can also be chosen systematically instead of randomly. They can be selected at
regular or irregular intervals along parallel transects. The aim is to give representative
coverage of the area, while avoiding following features of the landscape such as streams,
fencelines, and ridges. Methods that simulate what an ecologist might do on foot in a park,
when no aerial photograph exists, could be discussed. Buckland, Anderson, Burnham, and
Laake (1993) discuss some of the practicalities of these methods when introducing principles
of distance sampling.
23 Transect sampling is also used in microscopic work. For example, physiologists count
certain cell types, and engineers examine grain structure in metals. Commonly, a grid
available on the microscope is used to take systematic samples. Photographs taken under the
microscope could be used in place of the aerial photograph or as an extension to illustrate
other uses of transect sampling.

24 In subsequent lectures, when introducing the sampling distribution of a proportion, we


have found it useful to refer to the variety of estimates obtained by students. The number of
points, from a simple random sample of 20, that are in undisturbed bushland may be used as
an example of a binomial variable. However, because points on the transect are not
independent of one another, the number of points out of 400 in undisturbed bushland is not a
binomial variable.

25 As an assignment question, we ask the students to calculate a 95% confidence interval for
the proportion, p, of the whole area that is in undisturbed bushland using their own simple
random sample estimate. We also ask them to explain why the formula for the 95%
confidence interval is inappropriate for the transect method. Most of them can see that the
points are not independent.

Acknowledgments

This work was conducted with aid of a University of Adelaide Teaching Development Grant.
I wish to thank two anonymous referees for their very helpful suggestions.

Appendix: Student Notes


A.1. Learning Objectives
In this tutorial we shall learn

 Some ways of comparing different sampling methods,

 That estimates of the same thing, based on different samples, will vary, and

 How to select a random sample using random numbers.

A.2. The Problem

A common problem in rangeland management is the estimation of areas covered by a


particular vegetation type in a region. In this workshop, we are going to compare two
different rangeland surveying techniques, both of which involve sampling.

Method 1: Line Transect Method

This method is suitable for flat, low-lying scrub with clear delineations among vegetation
types. Walk in a straight line from a random starting point and either count paces or use a tape
measure to find the proportion of the whole traverse that intersects the vegetation type of
interest.
Since we have a good view of the region, we may decide to choose
a representative or bestpath to take for our estimate, but this leaves us open to the possibility
of biasing our estimate towards a preconceived idea of the area. On the other hand, choosing
a line at random is only the best method when the vegetation type of interest occurs in
random clumps, clearly a rare property in practice.

A common compromise is to take two lines, one at right angles to the other. In this way we
are likely to pick up any clumping.

Method 2: Grid Estimation Method

Given the dimensions of the region, randomly choose a subset of point coordinates to sample.
Walk to each point and decide whether it lies in the vegetation type of interest. Find the
number of points in the vegetation as a proportion of all points examined. In each case, this
proportion is an estimate of the proportion of the total area of the region covered. We can do
something similar in the laboratory by using an aerial photograph.

A.3. Initial Discussion

You have been given a photograph (scale 1:16000) of the Sandy Creek Conservation Park
and surrounds near Gawler taken in March 1979. The aim is to determine how much of the
whole region is "undisturbed bushland" as this has implications for the park fauna.

1. In discussion groups, decide which of the following you will regard as


undisturbed bushland. The original photograph is with the tutor for clearer
inspection.

o River banks

o Trees lining roads

o Paddocks with substantial tree cover

o Plantations

2. Still in groups, discuss reasons why it is important to define the undisturbed


bushland regions before you start.

3. Share your ideas with the whole class.

4. Once the class has settled on a definition, use a highlighter pen to mark
boundaries around all areas of undisturbed bushland. The highlighter
should be at the perimeter of the bushland but not in the bushland.

Eyeball Estimate: Just looking at the photograph, what proportion of the region do
you think is undisturbed bushland? Enter this value on the tutor's data sheet.
A.4. Executing Method 1

1. The left-hand side of the map is 180 mm long. To choose a starting point,
select a random number between 0 and 180.

2. Write down the selected random starting point (in millimetres) for the left-
hand side.

3. Draw a horizontal line across the picture starting at this point. Write down
the length of the line lying in undisturbed bushland -- that is, within the
boundaries you drew previously.

4. Repeat Steps 1 through 3 for a random starting point along the top of the
picture. (N.B. This time you want a starting point between 0 and 220.)

5. Add the two lengths of undisturbed bushland together.

6. The length of the horizontal line is 220 mm, and the length of the vertical
line is 180 mm. Use your answer from Step 5 to estimate the proportion of
the total area in the photograph that is undisturbed bushland.

7. Write your answer from Step 6 on the tutor's data sheet.

A.5. Executing Method 2

Imagine a grid of one-millimetre squares drawn on the photograph. Each of the intersections
is a point either lying in undisturbed bushland or not. Since this grid is very fine, the
proportion of the points which lie in undisturbed bushland will be close to the corresponding
proportion of the area. We will sample 20 points at random by selecting random coordinates
from the top and side scales 20 times.

1. Set up a table for your sampled points under the headings Top, Side, and
Bush.

2. Generate 20 top coordinates by selecting 20 random numbers between 0


and 220. Enter these numbers in the Top column of your table.

3. Next to these, in the Side column of your table, enter 20 random numbers
in the range 0 to 180.

4. Find the point on the photograph given by the Top and Side coordinates in
the first row of your table. If the point is in undisturbed bushland, as
marked by your highlighted boundaries, enter a 1 under the Bush heading.
Enter a 0 otherwise.
Suppose your first coordinate pair is Top = 25 mm, Side = 150 mm, and the
point lies in farmland. Then your table would look like this:
Top Side Bush
25 150 0

5. Repeat Step 4 for all 20 points.

6. Count the number of 1's in the Bush column and divide the count by 20 to
estimate the proportion of undisturbed bushland in the region.

7. Write your answer on the tutor's data sheet.

A.6. Summary Discussion

Your tutor will write the class results for the two methods on the board. Get into groups and
perform the following tasks.

 Find the five number summary of estimates for Method 1 in your class.

 Find the five number summary of estimates for Method 2 in your class.

 Use the five number summaries to decide which sampling method is more
precise (i.e., less variable). Give reasons. Consider ways of improving the
precision.

 Decide whether either method is precise enough to be useful.

 Discuss whether it is reasonable to compare a proportion estimated from a


random sample of 20 points with one estimated from two lines with a total
of 400 points. Give reasons. (Hint: You could consider the amount of effort
required to collect each of these samples in the laboratory and in the field.)

References
Buckland, S. T., Anderson, D. R., Burnham, K. P., and Laake, J. L. (1993), Distance
Sampling: Estimating Abundance of Biological Populations, London: Chapman & Hall.

Hanley, J. A., and Shapiro, S. H. (1994), "Sexual Activity and the Lifespan of Male Fruitflies:
A Dataset That Gets Attention," Journal of Statistics Education [Online], 2(1).
(http://jse.amstat.org/v2n1/datasets.hanley.html)
Rossman, A. J., and Short, T. H. (1995),"Conditional Probability and Educational Reform:
Are They Compatible?," Journal of Statistics Education [Online], 3(2).
(http://jse.amstat.org/v3n2/rossman.html)

Glenys Bishop
Statistics Department
University of Adelaide
Australia 5005

gbishop@stats.adelaide.edu.au

Return to Table of Contents | Return to the JSE Home Page

You might also like