You are on page 1of 13

SimpleRandomSamplingandSystematicSampling

Simplerandomsamplingandsystematicsamplingprovidethefoundationforalmostallofthemore
complexsamplingdesignsbasedonprobabilitysampling.Theyarealsousuallytheeasiestdesignsto
implement.Thesetwodesignshighlightatradeoffsinherentinselectingasamplingdesign:toselect
sampleunitsatrandomtominimizetheriskofintroducingbiasesintothesampleortoselectsamples
systematicallytoensurethatsampleunitsarewelldistributedthroughoutthepopulation.

BothdesignsinvolveselectingnsampleunitsfromtheNunitsavailableinthepopulationandcanbe
implementedwithorwithoutreplacement.

SimpleRandomSampling
Whenthepopulationofinterestisrelativelyhomogeneousthensimplerandomsamplingworkswell,
whichmeansitprovidesestimatesthatareunbiasedandhavehighprecision.Whenlittleisknown
aboutapopulationinadvance,suchasinapilotstudy,simplerandomsamplingisacommondesign
choice.

Advantages:

Easytoimplement
Requireslittleknowledgeofthepopulationinadvance

Disadvantages:

Impreciserelativetootherdesignsifthepopulationisheterogeneous
Moreexpensivethanotherdesignsifentitiesareclumpedandthecosttotravelamongunitsis
appreciable

Howitisimplemented:

SelectnsampleunitsatrandomfromNavailableinthepopulation

Allunitswithinthesamplinguniversemusthavethesame
probabilityofbeingselected,thereforeeachandevery
sampleofsizendrawnfromthepopulationhasanequal
chanceofbeingselected.

Therearemanystrategiesavailableforselectingarandom
sample.Forlargepopulations,thisofteninvolvesgenerating
pseudorandomnumberswithacomputerandforsmall
populationsitmightinvolveusingatableofrandomnumbers
orevenwritingauniqueidentifierforeverysampleunitin
thepopulationonascrapofpaper,placingthosenumbersin
ajar,shakingit,thenselectingnscrapsofpaperfromthejar
blindly.Theapproachusedforselectingthesamplematters

littleprovidedtherearenoconstraintsonhowthesampleunitsareselectedandallunitshaveanequal
chanceofbeingselected.

EstimatingthePopulationMean
Thepopulationmean()isthetrueaveragenumberofentitiespersampleunitandisestimatedwith
thesamplemean( or y )whichhasanunbiasedestimator:

y
i 1

whereyiisthevaluefromeachunitinthesampleandnisthenumberofunitsinthesample.

Thepopulationvariance(2)isestimatedwiththesamplevariance(s2)whichhasanunbiasedestimator:

s2

(y
i 1

y)2

n 1

2
N ns
.
N n

Varianceoftheestimate is: var( )

Thestandarderroroftheestimateisthesquarerootofvarianceoftheestimate,whichasalwaysisthe
standarddeviationofthesamplingdistributionoftheestimate.Standarderrorisausefulgaugeofhow
preciselyaparameterhasbeenestimated.

2
N ns
.

N n

Standarderrorof is: SE ( )

N n
isthefinitepopulationcorrectionfactorwhichadjustsvarianceoftheestimator
N

Thequantity

FPC

(notvarianceofthepopulationwhichdoesnotchangewithn)toreflecttheamountofinformationthat
isknownaboutthepopulationthroughthesample.Practically,thecorrectionfactorreflectsthe
proportionofthepopulationthatremainsunknown.Therefore,asthesamplesizenapproachesthe
populationsizeN,thefinitepopulationcorrection
factorapproacheszero,sotheamountofvariation
FPC with N = 100
associatedwiththeestimatealsoapproacheszero.

1
Whenthesamplesizenislargerelativetothe
0.8
0.6
populationsizeN,thefractionofthepopulation
0.4
beingsampledn/Nissmall,sothecorrectionfactor
0.2
haslittleeffectontheestimateofvariance(Fig.2
0
FPC.xls).Ifthefinitepopulationcorrectionfactoris
0
20
40
60
80
100
ignored,includingthosecaseswhereNisunknown,
n

theeffectonthevarianceoftheestimatorisslightwhenNislarge.WhenNissmall,however,the
varianceoftheestimatorcanbeoverestimatedappreciably.

EstimatingthePopulationTotal
Likethepopulationmean,thetotalnumberofentitiesinthepopulationisanotherattributeestimated
commonly.Unlikethepopulationmeanorproportion,estimatingthepopulationtotalrequiresthatwe
knowthenumberofsamplingunitsinapopulation,N.

Thepopulationtotal
estimator: N

N
n

N isestimatedwiththesampletotal( )whichhasanunbiased

i 1
n

i 1

whereNisthetotalnumberofsampleunitsinapopulation,nisthenumberofunitsinthesample,and
yiisthevaluemeasuredfromeachsampleunit.

Instudiesofwildlifepopulations,thetotalnumberofentitiesinapopulationisoftenrefereedtoas
abundanceandistraditionallyrepresentedwiththesymbolN.Consequently,thereisrealpotential
forconfusingthenumberofentitiesinthepopulationwiththenumberofsamplingunitsinthe
samplingframe.Therefore,inthecontextofsamplingtheory,welluse torepresentthepopulation
totalandNtorepresentthenumberofsamplingunitsinapopulation.Later,whenaddressingwildlife
populationsspecifically,welluseNtorepresentabundancetoremainconsistentwiththeliteraturein
thatfield.

Becausetheestimator issimplythenumberofsampleunitsinthepopulationNtimesthemean
numberofentitiespersampleunit, ,thevarianceoftheestimate reflectsboththenumberofunits
inthesamplinguniverseNandthevarianceassociatedwith .Anunbiasedestimateforthevariance
oftheestimate is:

s 2 N n
var() N 2 var( ) N 2

n N

wheres2istheestimatedpopulationvariance.

Example:EstimatingacariboupopulationinAlaska.

Caribouwerecountedinstriptransectsthatwere1milewide.Asimplerandomsampleof15transects
(n)werechosenfromthe286transectspotentiallyavailable(N).Thenumberofcariboucountedwere
1,50,21,98,2,36,4,29,7,15,86,10,21,5,4.

Thesamplemeannumberofcariboucountedpertransect:=25.93

Thesamplevariance:s2=919.0667

286 15 919.07
58.0576

286 15

Theestimatedvarianceofthesamplemean: va r( y )

Theestimatedstandarderrorofthemeanis: 58.06 7.62.

Anestimateofthetotalnumberofcaribouintheareais: 286(25.9333) 7417

Anestimateofvarianceoftheestimatedtotalis: va r( ) 286 2 (58.0576) 4,748,879

Theestimatedstandarderrorofthetotalis: 4,748,879 2179

EstimatingaPopulationProportion
Ifthereisinterestinthecompositionofapopulation,wecoulduseasimplerandomsampletoestimate
theproportionofthepopulationpthatiscomposedofelementswithaparticulartrait,suchasthe
proportionofplantsthatflowerinagivenyear,theproportionofjuvenileanimalscaptured,the
proportionoffemalesinestrus,andsoon.Wewillconsideronlyclassificationsthatfollowbinomial
trialswhichmeansthateitheranelementinthepopulationhasthetraitofinterest(flowering)ornot
(notflowering)althoughextendingthisideatomorecomplexsettingsisstraightforward.

Inthecaseofsimplerandomsampling,thepopulationproportionfollowsthemeanexactly;thatis,p=
.Ifthisideaisnewtoyou,convinceyourselfbyworkingthroughanexample.Saywegeneratea
sampleof10elements,where4haveavalueof1and6haveavalueof0(1=presenceofatrait,0=
absenceofatrait).Theproportionofthesamplewiththetraitis4/10or0.40andsoisthearithmetic
mean,which=0.40([1+1+1+1+0+0+0+0+0+0]/10=4/10).Cosmic.

Itfollowsthatthepopulationproportion(p)isestimatedwiththesampleproportion( p )whichhasan
unbiasedestimator:

y
i 1

Becausewearedealingwithdichotomousproportions(sampleunitdoesordoesnothavethetrait),the
populationvariance2iscomputedbasedonvarianceforabinomialwhichistheproportionofthe
populationwiththetrait(p)timestheproportionthatdoesnothavethattrait(1p)orp(1p).The
estimateofthepopulationvariances2is: p (1 p ) .

2
N n p (1 p )
N n s
.

N n 1 N n 1

Varianceoftheestimate p is: var( p )

2
N n p (1 p )
N n s
.

N n 1
N n 1

Standarderrorof p is: SE ( p )

SystematicSampling
Occasionally,selectingsampleunitsatrandomcanintroducelogisticalchallengesthatpreclude
collectingdataefficiently.Ifthechanceofintroducingabiasisloworifidealdispersionofsampleunits
inthepopulationisahigherprioritythatastrictlyrandomsample,thenitmightbeappropriateto
choosesamplesnonrandomly.Likesimplerandomsampling,systematicsamplingisatypeof
probabilitysamplingwhereeachelementinthepopulationhasaknownandequalprobabilityofbeing
selected.Theprobabilisticframeworkismaintainedthroughselectionofoneormorerandomstarting
points.Althoughsometimesmoreconvenient,systematicsamplingprovideslessprotectionagainst
introducingbiasesinthesamplecomparedtorandomsampling.Estimatorsforsystematicsamplingand
simplerandomsamplingareidentical;onlythemethodofsampleselecteddiffers.Therefore,
systematicsamplingisusedtosimplifytheprocessofselectingasampleortoensureidealdispersionof
sampleunitsthroughoutthepopulation.

Advantages:

Easytoimplement
Maximumdispersionofsampleunitsthroughoutthepopulation
Requiresminimumknowledgeofthepopulation

Disadvantages:

Lessprotectionfrompossiblebiases
Canbeimpreciseandinefficientrelativetootherdesignsifthepopulationbeingsampledis
heterogeneous

Howitisimplemented:

Chooseastartingpointatrandom
Selectsamplesatuniformintervalsthereafter

1inksystematicsample
Mostcommonly,asystematicsampleisobtainedbyrandomlyselecting1unitfromthefirstkunitsin
thepopulationandeverykthelementthereafter.Thisapproachiscalleda1inksystematicsamplewith
arandomstart.Tochooseksothanasampleofappropriatesizeisselected,calculate:

k=Numberofunitsinpopulation/Numberofsampleunitsrequired

Forexample,ifweplantochoose40plotsfromafield
of400plots,k=400/40=10,sothisdesignwouldbea
1in10systematicsample.Theexampleinthefigureis
a1in8sampledrawnfromapopulationofN=300;
thisyieldsn=28.Notethatthesamplesizedrawnwill
varyanddependsonthelocationofthefirstunit
drawn.

EstimatingthePopulationMean
n

Thepopulationmean()isestimatedwith:

y
i 1

Thepopulationvariance(2)isestimatedwith: s 2

(y
i 1

y) 2

n 1

2
N ns
.
N n

Varianceoftheestimate is: var( )

2
N ns
Standarderrorof is: SE ( )
.

N n

EstimatingthePopulationTotal
Thepopulationtotal isestimatedwith: N

N
n

y
i 1

s 2 N n
.
N

Varianceoftheestimate is: var() N 2 var( ) N 2


n
Standarderrorof is: var()

s2
N 2
n

N n

EstimatingthePopulationProportion
Thepopulationproportion(p)isestimatedwiththesampleproportion( p )whichhasanunbiased
estimator:

y
i 1

Becauseweareestimatingadichotomousproportion,thepopulationvariance2isagaincomputed
withabinomialwhichistheproportionofthepopulationwiththetrait(p)timestheproportionwithout
thattrait(1p)orp(1p).Theestimateofthepopulationvariances2is: p (1 p ) .

2
N n s
N n p (1 p )

N n 1 N n 1

Varianceoftheestimate p is: var( p )

HowManySamples?
Optimalallocationisanapproachtomaximizesamplingefficiency;thatistoprovidehighprecisionfora
givenamountofsamplingeffort.

Adifferentquestionishowmanysamplesshouldwetakefromthepopulation?

First,establishthedegreeofprecisionrequired,B,theboundtheerrorofestimation,whichisthehalf
widthoftheconfidenceintervalwewishtoattainfromsampling.Determinethesamplesizenrequired
bysettingZSE( y )equaltoBandsolvingthisexpressionforn.

Zisaconstantthatdenotestheupper/2pointofthestandardnormaldistributionwhereisthesame
valueusedtoestablishthewidthofconfidenceintervals.

PopulationMean

N
Forsimplerandomsampling,set: B Z
N

n 2

z 2 2
1
solveforntoget: n
; n0
.
2 or n
1
1
1
1
B

n0 N
z 2 2 N
B2
1

NotethatifnwillbesmallrelativetoN,thepopulationcorrectionfactorcanbeignored,andthe
formulaforsamplesizereducedton0.

Example:Estimatetheaveragebodymassofmalefreshmanoncampus.

Assumethatnopriorinformationexistswithwhichtoestimatepopulationvariance2butweknowthat
themassofmostmalefreshmeniswithinarangeofabout100poundsandthereareN=1000students.

HowmanysamplesareneededtoestimatewithaboundontheerrorofestimationB=3pounds
usingsimplerandomsampling?

Althoughitisbesttohavedatawithwhichtoestimate2,perhapsfromasmallpilotstudy,therangeis
oftenapproximatelyequalto4,soonefourthoftherangemightbeusedasanapproximatevalueof
:

range 100

25 .
4
4

Substituting: n

1
1
1
2
1000
2 25
2
3
2

1
1
1

277.78 1000

1
217.4
0.0036 0.001

Therefore,about218samplesareneededtoestimatewithaboundontheerrorofestimationB=3
pounds.

PopulationTotal

Forsimplerandomsampling,solvefornfrom: B Z N N n

2
n

N 2 z 2 2
1
n
; n0
or n
.
2
1
1
1
1
B

n0 N
N 2 z 2 2 N
B2
1

Again,ifNislargerelativeton,thepopulationcorrectionfactorcanbeignored,andtheformulafor
samplesizereducedton0.

Example:Whatsamplesizeisnecessarytoestimatethecariboupopulationweexaminedto
withinB=2000animalsofthetruetotalwith90%confidence(=0.10).

Usings2=919fromearlierandZ=1.645,whichistheupper=0.10/2=0.05pointofthenormal

286 21645
. 2 919
51
distribution: n0
2000 2

Toadjustforfinitepopulationsize: n

1
44
1
1

51 286

10

StratifiedRandomSampling

Thewaywehaveselectedsampleunitsthusfarhasrequiredthatweknowlittleaboutthepopulationof
interestinadvanceofselectingthesample.Thisapproachonlyworksbestwhenthecharacteristicof
interestisrelativelyhomogeneousacrossthepopulation.If,however,thecharacteristicis
heterogeneous,thenestimatesbasedonthesedesignswillbeimprecise.Ifwehaveancillary
informationthatisassociatedwiththeheterogeneityinthepopulation,wecanuseusingalternate
designstoselectsampleswhichwillyieldincreasedprecisionforafixedamountofeffort.Thefirstof
thesedesignsisstratifiedrandomsampling.

Astratifiedrandomsampleisoneobtainedbydividingthepopulationelementsintomutuallyexclusive,
nonoverlappinggroups(strata),thenselectingasimplerandomsamplefromwithineachstratum
(stratumissingularforstrata).Everypotentialsampleunitcanbeassignedtoonlyonestratumandno
unitscanbeexcluded.

Stratifyinginvolvesclassifyingsamplingunitsofthepopulationintorelativelyhomogeneousgroups,
usuallybeforeselectingsampleunits.Strataarebasedoninformationotherthanthecharacteristic
beingmeasuredthatisknowntoorthoughttovarywiththecharacteristicofinterestinsuchawaythat
thecharacteristicismorehomogeneouswithinstratathanamongstrata.Therefore,anyfeaturethat
explainsvariationinthecharacteristicofinterestcanbeusedasabasisforstratifying.Forexample,if
ourgoalistoestimatethetotalnumberofagavesinanareaandweknowfrompreviousworkthat
agaveabundancevarieswithsoiltype,wemightchoosetostratifythepopulationbysoiltype.Because
sampleswithinstrataarelikelytobemoresimilarthanthoseamongstrata,samplingerrorwillbelower
andestimatesgeneratedwithinstratawillhavehigherprecisionthansimplerandomsamplesdrawn
fromthesamepopulation.

Asmostecologicalsystemsareheterogeneous,stratifyingisacommonapproachforincreasing
precisioninecologicalstudies.Commonstratainecologicalstudiesincludeelevation,aspect,orother
geographicfeaturesforstudyingplantcommunitiesandvegetationcommunitiesforstudyinganimal
communities.Whenchoosingamongpotentialstrata,youshouldseektominimizevariationwithin
strataandmaximizevariationamongstrata.

Stratifiedrandomsamplingisappropriatewheneverthereisheterogeneityinapopulationthatcanbe
classifiedwithancillaryinformation;themoredistinctthestrata,thehigherthegainsinprecision.The
samepopulationcanbestratifiedmultipletimessimultaneously.

Advantages:

Higherprecisionofestimates
Moreefficient
Separateestimatesforeachstratum

Disadvantages:

Requiresancillaryinformation
Canbemoretimeconsumingtoplanandimplement

11


Howitisimplemented:

Dividetheentirepopulationintononoverlappingstrata
Selectedasimplerandomsamplefromwithineachstrata

L=numberofstrata

Ni=numberofsampleunitswithinstratumi

N=numberofsampleunitsinthepopulation

EstimatingthePopulationMean
Estimatesfromstratifiedrandomsamplesaresimplytheweightedsumofestimatesfromaseriesof
simplerandomsamples,eachgeneratedwithinauniquestratum.Thisshouldbeapparentinthe
estimatorsbelow,suchasthatforthepopulationmean,whichisanaverageofthemeansfromeach
stratumweightedbythenumberofsampleunitsmeasuredwithineachstratum.Withonlyone
stratum,stratifiedrandomsamplingreducestosimplerandomsampling.

Thepopulationmean()isestimatedwith:

1
N1 1 N 2 2 N L L 1
N
N

N
i

i 1

Varianceoftheestimate isagainjustaweightedaverageofestimatesfromaseriesofrandom
samples,althoughitlooksabitcumbersome:

var( )

1 2 N 1 n1 s12

N1
N 2 N 1 n1

N nL
N L2 L
NL

s L2

n L

1
2
N

N ni
N i

i 1
Ni

s i2

ni

N ni
N 12 i

i 1
Ni
L

si2

ni

Standarderrorof is: SE ( )

1
N2

2
1

EstimatingthePopulationTotal
Likethemean,estimatingatotalforastratifiedrandomsampleisamatterofsummingindividual
estimatesofthetotalfromeachstratum, N i i .

Thepopulationtotal isestimatedwith: N 1 1 N 2 2 N L L

N
i 1

Varianceoftheestimatedtotal is: var() N 2 var( )

N ni
N i2 i

i 1
Ni
L

si2

ni

12

Standarderrorof isthesquarerootof var() .

EstimatingthePopulationProportion
Estimatingtheproportionofthepopulationwithaparticulartrait(p)usingstratifiedrandomsampling
involvescombiningestimatesfrommultiplesimplerandomsamples,eachgeneratedwithinastratum.
Thepopulationproportionisestimatedwiththesampleproportion:

p N 1 p 1 N 2 p 2 N L p L N i p i
i 1

Varianceoftheestimate p is:

var( p )

1
N2

N
i 1

2
i

var( p i )

1
N2

N
i 1

2
i

N i ni

Ni

p i (1 p i )

ni 1

Standarderrorof p isthesquarerootof var( p ) .

Example:Simpleexampleof12samplestakenfromapopulationof41entities.

Stratum(i)

Ni

ni

si2

20

1.6

3.3

2.8

4.0

12

0.6

2.2

Estimateofthepopulationmean: y

1
1
64.4 157
20(16
. ) 9(2.8) 12(0.6)
.

41
41

Estimateofthepopulationtotal=411.57=64.4.

Estimatedvarianceoftheestimatedpopulationmeanis:

va r( y )

1
412

3.3
4.0
2.2 322.8

20(20 5) 5 9(9 3) 3 12(12 4) 4 412 0192

Estimatedvarianceoftheestimatedpopulationtotal=4120.192=322.8.

AllocatingSamplingEffortamongStrata
Afterdecidingtousestratifyrandomsampling,weneedtodecidehowtodividesamplingeffortamong
differentstrata;thatprocessiscalledallocation.Whendecidingwheretoexpendeffort,thequestion
becomeshowbesttoallocatesamplingeffortamongstratasothatthesamplingprocesswillbethe
mostefficientbalanceofeffort,cost,andprecision.Shouldweallocatethesamesamplingeffortto

13

eachstratum?Ifstrataareofdifferentsizes,asisusuallythecase,shouldweallocatemoreeffortto
largerstratum?

Therearemanystrategiesforallocatingsamplingeffort,andthemoreinformationavailableaboutthe
populationofinterest,themoreefficienttheallocationstrategycanbe.Informationonthevariability
ofsampleswithineachstratum,therelativecostofobtainingasamplefromeachstratum,andthe
numberofsampleunitsineachstratumcanallhelptoincreasesamplingefficiency.Someofthemost
commonallocationsstrategiesareuniform,proportionaltosize,variation,andcost,andoptimal,which
simultaneouslyconsiderssize,variation,andcostorwhichevercombinationofthoseisavailable.All
strategiesfunctionbycreateasimpleproportionalmultiplierbywhichafixednumberofsamplescanbe
allocatedamongstrata.

UniformAllocation

Thesimplestallocationstrategyistoselectthesamenumberofsamplesfromeachstratum,whichisan
idealapproachifthereisnoinformationavailableaboutvariabilityofunitswithinstrata,thecostof
samplingissimilarforallstrata,andstrataareofsimilarsize.

AllocationProportionaltoSizeorVariation

Thenumberofsampleunitstoselectfromeachstratumcanbemadeproportionaltothenumberof
sampleunits(orsize)withineachstratum.Variationinastratumoftenincreaseswithathesizeofa
stratum,soinsomecasesthisapproachcanbeconsideredasaroughapproachforallocatingmore
efforttostratathatarelikelytobemorevariablestrata.Toallocationproportionaltostratumsize:

N
n i n L i

Ni
i 1

n N i

Toallocationproportionaltotheamountofvariationamongelementswithineachstratum,as
measuredbytheestimatedstandarddeviationwithineachstratum:

s
n i n L i

si
i 1

Thisapproachreliesonestimatesgeneratedfromapreviousstudyoralternativelybytheabilityto
gaugerelativedifferencesinvariationamongstrata,suchasexpectingonestratumtohave1.5times
thevariationasanotherstratum.

OptimalAllocation

Bothallocationapproachesabovearespecialcasesoftheoptimalallocationstrategywhichestimates
thepopulationmeanortotalwiththelowestvarianceforagivensamplesizeinstratifiedrandom

14

sampling.Thenumberofsamplesselectedfromeachstratumisproportionaltothesize,variation,as
wellasthecost(ci)ofsamplingineachstratum.Moresamplingeffortisallocatedtolargerandmore
variablestrata,andlesstostratathataremorecostlytosample.

N i si

c
ni n L i

N k sk
k 1

ck

15

You might also like