You are on page 1of 9

Lesson25DistributedMemorySorting

BitonicSort

Thepseudocodeforgeneratingabitonicsequence:

1. createtwobitonicsubsequences
2. Makeoneanincreasingsequence
3. Makeoneintoadecreasingsequence

2
Workforabitonicsort:W
(n)= (nlog
(n))thisalgorithmisnotworkoptimal
bs
2
Spanforabitonicsort:D
(n)= (log
(n))
bs

DistributedBitonicMergeviaBinaryExchange

Withregardstobitonicsorts,everythingboilsdowntodoingbitonicsortsefficiently.

Step1:abitonicsplit,performedinplace
Attheendofstage1therearetwobitonicsubsequences

Foradistributedalgorithm,dividetheprocessesbetweenthe
nodes.

Thisshowstheminimumbetweenthetwoinputs.(Thestartofthesplit.)

Thesecondredlineshowthemaximumbetweenthetwoinputs.Theedges
indicatethedependencebetweenthetwoinputsandoutputs.

Theendresultisabitonicsplit.

Thisshowstheinputs,theoutputs,andthepatternsof
dependencies.

Tolookatthedistributedversionofthethis:
Using4processingnodes.

Inthiscase,communication
Communicationoccursanywhereadependenceedgecrosses
aprocessboundary.
Binaryexchange=twoprocessesswappingdatawitheach
other.

Notethatallcommunicationoccursinthefirstlog(p)stages.


Thereareonlylog(p)stagesthatrequirecommunication.
Intheotherlog(n/P)stagesthereisnoneedforcommunicationbetweenprocesses.

Inacyclicdistributiontherowsofthenetworkareassignedtodifferentprocessesina
roundrobinfashion.Thecommunicationissimilartotheblockscheme.

PickaNetwork

Whichnetworkwouldallowforfullyconcurrentexchangeswithoutcongestion?
Hypercubeandfullyconnectedtopologies.
Thisisbecauseyouneedanetworkwithalinearorbetterbisectionwidth.Thefullyconnected
networkismorethannecessary.

CommunicationCostofaBitonicMerge

Whatisthecommunicationtimeofabitonicmerge,assumingablockdistributed,
binaryexchangeschemeonahypercube.

Communicationtime=(a+(b*(n/p))*log(p))
Recall,thebinaryexchangeschemeonlycommunicatesduringthefirstlog(P)stages.
Eachprocesshastosendn/Pwordsateachstage.

BitonicMergeviaTransposes

Twodistributedbitonicmergeschemeshavebeendiscussed:

Blockdistributionscheme:log(P)stagesofcommunication
andlog(n/P)stagesofcomputation.

Cyclicscheme:log(n/P)stagesofcomputationandlog(P)stages
ofcomputation

Therunningtimeforthetwoarethesame:T
(nP)= log(P)+ (n/P)log(P)
net

Istheresomethingthatwillreducethe term,evenatthecostofincreasingthe term?YES

Startwithacyclictopology:
Thismeansthereisnocommunicationinitially.Thenswitchtoblock,whichmeansno
communicationattheend.Tomakethisworkthedatawillhavetobetransposed(or
shuffled).

Thetransposecanbeseenasanalltoallexchangeorasa
matrixtranspose.

2
Ifwelookatoneprocessnote:itneedstosendP1messages,eachofsizen/P
,toalltheother
processes.

Todeterminetheprocesstime:
1. Assumethenetworkisfullyconnected.

T
(nP)= (P1)+ (n/P)((P1)/P)Fullyconnectednetwork.
trans

Thereisalatencybandwidthtradeoffbetweenthetwo
schemes.
Inpracticeisitveryhardtofortheblockorcyclicschemeto
beatthetransposescheme.

ButterflyTrivia
Nameanotherfamousalgorithmthatfollowsthesame
computationalpattern.
FastFourierTransform

BitonicSortCostComputation

= cost/compare

(n/P )K = totaltimetodothecomparisons,atmergingstageK

Therearelognmergingstages,sothetotalcostforcomputationis:

BitonicSortCostCommunication

Assume:n,ParepowersofnP/n

T
(m)= + m
msg

Whatisthecommunicationtime?
Thekeyquestion:wheredoes
communicationhappen?
ForeachstageK,thesizeofthe
K
bitonicmergeisn
=2
K

Stage4asinglebitonicmergeof
size16.

AssumeablockdistributionwithPprocesses.
CommunicationonlystartswhenK>log(n/P)

Klog(n/P)
P
=2
whenk==log(n)itissimplifiedtoP
K

BitonicMergeinPprocessesofnsizerequires

Timetocommunicate

Sothetimefortheentiresortis:

Thesimplifiedversionis:
O( log(P ) + (n/P )(log 2

P))

LinearTimeDistributedSort,Part1

Anycomparisonbasedalgorithmforsortingscalesto:O(nlogn)

ForthebucketsortO(n)

Todobucketsort:
1. Startbyassumingyouknowthepossiblevalues.R={0,1,2,3,...,m1}
2. Thevaluestobesortedareuniformlydistributedovertherange.
3. DivideRintokbuckets

4. Thebucketsortfirstfiguresoutwhichbucketeachvaluebelongs.

5. Sortwithinineachbucketandconcatenatetheresults.

Howisthisalineartimescheme?
Howmanyelementsareineachbucket?theexpectednumberofelementsineachbucketwill
ben/k.(Assumetheelementsareevenlydistributedacrosstherange)

Thetimetosorteachbucketis:O(n/klog(n/k))

DistributedBucketSort
Itistodistributeassigneachbuckettoacomputenode.
Makethefollowingassumptions:
k=P
elementspernode~n/Pelementspertable
Assumeallnodesknowbucketranges
Assumethenetworkisfullyconnected

Therunningtimeis:O(b*(n/P)+t*(n/P)+(a*P))
Thestepstogetthis:
1. Eachprocessneedstoscanitslistoflocalelementsanddecidewhichelementsgo
where.Thenodescandothisinparallelandtheworkislinear.
2. Thenodesneedtoexchangeelements(analltoalloperation).Eachnodehas~n/P
2
elements.Soexpecttosendn/P
elementstoeverynode.Assumethenetworkisfully
connected.
3. Noweachbucketmustdoalocalsort,ofcostO(n/P)

LineartimeDistributedSort,Part2:SampleSort

Thebucketsorthasamajorflawtheassumptionofauniformdistributionofvaluesacross
thebuckets.Ifyoudonothaveuniformdist.,youwillnotgetanequalnumberofelementsin
eachbucket.

SouseSampling.
Dobucketsort,buttheintervalsvaryaccordingtothedata.Todecidethesizeoftheintervals,
usesampling.

TodoSampleSort:
1. Beginwithdataand,inthiscase,3processes.


2. Assumetheelementsareequallydist.amongthe3processes:

3. Sortthemlocally

4. Eachprocesswillchooseasampleofelementsfromtheirlist.Eachshouldchoosethe
sameequallyspacedelements.

5. Gatherthesamplesintheroot.
6. Sortthesamplesontheroot.

7. SelectP1splitters(inthiscase2)

8. Thesplittersdefinetheglobalbucketboundaries.
9. Forthisexample:

Thefirstbucketwillgetthefirstsplit,the0elements.Thesecondbucket
willgetthesecondsplit,13.Thethirdbucketwillgetthelastsplit,4end.


10. Nowthesplitterswillneedtobebroadcast.
11. Eachnodecanpartitionitselementsusingthesplitters.

12. Thenthenodesexchangevalues.Eachnodewillgetonlythevaluesinitsrange.
13. Theneachnodewilldoalocalsort.

Intherunningtimeforthissamplesort,whatisthelargestasymptoticfunctionofP?(AssumeP
processes)

2
2
2
O(P
)orO(P
logP)...TheroothastosortP
samples.

Ifthesystemistrulymassive,thiscouldbeadelimitedtoscalability.

You might also like