Professional Documents
Culture Documents
BitonicSort
Thepseudocodeforgeneratingabitonicsequence:
1. createtwobitonicsubsequences
2. Makeoneanincreasingsequence
3. Makeoneintoadecreasingsequence
2
Workforabitonicsort:W
(n)= (nlog
(n))thisalgorithmisnotworkoptimal
bs
2
Spanforabitonicsort:D
(n)= (log
(n))
bs
DistributedBitonicMergeviaBinaryExchange
Withregardstobitonicsorts,everythingboilsdowntodoingbitonicsortsefficiently.
Step1:abitonicsplit,performedinplace
Attheendofstage1therearetwobitonicsubsequences
Foradistributedalgorithm,dividetheprocessesbetweenthe
nodes.
Thisshowstheminimumbetweenthetwoinputs.(Thestartofthesplit.)
Thesecondredlineshowthemaximumbetweenthetwoinputs.Theedges
indicatethedependencebetweenthetwoinputsandoutputs.
Theendresultisabitonicsplit.
Thisshowstheinputs,theoutputs,andthepatternsof
dependencies.
Tolookatthedistributedversionofthethis:
Using4processingnodes.
Inthiscase,communication
Communicationoccursanywhereadependenceedgecrosses
aprocessboundary.
Binaryexchange=twoprocessesswappingdatawitheach
other.
Notethatallcommunicationoccursinthefirstlog(p)stages.
Thereareonlylog(p)stagesthatrequirecommunication.
Intheotherlog(n/P)stagesthereisnoneedforcommunicationbetweenprocesses.
Inacyclicdistributiontherowsofthenetworkareassignedtodifferentprocessesina
roundrobinfashion.Thecommunicationissimilartotheblockscheme.
PickaNetwork
Whichnetworkwouldallowforfullyconcurrentexchangeswithoutcongestion?
Hypercubeandfullyconnectedtopologies.
Thisisbecauseyouneedanetworkwithalinearorbetterbisectionwidth.Thefullyconnected
networkismorethannecessary.
CommunicationCostofaBitonicMerge
Whatisthecommunicationtimeofabitonicmerge,assumingablockdistributed,
binaryexchangeschemeonahypercube.
Communicationtime=(a+(b*(n/p))*log(p))
Recall,thebinaryexchangeschemeonlycommunicatesduringthefirstlog(P)stages.
Eachprocesshastosendn/Pwordsateachstage.
BitonicMergeviaTransposes
Twodistributedbitonicmergeschemeshavebeendiscussed:
Blockdistributionscheme:log(P)stagesofcommunication
andlog(n/P)stagesofcomputation.
Cyclicscheme:log(n/P)stagesofcomputationandlog(P)stages
ofcomputation
Therunningtimeforthetwoarethesame:T
(nP)= log(P)+ (n/P)log(P)
net
Startwithacyclictopology:
Thismeansthereisnocommunicationinitially.Thenswitchtoblock,whichmeansno
communicationattheend.Tomakethisworkthedatawillhavetobetransposed(or
shuffled).
Thetransposecanbeseenasanalltoallexchangeorasa
matrixtranspose.
2
Ifwelookatoneprocessnote:itneedstosendP1messages,eachofsizen/P
,toalltheother
processes.
Todeterminetheprocesstime:
1. Assumethenetworkisfullyconnected.
T
(nP)= (P1)+ (n/P)((P1)/P)Fullyconnectednetwork.
trans
Thereisalatencybandwidthtradeoffbetweenthetwo
schemes.
Inpracticeisitveryhardtofortheblockorcyclicschemeto
beatthetransposescheme.
ButterflyTrivia
Nameanotherfamousalgorithmthatfollowsthesame
computationalpattern.
FastFourierTransform
BitonicSortCostComputation
= cost/compare
(n/P )K = totaltimetodothecomparisons,atmergingstageK
Therearelognmergingstages,sothetotalcostforcomputationis:
BitonicSortCostCommunication
Assume:n,ParepowersofnP/n
T
(m)= + m
msg
Whatisthecommunicationtime?
Thekeyquestion:wheredoes
communicationhappen?
ForeachstageK,thesizeofthe
K
bitonicmergeisn
=2
K
Stage4asinglebitonicmergeof
size16.
AssumeablockdistributionwithPprocesses.
CommunicationonlystartswhenK>log(n/P)
Klog(n/P)
P
=2
whenk==log(n)itissimplifiedtoP
K
BitonicMergeinPprocessesofnsizerequires
Timetocommunicate
Sothetimefortheentiresortis:
Thesimplifiedversionis:
O( log(P ) + (n/P )(log 2
P))
LinearTimeDistributedSort,Part1
Anycomparisonbasedalgorithmforsortingscalesto:O(nlogn)
ForthebucketsortO(n)
Todobucketsort:
1. Startbyassumingyouknowthepossiblevalues.R={0,1,2,3,...,m1}
2. Thevaluestobesortedareuniformlydistributedovertherange.
3. DivideRintokbuckets
4. Thebucketsortfirstfiguresoutwhichbucketeachvaluebelongs.
5. Sortwithinineachbucketandconcatenatetheresults.
Howisthisalineartimescheme?
Howmanyelementsareineachbucket?theexpectednumberofelementsineachbucketwill
ben/k.(Assumetheelementsareevenlydistributedacrosstherange)
Thetimetosorteachbucketis:O(n/klog(n/k))
DistributedBucketSort
Itistodistributeassigneachbuckettoacomputenode.
Makethefollowingassumptions:
k=P
elementspernode~n/Pelementspertable
Assumeallnodesknowbucketranges
Assumethenetworkisfullyconnected
Therunningtimeis:O(b*(n/P)+t*(n/P)+(a*P))
Thestepstogetthis:
1. Eachprocessneedstoscanitslistoflocalelementsanddecidewhichelementsgo
where.Thenodescandothisinparallelandtheworkislinear.
2. Thenodesneedtoexchangeelements(analltoalloperation).Eachnodehas~n/P
2
elements.Soexpecttosendn/P
elementstoeverynode.Assumethenetworkisfully
connected.
3. Noweachbucketmustdoalocalsort,ofcostO(n/P)
LineartimeDistributedSort,Part2:SampleSort
Thebucketsorthasamajorflawtheassumptionofauniformdistributionofvaluesacross
thebuckets.Ifyoudonothaveuniformdist.,youwillnotgetanequalnumberofelementsin
eachbucket.
SouseSampling.
Dobucketsort,buttheintervalsvaryaccordingtothedata.Todecidethesizeoftheintervals,
usesampling.
TodoSampleSort:
1. Beginwithdataand,inthiscase,3processes.
2. Assumetheelementsareequallydist.amongthe3processes:
3. Sortthemlocally
4. Eachprocesswillchooseasampleofelementsfromtheirlist.Eachshouldchoosethe
sameequallyspacedelements.
5. Gatherthesamplesintheroot.
6. Sortthesamplesontheroot.
7. SelectP1splitters(inthiscase2)
8. Thesplittersdefinetheglobalbucketboundaries.
9. Forthisexample:
Thefirstbucketwillgetthefirstsplit,the0elements.Thesecondbucket
willgetthesecondsplit,13.Thethirdbucketwillgetthelastsplit,4end.
10. Nowthesplitterswillneedtobebroadcast.
11. Eachnodecanpartitionitselementsusingthesplitters.
12. Thenthenodesexchangevalues.Eachnodewillgetonlythevaluesinitsrange.
13. Theneachnodewilldoalocalsort.
Intherunningtimeforthissamplesort,whatisthelargestasymptoticfunctionofP?(AssumeP
processes)
2
2
2
O(P
)orO(P
logP)...TheroothastosortP
samples.
Ifthesystemistrulymassive,thiscouldbeadelimitedtoscalability.