You are on page 1of 16

Module 3

SIMD COMPUTERS
Traditional Von Neumann machine is SISD - it has one instruction stream,
one CPU and one memor!
Sin"le Instruction stream Multi#le Data stream$SIMD% &' com#uter that
#er(orms one o#eration on multi#le sets o( data! SIMD machines o#erate on
multi#le data sets in #arallel! This architecture consists o( a s)uare "rid o(
#rocessor*memor elements! ' sin"le control unit +roadcasts instructions ,hich
are carried out in loc-ste# + all the #rocessors, each one usin" its o,n data
(rom ts o,n memor! The arra #rocessor is ,ell-suited to calculations on
matrices!
SIMD Com#uter or"ani.ations
Con(i"uration / is structured ,ith N snchroni.ed PEs,all o( ,hich are under the
control o( one CU!Each PE
i
is essentiall an '0U ,ith attached ,or-in" re"isters
and local memor PEM
i
(or the stora"e o( distri+uted data!The CU also has its o,n
main memor (or stora"e o( #ro"rams!The (unction o( CU is to decode all
instruction and determine ,here the decoded instructions should +e e1ecuted!Scalar
or control t#e instructions are directl e1ecuted inside the CU!Vector instructions
are +roadcasted to the PEs (or distri+uted e1ecution!
Con(i"uration II di((ers (rom con(i"uration I in t,o as#ects!2irst the local memries
attached to the PEs are re#laced + #arallel memor modules shared + all the PEs
throu"h an alio"nment net,or-!Second the inter PE #ermutation net,or- is
re#laced + inter PE memor ali"nment net,or-!' "ood e1am#le o( con(i"uration
II SIMD machine is 3urrou"hs scienti(ic #rocessor!
'n SIMD com#uter is characteri.ed + (ollo,in" set o( #arameters!
C45N,2,I,M6
N4Num+er o( PEs in the sstem!
24' set o( data routin" (unctions!
I4Set o( machine instructions
M4Set o( mas-in" schemes!
Mas-in" and data routin" mechanisms
In an arra #rocessor, 7ector o#erands can +e s#eci(ied + the re"isters to +e
used or + the memor addresses to +e re(erenced! 2or memor-re(erence
instructions, each PEi accesses the local PEM8, o((set + its o,n inde1 re"ister Ii,
The Ii re"ister modi(ies the "lo+al memor address +roadcast (rom the CU! Thus,
di((erent locations in di((erent PEMs can +e accessed simultaneousl ,ith the same
"lo+al address s#eci(ied + the CU.
Mesh-Connected Illiac Net,or-
' sin"le sta"e recirculatin" net,or- has +een im#lemented in the Illiac IV
arra #rocessor ,ith 9: PEs!Each PE
i
is allo,ed to send data to PE
I;/
,PE
i-/
,PE,
i;r
PE
i-r
,here r4<N! and N4no& o( PE s
2ormall Illiac net,or- is characteri.ed + (ollo,in" (our routin" (unctions!
R
;/
$i%4$i;/%mod N
R
-/
$i%4$i-/%mod N
R
;r
$i%4$i;r%mod N
R
-r
$i% 4$i=r%mod N
Cu+e Interconnection Net,or-s
The cu+e net,or- can +e im#lemented as multi sta"e net,or- (or SIMD
machines!2ormall an n dimensional net,or- o( N #es is s#eci(ied + (ollo,in" n
routin" (unctions! Vertical lines connect 7ertices ,hose address di((er in the most
si"ni(icant +it #osition!Vertices at +oth ends o( the dia"onal lines di((er in the
middle +it #osition!>ori.ontal lines di((er in the least si"ni(icant +it #osition!
?
C
i
$a
n-/@@
a
i;/
a
i
a
i-/@@@!!
a
A
% (or i4A,/,B@@!!n-/!
Inter PE communications
There are (undamental decisions in desi"nin" a##ro#riate architecture o( an
interconnection net,or- (or an SIMD machine!The decisions are made +et,een
o#eration modes,control strate"ies,s,itchin" methodolo"ies,and net,or- to#olo"ies!
O#eration Mode&
The t#es o( communication can +e identi(ied &Snchronous and
asunchronous!
Control strate":
The control settin" (umctions can +e mana"ed + a centrali.ed controller or
+ indi7idual s,itchin" element!The later strate" is called distri+uted controland
the (irst strate" corres#onds to centrali.ed control!
S,itchin" Methodolo"&
The t,o ma8or s,itchin" methodolo"ies are circuit s,itchin" and #ac-et
s,itchin"!
Net,or- to#olo":
The to#olo"ies can +e "rou#ed into t,o cate"ories&static and dnamic!In
static to#olo" dedicated +uses cannot +e recon(i"ured!3ut lin-s in dnamic
cate"or can +e recon(i"ured!
SIMD Interconnection Net,or-s
Various interconnection net,or-s ha7e +een su""ested (or SIMD com#uters!
The to#olo"ical structure o( an arra #rocessor is mainl characteri.ed + the data
routin" net,or- used in interconnectin" the #rocessin" elements!Such net,or- can
+e s#eci(ied + a set o( data routin" (unctions.
3ased on to#olo",Interconnection net,or-s are classi(ied into
Static net,or-s
Dnamic net,or-s
Static net,or-s
To#olo"ies in static net,or- can +e classi(ied accordin" to the dimensions
re)uired (or laout! E1am#les (or one dimensional to#olo"ies include linear
arra!T,o dimensional to#olo" include rin", star, tree, mesh, and sstolic arra!
Three dimensional to#olo"ies include com#letel connected chordal rin", 3 cu+e,
and 3 cu+e connected ccle net,or-s!
Dnamic net,or-s
T,o classes o( dnamic net,or-s are there! Sin"le sta"e 7ersus multista"e.
Sin"le sta"e net,or-s
' sin"le sta"e net,or- is a s,itchin" net,or- ,ith N in#ut selectors $IS% and
N out#ut selectors $OS%!Each IS is essentiall a / to D demulti#le1er and each OS is
an M to / multi#le1er ,here /5D5N and /5M5N!' sin"le sta"e net,or- ,ith
D4M4N is a cross+ar s,itchin" net,or-! To esta+lish a desired connectin" #ath
di((erent #ath control si"nals ,ill +e a##lied to all IS and OS selectors! ' sin"le
sta"e net,or- is also called a reciculatin" net,or-! Data items ma ha7e to
reirculate throu"h the sin"le sta"e se7eral times +e(ore reachin" the (inal
destination!The num+er o( recirculations needed de#end on the connecti7it in the
sin"le sta"e net,or-!In "eneral,hi"her is the hard,are connecti7it,the less is the
num+er o( recirculations!
Multi sta"e net,or-s
Man sta"es o( an interconnected s,itch (orm the multista"e net,or-!The
are descri+ed + three characteri.in" (eatures &s,itch +o1,net,or- to#olo" and
control structure!Man s,itch +o1es are used in multista"e net,or-sEach +o1 is
essentiall an interchan"e de7ice ,ith t,o in#uts and out#uts!The (our states o( a
s,itch +o1 are &strai"ht,e1chan"e,u##er +roadcast and lo,er +roadcast!B t#es o(
control structures are used in multista"e net,or-s!Indi7idual sta"e control and
indi7idual +o1 control!
P'R'00E0 '0CORIT>MS 2OR 'RR'D PROCESSORS
The ori"inal moti7ation (or de7elo#in" SIMD arra #rocessors ,as to
#er(orm #arallel com#utations on 7ector or matri1 t#es of data! Parallel #rocessin"
al"orithms ha7e +een de7elo#ed + man com#uter scientists (or SIMD com#uters!
Im#ortant SIMD al"orithms can +e used to #er(orm matri1 multi#lication, (ast
2ourier trans(orm $22T%, matri1 trans#osition, summation of 7ector elements,
matri1 in7ersion, #arallel sortin", linear recurrence, +oolean matri1 o#erations, and
to sol7e #artial di((erential e)uations.
SIMD Matri1 Multi#lication
Man numerical #ro+lems suita+le (or #arall#rocessin" can +e (ormulated as
matri1 com#utations! Matri1 mani#ulation is (re)uentl needed in sol7in" linear
sstems of e)uations! Im#ortant matri1 o#erations include matri1 multi#lication, 0-
U decom#osition, and matri1 in7ersion! Ee #resent +elo, t,o #arallel al"o- rithms
(or matri1 multi#lication! The di((erences +et,een SISD and SIMD matri1
al"orithms are #ointed out in their #ro"ram structures and s#eed #er(ormances!
0et A 4 Fa
i-
G and B 4 F+
-H
G+e n 1 n matrices! The multi#lication of A and B
"enerates a #roduct matri1 C 4 A 1 B 4 FC
i8
G of dimension n 1 n! The elements of the
#roduct matri1 C is related to the elements of A and B +& Ci8
4
Ia
i-
1 +
-8
There are n
3
cumulati7e multi#lications to +e #er(ormed! ' cumulati7e
multi#lication re(ers to the lin-ed multi#l-add o#eration c4 c ; a 1 b. The addition
is mer"ed into the multi#lication +ecause the multi#l is e)ui7alent to multio#erand
addition! There(ore, ,e can consider the unit time as the time re)uired to #er(orm
one cumulati7e multi#lication, since add and multi#l are #er(ormed
simultaneousl! In a con7entional SISD uni#rocessor sstem, the n3 cumulati7e
multi#lications are carried out + a seriall coded #ro"ram ,ith three le7els of DO
loo#s corres#ondin" to three indices to +e used! The time com#le1it of this
se)uential #ro"ram is #ro#ortional to n3, as s#eci(ied in the (ollo,in" SISD
al"orithm (or matri1 multi#lication!
'n O(n3) al"orithm (or SISD matri1 multi#lication
2or i 4 I to n Do
2or j 4 I to n Do
Cij 4 A $initiali.ation%
2or k 4 I to n Do
C
i8
4C
i8
;a
i- +
i8 $scalar additi7e multi#l%
End o( k loo#
End o( j loo#
End o( i loo#
No,, ,e ,ant to im#lement the matri1 multi#lication on an SIMD com#uter
,ith n PEs! The al"orithm construct de#ends hea7il on the memor allocations o(
the A, B, and C matrices in the PEMs! Column 7ectors are then stored ,ithin the
same PEM! This memor allocation scheme allo,s #arallel access o( all the elements
in each ro, 7ector o( the matrices! 3ased in this data -distri+ution, ,e o+tain the
(ollo,in" #arallel al"orithm! The t,o #arallel do o#era tions corres#ond to vector
load (or initiali.ation and vector multiply (or the inner loo# o( additi7e
multi#lications! The time com#le1it has +een reduced to O(n2). There(ore, the
SIMD al"orithm is n times (aster than the SISD al"orithm (or matri1 multi#lication!
'n O(n) al"orithm (or SIMD matri1 multi#lication
2or i 4 I to n Do
Par (or k 4 I to n Do
Cik 4 A (rector load)
2or j 4 I to n Do
Par (or k 4 / to n Do
Cik 4 Cik ; aij ! bjk (vector multiply)
End o( j loo#
End o( i loo#
The vector load o#eration is #er(ormed to initiali.e the ro, 7ectors o( matri1
C one ro, at a time! In the vector multiply o#eration, the same multi#lier aij is
+roadcast (rom the CU to all PEs to multi#l all n elements b
ik
for k 4 /,B, !!!, n! o(
the ith ro, 7ector o( B. In total, n2 7ector multi#l o#erations are needed in the
dou+le loo#s!
I( ,e increase the num+er o( PEs used in an arra #rocessor to n
B
an O$n
lo"
B
n% can +e de7ised to multi#l t,o n 1n matrices a and +!0et n4B
m
!Consider an
arra #rocessor ,hose n
B
4B
Bm
PEs are located at the B
Bm
7ertices o( a Bm cu+e
net,or-!' Bm cu+e net,or- can +e considered as t,o $Bm-/% cu+e net,or-s lin-ed
to"ether + 2m e1tra ed"es! 2or e"& ' :-cu+e net,or- is constructed (rom t,o 3-
cu+e net,or-s + usin" J e1tra ed"es +et,een corres#ondin" 7ertices at the corner
#ositions. 0et $P
Bm-l
P
Bm-B
!!! P
m
P
m-l
! !,P
I
P
O
%B) +e the PE address in the 2m cu+e! Ee
can achie7e the O$n lo"B n) com#ute time onl i( initiall the matri1 elements are
(a7ora+l distri+uted in the PE 7ertices! The n ro,s o( matri1 A are distri+uted o7er
n distinct PEs ,hose addresses satis( the condition
P2m-lP2m-l...Pm =Pm-lPm-2.
2or the initial distri+ution o( (our ro,s o( the matri1 A in a : 1 : matri1
multi#lication (n 4 :, m 4 B% the ste#s are
The (our ro,s o( A are then +roadcast o7er the (ourth dimension and (ront to +ac-
ed"es, as mar-ed + ro, num+ers! The n columns o( matri1 B $or the n ro,s o(
matri1 B") are e7enl distri+uted o7er the PEs o( the 2m cu+es!The (our ro,s o( B"
are then +roadcast o7er the (ront and +ac- (aces! Then ,e ,ill "et the com+ined
results o( A and B" +roadcasts ,ith the inner #roduct read to +e com#uted! The
matri1 multi#lication on a 2m#cube net,or- is (ormall s#eci(ied +elo,
/!Trans#ose 3 to (orm 3
t o7er the m cu+es 1
Bm-/
@@@@@@@!
Km
A@!!A!
B!N-,a +roadcast each ro, o( 3
t
to all #es in the m cu+e!
3!N-,a +roadcast each ro, o( '
:!Each PE no, contain a ro, o( ' and a column o( 3!
Parallel Sortin" on 'rra Processors
'n SIMD al"orithm is to +e #resented (or sortin" n
B
elements on a mesh-
connected $llIiac-lV-Ii-e% #rocessor arra in O(n) routin" and com#arison ste#s!
This sho,s a s#eedu# o( O$lo"B n% o7er the +est sortin" al"orithm, ,hich ta-es O(n
lo"B n% ste#s on a uni#rocessor sstem! 0et an arra #rocessor ,ith $ 4 n
B
identical
PEs interconnected + a mesh net,or- similar to llIiac-IV e1ce#t that the PEs at the
#erimeter ha7e t,o or three rather than (our nei"h+ours! In other ,ords, there are
no %raparound connections in this sim#li(ied mesh net,or-!
T,o time measures are needed to estimate the time com#le1it o( the #arallel
sortin" al"orithm! 0et t & +e the routin' time re)uired to mo7e one item (rom a PE
to one o( its nei"h+ors, and tc +e the compari(on time re)uired (or one com#arison
ste#! Concurrent data routin" is allo,ed! U# to $ com#arisons ma +e #er(ormed
simultaneousl! This means that a com#arison-interchan"e ste# +et,een t,o items
in ad8acent PEs can +e done in 2t& ; tc time units $route le(t, com#are, and! route
ri"ht%! ' mi1ture o( hori.ontal and 7ertical com#arison interchan"es re)uires at
least )t& ; tc time units! The sortin" #ro+lem de#ends on the inde1in" schemes on
the PEs! The PEs ma +e inde1ed + a +i8ection (rom *,2,...,n! 1
*,2,...,n!to+,*,...,$#*!, ,here $ 4 n
2
. The sortin" #ro+lem can +e (ormulated as the
mo7in" o( the 8th smallest element in the PE arra (or all 8 4 A, /, B,!!!, $ - /!
Ci7en +elo, are the three inde1in" #atterns (ormed a(ter sortin" the "i7en
arra in #art a ,ith res#ect to threeLdi((erent ,as (or inde1in" the PEs
3atcherLs odd-e7en mer"e sort on a linear arra has +een "enerali.ed +
Thom#son and Mun" to a s)uare arra o( PEs! 0et M$8, k) +e a sortin" al"orithm
(or mer"in" t,o sorted 8-+--*B su+arras to (orm a sorted j#by#k arra, ,here 8 and
k are #o,ers o( B and k 6 /! The sna-eli-e ro,-ma8or orderin" is assumed in all the
arras! In the de"enerate case o( ,(l, B%, a sin"le com#arison-interchan"e ste# is
su((icient to sort t,o unit su+arras! Ci7en t,o sorted columns o( len"th 8 N B, the
M$8, B% al"orithm consists o( the (ollo,in" ste#s
The M$8, B% sortin" al"orithm
H /& !Mo7e all odds to the le(t column and all e7ens to the ri"ht in 2tk time!
HB& Use the odd#even tran(po(ition (ort to sort each column in 2jtk ; jtc time!
H3& Interchan"e on e7en ro, in 2tk time!
H:& Per(orm one com#arison-interchan"e in 2tk ; tc time
The M$8,-% al"orithm
/! !Per(orm sin"le interchan"e ste# on e7en ro,s
B! !Unshu((le each ro,!
3! !Mer"e + callin" al"orithm m$8,-*B%
:! !Shu((le each ro,
O! !Interchan"e on e7en ro,
9! Com#arison interchan"e
'ssociati7e arra #rocessin".
'ssociati7e arra #rocessors uses C'M$Content addressa+le memor% ie
Data stored in an associati7e memor are addressed + their contents! 'ssociati7e
memories are also -no,n as content!-addressa+le memor, -arallel (earc. memory
and multiacce(( memory ! The ma8or ad7anta"e o( assosiati7e memor o7er R'M is
its ca#a+ilit o( #er(ormin" #arallel search and #arallel search and #arallel
com#aruison o#erations! These are (re)uentl needed-in-man-im#Ortant
a##lications!, such as the stora"e and retrie7al o( ra#idl chan"in" data+ases,
radar- si"nal trac-in", ima"e #rocessin", com#uter 7ision, and arti(icial intelli"ence!
The ma8or shortcomin" o( associati7e memor is its much increased hard,are cost!
Recentl, the cost o( associati7e memor is much Ihi"her than that o( R'Ms!
Associative Memory Organizations
The associatui7e memor arra consists o( n ,ords ,ith m+its #er
,ord!Each cell in the arra consists o( a (li# (lo# associated ,ith some com#arison
lo"ic "ates (or #attern match and read ,rite control!' +it slice is a 7ertical column
o( +it cells o( all the ,ords at the same #osition! Each +it cell 3
i8
can +e ,ritten
in,read out,or com#ared ,ith an e1ternal interi"atin" si"nal!The com#arand
re"ister C4$C
/,
C
B,
@@@@!!C
m
% is used to hold the -e o#erand +ein" searched
(or !The mas-in" re"isterM4$M
/,
M
B
,@@@!!M
m
% is used to ena+le the +it slices to +e
in7ol7ed in the #arallel com#arison o#erations across all the ,ord in the associati7e
memor!
T,o memor or"ani.ations o( associati7e memor are &
The +it #arallel or"ani.ation &
In a +it #arallel or"ani.ation, the com#arison #rocess is #er(ormed in a
#arallel-+-,ord and #arallel-+-+it (ashion! 'll +it slices ,hich are not mas-ed o((
+ the mas-in" #attern are in7ol7ed in the com#arison #rocess! In this or"ani.ation,
,ord-match ta"s (or all ,ords are used!
3it serial or"ani.ation &
The memor or"ani.ation ,hich o#erates ,ith one +it slice at a time across
all the ,ords! The #articular +it slice is selected + an e1tra lo"ic and control unit!
The +it-cell readouts ,ill +e used in su+se)uent +it-slice o#erations

You might also like