Professional Documents
Culture Documents
why wait . . . ?
... let's solve a "real problem"
device: washer
function: fill, agitate, spin
washerPD = 30 mins
step 2:
step 4:
.....
step 2:
throughput:
the rate of which inputs or outputs are processed
(0th year's laundry = 1/90 mins-1= 0.011 mins-1)
(1st year's laundry = 1/60 mins-1= 0.016 mins-1)
okay, back to circuits...
unpipelined 45 1/45
clock cycle
i i+1 i+2 i+3
pipeline stages
convention:
every pipeline stage, hence every K-stage pipeline, has a
register on its output (not on its input)
always:
the clock common to all registers must have a period
sufficient to cover propagation over combinational paths
PLUS (input) register tPD PLUS (output) register tSETUP
2-pipe 4 1/2
3-pipe 6 1/2
considering pipelining
• advantages
– higher throughput than
the corresponding combinatorial device
– different parts of the logic
work on different parts of the problem
• disadvantages
– generally, increases latency
– only as good as the weakest link
step 1:
step 3:
latency = 90 min
step 4:
step 5:
circuit interleaving
one way to overcome
a pipeline bottleneck
is to replicate
the critical element
as many time as needed
and alternate inputs
between the various copies
N-1 registers
latency = 2 clocks
combining interleaving
with pipelining
moves the bottleneck
from the C-element
to the F-element
attachment <student_B3.xxx>
multiplication (positive numbers)
multiplicand A3 A2 A1 A0
multiplier B3 B2 B1 B0
x
ABi called a "partial A3B0 A2B0 A1B0 A0B0
product
A3B1 A2B1 A1B1 A0B1
A3B2 A2B2 A1B2 A0B2
+ A3B3 A2B3 A1B3 A0B3
easy part:
forming partial products (just an AND gate since Bi is either 0 or 1)
hard part:
adding partial products column by column with carry
multiplication
hard part:
adding M N-bit partial products
sequential multiplier
repeat M times {
P P + (BLSB ==1 ? A : 0)
shift P/B right one bit
}
63 0 31 0
product register E E
<<
>>
LSB
finite state machine
sequential multiplier (32-bit ALU)
after initialization
multiplicand
(product register at 0)
and loading the operands;
31 0
E
repeat 32 times:
{if 1==LSB in multiplier,
then add multiplicand;
shift multiplier 1 right;
shift product 1 right;
32 - bit add zero }
ALU multiplier
63 0 31 0
product register E E
<<
<<
>> >>
LSB
finite state machine
sequential multiplier (32-bit ALU)
after initialization
multiplicand
(product register at 0)
and loading the operands;
31 0
E
repeat 32 times:
{if 1==LSB in multiplier,
then add multiplicand;
shift content of
product register 1 right;
32 - bit add zero }
ALU multiplier
63 0
product register E
<<
>>
LSB
FA FA FA FA
A3 A2 A1 A0
B2
FA FA FA FA
A3 A2 A1 A0
B3
FA FA FA FA
P7 P6 P5 P4 P3 P2 P1 P0
pipelined multiplier
A3 A2 A1 A0
B0
"carry save" FA FA FA FA
configuration
A3 A2 A1 A0
B1
FA FA FA FA
A3 A2 A1 A0
B2
FA FA FA FA
A3 A2 A1 A0
B3
FA FA FA FA
FA FA FA FA
P7 P6 P5 P4 P3 P2 P1 P0
summary
• latency (L) = time it takes for given input to effect an output
• throughput (T) = rate at which new outputs appear
• for combinational circuits: L = tPD of device, T = 1/L
• for K-stage pipeline (K > 0):
– always have registers on output(s)
– K registers on every path from input to output
– T = (tPD,reg + tPD,slowest pipeline stage + tSETUP)-1
• to increase throughput: split the slowest stage
• no further splitting possible, use replication/interleaving
– L = KxT
• pipelined latency ≥ combinational latency
• pipelining can be combined chapter
with circuit interleaving 4.5-p332
en 3.3: