Professional Documents
Culture Documents
Amdahls Law
The performance gain that can be obtained by improving some portion
of a computer can be calculated using Amdahls Law.
Amdahls Law defines the speedup that can be gained by using a
particular feature.
is used to find the maximum expected improvement to an overall system
when only part of the system is improved.
Amdahl's Law
Speedup is the ratio
Alternatively,
Fraction enhanced
Speedup overall
S(p) =
Execution timeold
1
Fraction enhanced
Execution time new 1 Fraction
enhanced
Speedup enhanced
ts
= t
p
Suppose that we want to enhance the processor used for Web serving. The
new processor is 10 times faster on computation in the Web serving
application than the original processor. Assuming that the original processor
is busy with computation 40% of the time and is waiting for I/O 60% of the
time, what is the overall speedup gained by incorporating the enhancement?
Answer
Fractionenhanced = 0.4, Speedupenhanced = 10
Speedup overall
1
(1 - 0.4)
0.4
10
1
1
1.56
0.6 0.04 0.64
1.22
Speedup FP
1
(1 - 0.5)
0.5
1.6
1
1.23
0.8125
f = fraction
unaffected
p = speedup
of the rest
f = 0.01
30
f = 0.02
20
f = 0.05
10
f = 0.1
0
0
10
20
30
Enhancement factor (p )
40
50
s=
1
f + (1 f)/p
min(p, 1/f)
Amdahls law: speedup achieved if a fraction f of a task is unaffected
and the remaining 1 f part runs p times as fast.
Computer Architecture,
Background and Motivation
Speedup (s )
40
A processor spends 30% of its time on flp addition, 25% on flp mult,
and 10% on flp division. Evaluate the following enhancements, each
costing the same to implement:
Solution
a.Adder redesign speedup = 1 / [0.7 + 0.3 / 2] = 1.18
b.Multiplier redesign speedup = 1 / [0.75 + 0.25 / 3] = 1.20
c.Divider redesign speedup = 1 / [0.9 + 0.1 / 10] = 1.10
Computer Architecture,
Background and Motivation
fk
+
... +
p2
pk
Speedup formula
1
S=
f1
f2
+
p1
fk
+
p2
... +
pk
If a particular fraction
is slowed down rather
than speeded up,
use sj fj instead of fj / pj ,
where sj > 1 is the
slowdown factor
Computer Architecture,
Background and Motivation
f1
We can also count the number of instructions executed the instruction path
length or instruction count (IC).
If we know the number of clock cycles and IC, then the average number of
clock cycles per instruction (CPI).
CPI is computed as
CPU clock cycles for a program
CPI
IC
ICi CPIi
n
ICi
i 1
CPI
CPIi
Instruction count i 1 Instruction count
Hint: CPIi should be measured because pipeline effects, cache misses, and
any other memory system inefficiencies.
Answer
First, observe that only the CPI changes; the clock rate and
instruction count remain identical. We start by finding the original CPI with
neither enhancement;
ICi
CPIoriginal CPIi
4 25% 1.33 75% 2.0
Instructio
n
count
i 1
n
Speedup new FP
CPU timeoriginal
CPU time new FP
CPIoriginal
CPInew FP
2.00
1.23
1.625
250 instructions
for i = 1, 100 do
20 instructions
for j = 1, 100 do
40 instructions
for k = 1, 100 do
10 instructions
endfor
endfor
endfor
Static count = 326
2 + 20 + 124,200 instructions
100 iterations
12,422,200 instructions in all
2 + 40 + 1200 instructions
100 iterations
124,200 instructions in all
2 + 10 instructions
100 iterations
1200 instructions in all
for i = 1, n
while x > 0
Computer Architecture,
Background and Motivation
12,422,450 Instructions
1 GHz
Solution
20 steps
2 GHz
Computer Architecture,
Background and Motivation
4 steps
This number gives you an idea of the speed of a CPU, as faster processors
have a higher MIPS than slower computers
CPI for M2
4.0
3.8
2.0
Comments
Floating-point
Integer arithmetic
Nonarithmetic
Computer Architecture,
Background and Motivation
Class
F
I
N
b.
If 50% of instructions executed are class-N, with the rest divided equally among
F and I, which machine is faster? By what factor?
Solution
b. Average CPI for M1 = (5.0 x 0.25) + (2.0 x 0.25) + (2.4 x 0.5) = 2.95
for M2 = (4.0 x 0.25) + (3.8 x 0.25) + (2.0 x0.5) = 2.95
Average CPIs are the same
M1 is 1.2 times as fast as M2 factor (based on ratio of clock rates)
c. Designers of M1 plan to redesign the machine for better performance. With the
assumption of part b, which of the following redesign options has the greatest
performance impact and why?
1. Using a faster floating point unit with double the speed
(class F CPI = 2.5)
2. Adding a second integer ALU to reduce the integer CPI to 1.20
3. Using faster logic that allows a clock rate of 750MHz with the same
CPIs
Solution:
Option 1:
Average CPI for M1 = (2.5 x 0.25) + (2.0 x 0.25) + (2.4 x 0.5) = 2.325