Computer Architecture Unit 1 - Phase 2 PDF

Two Equations to Evaluate Alternatives
Amdahls Law
The performance gain that can be obtained by improving some portion
of a computer can be calculated using Amdahls Law.
Amdahls Law defines the speedup that can be gained by using a
particular feature.
is used to find the maximum expected improvement to an overall system
when only part of the system is improved.
Improvement rate N = (original execution time) / (new execution time)

New execution time = time unaffected + time affected / (Improvement
Factor)
The CPU Performance Equation

Essentially all computers are constructed using a clock running at a
constant rate.
CPU time then can be expressed by the amount of clock cycles.
Amdahl's Law
Speedup is the ratio
Alternatively,
Two major reasons of Speedup enhancement:

Fractionenhanced: the fraction of the computation time in the original
computer that can be converted to take advantage of the enhancement (1).
If 20 seconds of the execution time of a program that takes 60 seconds in

total can use an enhancement, the fraction is 20/60.
Speedupenhanced: the improvement gained by the enhanced execution mode,
that is, how much faster the task would run if the enhanced mode were used
for the entire program. (1).
This value is the time of the original mode over the time of the enhanced
mode.
If the enhanced mode takes 2 seconds for a portion of the program,
5 seconds in the original mode.
The improvement is 5/2.
Thus, Execution Timeoverall = the time of the unenhanced portion of the

machine + the time spent using the enhancement, i.e. that is,
Fraction enhanced
Execution time new Execution timeold 1 Fraction enhanced

Speedup enhanced
Speedup overall
S(p) =
Execution timeold
1
Fraction enhanced
Execution time new 1 Fraction
enhanced
Speedup enhanced
Execution time using one processor (best sequential algorithm)

Execution time using a multiprocessor with p processors
ts
= t
p
Even with infinite number of processors, maximum speedup is limited to 1/f.
Suppose that we want to enhance the processor used for Web serving. The
new processor is 10 times faster on computation in the Web serving
application than the original processor. Assuming that the original processor
is busy with computation 40% of the time and is waiting for I/O 60% of the
time, what is the overall speedup gained by incorporating the enhancement?
Answer
Fractionenhanced = 0.4, Speedupenhanced = 10
Speedup overall
1
(1 - 0.4)
0.4
10
1
1
1.56
0.6 0.04 0.64
Amdahls Law can serve as a guide to how much an enhancement will

improve performance and how to distribute resources to improve costperformance.
A common transformation required in graphics processors is square root.

Implementations of floating-point (FP) square root vary significantly in
performance, especially among processors designed for graphics.
Suppose FP square root (FPSOR) is responsible for 20% of the execution time
of a critical graphics benchmark. One proposal is to enhance the FPSQR
hardware and speed up this operation by a factor of 10.
The other alternative is just to try to make all FP instructions in the graphics
processor run faster by a factor of 1.6; FP instructions are responsible for half
of the execution time for the application. The design team believes that they
can make all FP instructions run 1.6 times faster with the same effort as
required for the fast square root.
Compare these two design alternatives.
Answer
We can compare these two alternatives by comparing the speedups:
1.22
Speedup FP
1
(1 - 0.5)
0.5
1.6
1
1.23
0.8125
Improving the performance of the FP operations overall is slightly better

because of the higher frequency.
Performance Enhancement: Amdahls Law

50
f =0
f = fraction
unaffected
p = speedup
of the rest
f = 0.01
30
f = 0.02
20
f = 0.05
10
f = 0.1
0
0
10
20
30
Enhancement factor (p )
40
50
s=
1
f + (1 f)/p
min(p, 1/f)
Amdahls law: speedup achieved if a fraction f of a task is unaffected
and the remaining 1 f part runs p times as fast.
Computer Architecture,
Background and Motivation
Speedup (s )
40
A processor spends 30% of its time on flp addition, 25% on flp mult,
and 10% on flp division. Evaluate the following enhancements, each
costing the same to implement:
Solution
a.Adder redesign speedup = 1 / [0.7 + 0.3 / 2] = 1.18
b.Multiplier redesign speedup = 1 / [0.75 + 0.25 / 3] = 1.20
c.Divider redesign speedup = 1 / [0.9 + 0.1 / 10] = 1.10
a.Redesign of the flp adder to make it twice as fast.

b.Redesign of the flp multiplier to make it three times as fast.
c.Redesign the flp divider to make it 10 times as fast.
Generalized Amdahls Law

Original running time of a program = 1 = f1 + f2 + . . . + fk
New running time after the fraction fi is speeded up by a factor pi

f2
+
p1
fk
+
... +
p2
pk
Speedup formula
1
S=
f1
f2
+
p1
fk
+
p2
... +
pk
If a particular fraction
is slowed down rather
than speeded up,
use sj fj instead of fj / pj ,
where sj > 1 is the
slowdown factor
f1
The CPU Performance Equation

Essentially all computers are constructed using clock (all called ticks, clock
ticks, clock periods, clocks, cycles, or clock cycles) running at a constant rate.
Clock rate: today in GHz
Clock cycle time: clock cycle time = 1/clock rate
Ex. 1 GHz clock rate = 1 ns cycle time
Thus, the CPU time for a program can be expressed two ways:
CPU Time CPU clock cycles for a program Clock cycle time
Number of clock cycles needed to execute a program.

OR
CPU Time
CPU clock cycles for a program

Clock rate
We can also count the number of instructions executed the instruction path
length or instruction count (IC).
If we know the number of clock cycles and IC, then the average number of
clock cycles per instruction (CPI).
CPI is computed as
CPU clock cycles for a program
CPI
IC
CPU clock cycles = IC CPI

This figure provides insight into different styles of instruction sets and
implementations.
IC CPI
CPU time IC CPI Clock cycle time
Clock rate
Processor performance is dependent upon three characteristics:

instruction count,
clock cycles per instruction
and clock cycle (or rate).
A % improvement in any one of three pieces leads to a % improvement in
CPU time.
Unfortunately, it is difficult to change one parameter in complete isolation
form others, because the technologies of them are interdependent:
Clock cycle time:
Hardware technology and organization;
CPI:
Organization and instruction set architecture;
Instruction count:
Instruction set architecture and compiler technology.
Computer architecture is focus on CPI and IC parameters.

To calculate the number of total processor clock cycles as
n
CPU clock cycles ICi CPIi

i 1
ICi: the number of times instruction i is executed in a program.

CPIi: the average number of clocks per instruction for instruction i.
It is useful in designing the processor.
To express CPU time again:
n
CPU time ICi CPIi Clock cycle time

i 1
And overall CPI as

n
ICi CPIi
n
ICi
i 1
CPI
CPIi
Instruction count i 1 Instruction count
Hint: CPIi should be measured because pipeline effects, cache misses, and
any other memory system inefficiencies.
ICi/IC presents the fraction of occurrences of that instruction in a program.
Suppose we have made the following measurements:

Frequency of FP operations = 25%,
Average CPI of FP operations =4.0,
Average CPI of other instructions = 1.33,
Assume that the design alternatives are to decrease the average CPI
of all FP operations to 2.5. Compare the processor performance equation.
Answer
First, observe that only the CPI changes; the clock rate and
instruction count remain identical. We start by finding the original CPI with
neither enhancement;
ICi
CPIoriginal CPIi
4 25% 1.33 75% 2.0
Instructio
n
count
i 1
n
CPInew FP 75% 1.33 25% 2.5 1.62
Improvement rate N = (original execution time) / (new execution time)
Speedup new FP
CPU timeoriginal
CPU time new FP
IC Clock cycle CPIoriginal

IC Clock cycle CPInew FP
CPIoriginal
CPInew FP
2.00
1.23
1.625
Amdahl's Law vs. CPU Performance

CPU performance equation is better than Amdahls Law
To measure the fraction of execution time for which a set of instructions
is responsible;
For an existing processor, to measure execution time and clock speed is
easy;
The challenge lies in discovering the instruction count or the CPI.
Most new processors include counter for both instructions executed
and for clock cycles.
Dynamic Instruction Count

How many instructions
are executed in this
program fragment?
Each for consists of two instructions:

increment index, check exit condition
250 instructions
for i = 1, 100 do
20 instructions
for j = 1, 100 do
40 instructions
for k = 1, 100 do
10 instructions
endfor
endfor
endfor
Static count = 326
2 + 20 + 124,200 instructions
100 iterations
12,422,200 instructions in all
2 + 40 + 1200 instructions
100 iterations
124,200 instructions in all
2 + 10 instructions
100 iterations
1200 instructions in all
for i = 1, n
while x > 0
12,422,450 Instructions
Faster Clock Shorter Running Time
1 GHz
Suppose addition takes 1 ns

Clock period = 1 ns; 1 cycle
Clock period = ns; 2 cycles
Solution
20 steps
2 GHz
In this example, addition time

does not improve in going from
1 GHz to 2 GHz clock
Faster steps do not necessarily mean shorter

travel time.
4 steps
Peak performance is often expressed in units of instructions per second or IPS,

with MIPS and GIPS.
For floating point calculation, floating point operations per second (FLOPS) is
used as the unit. MFLOPS, GFLOPS.
MIPS is a method of measuring the raw speed of a computer's processor and is
defined as the number of machine instructions (in millions) that a processor
can execute in one second.
MIPS = Clock rate/(CPI * 106)
This number gives you an idea of the speed of a CPU, as faster processors
have a higher MIPS than slower computers
CPI and IPS Calculations

Consider two implementations M1 (600 MHz) and M2 (500 MHz) of
an instruction set containing three classes of instructions:
CPI for M1
5.0
2.0
2.4
CPI for M2
4.0
3.8
2.0
Comments
Floating-point
Integer arithmetic
Nonarithmetic
a. What are the peak performances of M1 and M2 in MIPS?

Solution
a. Peak MIPS for M1 = 600 / 2.0 = 300 (assume all class I)
for M2 = 500 / 2.0 = 250 (assume all class N)
Class
F
I
N
b.
If 50% of instructions executed are class-N, with the rest divided equally among
F and I, which machine is faster? By what factor?
Solution
b. Average CPI for M1 = (5.0 x 0.25) + (2.0 x 0.25) + (2.4 x 0.5) = 2.95
for M2 = (4.0 x 0.25) + (3.8 x 0.25) + (2.0 x0.5) = 2.95
Average CPIs are the same
M1 is 1.2 times as fast as M2 factor (based on ratio of clock rates)
c. Designers of M1 plan to redesign the machine for better performance. With the
assumption of part b, which of the following redesign options has the greatest
performance impact and why?
1. Using a faster floating point unit with double the speed
(class F CPI = 2.5)
2. Adding a second integer ALU to reduce the integer CPI to 1.20
3. Using faster logic that allows a clock rate of 750MHz with the same
CPIs
Solution:
Option 1:
Average CPI for M1 = (2.5 x 0.25) + (2.0 x 0.25) + (2.4 x 0.5) = 2.325
MIPS = 600 / 2.325 = 258

Option 2:
Average CPI for M1 = (5 x 0.25) + (1.2 x 0.25) + (2.4 x 0.5) = 2.75
MIPS = 600 / 2.75 = 218
Option 3:
CPI = 2.95
MIPS = 750 / 2.95 = 254
Option 1 has greatest impact.
Prefixes for large units:

Kilo = 103, Mega = 106, Giga = 109, Tera = 1012, Peta = 1015
For memory:
K = 210 = 1024, M = 220, G = 230, T = 240, P = 250
Prefixes for small units:
micro = 10-6, nano = 10-9, pico = 10-12, femto = 10-15

Computer Architecture Unit 1 - Phase 2 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Architecture Unit 1 - Phase 2 PDF

Uploaded by

Copyright:

Available Formats

Two Equations to Evaluate Alternatives

Improvement rate N = (original execution time) / (new execution time)

The CPU Performance Equation

Two major reasons of Speedup enhancement:

If 20 seconds of the execution time of a program that takes 60 seconds in

Thus, Execution Timeoverall = the time of the unenhanced portion of the

Execution time new Execution timeold 1 Fraction enhanced

Execution time using one processor (best sequential algorithm)

Even with infinite number of processors, maximum speedup is limited to 1/f.

Amdahls Law can serve as a guide to how much an enhancement will

A common transformation required in graphics processors is square root.

We can compare these two alternatives by comparing the speedups:

Improving the performance of the FP operations overall is slightly better

Performance Enhancement: Amdahls Law

a.Redesign of the flp adder to make it twice as fast.

Generalized Amdahls Law

New running time after the fraction fi is speeded up by a factor pi

The CPU Performance Equation

Number of clock cycles needed to execute a program.

CPU clock cycles for a program

CPU clock cycles = IC CPI

Processor performance is dependent upon three characteristics:

Computer architecture is focus on CPI and IC parameters.

CPU clock cycles ICi CPIi

ICi: the number of times instruction i is executed in a program.

CPU time ICi CPIi Clock cycle time

And overall CPI as

ICi/IC presents the fraction of occurrences of that instruction in a program.

Suppose we have made the following measurements:

CPInew FP 75% 1.33 25% 2.5 1.62

Improvement rate N = (original execution time) / (new execution time)

IC Clock cycle CPIoriginal

Amdahl's Law vs. CPU Performance

Dynamic Instruction Count

Each for consists of two instructions:

Faster Clock Shorter Running Time

Suppose addition takes 1 ns

In this example, addition time

Faster steps do not necessarily mean shorter

Peak performance is often expressed in units of instructions per second or IPS,

CPI and IPS Calculations

a. What are the peak performances of M1 and M2 in MIPS?

MIPS = 600 / 2.325 = 258

Prefixes for large units:

You might also like