You are on page 1of 34

ELEC2300 Computer Organization

Lecture 4: Performance Evaluation


Professor George Yuan Office: Rm. 2527 Email:eeyuan@ust.hk
Note: some of the slides are adapted from Computer Organization and Design. Copyright 1998 Morgan Kaufmann Publishers and Notes of Prof. Pattersons CS152 Class, Copyright 1997 UCB.

OUTLINE
What is the computer performance? How to evaluate the performance?

ELEC152 Computer Organization Fall 2011

Page 2

Which of these airplanes has the best performance? 4 types of airplanes fly between Hong Kong & Shanghai (distance: D mi.)
Airplane Passengers 101 470 132 146 Range (mi) 630 4150 4000 8720 Speed (mph) 598 610 1350 544 Boeing 737-100 Boeing 747 BAC/Sud Concorde Douglas DC-8-50

Time

to perform the task (Execution Time) execution time, response time, latency

D L S

Tasks per day, hour, week, sec, ns. .. 1 S 1 T C S C throughput, bandwidth D D Latency and throughput often are in opposition
ELEC152 Computer Organization Fall 2011 Page 3

Example
Execution time of Concorde vs. 747: Concorde is 1350 mph / 610 mph = 2.2 times faster Throughput of Concorde vs. 747: Boeing is 286700 pmph / 178200 pmph = 1.6 times faster (470*610=286700, 132*1350=178200) Conclusions: Concorde is 2.2 times faster in terms of flying time. 747 is 1.6 times faster in terms of throughput.

ELEC152 Computer Organization Fall 2011

Page 4

Execution Time vs. Throughput Execution time


How long does it take for my job to run? How long does it take to execute a job? How long must I wait for the database query?

Throughput:
How many tasks can the machine run at once? What is the average execution rate? How much work is getting done?

Computer upgrade: 1. P3 -> P4 2. 1 P3 -> 2 P3 We will focus primarily on execution time for a single job.
ELEC152 Computer Organization Fall 2011 Page 5

Definitions
For computer study, 1 performanceX execution time X
" X is n times faster than Y" means

n performance X execution time Y performance Y execution time X


Problem: machine A runs a program in 20 seconds (1 program/20 sec) machine B runs the same program in 25 seconds (1 program/25 sec)
ELEC152 Computer Organization Fall 2011 Page 6

Execution Time
Elapsed time or response time
count everything (disk and memory accesses, I/O , etc.) a useful number, but often not good for comparison purposes

CPU time
Does not count I/O or time spent running other programs can be broken up into system time, and user time

Our focus: user CPU time


time spent executing the lines of code that are "in" our program System CPU time: time the CPU spends executing system (kernal) code in order to run your program, such as, reading files, moving information into and out of virtual memory, etc.

1 performanceX user CPU time X


ELEC152 Computer Organization Fall 2011 Page 7

CPU Time Measurement: Clock Cycles


Instead of reporting execution time in seconds, we often use cycles seconds cycles seconds program program cycle Processor runs machine instructions based on clock
clock cycle time

time

clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) A 200 Mhz. clock cycle time is
ELEC152 Computer Organization Fall 2011 Page 8

Relating the Metrics CPU time for a program


CPU time = CPU clock cycles * clock cycle time = CPU clock cycles/clock rate

Common ways to improve performance (i.e. shorten CPU execution time):


Reduce number of required CPU clock cycles for a program Shorten clock cycle time (i.e. increase clock rate)

ELEC152 Computer Organization Fall 2011

Page 9

Example-Problem
Description:
A program takes 10 seconds to run on a 400 MHz machine (computer A). We want to design a faster machine (computer B) that can run the same program in 6 seconds. The increase in clock rate affects the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the program.

Problem to solve:
What clock rate should machine B have?

ELEC152 Computer Organization Fall 2011 Page 10

Example - Answer

ELEC152 Computer Organization Fall 2011 Page 11

Cycle Number Calculation


CPU time for a program
CPU time = CPU clock cycles * clock cycle time = CPU clock cycles/clock rate
program compiler assembly program assembler machine instructions ISA processor clock cycles/instruction (CPI) compiler Instruction #

Cycle # = Instruction # CPI


ELEC152 Computer Organization Fall 2011 Page 12

Cycles Per Instruction


Wrong assumption:
# of CPU clock cycles in a program = # of instructions in the program,

Actual situation
For some processors, some instructions may take more cycles than the others: E.g. multiplication takes more cycles than addition Floating point operations takes more cycles than integer operations Memory access takes more cycles than accessing registers Conclusion: not all instructions require the same # of cycles to execute.

Cycle per instructions (CPI) an average number of clock cycles that each instruction in a program takes to execute.
ELEC152 Computer Organization Fall 2011 Page 13

Cycles Per Instruction (CPI)


Definition (for a given program): CPI = (CPU clock cycles)/(instruction count) A program has the same instruction count on two different implementations of the same instruction set architecture, but it may have different CPIs (because an instruction may require different numbers of clock cycles on different implementations). If the number of clock cycles for a program is known, knowing either the instruction count or the CPI can determine the other. CPI provides a measure for comparing implementations. Instruction count can be measured using software tools or simulators.
ELEC152 Computer Organization Fall 2011 Page 14

Cycles Per Instruction


Let there be n different instruction classes (with different CPIs). For a given program, suppose we know:
CPIi = CPI for instruction class i Ci = # of instruction of class I

CPU clock cycles = CPI * instruction count. It can be generalized to


CPU _ clock _ cycles (CPIi Ci )
i 1 n

and

CPI (CPIi Ci ) / Ci
i 1 i 1
ELEC152 Computer Organization Fall 2011 Page 15

CPI Example
Suppose we have two implementations of the same instruction set architecture (ISA) For some program, machine A has a clock cycle time of 1 ns (1 GHz) and a CPI of 2.0. Machine B has a clock cycle time of 2 ns (500MHz) and a CPI of 1.2. Which machine is faster for this program, and by how much? If two machines have the same ISA which of our quantities (e.g., clock rate, CPI, execution time, # of instructions, MIPS) will always be identical?

ELEC152 Computer Organization Fall 2011 Page 16

Example - Solution

ELEC152 Computer Organization Fall 2011 Page 17

Relating the metrics


For a given program X running on a machine A
Time = # of instructions seconds = a program program second # of clocks * * # of instructions clock

= instruction count * CPI * clock cycle time = instruction count * CPI / clock rate

The only complete and reliable measure is CPU execution time Other measures are unreliable. E.g. changing the instruction set to lower the instruction count may lead to a larger CPI or an organization with a slower clock rate. Either case can offset the improvement in instruction count.

ELEC152 Computer Organization Fall 2011 Page 18

Example Comparing Code Segments


Description A particular machine has the following hardware facts:
Instruction class A B C CPI for this instruction class 1 2 3

For a given C++ statement, a compiler designer considers two code sequences with the following instruction counts:
Code sequence 1 2
Instruction counts for instruction classes

A 2 4

B 1 1

C 2 1

Problem to solve Which code sequence executes the most instructions? Which is faster? What is the CPI for each sequence?

ELEC152 Computer Organization Fall 2011 Page 19

Example - Answer

ELEC152 Computer Organization Fall 2011 Page 20

A misleading measure - MIPS


There are some performance measures that are famous among computer manufacturers and sellers but are misleading! MIPS (million instructions per second) (meaningless indication of processor speed) MIPS = (instruction count)/(execution time * 106) MIPS depends on
Instruction set (instructions have different capabilities) Program

MIPS can vary inversely with performance Peak performance


ELEC152 Computer Organization Fall 2011 Page 21

Some Processors in MIPS


Processor Motorola 68000 Intel 386DX Intel 486DX PowerPC G2 Intel Pentium Pro ARM 7500FE PowerPC G3 Zilog eZ80 Intel Pentium III AMD Athlon Pentium 4 ARM Cortex A8 Xbox360 IBM Xenon Triple Core Intel Core2 Extreme QX6700 IPS 1MIPS @ 8MHz 8.5MIPS @ 25MHz 54MIPS @ 66MHz 35MIPS @ 33MHz 541MIPS @ 200MHz 35.9MIPS @ 40MHz 525MIPS @ 233MHz 80MIPS @ 50MHz 1354MIPS @ 500MHz 3561MIPS @ 1.2GHz 9726MIPS @ 3.2GHz 2000MIPS @ 1.0GHz 6400MIPS @ 3.2GHz 57063MIPS @ 3.33GHz Year 1979 1988 1992 1994 1996 1996 1997 1999 1999 2000 2003 2005 2005 2005 2006

AMD Athlon 64 3800+ X2(Dual Core) 14564MIPS @ 2.0GHz

ELEC152 Computer Organization Fall 2011 Page 22

Another misleading measure - MFLOPS MFLOPS (million floating-point operations per second): MFLOPS = (# of floating point operations)/(execution time * 106) MFLOPS considers only floating-point operations (addition, subtraction, multiplication, or division operation applied to a number in a single or double precision floating-point representation). MFLOPS depends on:
Floating-point operation (e.g., addition and multiplication differ in complexity) Program

Meaningless if there is little or no floating-point arithmetic.

ELEC152 Computer Organization Fall 2011 Page 23

MIPS example
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software. The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. The second compiler's code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. What are the execution times for each sequence? What is the MIPS index for this processor based on the two testing sequence?

ELEC152 Computer Organization Fall 2011 Page 24

Summary
Some related terminology:
clock, clock cycle, cycle clock cycle time, cycle time (seconds, us, ns) clock rate, cycle rate (Hz, MHz) CPI (cycles per instruction) MIPS (millions of instructions per second)

Performance is determined by the execution time Execution time calculation:


Execution Time = instruction count * CPI * clock cycle time = instruction count * CPI / clock rate

ELEC152 Computer Organization Fall 2011 Page 25

OUTLINE
What is the computer performance? How to evaluate the performance?

ELEC152 Computer Organization Fall 2011 Page 26

Benchmarks
Execution time calculation: Execution Time = instruction count * CPI * clock cycle time = instruction count * CPI / clock rate Benchmark: a set of specially designed programs to test the performance of a computer Performance best determined by running a real application Benchmarks are application specific
CPU performance, graphics, high-performance computing, objectoriented computing, Java applications, client-server models, mail systems, file systems, Web servers.

SPEC (System Performance Evaluation Cooperative) companies have agreed on a set of real program and inputs valuable indicator of computer performance
Processor (ISA implementation) + compiler
ELEC152 Computer Organization Fall 2011 Page 27

SPEC 89
Compiler enhancements and performance
80 0 70 0

60 0 SPEC performance ratio

50 0

40 0

30 0

20 0

10 0

gcc

e sp re s s o

s p ic e

do du c

n a sa 7

li

e q n to tt

m a trix3 0 0

fp p p p C o m p iler

to m ca tv

B e n ch m ark E n h a n ce d co m pile r

ELEC152 Computer Organization Fall 2011 Page 28

SPEC CPU2000
SPEC ratio Reference: Sun Ultra 5_10 with a 300MHz processor CINT2000, CFP2000 Geometric mean of SPEC ratios

ELEC152 Computer Organization Fall 2011 Page 29

SPEC CPU2000 Benchmarks

ELEC152 Computer Organization Fall 2011 Page 30

SPEC CPU2000 ratings

ELEC152 Computer Organization Fall 2011 Page 31

Amdahl's Law
Execution Time After Improvement = Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement ) Example: "Suppose a program runs in 100 seconds on a machine, with multiplication responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?" How about making the program 5 times faster? Principle: Make the common case fast

ELEC152 Computer Organization Fall 2011 Page 32

Example
Suppose we enhance a machine making all floating-point instructions five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions? We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark?

ELEC152 Computer Organization Fall 2011 Page 33

Remember
Performance is specific to a particular program
Total execution time is a consistent summary of performance

For a given architecture performance increases come from:


increases in clock rate (without adverse CPI affects) improvements in processor organization that lower CPI compiler enhancements that lower CPI and/or instruction count

Pitfall: expecting improvement in one aspect of a machines performance to affect the total performance You should not always believe everything you read! Read carefully!
ELEC152 Computer Organization Fall 2011 Page 34

You might also like