Chapter 3: Principles of Scalable Performance

Chapter 3: Principles of Scalable
Performance
Performance measures
Speedup laws
Scalability principles
Scaling up vs. scaling down
EENG-630 Chapter 3
Performance metrics and

measures
Parallelism profiles
Asymptotic speedup factor
System efficiency, utilization and quality
Standard performance measures
EENG-630 Chapter 3
Degree of parallelism
Reflects the matching of software and
hardware parallelism
Discrete time function measure, for each
time period, the # of processors used
Parallelism profile is a plot of the DOP as a
function of time
Ideally have unlimited resources
EENG-630 Chapter 3
Factors affecting parallelism

profiles
Algorithm structure
Program optimization
Resource utilization
Run-time conditions
Realistically limited by # of available
processors, memory, and other
nonprocessor resources
EENG-630 Chapter 3
Average parallelism variables

n homogeneous processors
m maximum parallelism in a profile
- computing capacity of a single
processor (execution rate only, no overhead)
DOP=i # processors busy during an
observation period
EENG-630 Chapter 3
Average parallelism
Total amount of work performed is
proportional to the area under the profile
curve
t2
W DOP(t )dt
t1
W i ti
i 1
EENG-630 Chapter 3
Average parallelism
1 t2
A
DOP
(
t
)
dt
t 2 t1 t 1
m
m
A i ti / ti
i 1
i 1
EENG-630 Chapter 3
Example: parallelism profile and

average parallelism
EENG-630 Chapter 3
Asymptotic speedup
m
Wi
T (1) ti (1)
i 1
i 1
m
Wi
T ( ) ti ( )
i 1
i 1 i
T (1)
S
T ( )
i 1
m
W / i
i
i 1
= A in the ideal case
(response time)
EENG-630 Chapter 3
Performance measures
Consider n processors executing m
programs in various modes
Want to define the mean performance of
these multimode computers:
Arithmetic mean performance
Geometric mean performance
Harmonic mean performance
EENG-630 Chapter 3
10
Arithmetic mean performance

m
Ra Ri / m
i 1
m
R ( f i Ri )
*
a
Arithmetic mean execution rate

(assumes equal weighting)
Weighted arithmetic mean

execution rate
i 1
-proportional to the sum of the inverses of

execution times
EENG-630 Chapter 3
11
Geometric mean performance

m
Rg R
i 1
m
1/ m
i
R Ri
*
g
i 1
fi
Geometric mean execution rate
Weighted geometric mean

execution rate
-does not summarize the real performance since it does

not have the inverse relation with the total time
EENG-630 Chapter 3
12

Ti 1 / Ri
Mean execution time per instruction

For program i
1 m
1 m 1
Ta Ti
m i 1
m i 1 Ri
Arithmetic mean execution time

per instruction
EENG-630 Chapter 3
13

Rh 1 / Ta
m
m
(1 / R )
i 1
R
*
h
1
m
( f
i 1
Harmonic mean execution rate
Weighted harmonic mean execution rate
/ Ri )
-corresponds to total # of operations divided by
the total time (closest to the real performance)
EENG-630 Chapter 3
14
Harmonic Mean Speedup

Ties the various modes of a program to the
number of processors used
Program is in mode i if i processors used
Sequential execution time T1 = 1/R1 = 1
S T1 / T
*
i 1
f i / Ri
EENG-630 Chapter 3
15
Harmonic Mean Speedup

Performance
EENG-630 Chapter 3
16
Amdahls Law
Assume Ri = i, w = (, 0, 0, , 1- )
System is either sequential, with probability
, or fully parallel with prob. 1-
n
Sn
1 (n 1)
Implies S 1/ as n
EENG-630 Chapter 3
17
Speedup Performance
EENG-630 Chapter 3
18
System Efficiency
O(n) is the total # of unit operations
T(n) is execution time in unit time steps
T(n) < O(n) and T(1) = O(1)
S ( N ) T (1) / T (n)
S (n) T (1)
E ( n)
n
nT (n)
EENG-630 Chapter 3
19
Redundancy and Utilization

Redundancy signifies the extent of
matching software and hardware parallelism
R (n) O(n) / O(1)

Utilization indicates the percentage of
resources kept busy during execution
O ( n)
U ( n) R ( n) E ( n)
nT
(
n
)
EENG-630 Chapter 3
20
Quality of Parallelism
Directly proportional to the speedup and
efficiency and inversely related to the
redundancy
Upper-bounded by the speedup S(n)
3
S ( n) E ( n)
T (1)
Q ( n)
2
R ( n)
nT (n)O(n)
EENG-630 Chapter 3
21
Example of Performance
Given O(1) = T(1) = n3, O(n) = n3 + n2log n,
and T(n) = 4 n3/(n+3)
S(n)
E(n)
R(n)
U(n)
Q(n)
=
=
=
=
=
(n+3)/4
(n+3)/(4n)
(n + log n)/n
(n+3)(n + log n)/(4n2)
(n+3)2 / (16(n + log n))
EENG-630 Chapter 3
22
Standard Performance Measures

MIPS and Mflops
Depends on instruction set and program used
Dhrystone results
Measure of integer performance
Whestone results
Measure of floating-point performance
TPS and KLIPS ratings

Transaction performance and reasoning power
EENG-630 Chapter 3
23
Parallel Processing Applications
Drug design
High-speed civil transport
Ocean modeling
Ozone depletion research
Air pollution
Digital anatomy
EENG-630 Chapter 3
24
Application Models for Parallel

Computers
Fixed-load model
Constant workload
Fixed-time model
Demands constant program execution time
Fixed-memory model
Limited by the memory bound
EENG-630 Chapter 3
25
Algorithm Characteristics
Deterministic vs. nondeterministic

Computational granularity
Parallelism profile
Communication patterns and
synchronization requirements
Uniformity of operations
Memory requirement and data structures
EENG-630 Chapter 3
26
Isoefficiency Concept
Relates workload to machine size n needed
to maintain a fixed efficiency
w( s )
E
w( s ) h( s, n)
workload
overhead
The smaller the power of n, the more

scalable the system
EENG-630 Chapter 3
27
Isoefficiency Function
To maintain a constant E, w(s) should grow
in proportion to h(s,n)
E
w( s )
h( s, n)
1 E
C = E/(1-E) is constant for fixed E
f E ( n) C h( s, n)
EENG-630 Chapter 3
28
Speedup Performance Laws

Amdahls law
for fixed workload or fixed problem size
Gustafsons law
for scaled problems (problem size increases
with increased machine size)
Speedup model
for scaled problems bounded by memory
capacity
EENG-630 Chapter 3
29
Amdahls Law
As # of processors increase, the fixed load
is distributed to more processors
Minimal turnaround time is primary goal
Speedup factor is upper-bounded by a
sequential bottleneck
Two cases:
DOP < n
DOP n
EENG-630 Chapter 3
30
Fixed Load Speedup Factor

Case 1: DOP > n
Wi
ti (i )
i
m
i
n
Wi
T ( n)
i 1 i
i
n

T (1)
Sn
T ( n)
Case 2: DOP < n
Wi
t i ( n ) t i ( )
i
m
W
i i
Wi
i 1 i
i
n
EENG-630 Chapter 3
31
Gustafsons Law
With Amdahls Law, the workload cannot
scale to match the available computing
power as n increases
Gustafsons Law fixes the time, allowing
the problem size to increase with higher n
Not saving time, but increasing accuracy
EENG-630 Chapter 3
32
Fixed-time Speedup
As the machine size increases, have
increased workload and new profile
In general, Wi > Wi for 2 i m and
= W1
Assume T(1) = T(n)
EENG-630 Chapter 3
W1
33
Gustafsons Scaled Speedup

m
Wi
Wi
i 1
i 1 i
m'
S
'
n
W
i 1
m
Q
(
n
)
n
'
W
i 1
'
W1 nWn
W1 Wn
EENG-630 Chapter 3
34
Memory Bounded Speedup

Model
Idea is to solve largest problem, limited by
memory space
Results in a scaled workload and higher accuracy
Each node can handle only a small subproblem for
distributed memory
Using a large # of nodes collectively increases the
memory capacity proportionally
EENG-630 Chapter 3
35
Fixed-Memory Speedup
Let M be the memory requirement and W
the computational workload: W = g(M)
g*(nM)=G(n)g(M)=G(n)Wn
m*
S
*
n
W
i 1
m*
Wi *
i 1 i
i
n Q(n)
W1 G (n)Wn
W1 G (n)Wn / n
EENG-630 Chapter 3
36
Relating Speedup Models

G(n) reflects the increase in workload as
memory increases n times
G(n) = 1 : Fixed problem size (Amdahl)
G(n) = n : Workload increases n times when
memory increased n times (Gustafson)
G(n) > n : workload increases faster than
memory than the memory requirement
EENG-630 Chapter 3
37
Scalability Metrics
Machine size (n) : # of processors
Clock rate (f) : determines basic m/c cycle
Problem size (s) : amount of computational
workload. Directly proportional to T(s,1).
CPU time (T(s,n)) : actual CPU time for
execution
I/O demand (d) : demand in moving the
program, data, and results for a given run
EENG-630 Chapter 3
38
Scalability Metrics
Memory capacity (m) : max # of memory words
demanded
Communication overhead (h(s,n)) : amount of
time for interprocessor communication,
synchronization, etc.
Computer cost (c) : total cost of h/w and s/w
resources required
Programming overhead (p) : development
overhead associated with an application program
EENG-630 Chapter 3
39
Speedup and Efficiency

The problem size is the independent
parameter
T ( s,1)
S ( s, n)
T ( s, n) h( s, n)
S ( s , n)
E ( s, n)
n
EENG-630 Chapter 3
40
Scalable Systems
Ideally, if E(s,n)=1 for all algorithms and
any s and n, system is scalable
Practically, consider the scalability of a m/c
S ( s, n) TI ( s, n)
( s, n)
S I ( s, n) T ( s, n)
EENG-630 Chapter 3
41

Chapter 3: Principles of Scalable Performance

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3: Principles of Scalable Performance

Uploaded by

Copyright:

Available Formats

Chapter 3: Principles of Scalable

Performance metrics and

Factors affecting parallelism

Average parallelism variables

Example: parallelism profile and

= A in the ideal case

Arithmetic mean performance

Arithmetic mean execution rate

Weighted arithmetic mean

-proportional to the sum of the inverses of

Geometric mean performance

Geometric mean execution rate

Weighted geometric mean

-does not summarize the real performance since it does

Harmonic mean performance

Mean execution time per instruction

Arithmetic mean execution time

Harmonic mean performance

Harmonic mean execution rate

Weighted harmonic mean execution rate

Harmonic Mean Speedup

Harmonic Mean Speedup

Redundancy and Utilization

R (n) O(n) / O(1)

Standard Performance Measures

TPS and KLIPS ratings

Parallel Processing Applications

Application Models for Parallel

Deterministic vs. nondeterministic

The smaller the power of n, the more

Speedup Performance Laws

Fixed Load Speedup Factor

Case 2: DOP < n

Gustafsons Scaled Speedup

Memory Bounded Speedup

Relating Speedup Models

Speedup and Efficiency

You might also like