Professional Documents
Culture Documents
Thread-Level
Parallelism
15-213:
Introduc0on
to
Computer
Systems
26th
Lecture,
Nov.
30,
2010
Instructors:
Randy
Bryant
and
Dave
OHallaron
Carnegie Mellon
Today
Carnegie Mellon
Mul:core
Processor
Core 0
Core n-1
Regs
Regs
L1
d-cache
L1
i-cache
L2 unified cache
L1
d-cache
L1
i-cache
L2 unified cache
L3 unified cache
(shared by all cores)
Main memory
Carnegie Mellon
Memory
Consistency
int
a
=
1;
int
b
=
100;
Thread1:
Wa:
a
=
2;
Rb:
print(b);
Thread2:
Wb:
b
=
200;
Ra:
print(a);
Thread
consistency
constraints
Wa
Rb
Wb
Ra
Sequen:al
consistency
Overall
eect
consistent
with
each
individual
thread
Otherwise,
arbitrary
interleaving
Carnegie Mellon
Thread
consistency
constraints
Wa
Rb
int
a
=
1;
int
b
=
100;
Thread1:
Wa:
a
=
2;
Rb:
print(b);
Wb
Thread2:
Wb:
b
=
200;
Ra:
print(a);
Rb
Wa
Wb
Ra
Wb
Impossible outputs
Wa
Ra
Wb
Ra 100, 2
Rb
Ra 200, 2
Ra
Wa
Rb
2,
200
Rb
1,
200
Ra
Rb 2, 200
Rb
Ra 200, 2
Carnegie Mellon
int
a
=
1;
int
b
=
100;
Thread1:
Wa:
a
=
2;
Rb:
print(b);
Thread1 Cache
a: 2
Thread2 Cache
a:1
b:100
Thread2:
Wb:
b
=
200;
Ra:
print(a);
b:200
print
1
print
100
Main Memory
a:1
b:100
Carnegie Mellon
Snoopy Caches
Thread1 Cache
E
int
a
=
1;
int
b
=
100;
Thread1:
Wa:
a
=
2;
Rb:
print(b);
Thread2:
Wb:
b
=
200;
Ra:
print(a);
Thread2 Cache
a: 2
E b:200
Main Memory
a:1
b:100
Carnegie Mellon
Snoopy Caches
Thread1 Cache
E
S
a: 2
Thread1:
Wa:
a
=
2;
Rb:
print(b);
a:2
print 2
E b:200
S
b:200
Main Memory
a:1
Thread2:
Wb:
b
=
200;
Ra:
print(a);
Thread2 Cache
S
S b:200
int
a
=
1;
int
b
=
100;
b:100
print 200
Carnegie Mellon
Op. Queue
Instruction
Cache
PC
Functional Units
Integer
Arith
Integer
Arith
FP
Arith
Load /
Store
Data Cache
Carnegie Mellon
Hyperthreading
Instruction Control
Reg A
Instruction
Decoder
Op. Queue A
Reg B
Op. Queue B
PC A
Instruction
Cache
PC B
Functional Units
Integer
Arith
Integer
Arith
FP
Arith
Load /
Store
Data Cache
10
Carnegie Mellon
Mul:core
Separate
instruc0on
logic
and
func0onal
units
Some
shared,
some
private
caches
Must
implement
cache
coherency
Hyperthreading
Combining
Shark
machines:
8
cores,
each
with
2-way
hyperthreading
Theore0cal
speedup
of
16X
Carnegie Mellon
Summa:on Example
12
Carnegie Mellon
13
Carnegie Mellon
14
Carnegie Mellon
15
Carnegie Mellon
Unsynchronized Performance
N
=
230
Best
speedup
=
2.86X
Gets
wrong
answer
when
>
1
thread!
16
Carnegie Mellon
Mutex
pthread_mutex_lock(&mutex);
global_sum += i;
pthread_mutex_unlock(&mutex);
17
Carnegie Mellon
Terrible
Performance
2.5
seconds
~10
minutes
18
Carnegie Mellon
Separate Accumula:on
19
Carnegie Mellon
20
Carnegie Mellon
21
Carnegie Mellon
22
Carnegie Mellon
False
Sharing
0
psum
Cache Block m
23
Carnegie Mellon
Carnegie Mellon
25
Carnegie Mellon
Carnegie Mellon
27
Carnegie Mellon
p2
L2
p2
R2
28
Carnegie Mellon
R
p3
L3
p3
R3
R
L
29
Carnegie Mellon
30
Carnegie Mellon
Parallel Quicksort
Degree
of
parallelism
Top-level
par00on:
none
Second-level
par00on:
2X
31
Carnegie Mellon
p2
R
p3
L2
p2
R2
L3
p3
R3
32
Carnegie Mellon
Carnegie Mellon
Carnegie Mellon
35
Carnegie Mellon
Carnegie Mellon
37
Carnegie Mellon
38
Carnegie Mellon
Carnegie Mellon
Carnegie Mellon
Implementa:on Subtle:es
Reaping
threads
Can
we
be
sure
the
program
wont
terminate
prematurely?
41
Carnegie Mellon
Amdahls Law
Overall
problem
T
Total
0me
required
p
Frac0on
of
total
that
can
be
sped
up
(0
p
1)
k
Speedup
factor
Resul:ng
Performance
Tk
=
pT/k
+
(1-p)T
Por0on
which
can
be
sped
up
runs
k
0mes
faster
Por0on
which
cannot
be
sped
up
stays
the
same
Maximum
possible
speedup
k
=
T
=
(1-p)T
42
Carnegie Mellon
Overall
problem
T
=
10
Total
0me
required
p
=
0.9
Frac0on
of
total
which
can
be
sped
up
k
=
9
Speedup
factor
Resul:ng
Performance
T9
=
0.9
*
10/9
+
0.1
*
10
=
1.0
+
1.0
=
2.0
Maximum
possible
speedup
43
Carnegie Mellon
Sequen:al
bofleneck
Top-level
par00on:
No
speedup
Second
level:
2X
speedup
kth
level:
2k-1X
speedup
Implica:ons
Good
performance
for
small-scale
parallelism
Would
need
to
parallelize
par00oning
step
to
get
large-scale
parallelism
Parallel
Sor0ng
by
Regular
Sampling
H.
Shi
&
J.
Schaeer,
J.
Parallel
&
Distributed
Compu0ng,
1992
44
Carnegie Mellon
Lessons Learned
45