Professional Documents
Culture Documents
CS194 Lecure
CS194 Lecure
CS194 Lecure
Moores Law
8/29/2007
CS194 Lecure
100,000,000
1000
10,000,000
1,000,000
i80386
i80286
100,000
R3000
R2000
100
Clock Rate (M Hz)
Transistors
R10000
Pentium
10
i8086
10,000
i8080
i4004
1,000
1970 1975 1980 1985 1990 1995 2000 2005
Year
0.1
1970
1980
1990
2000
Year
CS194 Lecure
10000
Suns
Surface
Rocket
Nozzle
1000
Nuclear
Reactor
100
8086
Hot Plate
10 4004
8008 8085
386
286
8080
1
1970
8/29/2007
1980
P6
Pentium
486
1990
Year
CS194 Lecure
Source: Patrick
Gelsinger, Intel
2000
2010
6
Performance = (Cores
2Cores
Cores ***FF)*1
F
F/2
Additional benefits
Small/simple cores more predictable performance
8/29/2007
CS194 Lecure
??%/year
1000
52%/year
100
10
25%/year
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
VAX
: 25%/year 1978 to 1986
RISC
+ x86: 52%/year 1986 to CS194
2002 Lecure
8/29/2007
8/29/2007
CS194 Lecure
Performance Comparison
10000
1000
??%/year
2x every
5 years?
52%/year
100
10
25%/year
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
VAX
: 25%/year 1978 to 1986
RISC
+ x86: 52%/year 1986 to CS194
2002 Lecure
8/29/2007
RISC + x86: ??%/year 2002 to present
11
CS194 Lecure
12
r = 0.3
mm
CS194 Lecure
13
There is little or no
hidden parallelism
(ILP) to be found
Parallelism must be
exposed to and
managed by software
CS194 Lecure
14
Multicore in Products
We are dedicating all of our future product development to
multicore designs. This is a sea change in computing
Paul Otellini, President, Intel (2005)
AMD/05
Intel/06
IBM/04
Sun/07
Processors/chip
Threads/Processor
16
Threads/chip
128
CS194 Lecure
15
CS194 Lecure
16
CS194 Lecure
17
Outline
all
Why powerful computers must be parallel processors
Including your laptop
8/29/2007
CS194 Lecure
18
8/29/2007
CS194 Lecure
19
CS194 Lecure
20
Speedup(P) = Time(1)/Time(P)
<= 1/(s + (1-s)/P)
<= 1/s
Even if the parallel part speeds up perfectly
performance is limited by the sequential part
8/29/2007
CS194 Lecure
21
Overhead of Parallelism
Given enough parallel work, this is the biggest barrier to
getting desired speedup
Parallelism overheads include:
8/29/2007
CS194 Lecure
22
Proc
Cache
L2 Cache
Proc
Cache
L2 Cache
L3 Cache
L3 Cache
Memory
Memory
Memory
potential
interconnects
L3 Cache
Load Imbalance
Load imbalance is the time that some processors in the
system are idle due to
insufficient parallelism (during that phase)
unequal size tasks
8/29/2007
CS194 Lecure
24
Course
Organization
8/29/2007
CS194 Lecure
25
Course Mechanics
Expected background
All of 61 series
At least one upper div software/systems course, preferably 162
Work in course
Homework with programming (~1/week for first 8 weeks)
Parallel hardware in CS, from Intel, at LBNL
CS194 Lecure
26
Reading Materials
Optional text
Introduction to Parallel Computing, 2nd Edition Ananth Grama,
Anshul Gupta, George Karypis, Vipin Kumar, Addison-Wesley,
2003
8/29/2007
CS194 Lecure
27