Lec11 Pipeline Introduction

CSCE430/830 Computer Architecture
Pipeline: Introduction
Lecturer: Prof. Hong Jiang
Courtesy of Prof. Yifeng Zhu, U of Maine
Fall, 2006
CSCE430/830
Portions of these slides are derived from:

Dave Patterson UCB
Pipeline
Pipelining Outline
Introduction
Defining Pipelining
Pipelining Instructions
Hazards
Structural hazards
Data Hazards
Control Hazards
Performance
Controller implementation
CSCE430/830
Pipeline
What is Pipelining?
A way of speeding up execution of instructions
Key idea:
overlap execution of multiple instructions
CSCE430/830
Pipeline
The Laundry Analogy

Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
Washer takes 30 minutes

Dryer takes 30 minutes
Folder takes 30 minutes
Stasher takes 30 minutes
to put clothes into drawers
CSCE430/830
Pipeline
If we do laundry sequentially...
6 PM 7
T
a
s
k
O
r
d
e
r
10
11
12
2 AM
3030 30 30 3030 30 30 3030 30 30 3030 30 30

Time
B
C
D
Time Required: 8 hours for 4 loads
CSCE430/830
Pipeline
To Pipeline, We Overlap Tasks

6 PM 7
T
a
s
k
O
r
d
e
r
10
3030 30 30 30 3030
11
12
2 AM
Time
A
B
C
D
Time Required: 3.5 Hours for 4 Loads
CSCE430/830
Pipeline
To Pipeline, We Overlap Tasks

6 PM 7
T
a
s
k
O
r
d
e
r
3030 30 30 30 3030
A
B
C
D
CSCE430/830
10
11
12
2 AM
Time
Pipelining doesnt help latency of
single task, it helps throughput of
entire workload
Pipeline rate limited by slowest
pipeline stage
Multiple tasks operating
simultaneously
Potential speedup = Number pipe
stages
Unbalanced lengths of pipe stages
reduces speedup
Time to fill pipeline and time to
drain it reduces speedup
Pipeline
Pipelining a Digital System

1 nanosecond = 10^-9 second
1 picosecond = 10^-12 second
Key idea: break big computation up into pieces
1ns
Separate each piece with a pipeline register
200ps
200ps
200ps
200ps
200ps
Pipeline
Register
CSCE430/830
Pipeline
Pipelining a Digital System

Why do this? Because it's faster for repeated
computations
Non-pipelined:
1 operation finishes
every 1ns
1ns
Pipelined:
1 operation finishes
every 200ps
200ps
CSCE430/830
200ps
200ps
200ps
200ps
Pipeline
Comments about pipelining

Pipelining increases throughput, but not
latency
Answer available every 200ps, BUT
A single computation still takes 1ns
Limitations:
Computations must be divisible into stage size
Pipeline registers add overhead
CSCE430/830
Pipeline
Pipelining a Processor
Recall the 5 steps in instruction execution:

1.
2.
3.
4.
5.
Instruction Fetch (IF)

Instruction Decode and Register Read (ID)
Execution operation or calculate address (EX)
Memory access (MEM)
Write result into register (WB)
Review: Single-Cycle Processor

All 5 steps done in a single clock cycle
Dedicated hardware required for each step
CSCE430/830
Pipeline
Review - Single-Cycle Processor
CSCE430/830
What do we need to add to actually split the datapath into stages?
Pipeline
The Basic Pipeline For MIPS
Reg
Ifetch
Reg
Ifetch
Reg
DMem
Reg
Reg
DMem
ALU
Ifetch
DMem
ALU
O
r
d
e
r
Reg
ALU
I
n
s
t
r.
Ifetch
ALU
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Reg
DMem
Reg
What do we need to add to actually split the datapath into stages?

CSCE430/830
Pipeline
Basic Pipelined Processor
CSCE430/830
Pipeline
Pipeline example: lw
IF
CSCE430/830
Pipeline
ID
CSCE430/830
Pipeline
EX
CSCE430/830
Pipeline
MEM
CSCE430/830
Pipeline
WB
Can you find a problem?

CSCE430/830
Pipeline
Basic Pipelined Processor (Corrected)
CSCE430/830
Pipeline
Single-Cycle vs. Pipelined Execution

Non-Pipelined
Instruction
Order
200
Instruction REG
Fetch
RD
lw$1,100($0)
lw$2,200($0)
400
600
ALU
MEM
800
REG
WR
800ps
1000
Instruction REG
Fetch
RD
lw$3,300($0)
Pipelined
Instruction
Order
lw$1,100($0)
lw$2,200($0)
lw$3,300($0)
1200
1400
ALU
MEM
800ps
1600
REG
WR
1800
Time
Instruction
Fetch
800ps
0
200
Instruction
Fetch
200ps
400
REG
RD
Instruction
Fetch
200ps
600
800
ALU
MEM
REG
RD
Instruction
Fetch
ALU
REG
RD
1000
1200
1400
1600
Time
REG
WR
MEM
ALU
REG
WR
MEM
REG
WR
200ps 200ps 200ps 200ps 200ps

CSCE430/830
Pipeline
Speedup
Consider the unpipelined processor introduced previously. Assume that
it has a 1 ns clock cycle and it uses 4 cycles for ALU operations and
branches, and 5 cycles for memory operations, assume that the relative
frequencies of these operations are 40%, 20%, and 40%, respectively.
Suppose that due to clock skew and setup, pipelining the processor
adds 0.2ns of overhead to the clock. Ignoring any latency impact, how
much speedup in the instruction execution rate will we gain from a
pipeline?
Average instruction execution time
= 1 ns * ((40% + 20%)*4 + 40%*5)
= 4.4ns
Speedup from pipeline
= Average instruction time unpiplined/Average instruction time pipelined
= 4.4ns/1.2ns = 3.7
CSCE430/830
Pipeline
Comments about Pipelining

The good news
Multiple instructions are being processed at same time
This works because stages are isolated by registers
Best case speedup of N
The bad news

Instructions interfere with each other - hazards
Example: different instructions may need the same
piece of hardware (e.g., memory) in same clock cycle
Example: instruction may require a result produced
by an earlier instruction that is not yet complete
CSCE430/830
Pipeline
Pipeline Hazards
Limits to pipelining: Hazards prevent next instruction
from executing during its designated clock cycle
Structural hazards: two different instructions use same h/w
in same cycle
Data hazards: Instruction depends on result of prior
instruction still in the pipeline
Control hazards: Pipelining of branches & other instructions
that change the PC
CSCE430/830
Pipeline
Summary - Pipelining Overview

Pipelining increase throughput (but not
latency)
Hazards limit performance
Structural hazards
Control hazards
Data hazards
CSCE430/830
Pipeline
Pipelining Outline
Introduction
Defining Pipelining
Pipelining Instructions
Hazards
Structural hazards
Data Hazards
Control Hazards
Performance
Controller implementation
CSCE430/830
Pipeline

Lec11 Pipeline Introduction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec11 Pipeline Introduction

Uploaded by

Copyright:

Available Formats

CSCE430/830 Computer Architecture

Portions of these slides are derived from:

The Laundry Analogy

Washer takes 30 minutes

3030 30 30 3030 30 30 3030 30 30 3030 30 30

To Pipeline, We Overlap Tasks

To Pipeline, We Overlap Tasks

Pipelining a Digital System

Key idea: break big computation up into pieces

Separate each piece with a pipeline register

Pipelining a Digital System

Comments about pipelining

Recall the 5 steps in instruction execution:

Instruction Fetch (IF)

Review: Single-Cycle Processor

Review - Single-Cycle Processor

What do we need to add to actually split the datapath into stages?

The Basic Pipeline For MIPS

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

What do we need to add to actually split the datapath into stages?

Basic Pipelined Processor

Can you find a problem?

Basic Pipelined Processor (Corrected)

Single-Cycle vs. Pipelined Execution

200ps 200ps 200ps 200ps 200ps

Comments about Pipelining

The bad news

Summary - Pipelining Overview

You might also like