Pipelining: Advanced Computer Architecture

Pipelining
Advanced Computer Architecture
Pipelining Techniques
Linear Pipeline Processors
Asynchronous and Synchronous Models Clocking and Timing control Speedup, Efficiency and Throughput
Non Linear Pipeline Processors

Reservation and Latency Analysis Collision Free Scheduling Pipeline Schedule Optimization

A linear pipeline processor is constructed with k processing stages i.e. S1 Sk These stages are linearly connected to perform a specific function Data stream flows from one end of the pipeline to another end, external inputs are fed into S1 and results move out from Sk , intermediate results pass from Si to Si+1 Linear pipelining applied to: Instruction execution Arithmetic computation Memory access operations

Asynchronous Model
Data flow controlled by handshaking protocol
When a stage Si is ready to transmit, it sends a ready signal to stage Si+1 This is followed by the actual data transfer After stage Si+1 receives the data, it returns an acknowledge signal to Si
Source: Kai Hwang
Synchronous Model
Clocked latches are used to interface between stages
Latches are flip flops that isolate inputs from outputs. Upon arrival of a clock pulse, all latches transfer data to next stage at same time.
Pipeline stages are combinational circuits.
Source: Kai Hwang
Reservation Table
It specifies the utilization pattern of successive stages in a synchronous pipeline Space time graph depicting precedence relationship in using the pipeline stages
Source: Kai Hwang

Clocking and Timing Control

Clock cycle and throughput:
Clock cycle time (t) of a pipeline is given below t = tm + d where tm denote maximum stage delay d denote latch delay Pipeline frequency (1/t) is referred as throughput of the pipeline
Clock skewing:
Ideally clock pulses should arrive at all stages at same time, but due to clock skewing, same clock pulse may arrive at different stages with an offset of s Further, let tmax be time delay of longest logic path in a stage and tmin be that of shortest logic path in a stage, then d + tmax + s <= t <= tm + tmin - s

Speedup
Case 1: Pipelined processor
Ideally, number of clock cycles required by a k stage pipeline to process n tasks is:Np = k + (n-1) (k clock cycles for first task & 1 clock cycle for each of n-1 tasks) Total time required is Tp = (k+(n-1))t
Case 2: Non-pipelined processor

Non-pipelined processor would take time, Tnp = nkt
Speedup Factor:
Sk = Tp / Tnp = nkt / (k+ (n-1))t = nk / (k + n-1))
Efficiency & Throughput

Efficiency: It is defined as speedup divided by number of stages:Ek = Sk / k = n / (k + (n-1))
Throughput: It is defined as number of tasks per unit time as below:Hk = n / (k + (n-1))t = nf / (k + (n-1))


It has a dynamic pipeline that can be reconfigured to perform different functions at different times Dynamic pipeline allows feedback and feedforward connections in addition to the conventional streamline connections Output of the non-linear pipeline is not necessarily from the last stage.
Source: Kai Hwang

Reservation Tables
Each table evaluates a function Number of columns in a reservation table represent the evaluation time Pipeline initiation happens when input for a function is fed into the pipeline Note: There is only a single reservation table of linear pipeline
Source: Kai Hwang
Latency Analysis
Number of time units between two initiations of pipeline is called latency Any attempt by two or more initiations to use the same pipeline stage at same time causes collision Latencies that cause collisions are called forbidden latencies
Source: Kai Hwang
Latency Analysis contd.

A sequence of permissible non-forbidden latencies between successive task initiations is called latency sequence Latency sequence repeats itself after every fixed number of cycles called latency cycle
Source: Kai Hwang

Collision Free Scheduling

Scheduling Goal: To obtain shortest average latency between initiations without collisions Next, we aim to study a systematic method to achieve collision free scheduling
Collision vectors State diagrams Single cycles Greedy cycles Minimal average latency (MAL)
Collision Vector
Combined set of permissible and forbidden latencies can be displayed by a collision vector It is a binary representation of size 1 . n-1, where n is evaluation time
C = (Cn-1 Cn-2 .. C2 C1)
Ci = 1, if latency i causes a collision Ci = 0, if latency i is permissible
Examples: Cx = (1011010) ; Cy = (1010)
State Diagrams
From the collision vector, one can construct a state diagram, specifying the permissible state transitions among successive initiations Next state is obtained with the help of a shift register and at time t+p where p refers to a permissible latency
Source: Kai Hwang
Cycles
There are many latency cycles that can be traced from state diagram
Eg. (1,8), (1,8,6,8), (3), (6), (3,8), etc.
Among these only simple cycles are of interest

Simple cycle is the latency cycle in which each state appears only once. Eg. (3), (6), (1,8), etc.
Some of these simple cycles are greedy cycles

Greedy cycle is the one whose edges are all made with minimum latencies from respective starting states. Eg. (1,8), (3), etc.

Pipelining: Advanced Computer Architecture

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pipelining: Advanced Computer Architecture

Uploaded by

Copyright:

Available Formats

Pipelining

Advanced Computer Architecture

Non Linear Pipeline Processors

Linear Pipeline Processors

Non Linear Pipeline Processors

Source: Kai Hwang

Pipeline stages are combinational circuits.

Source: Kai Hwang

Source: Kai Hwang

Non Linear Pipeline Processors

Clocking and Timing Control

Non Linear Pipeline Processors

Case 2: Non-pipelined processor

Efficiency & Throughput

Non Linear Pipeline Processors

Non Linear Pipeline Processors

Source: Kai Hwang

Non Linear Pipeline Processors

Source: Kai Hwang

Source: Kai Hwang

Latency Analysis contd.

Source: Kai Hwang

Non Linear Pipeline Processors

Collision Free Scheduling

Examples: Cx = (1011010) ; Cy = (1010)

Source: Kai Hwang

Among these only simple cycles are of interest

Some of these simple cycles are greedy cycles

You might also like