You are on page 1of 23

Pipelining

Advanced Computer Architecture

Pipelining Techniques
Linear Pipeline Processors
Asynchronous and Synchronous Models Clocking and Timing control Speedup, Efficiency and Throughput

Non Linear Pipeline Processors


Reservation and Latency Analysis Collision Free Scheduling Pipeline Schedule Optimization

Linear Pipeline Processors


A linear pipeline processor is constructed with k processing stages i.e. S1 Sk These stages are linearly connected to perform a specific function Data stream flows from one end of the pipeline to another end, external inputs are fed into S1 and results move out from Sk , intermediate results pass from Si to Si+1 Linear pipelining applied to: Instruction execution Arithmetic computation Memory access operations

Pipelining Techniques
Linear Pipeline Processors
Asynchronous and Synchronous Models Clocking and Timing control Speedup, Efficiency and Throughput

Non Linear Pipeline Processors


Reservation and Latency Analysis Collision Free Scheduling Pipeline Schedule Optimization

Asynchronous Model
Data flow controlled by handshaking protocol
When a stage Si is ready to transmit, it sends a ready signal to stage Si+1 This is followed by the actual data transfer After stage Si+1 receives the data, it returns an acknowledge signal to Si

Source: Kai Hwang

Synchronous Model
Clocked latches are used to interface between stages
Latches are flip flops that isolate inputs from outputs. Upon arrival of a clock pulse, all latches transfer data to next stage at same time.

Pipeline stages are combinational circuits.

Source: Kai Hwang

Reservation Table
It specifies the utilization pattern of successive stages in a synchronous pipeline Space time graph depicting precedence relationship in using the pipeline stages

Source: Kai Hwang

Pipelining Techniques
Linear Pipeline Processors
Asynchronous and Synchronous Models Clocking and Timing control Speedup, Efficiency and Throughput

Non Linear Pipeline Processors


Reservation and Latency Analysis Collision Free Scheduling Pipeline Schedule Optimization

Clocking and Timing Control


Clock cycle and throughput:
Clock cycle time (t) of a pipeline is given below t = tm + d where tm denote maximum stage delay d denote latch delay Pipeline frequency (1/t) is referred as throughput of the pipeline

Clock skewing:
Ideally clock pulses should arrive at all stages at same time, but due to clock skewing, same clock pulse may arrive at different stages with an offset of s Further, let tmax be time delay of longest logic path in a stage and tmin be that of shortest logic path in a stage, then d + tmax + s <= t <= tm + tmin - s

Pipelining Techniques
Linear Pipeline Processors
Asynchronous and Synchronous Models Clocking and Timing control Speedup, Efficiency and Throughput

Non Linear Pipeline Processors


Reservation and Latency Analysis Collision Free Scheduling Pipeline Schedule Optimization

Speedup
Case 1: Pipelined processor
Ideally, number of clock cycles required by a k stage pipeline to process n tasks is:Np = k + (n-1) (k clock cycles for first task & 1 clock cycle for each of n-1 tasks) Total time required is Tp = (k+(n-1))t

Case 2: Non-pipelined processor


Non-pipelined processor would take time, Tnp = nkt

Speedup Factor:
Sk = Tp / Tnp = nkt / (k+ (n-1))t = nk / (k + n-1))

Efficiency & Throughput


Efficiency: It is defined as speedup divided by number of stages:Ek = Sk / k = n / (k + (n-1))

Throughput: It is defined as number of tasks per unit time as below:Hk = n / (k + (n-1))t = nf / (k + (n-1))

Pipelining Techniques
Linear Pipeline Processors
Asynchronous and Synchronous Models Clocking and Timing control Speedup, Efficiency and Throughput

Non Linear Pipeline Processors


Reservation and Latency Analysis Collision Free Scheduling Pipeline Schedule Optimization

Non Linear Pipeline Processors


It has a dynamic pipeline that can be reconfigured to perform different functions at different times Dynamic pipeline allows feedback and feedforward connections in addition to the conventional streamline connections Output of the non-linear pipeline is not necessarily from the last stage.

Source: Kai Hwang

Pipelining Techniques
Linear Pipeline Processors
Asynchronous and Synchronous Models Clocking and Timing control Speedup, Efficiency and Throughput

Non Linear Pipeline Processors


Reservation and Latency Analysis Collision Free Scheduling Pipeline Schedule Optimization

Reservation Tables
Each table evaluates a function Number of columns in a reservation table represent the evaluation time Pipeline initiation happens when input for a function is fed into the pipeline Note: There is only a single reservation table of linear pipeline

Source: Kai Hwang

Latency Analysis
Number of time units between two initiations of pipeline is called latency Any attempt by two or more initiations to use the same pipeline stage at same time causes collision Latencies that cause collisions are called forbidden latencies

Source: Kai Hwang

Latency Analysis contd.


A sequence of permissible non-forbidden latencies between successive task initiations is called latency sequence Latency sequence repeats itself after every fixed number of cycles called latency cycle

Source: Kai Hwang

Pipelining Techniques
Linear Pipeline Processors
Asynchronous and Synchronous Models Clocking and Timing control Speedup, Efficiency and Throughput

Non Linear Pipeline Processors


Reservation and Latency Analysis Collision Free Scheduling Pipeline Schedule Optimization

Collision Free Scheduling


Scheduling Goal: To obtain shortest average latency between initiations without collisions Next, we aim to study a systematic method to achieve collision free scheduling
Collision vectors State diagrams Single cycles Greedy cycles Minimal average latency (MAL)

Collision Vector
Combined set of permissible and forbidden latencies can be displayed by a collision vector It is a binary representation of size 1 . n-1, where n is evaluation time
C = (Cn-1 Cn-2 .. C2 C1)
Ci = 1, if latency i causes a collision Ci = 0, if latency i is permissible

Examples: Cx = (1011010) ; Cy = (1010)

State Diagrams
From the collision vector, one can construct a state diagram, specifying the permissible state transitions among successive initiations Next state is obtained with the help of a shift register and at time t+p where p refers to a permissible latency

Source: Kai Hwang

Cycles
There are many latency cycles that can be traced from state diagram
Eg. (1,8), (1,8,6,8), (3), (6), (3,8), etc.

Among these only simple cycles are of interest


Simple cycle is the latency cycle in which each state appears only once. Eg. (3), (6), (1,8), etc.

Some of these simple cycles are greedy cycles


Greedy cycle is the one whose edges are all made with minimum latencies from respective starting states. Eg. (1,8), (3), etc.

You might also like