You are on page 1of 34

More on Pipelining

Advanced Computer Architecture

Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Arithmetic Pipeline Design


Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Typical Instruction Pipeline


A typical instruction execution includes a sequence of operations which includes: Instruction Fetch (F) Decode (D) Operand Fetch or Issue (I) Execute, several stages (E) Write Back (W)

Source: Kai Hwang

Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Arithmetic Pipeline Design


Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Instruction Execution Phases


Each operation (F, D, I, E, W) may require one clock cycle or more. Ideally, these operations need to be overlapped. Example (assumptions): load and store instructions take four cycles add and multiply instructions take three cycles

Shaded regions indicate idle cycles due to dependencies

Source: Kai Hwang

Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Arithmetic Pipeline Design


Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Mechanisms for Instruction Pipelining


Goal: Achieve maximum parallelism in pipeline by smoothening the instruction flow and minimizing the idle cycles Mechanisms: Prefetch Buffers Multiple Functional Units Internal Data Forwarding Hazard Avoidance

Prefetch Buffers
Used to match the instruction fetch rate to the pipeline consumption rate In a single memory access, a block of consecutive instructions are fetched into a prefetch buffer Three types of prefetch buffers: Sequential buffers, used to store sequential instructions Target buffers, used to store branch target instructions Loop buffer, used to store loop instructions

Source: Kai Hwang

Multiple Functional Units


At times, a specific pipeline stage becomes the bottleneck Identified by large number of checks in a row in reservation table To resolve dependencies, we use reservation stations Each RS is uniquely identified with a tag monitored by tag unit (Register Tagging) Helps in conflict resolution and serving as buffer

Source: Kai Hwang

Internal Data Forwarding


Goal: Memory access operations to be replaced with register transfer operations Types: Store load forwarding Load load forwarding Store store forwarding

Source: Kai Hwang

Hazard Avoidance
Read/write of shared variables by different instructions in pipeline may lead to different results if instructions are executed out of order Types: Read after Write (RAW) Hazard Write after Write (WAW) Hazard Write after Read (WAR) Hazard

Source: Kai Hwang

Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Arithmetic Pipeline Design


Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Instruction Scheduling
Aim: To schedule instructions through an instruction pipeline Types of instruction scheduling: Static Scheduling
Supported by optimizing compiler

Dynamic Scheduling
Achieved by Tomasulos register-tagging scheme Using scoreboarding scheme

Static Scheduling
Data dependency in a sequence of instructions create interlocked relationships Interlocking can be resolved by compiler by increasing separation between interlocked instructions Example:

Two independent load instructions can be moved ahead so that spacing between them and multiply instruction is increased.

Tomasulos Algorithm
Hardware dependent scheme Data operands are saved in Register Station (RS) until dependencies get resolved Register tagging is used to allocate/deallocate register All working registers are tagged

Source: Kai Hwang

Scoreboarding
Multiple functional units appear in multiple execution pipelines. Parallel units allow instruction to execute out of order w.r.t. original program sequence. Processor has instruction buffers, instructions are issues regardless of the availability of their operands. Centralized control units called scoreboard is used to keep track of unavailable operands for instructions stored in buffer

Source: Kai Hwang

Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Arithmetic Pipeline Design


Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Branch Handling Techniques


Pipeline performance is limited by presence of branch instructions in program Various branch strategies are applied to minimize performance degradation To evaluate branch strategy, two approaches can be followed
Trace data approach Analytical analysis

Effect of branching contd.

Branching Illustrated
Ib: Branch Instruction Once branch taken is decided, all instructions are flushed Subsequently, all the instructions at branch target are run

Source: Kai Hwang

Effect of Branching
Nomenclature:
Branch Taken, action of fetching non-sequential (remote) instructions after branch instruction Branch Target, (remote) instruction to be executed after branch taken Delay Slot (b), number of pipeline cycles consumed between branch taken and branch target In general, 0 <= b <= k-1 where k is number of pipeline stages

Effect of Branching
When branch taken occurs, all instruction after branch instruction become useless, pipeline is flushed, loosing number of cycles Let Ib be branch instruction, then branch taken shall cause all instructions from Ib+1 till Ib+k-1 to be drained from pipeline Let p be probability of instruction to be branch instruction and q be probability of branch taken, then penalty, in terms of time is expressed as Tpenalty = pqnbt , where n: number of instructions; b: number of pipeline cycles consumed; t: cycle time Effective execution time becomes T = kt + (n-1)t +

Branch Prediction
Branch can be predicted based on
Static Branch Strategy
Probability of branch with respect to a particular branch type can be used to predict branch Probability may be obtained by collecting frequency of branch taken and branch types across large number of program traces

Dynamic Branch Strategy


Uses limited recent branch history to predict whether or not branch will be taken when it occurs next time

Branch Prediction Internals


Branch prediction buffer
Used to store the branch history information in order to make branch prediction

State transition diagram used in dynamic branch prediction

Source: Kai Hwang

Delayed Branches
Branch penalty can be reduced by the concept of delayed branch The central idea is to delay the execution of branch instruction to accommodate independent* instructions
Delaying by d cycles allows few useful instructions (independent*) of branch instructions to be executed * Execution of these instructions should be independent of outcome of branch instruction

Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Arithmetic Pipeline Design


Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Computer Arithmetic Principles


Arithmetic is performed with finite precision due to use of fixed size memory words or registers Finite precision implies that data exceeding the limit is either truncated or rounded off Types of arithmetic operations: Fixed point operations
Represented internally mostly using 2s complement

Floating point operations


Represented internally mostly using IEEE 754 standard

Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Arithmetic Pipeline Design


Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Static Arithmetic Pipelines


Most arithmetic pipelines performs fixed functions. Due to performance of a fixed function, it is also called unifunctional pipeline ALUs performs fixed-point using integer unit Floating-point operations is performed using a separate unit (coprocessor)

Arithmetic Pipeline Design


All arithmetic operations can be performed using basic add and shift operations Arithmetic and logical shifts can be performed with shift registers Addition can be done using carry propagation adder (CPA) or carry save adder (CSA)

Source: Kai Hwang

Multiply Pipeline Design


CSA and CPA are used at different stages to design pipeline for fixed point multiplication
Example: multiplication of two 8-bit numbers, producing a 16-bit result S1: generates eight partial products S2: two levels of CSAs taking eight numbers and producing four S3: two CSAs convert four numbers into two numbers S4: one CPA takes two numbers and result into one number
Source: Kai Hwang

Pipeline Design
Instruction Pipeline Design
Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Arithmetic Pipeline Design


Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Multifunctional Arithmetic Pipeline


Multifunctional arithmetic pipeline perform many functions Types of multifunctional pipelines: Static pipeline
Performs single function at a given time, another function at some other time

Dynamic pipeline
Performs multiple functions at the same time Care needs to be taken in sharing the pipeline

Static Multifunctional Pipeline


Example: Advanced Scientific Computer
Key features: Four pipeline arithmetic units Large number of working registers in the processor which controls operations of memory buffer units and arithmetic units IPU handles fetching and decoding of instructions

Source: Kai Hwang

Pipeline Interconnections
Example: Advanced Scientific Computer Arithmetic pipeline has eight stages It is an example of static multifunctional pipeline With change in interconnections, different functions (fixed-point and floating point) can be performed

Source: Kai Hwang

You might also like