Pipe Lining The CPU

Pipelining the CPU
From COE1502
Pipelining the CPU Jump to: navigation, search
Contents
[hide]

1 Pipelined CPU 2 MIPS32 3 Pipeline States / Phases of Computation 4 Creating the CPU 5 Testing your CPU
[edit] Pipelined CPU

The fundamental performance equation for any CPU (ignoring I/O) is:
Program execution time = (number of instructions executed) * (average clock cycles per instruction) * (clock period time)
A pipelined CPU attempts to improve performance by minimizing the average number of clock cycles per instruction. To reach this goal, the system will be designed so that each of the phases of computation that make up an instruction will be run in parallel. Thus, a single clock cycle (phase) results in multiple instructions completing a phase of execution at the same time. Under Ideal circumstances, one new instruction will start and one will finish in each clock cycle. In reality, an instruction won't always be completed on each clock cycle, but still, the performance benefits of this scheme are obvious.
[edit] MIPS32
Our target platform for this and all remaining projects is the MIPS32 instruction set architecture. If you are not familiar with this or any other RISC-type instruction set, it is recommended that you read the MIPS references provided on the [References] page. Additionally, a complete description of all instructions that are required for this implementation are provided below:
Click Here to See The Instruction Set

Our CPU design itself will be based on the figure below, which is taken from the
Hennessy and Patterson text. If you do not understand pipelining or how a pipelined CPU works, you should review this chapter.
Figure 1: Figure 6.30 from Hennesy and Patterson, piplined CPU block diagram
Figure 2: CPU Top Level Structure
[edit] Pipeline States / Phases of Computation

In essence, this style of CPU can be broken into five distinct phases of computation: 1. Fetch - The next instruction is fetched from memory and stored in the instruction register and the PC is incremented by 4 2. Decode - Instruction operands are read from the register file and latched into buffer registers 3. Execute - Arithmetic, logical, shift, address calculation and branch tests are done. ALU results are latched into the ALU register or a branch/jump is taken (PC is changed). If there is a branch-and-link or a jump-and-link (and it is taken), the PC is stored in register 31 4. Memory - Data from a Load instruction is written into the register file or a register is written to memory in a Store Instruction 5. Write Back - Results are written back into the register file for R-type instructions or memory is accessed for load/store instructions In the pipelined CPU, after each phase, the result is stored in a buffer register and passed to the next phase. When implementing a pipelined CPU, this level of organization should be maintained in design. As a design suggestion, investigate each instruction in the ISA and determine what actions need to be taken during each phase of computation. This analysis will form the basis for the rest of your design.
[edit] Creating the CPU
You have already designed an ALU and will find most of the other major components that you will need in the COE1502 library. These include the register file (regfile32x32) and individual register components (reg32) used for the instruction register and the buffer registers. You will need to implement some miscellaneous logic such as multiplexers, sign extension circuits, etc. You may use Moduleware components or build them yourself as needed. To begin this unit, define a new library called CPU, and create a new component symbol within that library. You may have noticed that the pinout of your CPU is almost the same as the memory components in Figure 1. This is because that schematic shows an entire MIPS based computer system which only requires a clock input. In our design, you are implementing a processor that accesses an external memory, hence the memory access input and output pins on the symbol. One of the ways that such a design could be implemented follows: 1. A new instruction word is loaded to the instruction register. It is decoded by the control unit 2. The control unit generates control signals that are routed to the first pipeline stage 3. At each cycle the control signals are copied from the current stage to the next stage 4. At each stage certain control signals drive certain units, such as muxes, thus directing the data flow and computation There will be situations that must be taken care of, such as data and control hazards such as an attempt to read a register right after its value is changed by the previous instruction, which has not finished yet (RAW hazard). You will have to solve the following hazards by the following methods: 1. RAW data hazard by data feedback (decide whether a register value should be taken from the register file or as a result from a pipeline stage ahead) 2. Control hazard by stalling CPU (do not process new instructions until the hazard is resolved) Please refer the textbook for more information on pipelined CPU design. This is a large project so, it is important that you do this unit in small steps, testing parts of the design as you implement it. When it makes the design clearer and easier to test, utilize hierarchy. One suggested method of testing is to thoroughly check each new instruction as you add it. By checking throughout your design you will catch errors before too much complexity makes it difficult. Additionally, as you complete the design, you will build up a series of tests, each of which test a specific instruction. When the processor is complete, you will also have a suite of tests that you can run on your CPU,
helping to prevent a new instruction from breaking existing ones.
[edit] Testing your CPU

We have included a component in COELIB that you can use to test the functionality of your CPU. The instr_gen component can be used to sequentially feed your CPU a sequence of instructions. Create a testbench that includes your CPU and instr_gen. Since this is not an actual memory, operations that access main memory (fetch, branch, jump, load and store) will not change the actual contents of the generator. However, you can verify the functionality of your CPU by visually inspecting the output signals (MemWrite, MemRead, MemoryAddress and MemoryDataOut) on the simulation waveforms. The instruction generator is initialized once it has been reset and MemWait is never asserted, so your CPU will never wait for the memory to respond. In the next unit, we will connect your CPU to the System Bus and this behavior will be tested then.
Figure 3: Pipelined CPU Testbench It is a good idea use a set of test instructions such that you first initialize the register operands and after the execution has completed, see the final result on the output bus. For example, in order to test the addi, sub and sw instructions, you could use the following program:
addi addi sub sw $7, $0, 17 $11, $0, -3 $11, $7, $11 $11, 0($7) # # # GPR[ 7 ] 17 GPR[11] -3 GPR[11] 20 memory[17] 20
Note that in order to execute all of these instructions, you must run the CPU enough cycles to flush the pipeline and execute all of the instructions in your program. To do this, add No Op instructions to pad the length. In order to use the program above, you will have to initialize instr_gen by assigning values to the contents of the array of instructions (instruction_array) within the simulator. Notice below the inclusion of /pipelined_cpu_tb/instruction_memory when forcing values into the array, as the instruction generator resides in within the hierarchy of the entire testbench.
force force X"20070011" force X"200BFFFD" force X"00EB5822" force X"ACEB0000" force X"00000000" force X"00000000" force X"00000000" force X"00000000" force run force run . . . -freeze clock 1 0, 0 {50 ns} -r 100 /pipelined_cpu_tb/instruction_memory/instruction_array(0) /pipelined_cpu_tb/instruction_memory/instruction_array(1) /pipelined_cpu_tb/instruction_memory/instruction_array(2) /pipelined_cpu_tb/instruction_memory/instruction_array(3) /pipelined_cpu_tb/instruction_memory/instruction_array(4) /pipelined_cpu_tb/instruction_memory/instruction_array(5) /pipelined_cpu_tb/instruction_memory/instruction_array(6) /pipelined_cpu_tb/instruction_memory/instruction_array(7) -freeze /pipelined_cpu_tb/reset 1 0 -freeze /pipelined_cpu_tb/reset 0 0
If everything is working , then the four instructions would have been fetched (MemRead asserted) at addresses 0x00000000, 0x00000004, 0x00000008 and 0x0000000C. You can verify that the addi, sub and sw instructions were executed properly by observing the output bus during the execution of the fourth instruction. If the value of 0x00000014 was stored (MemWrite asserted) at address 0x00000011, then all of the instructions executed correctly. To save time (and ensure accuracy), you can use the SPIM simulator to decode instructions and verify the results of test programs. Note that the above instructions only fill in the instruction memory. If required, you may also need to fill in values for the data memory as well. Also, the name of your memory component may be different from the name listed above.
Pipelining the CPU

From COE1502
Pipelining the CPU Jump to: navigation, search
Contents
[hide]

1 Pipelined CPU 2 MIPS32 3 Pipeline States / Phases of Computation 4 Creating the CPU 5 Testing your CPU
[edit] Pipelined CPU

The fundamental performance equation for any CPU (ignoring I/O) is:
Program execution time = (number of instructions executed) * (average clock cycles per instruction) * (clock period time)
A pipelined CPU attempts to improve performance by minimizing the average number of clock cycles per instruction. To reach this goal, the system will be designed so that each of the phases of computation that make up an instruction will be run in parallel. Thus, a single clock cycle (phase) results in multiple instructions completing a phase of execution at the same time. Under Ideal circumstances, one new instruction will start and one will finish in each clock cycle. In reality, an instruction won't always be completed on each clock cycle, but still, the performance benefits of this scheme are obvious.
[edit] MIPS32
Our target platform for this and all remaining projects is the MIPS32 instruction set architecture. If you are not familiar with this or any other RISC-type instruction set, it is recommended that you read the MIPS references provided on the [References] page. Additionally, a complete description of all instructions that are required for this implementation are provided below:
Click Here to See The Instruction Set

Our CPU design itself will be based on the figure below, which is taken from the
Hennessy and Patterson text. If you do not understand pipelining or how a pipelined CPU works, you should review this chapter.
Figure 1: Figure 6.30 from Hennesy and Patterson, piplined CPU block diagram
Figure 2: CPU Top Level Structure
[edit] Pipeline States / Phases of Computation

In essence, this style of CPU can be broken into five distinct phases of computation: 1. Fetch - The next instruction is fetched from memory and stored in the instruction register and the PC is incremented by 4 2. Decode - Instruction operands are read from the register file and latched into buffer registers 3. Execute - Arithmetic, logical, shift, address calculation and branch tests are done. ALU results are latched into the ALU register or a branch/jump is taken (PC is changed). If there is a branch-and-link or a jump-and-link (and it is taken), the PC is stored in register 31 4. Memory - Data from a Load instruction is written into the register file or a register is written to memory in a Store Instruction 5. Write Back - Results are written back into the register file for R-type instructions or memory is accessed for load/store instructions In the pipelined CPU, after each phase, the result is stored in a buffer register and passed to the next phase. When implementing a pipelined CPU, this level of organization should be maintained in design. As a design suggestion, investigate each instruction in the ISA and determine what actions need to be taken during each phase of computation. This analysis will form the basis for the rest of your design.
[edit] Creating the CPU
You have already designed an ALU and will find most of the other major components that you will need in the COE1502 library. These include the register file (regfile32x32) and individual register components (reg32) used for the instruction register and the buffer registers. You will need to implement some miscellaneous logic such as multiplexers, sign extension circuits, etc. You may use Moduleware components or build them yourself as needed. To begin this unit, define a new library called CPU, and create a new component symbol within that library. You may have noticed that the pinout of your CPU is almost the same as the memory components in Figure 1. This is because that schematic shows an entire MIPS based computer system which only requires a clock input. In our design, you are implementing a processor that accesses an external memory, hence the memory access input and output pins on the symbol. One of the ways that such a design could be implemented follows: 1. A new instruction word is loaded to the instruction register. It is decoded by the control unit 2. The control unit generates control signals that are routed to the first pipeline stage 3. At each cycle the control signals are copied from the current stage to the next stage 4. At each stage certain control signals drive certain units, such as muxes, thus directing the data flow and computation There will be situations that must be taken care of, such as data and control hazards such as an attempt to read a register right after its value is changed by the previous instruction, which has not finished yet (RAW hazard). You will have to solve the following hazards by the following methods: 1. RAW data hazard by data feedback (decide whether a register value should be taken from the register file or as a result from a pipeline stage ahead) 2. Control hazard by stalling CPU (do not process new instructions until the hazard is resolved) Please refer the textbook for more information on pipelined CPU design. This is a large project so, it is important that you do this unit in small steps, testing parts of the design as you implement it. When it makes the design clearer and easier to test, utilize hierarchy. One suggested method of testing is to thoroughly check each new instruction as you add it. By checking throughout your design you will catch errors before too much complexity makes it difficult. Additionally, as you complete the design, you will build up a series of tests, each of which test a specific instruction. When the processor is complete, you will also have a suite of tests that you can run on your CPU,
helping to prevent a new instruction from breaking existing ones.
[edit] Testing your CPU

We have included a component in COELIB that you can use to test the functionality of your CPU. The instr_gen component can be used to sequentially feed your CPU a sequence of instructions. Create a testbench that includes your CPU and instr_gen. Since this is not an actual memory, operations that access main memory (fetch, branch, jump, load and store) will not change the actual contents of the generator. However, you can verify the functionality of your CPU by visually inspecting the output signals (MemWrite, MemRead, MemoryAddress and MemoryDataOut) on the simulation waveforms. The instruction generator is initialized once it has been reset and MemWait is never asserted, so your CPU will never wait for the memory to respond. In the next unit, we will connect your CPU to the System Bus and this behavior will be tested then.
Figure 3: Pipelined CPU Testbench It is a good idea use a set of test instructions such that you first initialize the register operands and after the execution has completed, see the final result on the output bus. For example, in order to test the addi, sub and sw instructions, you could use the following program:
addi addi sub sw $7, $0, 17 $11, $0, -3 $11, $7, $11 $11, 0($7) # # # GPR[ 7 ] 17 GPR[11] -3 GPR[11] 20 memory[17] 20
Note that in order to execute all of these instructions, you must run the CPU enough cycles to flush the pipeline and execute all of the instructions in your program. To do this, add No Op instructions to pad the length. In order to use the program above, you will have to initialize instr_gen by assigning values to the contents of the array of instructions (instruction_array) within the simulator. Notice below the inclusion of /pipelined_cpu_tb/instruction_memory when forcing values into the array, as the instruction generator resides in within the hierarchy of the entire testbench.
force force X"20070011" force X"200BFFFD" force X"00EB5822" force X"ACEB0000" force X"00000000" force X"00000000" force X"00000000" force X"00000000" force run force run . . . -freeze clock 1 0, 0 {50 ns} -r 100 /pipelined_cpu_tb/instruction_memory/instruction_array(0) /pipelined_cpu_tb/instruction_memory/instruction_array(1) /pipelined_cpu_tb/instruction_memory/instruction_array(2) /pipelined_cpu_tb/instruction_memory/instruction_array(3) /pipelined_cpu_tb/instruction_memory/instruction_array(4) /pipelined_cpu_tb/instruction_memory/instruction_array(5) /pipelined_cpu_tb/instruction_memory/instruction_array(6) /pipelined_cpu_tb/instruction_memory/instruction_array(7) -freeze /pipelined_cpu_tb/reset 1 0 -freeze /pipelined_cpu_tb/reset 0 0
If everything is working , then the four instructions would have been fetched (MemRead asserted) at addresses 0x00000000, 0x00000004, 0x00000008 and 0x0000000C. You can verify that the addi, sub and sw instructions were executed properly by observing the output bus during the execution of the fourth instruction. If the value of 0x00000014 was stored (MemWrite asserted) at address 0x00000011, then all of the instructions executed correctly. To save time (and ensure accuracy), you can use the SPIM simulator to decode instructions and verify the results of test programs. Note that the above instructions only fill in the instruction memory. If required, you may also need to fill in values for the data memory as well. Also, the name of your memory component may be different from the name listed above.
Instruction Set Detail
From COE1502
Instruction Set Detail Jump to: navigation, search
Contents
[hide]
1 MIPS Instruction Types o 1.1 R-Type Instruction Format o 1.2 I-Type Instruction Format o 1.3 J-Type Instruction Format (Jump) 2 Instruction Set Detail o 2.1 Add Word (ADD) o 2.2 Add Unsigned Word (ADDU) o 2.3 Add Immediate Word (ADDI) o 2.4 Add Immediate Unsigned Word (ADDIU) o 2.5 Subtract Word (SUB) o 2.6 Subtract Unsigned Word (SUBU) o 2.7 And (AND) o 2.8 And Immediate (ANDI) o 2.9 Or (OR) o 2.10 Or Immediate (ORI) o 2.11 Exclusive OR (XOR) o 2.12 Exclusive OR Immediate (XORI) o 2.13 Not Or (NOR) o 2.14 Shift Word Left Logical (SLL) o 2.15 Shift Word Right Logical (SRL) o 2.16 Shift Word Right Arithmetic (SRA) o 2.17 Shift Word Left Logical Variable (SLLV) o 2.18 Shift Word Right Logical Variable (SRLV) o 2.19 Shift Word Right Arithmetic Variable (SRAV) o 2.20 Set on Less Than (SLT) o 2.21 Set on Less Than Unsigned (SLTU) o 2.22 Set on Less Than Immediate (SLTI) o 2.23 Set on Less Than Immediate Unsigned(SLTIU) o 2.24 Load Byte (LB) o 2.25 Load Byte Unsigned (LBU) o 2.26 Load Halfword (LH) o 2.27 Load Halfword Unsigned (LHU) o 2.28 Load Word (LW) o 2.29 Load Upper Immediate (LUI) o 2.30 Store Byte (SB) o 2.31 Store HalfWord (SH) o 2.32 Store Word (SW)
o o o o o o o o o o o o
2.33 Branch on Equal (BEQ) 2.34 Branch on Not Equal (BNE) 2.35 Branch on Less Than Zero (BLTZ) 2.36 Branch on Greater Than or Equal to Zero (BGEZ) 2.37 Branch on Less Than Zero and Link (BLTZAL) 2.38 Branch on Greater Than or Equal to Zero and Link (BGEZAL) 2.39 Branch on Less Than or Equal to Zero (BLEZ) 2.40 Branch on Greater Than Zero (BGTZ) 2.41 Jump (J) 2.42 Jump and Link (JAL) 2.43 Jump Register (JR) 2.44 Jump and Link Register (JALR)
[edit] MIPS Instruction Types

All instructions in the MIPS Architecture you are implementing are 32 bits in length. There are three different instruction formats: R-Type instructions, I-Type instructions, and J-Type instructions. R-Type instructions, or Register instructions are used for register based ALU operations. The two operands and the destination of the result are specified by locations in the general purpose register (GPR) file. I-Type instructions, or Immediate instructions, can be either Load/Store operations, Branch operations, or Immediate ALU operations. In these instructions, register file locations are specified as well as a 16-bit immediate value which may be used as an operand or an address. J-Type instructions, or Jump instructions, devote all of the non-opcode space to a 26-bit jump destination field. The rs and rt register addresses, which are present in both R- and I-type instructions, specify the two addresses which the register file is to read. In R-Type instructions the destination (write) register for the register file is specifies by rd and in I-Type instructions the destination register is specified by rt (the second read from the register file is ignored).
[edit] R-Type Instruction Format
opcode: Instruction encoding constant (6 bits) rs: Operand located in general purpose register file (5 bits) rt: Operand located in general purpose register file (5 bits)
rd: Destination location of result in general purpose register file (5 bits) shamt: Shift amount (5 bits) funct: Function code (6 bits)
[edit] I-Type Instruction Format
opcode: Instruction encoding constant (6 bits) rs: Operand located in general purpose register file (5 bits) rt: Destination location of result in general purpose register file (5 bits) immediate: Immediate operand value (16 bits)
[edit] J-Type Instruction Format (Jump)
opcode: Instruction encoding constant (6 bits) instr_index: Destination address for jump operation (26 bits)
[edit] Instruction Set Detail

In this course we will be creating a processor which implements a subset of the MIPS 32bit architecture. Due to limitations in space and time, we will not be attempting to implement any floating point operations, integer multiply and divide operations, and
certain operations dealing with the multiply/divide hardware and coprocessors. The core operations which remain can be grouped as follows:
Arithmetic Operations: add, addi, addiu, addu, sub, subu Logic Operations: and, andi, nor, or, ori, xor, xori Shift Operations: sll, sra, srl, sllv, srav, srlv Comparison Instructions: slt, sltu, slti, sltiu Load/Store Instructions: lui, lb, lbu, lh, lhu, lw, sb, sh,
sw
Branch Instructions: beq, bne, bgez, bgezal, bgtz, blez, bltzal, bltz Jump Instructions: j, jal, jr, jalr
The above operations are those that are physically implemented in the hardware of our CPU. Most MIPS assemblers, including the one which accompanies the simulator SPIM, also accept pseudo-operations. A pseudo-operation is a single assembly language instruction which is translated by the assembler into a short sequence of actual operations. Pseudo-operations are not assigned their own opcodes. An example of a pseudo-operation would be the instruction abs rd, rs, which puts the absolute value of general purpose register rs into general purpose register rd. This could be translated into the following sequence of real operations:
slt ra, rs, $0 bne ra, $0, NEG add rd, rs, $0 j DONE sub rd, $0, rs
NEG: DONE:
Below can be found a listing of the MIPS Assembly Language instruction formats that we will be implementing in our COE1502 Microprocessor . Each listing contains the the encoding of the instruction into a 32-bit binary value, the instructionsyntax for use in with any MIPS assembler, such as SPIM, followed by a short description of what the instruction does and a register transfer level (RTL) description of the instruction. All of the instruction encodings, formats and descriptions below were copied from the official MIPS32 instruction set programmers manual. Please refer to this document if you need any further clarification on the instructions you are implementing. You will notice that some of the instructions have notes on how to handle certain special cases (e.g. Overflow exceptions). Ignore the handling of these special cases for now, as we will address them in the next unit.
[edit] Add Word (ADD)
Format: ADD rd, rs, rt Purpose: To add 32-bit integers. If an overflow occurs, then trap. Description: GPR[rd] GPR[rs] + GPR[rt] The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs to produce a 32bit result. If the addition results in 32-bit 2s complement arithmetic overflow, the destination register is not modified and an Integer Overflow exception occurs. If the addition does not overflow, the 32-bit result is placed into GPR rd.
[edit] Add Unsigned Word (ADDU)
Format: ADDU rd, rs, rt Purpose: To add 32-bit integers. Description: GPR[rd] GPR[rs] + GPR[rt] The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs and the 32-bit arithmetic result is placed into GPR rd.
[edit] Add Immediate Word (ADDI)
Format: ADDI rt, rs, immediate Purpose: To add a constant to a 32-bit integer. If overflow occurs, then trap.
Description: GPR[rt] GPR[rs] + immediate The 16-bit signed immediate is added to the 32-bit value in GPR rs to produce a 32-bit result. If the addition results in 32-bit 2s complement arithmetic overflow, the destination register is not modified and an Integer Overflow exception occurs. If the addition does not overflow, the 32-bit result is placed into GPR rt.
[edit] Add Immediate Unsigned Word (ADDIU)
Format: ADDIU rt, rs, immediate Purpose: To add a constant to a 32-bit integer. Description: GPR[rt] GPR[rs] + immediate The 16-bit unsigned immediate is added to the 32-bit value in GPR rs and the 32-bit arithmetic result is placed into GPR rt.
[edit] Subtract Word (SUB)
ctions are used for register based ALU operations. The two operands and the destination of the result are specified by locations in the general purpose register (GPR) file. I-Type instructions, or Immediate instructions, can be either Load/Store operations, Branch operations, or Immediate ALU operations. In these instructions, register file locations are specified as well as a 16-bit immediate value which may be used as an operand or an address. J-Type instructions, or Jump instructions, devote all of the non-opcode space to a 26-bit jump destination field. The rs and rt register addresses, which are present in both R- and I-type instructions, specify the two addresses which the register file is to read. In R-Type instructions the destination (write) register for the register file is specifies by rd and in I-Type instructions the destination register is specified by rt (the second read from the register file is ignored).
[edit] R-Type Instruction Format
opcode: Instruction encoding constant (6 bits) rs: Operand located in general purpose register file (5 bits) rt: Operand located in general purpose register file (5 bits) rd: Destination location of result in general purpose register file (5 bits) shamt: Shift amount (5 bits) funct: Function code (6 bits)
[edit] I-Type Instruction Format
opcode: Instruction encoding constant (6 bits) rs: Operand located in general purpose register file (5 bits) rt: Destination location of result in general purpose register file (5 bits) immediate: Immediate operand value (16 bits)
[edit] J-Type Instruction Format (Jump)
opcode: Instruction encoding constant (6 bits)
instr_index: Destination address for jump operation (26 bits)
[edit] Instruction Set Detail

In this course we will be creating a processor which implements a subset of the MIPS 32bit architecture. Due to limitations in space and time, we will not be attempting to implement any floating point operations, integer multiply and divide operations, and certain operations dealing with the multiply/divide hardware and coprocessors. The core operations which remain can be grouped as follows:
Arithmetic Operations: add, addi, addiu, addu, sub, subu Logic Operations: and, andi, nor, or, ori, xor, xori Shift Operations: sll, sra, srl, sllv, srav, srlv Comparison Instructions: slt, sltu, slti, sltiu Load/Store Instructions: lui, lb, lbu, lh, lhu, lw, sb, sh,
sw
Branch Instructions: beq, bne, bgez, bgezal, bgtz, blez, bltzal, bltz Jump Instructions: j, jal, jr, jalr
The above operations are those that are physically implemented in the hardware of our CPU. Most MIPS assemblers, including the one which accompanies the simulator SPIM, also accept pseudo-operations. A pseudo-operation is a single assembly language instruction which is translated by the assembler into a short sequence of actual operations. Pseudo-operations are not assigned their own opcodes. An example of a pseudo-operation would be the instruction abs rd, rs, which puts the absolute value of general purpose register rs into general purpose register rd. This could be translated into the following sequence of real operations:
slt ra, rs, $0 bne ra, $0, NEG add rd, rs, $0 j DONE sub rd, $0, rs
NEG: DONE:
Below can be found a listing of the MIPS Assembly Language instruction formats that we will be implementing in our COE1502 Microprocessor . Each listing contains the the encoding of the instruction into a 32-bit binary value, the instructionsyntax for use in with any MIPS assembler, such as SPIM, followed by a short description of what the instruction does and a register transfer level (RTL) description of the instruction. All of the instruction encodings, formats and descriptions below were copied from the official MIPS32 instruction set programmers manual. Please refer to this document if you need any further clarification on the instructions you are implementing. You will notice that some of the instructions have notes on how to handle certain special cases (e.g. Overflow
exceptions). Ignore the handling of these special cases for now, as we will address them in the next unit.
[edit] Add Word (ADD)
Format: ADD rd, rs, rt Purpose: To add 32-bit integers. If an overflow occurs, then trap. Description: GPR[rd] GPR[rs] + GPR[rt] The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs to produce a 32bit result. If the addition results in 32-bit 2s complement arithmetic overflow, the destination register is not modified and an Integer Overflow exception occurs. If the addition does not overflow, the 32-bit result is placed into GPR rd.
[edit] Add Unsigned Word (ADDU)
Format: ADDU rd, rs, rt Purpose: To add 32-bit integers. Description: GPR[rd] GPR[rs] + GPR[rt] The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs and the 32-bit arithmetic result is placed into GPR rd.
[edit] Add Immediate Word (ADDI)
Format: ADDI rt, rs, immediate Purpose: To add a constant to a 32-bit integer. If overflow occurs, then trap. Description: GPR[rt] GPR[rs] + immediate The 16-bit signed immediate is added to the 32-bit value in GPR rs to produce a 32-bit result. If the addition results in 32-bit 2s complement arithmetic overflow, the destination register is not modified and an Integer Overflow exception occurs. If the addition does not overflow, the 32-bit result is placed into GPR rt.
[edit] Add Immediate Unsigned Word (ADDIU)
Format: ADDIU rt, rs, immediate Purpose: To add a constant to a 32-bit integer. Description: GPR[rt] GPR[rs] + immediate The 16-bit unsigned immediate is added to the 32-bit value in GPR rs and the 32-bit arithmetic result is placed into GPR rt.
[edit] Subtract Word (SUB)
Format: SUB rd, rs, rt Purpose: To subtract 32-bit integers. If overflow occurs, then trap. Description: GPR[rd] GPR[rs] - GPR[rt]
The 32-bit word value in GPR rt is subtracted from the 32-bit value in GPR rs to produce a 32-bit result. If the subtraction results in 32-bit 2s complement arithmetic overflow, then the destination register is not modified and an Integer Overflow exception occurs. If it does not overflow, the 32-bit result is placed into GPR rd.
[edit] Subtract Unsigned Word (SUBU)
Format: SUBU rd, rs, rt Purpose: To subtract 32-bit integers. Description: GPR[rd] GPR[rs] - GPR[rt] The 32-bit word value in GPR rt is subtracted from the 32-bit value in GPR rs and the 32bit arithmetic result is and placed into GPR rd.
[edit] And (AND)
Format: AND rd, rs, rt Purpose: To do a bitwise logical AND Description: GPR[rd] GPR[rs] AND GPR[rt] The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical AND operation. The result is placed into GPR rd.
[edit] And Immediate (ANDI)
Format: ANDI rt, rs, immediate
Purpose: To do a bitwise logical AND with a constant. Description: GPR[rt] GPR[rs] AND immediate The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rs in a bitwise logical AND operation. The result is placed into GPR rt.
[edit] Or (OR)
Format: OR rd, rs, rt Purpose: To do a bitwise logical OR. Description: GPR[rd] GPR[rs] OR GPR[rt] The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical OR operation. The result is placed into GPR rd.
[edit] Or Immediate (ORI)
Format: ORI rt, rs, immediate Purpose: To do a bitwise logical OR with a constant. Description: GPR[rt] GPR[rs] OR immediate The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rs in a bitwise logical OR operation. The result is placed into GPR rt.
[edit] Exclusive OR (XOR)
Format: XOR rd, rs, rt Purpose: To do a bitwise logical Exclusive OR. Description: GPR[rd] GPR[rs] XOR GPR[rt] Combine the contents of GPR rs and GPR rt in a bitwise logical Exclusive OR operation and place the result into GPR rd.
[edit] Exclusive OR Immediate (XORI)
Format: XORI rt, rs, immediate Purpose: To do a bitwise logical Exclusive OR with a constant. Description: GPR[rt] GPR[rs] XOR immediate Combine the contents of GPR rs and the 16-bit zero-extended immediate in a bitwise logical Exclusive OR operation and place the result into GPR rt.
[edit] Not Or (NOR)
Format: NOR rd, rs, rt Purpose: To do a bitwise logical NOT OR Description: GPR[rd] GPR[rs] NOR GPR[rt] The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical NOR operation. The result is placed into GPR rd.
[edit] Shift Word Left Logical (SLL)
Format: SLL rd, rt, sa Purpose: To left-shift a word by a fixed number of bits. Description: GPR[rd] GPR[rt] << sa The contents of the low-order 32-bit word of GPR rt are shifted left, inserting zeros into the emptied bits; the word result is placed in GPR rd. The bit-shift amount is specified by sa.
[edit] Shift Word Right Logical (SRL)
Format: SRL rd, rt, sa Purpose: To execute a logical right-shift of a word by a fixed number of bits. Description: GPR[rd] GPR[rt] >> sa (logical) The contents of the low-order 32-bit word of GPR rt are shifted right, inserting zeros into the emptied bits; the word result is placed in GPR rd. The bit-shift amount is specified by sa.
[edit] Shift Word Right Arithmetic (SRA)
Format: SRA rd, rt, sa Purpose: To execute an arithmetic right-shift of a word by a fixed number of bits. Description: GPR[rd] GPR[rt] >> sa (arithmetic)
The contents of the low-order 32-bit word of GPR rt are shifted right, duplicating the sign-bit (bit 31) in the emptied bits; the word result is placed in GPR rd. The bit-shift amount is specified by sa.
[edit] Shift Word Left Logical Variable (SLLV)
Format: SLLV rd, rt, rs Purpose: To left-shift a word by a variable number of bits. Description: GPR[rd] GPR[rt] << GPR[rs] The contents of the low-order 32-bit word of GPR rt are shifted left, inserting zeros into the emptied bits; the result word is placed in GPR rd. The bit-shift amount is specified by the low-order 5 bits of GPR rs.
[edit] Shift Word Right Logical Variable (SRLV)
Format: SRLV rd, rt, rs Purpose: To execute a logical right-shift of a word by a variable number of bits. Description: GPR[rd] GPR[rt] >> GPR[rs] (logical) The contents of the low-order 32-bit word of GPR rt are shifted right, inserting zeros into the emptied bits; the word result is placed in GPR rd. The bit-shift amount is specified by the low-order 5 bits of GPR rs.
[edit] Shift Word Right Arithmetic Variable (SRAV)
Format: SRAV rd, rt, rs Purpose: To execute an arithmetic right-shift of a word by a variable number of bits. Description: GPR[rd] GPR[rt] >> GPR[rs] (arithmetic) The contents of the low-order 32-bit word of GPR rt are shifted right, duplicating the sign-bit (bit 31) in the emptied bits; the word result is placed in GPR rd. The bit-shift amount is specified by the low-order 5 bits of GPR rs.
[edit] Set on Less Than (SLT)
Format: SLT rd, rs, rt Purpose: To record the result of a less-than comparison. Description: GPR[rd] (GPR[rs] < GPR[rt]) Compare the contents of GPR rs and GPR rt as signed integers and record the Boolean result of the comparison in GPR rd. If GPR rs is less than GPR rt, the result is 1 (true); otherwise, it is 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.
[edit] Set on Less Than Unsigned (SLTU)
Format: SLTU rd, rs, rt Purpose: To record the result of an unsigned less-than comparison. Description: GPR[rd] (GPR[rs] < GPR[rt]) Compare the contents of GPR rs and GPR rt as unsigned integers and record the Boolean result of the comparison in GPR rd. If GPR rs is less than GPR rt, the result is 1 (true); otherwise, it is 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.
[edit] Set on Less Than Immediate (SLTI)
Format: SLTI rt, rs, immediate Purpose: To record the result of a less-than comparison with a constant. Description: GPR[rt] (GPR[rs] < immediate) Compare the contents of GPR rs and the 16-bit signed immediate as signed integers and record the Boolean result of the comparison in GPR rt. If GPR rs is less than immediate, the result is 1 (true); otherwise, it is 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.
[edit] Set on Less Than Immediate Unsigned(SLTIU)
Format: SLTIU rt, rs, immediate Purpose: To record the result of an unsigned less-than comparison with a constant. Description: GPR[rt] (GPR[rs] < immediate) Compare the contents of GPR rs and the zero-extended 16-bit immediate as unsigned integers and record the Boolean result of the comparison in GPR rt. If GPR rs is less than immediate, the result is 1 (true); otherwise, it is 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.
[edit] Load Byte (LB)
Format: LB rt, offset(base)
Purpose: To load a byte from memory as a signed value. Description: GPR[rt] memory[GPR[base] + offset] The contents of the 8-bit byte at the memory location specified by the effective address are fetched, sign-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
[edit] Load Byte Unsigned (LBU)
Format: LBU rt, offset(base) Purpose: To load a byte from memory as an unsigned value. Description: GPR[rt] memory[GPR[base] + offset] The contents of the 8-bit byte at the memory location specified by the effective address are fetched, zero-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
[edit] Load Halfword (LH)
Format: LH rt, offset(base) Purpose: To load a halfword from memory as a signed value. Description: GPR[rt] memory[GPR[base] + offset] The contents of the 16-bit halfword at the memory location specified by the aligned effective address are fetched, sign-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
[edit] Load Halfword Unsigned (LHU)
Format: LHU rt, offset(base) Purpose: To load a halfword from memory as an unsigned value. Description: GPR[rt] memory[GPR[base] + offset] The contents of the 16-bit halfword at the memory location specified by the aligned effective address are fetched, zero-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
[edit] Load Word (LW)
Format: LW rt, offset(base) Purpose: To load a word from memory as a signed value. Description: GPR[rt] memory[GPR[base] + offset] The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched, sign-extended to the GPR register length if necessary, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address. The effective address must be naturally-aligned. If either of the 2 leastsignificant bits of the address is non-zero, an Address Error exception occurs.
[edit] Load Upper Immediate (LUI)
Format: LUI rt, immediate Purpose: To load a constant into the upper half of a word.
Description: GPR[rt] immediate || 0x0000 The 16-bit immediate is shifted left 16 bits and concatenated with 16 bits of low-order zeros. The 32-bit result is placed into GPR rt.
[edit] Store Byte (SB)
Format: SB rt, offset(base) Purpose: To store a byte to memory. Description: memory[GPR[base] + offset] GPR[rt] The least-significant 8-bit byte of GPR rt is stored in memory at the location specified by the effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address.
[edit] Store HalfWord (SH)
Format: SH rt, offset(base) Purpose: To store a halfword to memory. Description: memory[GPR[base] + offset] GPR[rt] The least-significant 16-bit halfword of register rt is stored in memory at the location specified by the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address. The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
[edit] Store Word (SW)
Format: SW rt, offset(base) Purpose: To store a word to memory. Description: memory[GPR[base] + offset] GPR[rt] The 32-bit word of GPR rt is stored in memory at the location specified by the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address. The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.
[edit] Branch on Equal (BEQ)
Format: BEQ rs, rt, offset Purpose: To compare GPRs then do a PC-relative conditional branch. Description: if GPR[rs] = GPR[rt] then PC PC + branch_offset An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address. If the contents of GPR rs and GPR rt are equal, branch to the effective target address after the instruction in the delay slot is executed.
[edit] Branch on Not Equal (BNE)
Format: BNE rs, rt, offset Purpose: To compare GPRs then do a PC-relative conditional branch.
Description: if GPR[rs] GPR[rt] then PC PC + branch_offset An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address. If the contents of GPR rs and GPR rt are not equal, branch to the effective target address after the instruction in the delay slot is executed.
[edit] Branch on Less Than Zero (BLTZ)
Format: BLTZ rs, offset Purpose: To test a GPR then do a PC-relative conditional branch. Description: if GPR[rs] < 0 then PC PC + branch_offset An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address. If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed.
[edit] Branch on Greater Than or Equal to Zero (BGEZ)
Format: BGEZ rs, offset Purpose: To test a GPR then do a PC-relative conditional branch. Description: if GPR[rs] 0 then PC PC + branch_offset An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address. If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed.
[edit] Branch on Less Than Zero and Link (BLTZAL)
Format: BLTZAL rs, offset Purpose: To test a GPR then do a PC-relative conditional procedure call. Description: if GPR[rs] < 0 then GPR[31]PC + 8, PC PC + branch_offset Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, where execution continues after a procedure call. An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address. If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed.
[edit] Branch on Greater Than or Equal to Zero and Link (BGEZAL)
Format: BGEZAL rs, offset Purpose: To test a GPR then do a PC-relative conditional procedure call. Description: if GPR[rs] 0 then GPR[31]PC + 8, PC PC + branch_offset Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, where execution continues after a procedure call. An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address. If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed.
[edit] Branch on Less Than or Equal to Zero (BLEZ)
Format: BLEZ rs, offset Purpose: To test a GPR then do a PC-relative conditional branch. Description: if GPR[rs] 0 then PC PC + branch_offset An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address. If the contents of GPR rs are less than or equal to zero (sign bit is 1 or value is zero), branch to the effective target address after the instruction in the delay slot is executed.
[edit] Branch on Greater Than Zero (BGTZ)
Format: BGTZ rs, offset Purpose: To test a GPR then do a PC-relative conditional branch. Description: if GPR[rs] > 0 then PC PC + branch_offset An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address. If the contents of GPR rs are greater than zero (sign bit is 0 but value not zero), branch to the effective target address after the instruction in the delay slot is executed.
[edit] Jump (J)
Format: J target
Purpose: To branch within the current 256 MB-aligned region. Description: PC PC31..28 || instr_index || 00 This is a PC-region branch (not PC-relative); the effective target address is in the current 256 MB-aligned region. The low 28 bits of the target address is the instr_index field shifted left 2 bits. The remaining upper bits are the corresponding bits of the address of the instruction in the delay slot (not the branch itself). Jump to the effective target address. Execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself.
[edit] Jump and Link (JAL)
Format: JAL target (rd = 31 implied) Purpose: To execute a procedure call within the current 256 MB-aligned region. Description: GPR[rd] PC + 8, PC PC31..28 || instr_index || 00 Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, at which location execution continues after a procedure call. This is a PC-region branch (not PC-relative); the effective target address is in the current 256 MB-aligned region. The low 28 bits of the target address is the instr_index field shifted left 2 bits. The remaining upper bits are the corresponding bits of the address of the instruction in the delay slot (not the branch itself). Jump to the effective target address. Execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself.
[edit] Jump Register (JR)
Format: JR rs Purpose: To execute a branch to an instruction address in a register. Description: PC GPR[rs]
Jump to the effective target address in GPR rs. Execute the instruction following the jump, in the branch delay slot, before jumping.
[edit] Jump and Link Register (JALR)
Format: JALR rs (rd = 31 implied) Purpose: To execute a procedure call to an instruction address in a register. Description: GPR[31] PC + 8, PC GPR[rs] Place the return address link in GPR rd. The return link is the address of the second instruction following the branch, where execution continues after a procedure call.

Pipe Lining The CPU

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pipe Lining The CPU

Uploaded by

Copyright:

Available Formats

Pipelining the CPU

[edit] Pipelined CPU

Click Here to See The Instruction Set

Figure 2: CPU Top Level Structure

[edit] Pipeline States / Phases of Computation

[edit] Creating the CPU

helping to prevent a new instruction from breaking existing ones.

[edit] Testing your CPU