Abstract Power consumption and optimization has become a major issue in IC design. In this paper, we present an implementation of a power efficient Microprocessor without Interlocked Pipeline Stages (MIPS) processor design via VHSIC Hardware Description Language (VHDL). We have implemented a modified MIPS architecture that leads to significant power reduction by effectively reducing unwanted clock transitions of various blocks utilizing techniques such as clock gating and stall power reduction. We have carried out VHDL code implementation and synthesis, analysed module functionality and performance issue like area, power dissipation and propagation delay, implemented placement and routing and generated schematic/layout. Finally, the design is verified for DRC and LVS errors. Keywords : MIPS, RISC, parallel pipelined, VHDL, clock gating. 1. Introduction MIPS is a general purpose five stage pipelined micro architecture based on RISC design principle and implemented on a single VLSI chip. The architecture of MIPS refers to the instruction set, registers, adders, layout etc. and the hardware implementation refers to the manner in which different processors use the architecture to build their own model. The architecture remains the same for all MIPS based processors while the implementations may differ. In our paper, the design and implementation of the processor based on parallel pipelining method has been explored. Pipelining is an implementation technique used to improve both CPI (Cycle Per Instruction) and overall system performance. Pipelining allows a processor to work on different steps of the instructions at the same time, thus more instructions can be executed in a shorter period of time. Thus in pipelining, each module of MIPS processor does not wait for the previous instruction to finish before it can execute. Pipelining the MIPS processor introduces events called hazards, which prevent the next instruction in the instruction stream from being executed during its designated clock cycle. The types of hazards include structural, data and control hazards. a) Structural hazards arise when the flow of instructions requires more hardware resources than those available on the platform [1]. b) Data hazards arise when there is a data dependency between the current instruction and the previous instruction in the pipeline. c) Control hazards arise when there is a change in the flow of the program such as a branch or a jump instruction that changes the PC. Manan Parikh, Mayuresh Dawoo, Pallavi Manjunath, Prashant Awasthi, Fahad Usmani 2. Architecture and Design The instruction set of MIPS consists of namely three types of instructions: Register, Immediate and Jump. The 32-bit MIPS format for these instructions is as follows: R Type: Opcode(6) rs(5) rt(5) rd(5) sa(5) Function(6) I Type: Opcode(6) rs(5) rt(5) Immediate(16) J Type : Opcode(6) Target(26) The five stage pipelined architecture has been very well explained by Hennessey & Patterson [1]. We have adopted this architecture only for our MIPS implementation. The figure below shows that 5-stage pipelined MIPS architecture described in that book. Fig. 1 - 5-stage pipelined MIPS architecture Department of Electrical and Computer Engineering, University of Florida International Journal of Research in Engineering and Applied Sciences (IJREAS) IJREAS, Vol. 02, Issue 01, Jan 2014 ISSN : 2249-9210 20 Stages of Mips Pipeline a) Fetch stage: In this stage, the program counter is used to access instruction memory and fetch the next instruction to be executed. The instruction fetched is then passed on to control and decode unit to make proper decisions for the correct execution of that instruction. b) Decode st age: During this stage, the instruction is decoded and the required operands are retrieved. c) Execute stage: In this stage, for R type instructions, the ALU operations are performed according to the ALU operation control signals and for load and store instructions, effective address calculation is done. d) Data memory stage: If the instruction being executed is of the load or store type, then the data memory is accessed during this stage. The load and store instructions write to and read from the data memory in the memory stage on the basis of previously calculated effective address. e) Write-back stage: During this stage, the results of the calculation from the execute stage or from the memory access stage are updated into the registers in the register file. The updated value is fetched in accordance with the type of instruction. For arithmetic instructions the value is taken directly from the execute stage, whereas for the load instruction this value is taken from the memory access stage. Our design incorporates this architecture with the major building blocks of the design being: 1) Memory and Register Blocks 2) Datapath 3) Control Logic 4) Data Forwarding Unit 5) Hazard Detection Unit 6) Power Reduction Unit. Instruction memory: The instruction memory stores the instructions that are to be executed by the processor. It is 32 bits by 256 bits wide and takes 8 bit address from the program counter as an input and gives 32 bit instruction word as an output. Data memory: Data memory is accessed by the load and store instructions and it is 8 by 256 bits wide. There is also a register file in the decode stage which contains thirty two 8 bit general purpose registers. Datapath: The datapath consist of 5 stage pipelined structure. The five stages being fetch, decode, execute, memory access and write back stage. Pipeline registers are placed between each stage and they are used to carry the result of the previous stage to the next stages. Control unit: The control unit is responsible for generating signals that are used for coordinating all components of entire processor. It controls the flow of data across the pipeline and also generates signals that are responsible for taking care of hazard detection. Data forwarding unit: Data hazards arise from the dependence of one instruction on an earlier one in the pipeline. To avoid these hazards, data needs to be forwarded which is done by the forwarding unit. Forwarding is implemented by feeding back the output of instruction into the previous stage of the pipeline as soon as the output of that instruction is available. Fig. 2 - Illustration of data forwarding in pipelined MIPS Hazard Detection unit: This unit detects conditions under which data forwarding is not possible and stalls the pipeline for one or two clock cycles so that instructions can be executed in the correct sequence. Fig. 3 - Figure showing pipeline stalls due to data hazards 3. Power Reduction Techniques in our Implementation The need for low power design is motivated by several factors, such as the emergence of portable systems, thermal considerations, reliability issues and environmental concerns. Low power consumption helps to reduce heat dissipation, lengthen battery life, and increase device reliability. With increasing demand for low power battery driven electronic systems, power efficient design is presently an active research area. In battery powered applications, where speed is less of a concern, pipelined processors are often used. The pipeline stages of MIPS for different type of instructions are shown in fig. 4. Fig. 4 - Pipelined Representation of Instructions Store Instruction IF ID EX MEM NOP Load Instruction IF ID EX MEM WB R Type Instruction & Arithmetic I-Type IF ID EX NOP WB Branch Instruction IF ID EX WB MEM Jump Instruction IF ID NOP NOP NOP International Journal of Research in Engineering and Applied Sciences (IJREAS) IJREAS, Vol. 02, Issue 01, Jan 2014 ISSN : 2249-9210 21 We have implemented mainly two power reduction techniques in our MIPS design as explained below: 3.1 Stall Power Reduction It can be seen that arithmetic type instructions do not use memory access stage. Also, Store instructions do not require write-back stage while load instructions go through all pipeline stages. Transitions during the unused stage cause extra power consumption. These unwanted transitions can be reduced by by- passing the unused pipeline stage. In arithmetic instruction memory access stage is not used, so data obtained from the execution stage is forwarded directly to write back stage. During this time, the EX/MEM pipeline registers are maintained at zero value thus ensuring that no transitions take place, clock is disabled for the memory stage unless it is a store instruction and thus the power dissipation is effectively reduced. This bypassing of data can be continued till a Load instruction is encountered. A Load instruction uses all five stages of the pipeline and hence a resource conflict will arise as shown by crossed-arrows in Fig. 5. So, data has to pass through the regular pipelined structure till it encounters a store/branch/jump instruction after which the reconfigured pipeline can be again brought in. Fig. 5 - Pipeline Showing Scope for Reduction of One Stage Fig. 6 shows the reconfigured pipeline structure. Here it is seen that the three arithmetic instructions following the store instructions use the reconfigured pipeline structure. After that a load instruction comes thus reverting back to the original pipeline structure. The modified structure will again be used after the next store/branch/jump instruction. Fig. 6 - Reconfigured Pipeline with Bypassing 3.2 Clock Gating Another power optimization technique used in our design is clock gating wherein power reduction is achieved by disabling the pipeline stages that cause unnecessary switching activity. Switching activity is one of the influencing factors of dynamic power dissipation where the dynamic power is given by the equation, 2 P =0.5*C*(Vdd) * *f sw clk Thereby decreasing switching activity sw results in reduced dynamic power consumption. So by reducing unwanted transitions the dynamic power consumption of pipelined processor is reduced. 4. Implementation We have implemented an 8-bit MIPS processor in VHDL using behavioural description. Each stage is an individual module and is connected to the other required modules. Data is passed between stages through a top module which interconnects all the blocks and this overall behaves as the MIPS processor. The control unit supervises the functioning of all modules and controls data, branch and structural hazards and disabling of blocks. The different instructions implemented in our processor are 1. R-type : add, sub, addi, slt, srl 2. I-type : lw, sw 3. Branch : beq, jump Stall power reduction is implemented as explained in the earlier section. We have implemented clock gating in two stages. First, the clock is gated with reset in the top module to avoid unnecessary transitions in clock when reset is high. Secondly, the gated clock from the first stage is gated at every module with the stall signal such that when the pipeline is stalled due to structural or data hazards, the clock does not operate thus disabling the module and saving power. 5. Conclusion and Results In this work, we have designed an 8 bit MIPS processor in VHDL with hazard detection and we have optimized it with respect to power by clock gating and stall power reduction techniques. The reduction in power compared when clock is gated with stall in all the modules compared to baseline code with nooptimizations is found to be 49.39%. The reduction in power with two stages of clock gating as compared to the baseline code is found to be 62.52%. The design is synthesized in Cadence and DRC and LVS are verified. The comparison of power and area for baseline and power optimized MIPS processor are shown below. Fig 7. Power for unoptimized and optimized MIPS processor at different time periods International Journal of Research in Engineering and Applied Sciences (IJREAS) IJREAS, Vol. 02, Issue 01, Jan 2014 ISSN : 2249-9210 22 Fig. 8 - Power dissipation in the optimized code for different frequencies Fig. 9 - Area comparison Fig 10. Final layout of optimized MIPS processor with IO pads References [1] Computer Architecture, Fifth Edition: A Quantitative Approach (The Morgan Kaufmann Series in ComputerArchitecture and Design) [2] Gautham P, Parthasarathy R, Karthi Balasubramanian, Amrita Vishwavidyapeetham Low power pipelined MIPS processor design. [3] Jingpeng Lv, Xianzong Xie,Kyung Jin Park, Byong Wu, Low power MIPS processor design. [4] Mamun Bin Ibne Reaz,Md.Shabiul Islam, Mohd. S. Sulaiman, A single clock cycle MIPS RISC processor design using VHDL. [5] VHDL Implementation of a MIPS RISC processor Anjana R and Krunal Gandhi, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 2, Issue 8, August 2012 International Journal of Research in Engineering and Applied Sciences (IJREAS) IJREAS, Vol. 02, Issue 01, Jan 2014 ISSN : 2249-9210