Multi Layer AHB Protocol

Abstract
The multilayer advanced high-performance bus (ML-AHB) busmtrix employs slave-side arbitration. Slave-side arbitration is different from master-side arbitration in terms of request and grant signals since, in the former, the master merely starts a burst transaction and waits for the slave response to proceed to the next transfer. Therefore, in the former, the unit of arbitration can be a transaction or a transfer. However, the ML-AHB busmatrix of ARM offers only transferbased fixed-priority and round-robin arbitration schemes. In this project, the design a flexible arbiter for the ML-AHB busmatrix to support three priority policiesfixed priority, round robin, and dynamic priorityand three data multiplexing modestransfer, transaction, and desired transfer length. In total, there are nine possible arbitration schemes. The proposed arbiter, which is self-motivated (SM), selects one of the nine possible arbitration schemes based upon the priority-level notifications and the desired transfer length from the masters so that arbitration leads to the maximum performance. In this project, a flexible arbiter based on the SM arbitration scheme for the ML-AHB busmatrix will be designed and also arbiter should supports three priority policies-fixed priority, round-robin, and dynamic priority-and three approaches to data multiplexing- transfer, transaction, and desired transfer length; in other words, there are nine possible arbitration schemes. In addition, the SM arbiter design should selects one of the nine possible arbitration schemes based on the priority-level notifications and the desired transfer length from the masters to allow the arbitration to lead to the maximum performance. The design is simulated using Modelsim and synthesized using Xilinx tools.
CHAPTER 1 INTRODUCTION
1.1 Introduction The Advanced Microcontroller Bus Architecture (AMBA) was introduced by ARM Ltd in 1996 and is widely used as the on-chip bus in System-on-a-chip (SoC) designs. AMBA is a registered trademark of ARM Ltd. The first AMBA buses were Advanced System Bus (ASB) and Advanced Peripheral Bus (APB). In its 2nd version, AMBA 2, ARM added AMBA Highperformance Bus (AHB) that is a single clock-edge protocol.
Figure 1.1 ARM soc Block Diagram
In 2003, ARM introduced the 3rd generation, AMBA 3, including AXI to reach even higher performance interconnects and the Advanced Trace Bus (ATB) as part of the CoreSight on-chip debugs and trace solution. These protocols are today the de-facto standard for 32-bit embedded processors because they are well documented and can be used without royalties. Some manufacturers utilize AMBA buses for non-ARM designs. As an example Infineon uses an AMBA bus for the ADM5120 SoC based on the MIPS architecture. The important aspect of a SoC is not only which components or blocks it houses, but also how they are interconnected. AMBA is a solution for the blocks to interface with each other. Since its inception, the scope of AMBA has gone far beyond microcontroller devices, and is now widely used on a range of ASIC and SoC parts including applications processors used in modern portable mobile devices like smartphones.
1.2 Advanced High performance Bus (AHB) Protocol: The AMBA advanced microcontroller bus architecture (AMBA) Specification defines an OnChip Communications standard for designing high performance embedded microcontrollers. Three distinct buses are defined within the AMBA specification The Advanced High-Performance Bus (AHB) The Advanced System Bus (ASB) The Advanced Peripheral Bus (APB)
A test methodology is included with the AMBA specification which provides an infrastructure for modular test and diagnostic access. 1.2.1 Advanced High Performance Bus (AHB):
The AMBA AHB is for high-performance, high clock frequency system modules. The AHB acts as the high-performance system backbone bus. AHB supports the efficient connection of processors, on-chip memories and off-chip external memory interfaces with lowpower peripheral macro cell functions. AHB is also specified to ensure ease of use in an efficient design flow using synthesis and automated test techniques.
1.2.2 Advanced System Bus (ASB): The AMBA ASB is for high-performance system modules. AMBA ASB is an alternative system bus suitable for use where the high-performance features of AHB are not required. ASB also supports the efficient Connection of processors, on-chip memories and off-chip external memory interfaces with low-power peripheral macro cell functions. 1.2.3 Advanced Peripheral Bus (APB): The AMBA APB is for low-power peripherals. AMBA APB is optimized for minimal power consumption and reduced interface complexity to support peripheral functions. APB can be used in conjunction with either version of the system bus. 1.3 Objectives of the AMBA Specification: The AMBA specification has been derived to satisfy four key requirements: To facilitate the right-first-time development of embedded microcontroller products with one or more CPUs or signal processors. To be technology-independent and ensure that highly reusable peripheral and system macro cells can be migrated across a diverse range of IC processes and be appropriate for full-custom, standard cell and gate array technologies. To encourage modular system design to improve processor independence, providing a development road-map for advanced cached CPU cores and the development of peripheral libraries. To minimize the silicon infrastructure required to support efficient on-chip and off-chip communication for both operation and manufacturing test. 1.4 A Typical AMBA-based Microcontroller: An AMBA-based microcontroller typically consists of a high-performance system backbone bus (AMBA AHB or AMBA ASB), able to sustain the external memory bandwidth, on
which the CPU, on-chip memory and other Direct Memory Access (DMA) devices reside. This bus provides a high-bandwidth interface between the elements that are involved in the majority of transfers. Also located on the high-performance bus is a bridge to the lower bandwidth APB, where most of the peripheral devices in the system are located (see Figure 1.1).
Figure1.2: A typical AMBA AHB-based System AMBA APB provides the basic peripheral macro cell communications infrastructure as a secondary bus from the higher bandwidth pipelined main system bus such peripherals typically. Have interfaces which are memory-mapped registers Have no high-bandwidth interfaces Are accessed under programmed control
The external memory interface is application-specific and may only have a narrow data path, but may also support a test access mode which allows the internal AMBA AHB, ASB and APB modules to be tested in isolation with system-independent test sets.
1.5 Multi-layer AHB

THE ON-CHIP bus plays a key role in the system-on-a-chip (SoC) design by enabling the efficient integration of heterogeneous system components such as CPUs, DSPs, application specific cores, memories, and custom logic. Recently, as the level of design complexity has become higher, SoC designs require a system bus with high bandwidth to perform multiple operations in parallel. To solve the bandwidth problems, there have been several types of highperformance on-chip buses proposed, such as the multilayer AHB (ML-AHB) busmatrix from ARM, the PLB crossbar switch from IBM, and CONMAX from Silicore . Among them, the MLAHB busmatrix has been widely used in many SoC designs. This is because of the simplicity of the AMBA bus of ARM, which attracts many IP designers, and the good architecture of the AMBA bus for applying embedded systems with low power. The ML-AHB busmatrix is an interconnection scheme based on the AMBA AHB protocol, which enables parallel access paths between multiple masters and slaves in a system. This is achieved by using a more complex interconnection matrix and gives the benefit of both increased overall bus bandwidth and a more flexible system structure. In particular, the MLAHB busmatrix uses slave-side arbitration. Slave-side arbitration is different from master-side arbitration in terms of request and grant signals since, in the former, the master merely starts a burst transaction and waits for the slave response to proceed to the next transfer. Therefore, the unit of arbitration can be a transaction or a transfer. The transaction-based arbiter multiplexes the data transfer based on the burst transaction, and the transfer-based arbiter switches the data transfer based on a single transfer. However, the ML-AHB busmatrix of ARM presents only transfer-based arbitration schemes, i.e., transfer based fixed-priority and round-robin arbitration schemes. This limitation on the arbitration scheme may lead to degradation of the system performance because the arbitration scheme is usually dependent on the application requirements; recent applications are likewise becoming more complex and diverse. By implementing an efficient arbitration scheme, the system performance can be tuned to better suit
applications. For a high-performance on-chip bus, several studies related to the arbitration scheme have been proposed, such as table-lookup-based crossbar arbitration, two-level timedivision multiplexing (TDM) scheduling, token-ring mechanism , dynamic bus distribution algorithm , and LOTTERYBUS. However, these approaches employ master-side arbitration. Therefore, they can only control priority policy and also present some limitations when handling the transfer-based arbitration scheme since master-side arbitration uses a centralized arbiter. In contrast, it is possible to deal with the transfer-based arbitration scheme as well as the transaction- based arbitration scheme in slave-side arbitration. In this paper, we propose a flexible arbiter based on the self-motivated (SM) arbitration scheme for the ML-AHB busmatrix. Our SM arbitration scheme has the following advantages: 1) It can adjust the processed data unit; 2) it changes the priority policies during runtime; and 3) it is easy to tune the arbitration scheme according to the characteristics of the target application. Hence, our arbiter is able to not only deal with the transfer-based fixed-priority, round-robin, and dynamic-priority arbitration schemes but also manage the transaction-based fixed-priority, round-robin, and dynamic-priority arbitration schemes. Furthermore, our arbiter provides the desired-transfer-length-based fixedpriority, round-robin, and dynamic-priority arbitration schemes. In addition, the proposed SM arbiter selects one of the nine possible arbitration schemes based on the priority-level notifications and the desired transfer length from the masters to ensure that the arbitration leads to the maximum performance. Multi-layer AHB is an interconnection scheme, based on the AHB protocol, which enables parallel access paths between multiple masters and slaves in a system. This is achieved by using a more complex interconnection matrix. Key advantages are: You can develop multi-master systems with an increased available bus bandwidth. You can construct complex multi-master systems that have a flexible architecture. This removes the requirement to fix design decisions about the allocation of system resources to particular masters at the hardware design stage. You can use standard AHB master and slave modules without requiring modification.
Each AHB layer can be very simple because it only has one master, so no arbitration or masterto-slave muxing is required. These layers can use the AHB-Lite protocol, meaning that they do not have to support request and grant, or retry and split transactions. The arbitration effectively becomes point arbitration at each peripheral and is only necessary when more than one master wants to access the same slave simultaneously. The only hardware you have to add to the standard AHB transport infrastructure is the multiplexor block to connect the multiple masters to the peripherals. Because the multi-layer architecture is based on the existing AHB protocol, you can reuse previously-designed masters and slaves without modification. Figure 1.3 shows a block diagram of the basic multi-layer concept.
Figure 1.3 Basic multi-layer concept
1.6 APPLICATIONS
AMBA-AHB can be used in the different application and also it is technology independent. ARM Controllers are designed according to the specifications of AMBA.In the present technology, high performance and speed are required which are convincingly met by
AMBA-AHB Compared to the other architectures AMBA-AHB is far more advanced and efficient. To minimize the silicon infrastructure to support on-chip and off-chip communications Any embedded project which involve in ARM processors or microcontroller must always make use of this AMBA-AHB as the common bus throughout the project.
CHAPTER 2 Literature survey

2.1 Introduction to Advanced High-performance Bus AHB The AHB (Advanced High-performance Bus) is a high-performance bus in AMBA (Advanced Microcontroller Bus Architecture) family. This AHB can be used in high clock frequency system modules. The AHB acts as the high-performance system backbone bus. AHB supports the efficient connection of processors, on-chip memories and off-chip external memory interfaces with low-power peripheral macro cell functions. AHB is also specified to ensure ease of use in an efficient design flow using automated test techniques. This AHB is a technologyindependent and ensure that highly reusable peripheral and system macro cells can be migrated across a diverse range of IC processes and be appropriate for full-custom, standard cell and gate array technologies. 2.2 Features AMBA Advanced High-performance Bus (AHB) supports the following features. High performance Burst transfers Split transactions
Single edge clock operation SEQ, NONSEQ, BUSY, and IDLE Transfer Types Programmable number of idle cycles Large Data bus-widths - 32, 64, 128 and 256 bits wide Address Decoding with Configurable Memory Map 2.3 Merits Since AHB is a most commonly used bus protocol, it must have many advantages from designers point of view and are mentioned below. AHB offers a fairly low cost (in area), low power (based on I/O) bus with a moderate amount of complexity and it can achieve higher frequencies when compared to others because this protocol separates the address and data phases. AHB can use the higher frequency along with separate data buses that can be defined to 128-bit and above to achieve the bandwidth required for high-performance bus applications. AHB can access other protocols through the proper bridging converter. Hence it supports the bridge configuration for data transfer. AHB allows slaves with significant latency to respond to read with an HRESP of SPLIT. The slave will then request the bus on behalf of the master when the read data is available. This enables better bus utilization. AHB offers burst capability by defining incrementing bursts of specified length and it supports both incrementing and wrapping. Although AHB requires that an address phase be provided for each beat of data, the slave can still use the burst information to make the proper request on the other side. This helps to mask the latency of the slave. AHB is defined with a choice of several bus widths, from 8-bit to 1024-bit. The most common implementation has been 32-bit, but higher bandwidth requirements may be satisfied by using 64 or 128-bit buses.
AHB used the HRESP signals driven by the slaves to indicate when an error has occurred. AHB also offers a large selection of verification IP from several different suppliers. The solutions offered support several different languages and run in a choice of environments. Access to the target device is controlled through a MUX, thereby admitting bus-access to one bus-master at a time. AHB Masters, Slaves and Arbiters support Early Burst Termination. Bursts can be early terminated either as a result of the Arbiter removing the HGRANT to a master part way through a burst or after a slave returns a non-OKAY response to any beat of a burst. However that a master cannot decide to terminate a defined length burst unless prompted to do so by the Arbiter or Slave responses. Any slave which does not use SPLIT responses can be connected directly to an AHB master. If the slave does use SPLIT responses then a simplified version of the arbiter is also required. Thus the strengths of the AHB protocol is listed above which clearly resembles the reason for the wide use of this protocol.
2.4 Demerits Even though AHB protocol is commonly used bus in the design, it has some affordable demerits which are listed below. AHB cannot achieve full data bus utilization and bandwidth if some slaves have a relatively high latency. AHB defines transfer sizes of 1, 2, 4, 8, and 16 bytes. Because byte enables are not defined, there are cases where multiple transfers must be made inside a single quadword. AHB defines timing parameters for many of the relationships between signals on the bus. However, these are not associated with requirements relative to a clock cycle. Therefore, SoC developers must integrate AHB cores and run chip level static timing analysis to judge how compatible AHB masters and slaves are with one another. Power-based SoCs cover a wide range of applications, and there is a corresponding wide range of address map requirements. Having the address decodes for all AHB slaves reside
within the interconnect means having to support the most complex split address ranges, even for the simplest of slaves. Thus the weakness of AHB protocol is mentioned above which can be tolerated with respect to its useful advantages. 2.5 Block Diagram The block diagram of the Advanced High-Performance Bus Protocol is shown in the Figure 2.1. Totally this block diagram comprises of four components. Arbiter Master Slave Decoder
2.5.1 Arbiter The arbitration mechanism is used to ensure that only one master has access to the bus at any one time. The arbiter performs this function by observing a number of different requests to use the bus and deciding which is currently the highest priority master requesting the bus.
Figure 2.1 AMBA AHB block diagram
2.5.2 Master A bus master is able to initiate read and write information by providing address and control information. Only one bus master can use the bus at the same time An AHB bus master has the most complex bus interface in an AMBA system. Typically an AMBA system designer
would use predesigned bus masters and therefore would not need to be concerned with the detail of the bus master interface. No provision is made within the AHB specification for a bus master to cancel a transfer once it has commenced. 2.5.3 Slave After a master has started a transfer, the slave then determines how the transfer should progress. Whenever a slave is accessed it must provide a response which indicates the status of the transfer. The HREADY signal is used to extend the transfer and this works in combination with the response signal HRESP which provide the status of the transfer. The slave can complete the transfer in a number of ways. It can: Complete the transfer immediately Signal an error to indicate that the transfer has failed Delay the completion of the transfer, but allow the master and slave to back off the bus, leaving it available for other transfers. 2.5.4 Decoder The AHB decoder is used to decode the address of each transfer and provide a select signal for the slave that is involved in the transfer. A central address decoder is used to provide a select signal HSELx for each slave on the bus. The select signal is a combinatorial decode of the high-order address signals. A slave must only sample the address and control signals and HSELx is asserted when HREADY is HIGH, indicating that the current transfer is completing. 2.6 Working of AHB The AMBA AHB bus protocol is designed with a central multiplexor interconnection scheme. Using this scheme all bus masters drive out the address and control signals indicating the transfer, they wish to perform and the arbiter determines which master has its address and control signals routed to all of the slaves. Before which initially the master who needs to perform the operation should give the request signal to the arbiter and the arbiter will give the grant signal to the master for further proceedings. Similarly, a decoder is used to select the slave which has to
be active during the operation based on the address given by the master. A central decoder is also required to control the read data and response signal multiplexor, which selects the appropriate signals from the slave that is involved in the transfer. These make the read and write operation smoothly. Thus the working of AMBA AHB protocol is explained with the help of its block diagram shown in Figure 2.1. 2.6 Specification The following points should be considered when reading the AMBA specification Technology independence Electrical characteristics Timing specification
2.6.1 Technology Independence AMBA is a technology-independent on-chip protocol. The specification only details the bus protocol at the clock cycle level. 2.6.2 Electrical Characteristics No information regarding the electrical characteristics is supplied within the AMBA specification as this will be entirely dependent on the manufacturing process technology that is selected for the design. 2.6.3 Timing Specification The AMBA protocol defines the behavior of various signals at the cycle level. The exact timing requirements will depend on the process technology used and the frequency of operation. Because the exact timing requirements are not defined by the AMBA protocol, the system integrator is given maximum flexibility in allocating the signal timing budget amongst the various modules on the bus. 2.7 AMBA Signals
All AMBA signals are named such that the first letter of the name indicates which bus the signal is associated with. A lower case n in the signal name indicates that the signal is active LOW, otherwise signal names are always all upper case. Test signals have a prefix T regardless of the bus type. 2.7.1 AHB Signal Prefixes H indicates an AHB signal. For example, HREADY is the signal used to indicate that the data portion of an AHB transfer can complete. It is active HIGH. 2.7.2 AMBA AHB Signal List: All signals are prefixed with the letter H, ensuring that the AHB signals are differentiated from other similarly named signals in a system design. The signals involved in designing the AMBA AHB are listed in the Table 2.1 which also gives the specification of each signal. Table 2.1 AMBA AHB signal specification S.No. 1 2 3 NAME HCLK HADDR HTRANS WIDTH 1 32 2 DRIVER Clock Source Master Master FUNCTION This clock times all bus transfers at the rising edge of HCLK The system address bus of width 32-bit Indicates the type of the current transfer happening When HIGH this signal indicates a 4 HWRITE 1 Master write transfer and when LOW a read transfer
HSIZE
Master
Indicates the size of the transfer
HBURST
Master
Indicates if the transfer forms part of a burst. The write data bus is used to transfer
HWDATA
Master
data from the master to the bus slaves during write operations. Each AHB slave has its own slave
HSELx
Decoder
select signal and this signal indicates that the current transfer is intended for the selected slave. The read data bus is used to transfer
HRDATA
Slave
data from bus slaves to the bus master during read operations. When HIGH the HREADY signal
10
HREADY
Slave
indicates that a transfer has finished on the bus. This signal may be driven LOW to extend a transfer. The transfer response provides
11
HRESP
Slave
additional information on the status of a transfer
The table also includes the function of each signal and the source from which the each signal is driven. The operation is performed in a synchronized clock frequency and hence the signals should be changed with respect to the rising edge of the clock.
2.8 Overview of AMBA AHB Operation Before an AMBA AHB transfer can commence the bus master must be granted access to the bus. This process is started by the master asserting a request signal to the arbiter. Then the arbiter indicates when the master will be granted use of the bus.
A granted bus master starts an AMBA AHB transfer by driving the address and control signals. These signals provide information on the address, direction and width of the transfer, as well as an indication if the transfer forms part of a burst. Two different forms of burst transfers are allowed. Incrementing bursts, which do not wrap at address boundaries Wrapping bursts, which wrap at particular address boundaries A write data bus is used to move data from the master to a slave, while a read data bus is used to move data from a slave to the master. Every transfer consists of: An address and control cycle One or more cycles for the data. The address cannot be extended and therefore all slaves must sample the address during this time. The data, however, can be extended using the HREADY signal. When LOW this signal causes wait states to be inserted into the transfer and allows extra time for the slave to provide or sample data. During a transfer the slave shows the status using the response signals, HRESP OKAY. The OKAY response is used to indicate that the transfer is progressing normally and when HREADY goes HIGH this shows the transfer has completed successfully. 2.8.1 Address Decoding A central address decoder is used to provide a select signal, HSELx, for each slave on the bus. The select signal is a combinatorial decode of the high-order address signals, and simple address decoding schemes are encouraged to avoid complex decode logic and to ensure highspeed operation.
Figure 2.2 Decoder and Slave select signals
A slave must only sample the address and control signals and HSELx when HREADY is HIGH, indicating that the current transfer is completing. Under certain circumstances it is possible that HSELx will be asserted when HREADY is LOW, but the selected slave will have changed by the time the current transfer completes. In the case where a system design does not contain a completely filled memory map an additional default slave should be implemented to provide a response when any of the nonexistent address locations are accessed. Typically the default slave functionality will be implemented as part of the central address decoder. 2.8.2 Slave Transfer Responses After a master has started a transfer, the slave then determines how the transfer should progress. No provision is made within the AHB specification for a bus master to cancel a transfer once it has commenced.
Whenever a slave is accessed it must provide a response which indicates the status of the transfer. The HREADY signal is used to extend the transfer and this works in combination with the response signals, HRESP [1:0], which provide the status of the transfer. The slave can complete the transfer by doing its transfer immediately. 2.8.3 AHB Decoder: The decoder in an AMBA system is used to perform a centralized address decoding function, which improves the portability of peripherals, by making them independent of the system memory map.
Figure 2.3 AHB Decoder Interface Diagram 2.8.4 Arbitration The arbitration mechanism is used to ensure that only one master has access to the bus at any one time. The arbiter performs this function by observing a number of different requests to use the bus and deciding which is currently the highest priority master requesting the bus. The arbiter also receives requests from slaves that wish to complete SPLIT transfers. Any slaves which are not capable of performing SPLIT transfers do not need to be aware of the arbitration process, except that they need to observe the fact that a burst of transfers may not complete if the ownership of the bus is changed. Interface Diagram
Figure 2.4 AHB Arbiter Interface Diagram The role of the arbiter in an AMBA system is to control which master has access to the bus. Every bus master has a REQUEST/GRANT interface to the arbiter and the arbiter uses a prioritization scheme to decide which bus master is currently the highest priority master requesting the bus. The detail of the priority scheme is not specified and is defined for each application. It is acceptable for the arbiter to use other signals, either AMBA or non-AMBA, to influence the priority scheme that is in use. Signal Description A brief description of each of the arbitration signals is given below. HREQx
The bus request signal is used by a bus master to request access to the bus. Each bus master has its own HBUSREQx signal to the arbiter and there can be up to 16 separate bus masters in any system. HGRANTx The grant signal is generated by the arbiter and indicates that the appropriate master is currently the highest priority master requesting the bus, taking into account locked transfers and SPLIT transfers. A master gains ownership of the address bus when HGRANTx is HIGH and HREADY is HIGH at the rising edge of HCLK. HMASTER The arbiter indicates which master is currently granted the bus using the HMASTER[3:0] signals and this can be used to control the central address and control multiplexer. 2.8.5 Requesting Bus Access A bus master uses the HREQx signal to request access to the bus and may request the bus during any cycle. The arbiter will sample the request on the rising of the clock and then use an internal priority algorithm to decide which master will be the next to gain access to the bus. Normally the arbiter will only grant a different bus master when a burst is completing. However, if required, the arbiter can terminate a burst early to allow a higher priority master access to the bus. When a master is granted the bus and is performing a fixed length burst it is not necessary to continue to request the bus in order to complete the burst. The arbiter observes the progress of the burst and uses the HBURST[2:0] signals to determine how many transfers are required by the master. If the master wishes to perform a second burst after the one that is currently in progress then it should re-assert the request signal during the burst. If a master loses access to the bus in the middle of a burst then it must re-assert the HREQx request line to regain access to the bus.
For undefined length bursts the master should continue to assert the request until it has started the last transfer. The arbiter cannot predict when to change the arbitration at the end of an undefined length burst. It is possible that a master can be granted the bus when it is not requesting it. This may occur when no masters are requesting the bus and the arbiter grants access to a default master. Therefore, it is important that if a master does not require access to the bus it drives the transfer type HTRANS to indicate an IDLE transfer. 2.8.6 Granting Bus Access The arbiter indicates which bus master currently the highest priority is requesting the bus by asserting the appropriate HGRANTx signal. When the current transfer completes, as indicated by HREADY HIGH, then the master will become granted and the arbiter will change the HMASTER [3:0] signals to indicate the bus master number. The arbiter changes the HGRANTx signals when the penultimate (one before last) address has been sampled. The new HGRANTx information will then be sampled at the same point as the last address of the burst is sampled.
Figure 2.5 Bus Master Grant Signals Because a central multiplexer is used, each master can drive out the address of the transfer it wishes to perform immediately and it does not need to wait until it is granted the bus. The HGRANTx signal is only used by the master to determine when it owns the bus and hence when it should consider that the address has been sampled by the appropriate slave. A delayed version of the HMASTER bus is used to control the write data multiplexer. 2.8.7 Default Bus Master Every system must include a default bus master which is granted the bus if all other masters are unable to use the bus. When granted, the default bus master must only perform IDLE transfers. If no masters are requesting the bus then the arbiter may either grant the default master or alternatively it may grant the master that would benefit the most from having low access latency to the bus.
Granting the default master access to the bus also provides a useful mechanism for ensuring that no new transfers are started on the bus and is a useful step to perform prior to entering a low-power mode of operation. 2.8.8 AHB Data Bus Width: One way to improve bus bandwidth without increasing the frequency of operation is to make the data path of the on-chip bus wider. Both the increased layers of metal and the use of large on-chip memory blocks (such as Embedded DRAM) are driving factors which encourage the use of wider on-chip buses. Specifying a fixed width of bus will mean that in many cases the width of the bus is not optimal for the application. Therefore an approach has been adopted which allows flexibility of the width of bus, but still ensures that modules are highly portable between designs. The protocol allows for the AHB data bus to be 8, 16, 32, 64, 128, 256, 512 or 1024-bits wide. However, it is recommended that a minimum bus width of 32 bits is used and it is expected that a maximum of 256 bits will be adequate for almost all applications. For both read and write transfers the receiving module must select the data from the correct byte lane on the bus. Replication of data across all byte lanes is not required. 2.8.9 AHB Bus Master An AHB bus master has the most complex bus interface in an AMBA system. Typically an AMBA system designer would use pre designed bus masters and therefore would not need to be concerned with the detail of the bus master interface. Here the Arbiter signals and the transfer signals were mentioned in the below data flow AHB bus master interface diagram.
Figure 2.6 AHB Bus Master Interface Diagram 2.8.10 AHB Bus Slave An AHB bus slave responds to transfers initiated by bus masters within the system. The slave uses a HSELx select signal from the decoder to determine when it should respond to a bus transfer. All other signals required for the transfer, such as the address and control information, will be generated by the bus master.
Figure 2.7 AHB Bus Slave Interface Hence all the signals involved in the slave and decoder were mentioned in the above AHB bus slave interface diagram. 2.9 Multi Layered Advanced High performance Bus protocol (ML-AHB) In the simplest implementation of a multi-layer system, each master has its own AHB Layer and is connected to the slave devices by an interconnect matrix, as shown in Figure 2.8
Figure 2.8 Multi-layer interconnect topology Within the interconnect matrix: 1. Every layer has a Decode stage that determines which slave is required for a transfer. 2. A mux routes the transfer from the appropriate layer to the required slave. If two layers require access to the same slave at the same time, the arbitration within the interconnect matrix must determine which layer has highest priority. The layer that is not given access is waited using HREADY until it is given access to the required slave. When a layer is waited an Input Stage is used to store a copy of the pipelined address and control information until the access to the shared slave is given. Each slave port has its own arbitration and a number of different schemes can be used. For example: Input layers can be serviced in round-robin manner, changing every transfer or every burst The arbitration can use a fixed priority scheme where certain high priority layers are always given access in preference to lower priority layers. The number of input/output ports on the interconnect matrix is completely flexible and can be adapted to suit the system requirements.
2.10 Bus configurations As the number of masters and slaves in a system increases, the complexity of the interconnect matrix can become significant. You can use the following techniques to optimize the system architecture: Figure 2.9 shows how you can make slaves local, or private, to a particular layer. This reduces the complexity of the interconnect matrix. By using this when it is acceptable that a slave can only be accessed by masters on the same layer.
Figure 2.9 Local slaves Figure 2.10 shows how you can make multiple slaves appear as a single slave to the interconnect matrix. This is useful to combine a number of low-bandwidth slaves. You can also use it where a set of slaves are normally accessed by just one master, such as a DMA controller, and the interconnect matrix is used only to give access to other masters under special circumstances, such as debugging the system.
Figure 2.10 multiple slaves on one slave port
Figure 2.11 multiple masters on one layer In the simplest implementation of a multi-layer system, each master has its own layer. Figure 2.11 shows how you can also build a system where multiple masters share a layer. This is well
suited to combine masters that have low-bandwidth requirements or masters, such as the Test Interface Controller (TIC), that have particular characteristics.
Each layer can be a complete AHB subsystem, with the interconnect matrix being used to enable communication between the two subsystems. The example in Figure 2.12 shows only a single slave is shared. Typically this is an on-chip memory slave that is used as a buffer area between the two subsystems.
Figure 2.12 Separate AHB subsystems 2.12 Multi-port slaves In a multi-layer AHB system certain slaves, such as an SDRAM controller, are able to operate more efficiently by processing transfers from different layers in parallel. Figure 2.13 shows how you can do this by designing the slave with multiple AHB slave ports.
Figure 2.13 Multi-port slaves
2.9 Summary
The literature survey is carried out with merits and demerits of AHB and the signal flow diagram is identified. The specification for the signals shown in the signal flow diagram is identified and its working is explained with the help of its block diagram. The discussion on the overview of the AMBA-AHB operation was made which includes all the components involved in the AHB The ML AHB bus configurations and the multi layer AHB bus matrix interconnection scheme are studied.
CHAPTER 3 Design of Self-Motivated Arbitration Scheme for the Multilayer AHB Bus matrix
The ML-AHB bus matrix of ARM consists of the input stage, decoder, and output stage, including an arbiter Fig. 1 shows the overall structure of the ML-AHB bus matrix of ARM.
Fig.3.1 Overall structure of the ML-AHB bus matrix of ARM The input stage is responsible for holding the address and control information when transfer to a slave is not able to commence immediately. The decoder determines which slave that a transfer is destined for. The output stage is used to select which of the various master input ports is routed to the slave. Each output stage has an arbiter. The arbiter determines which input stage has to perform a transfer to the slave and decides which the highest priority is currently. The ML-AHB bus matrix employs slave-side
arbitration, in which the arbiters are located in front of each slave port, as shown in Fig. 1; the master simply starts a transaction and waits for the slave response to proceed to the next transfer. Therefore, the unit of arbitration can be a transaction or a transfer. However, the ML-AHB busmatrix of ARM furnishes only transfer-based arbitration schemes, specifically transfer-based fixed-priority and round-robin arbitration schemes. The transfer-based fixed-priority (round-robin) arbiter multiplexes the data transfer based on a single transfer in a fixed-priority or round-robin fashion. SM ARBITRATION SCHEME FOR THE ML-AHB BUSMATRIX An assumption is made that the masters can change their priority level and can issue the desired transfer length to the arbiters in order to implement a SM arbitration scheme. This assumption should be valid because the system developer generally recognizes the features of the target applications. For example, some masters in embedded systems are required to complete their job for given timing constraints, resulting in the satisfaction of system-level timing constraints. The computation time of each master is predictable, but it is not easy to foresee the data transfer time since the on-chip bus is usually shared by several masters. Previous works solved this issue by minimizing the latencies of several latency-critical masters, but a side effect of these methods is that they can increase the latencies of other masters; hence, they may violate the given timing constraints. Unlike existing works, our scheme can keep the latency close to its given constraint by adjusting the priority level and transfer length of the masters. Fig. 3.2 shows an example. In this example, the service latencies (latency-limit times) of M1, M2, and M3 are 4, 8, and 2 cycles (T14, T8, and T10), respectively. The requests for three masters are all initiated at T0, and M3 is the most latency-sensitive master. Fig.3.2(a) shows an arbitration scheme that does not use latency constraints for arbitration. Therefore, M2 and M3 violate the latency constraint as the masters are selected in ascending order. Only M1 meets the constraint. Fig.3.2(b) shows
the scheduling of a typical latency- minimizing arbiter. It minimizes the latency of the most latency-sensitive module, namely, M3, causing M2 to violate its constraint. Although neither of these two arbitration schemes can meet the latency constraints for all three masters, in the SM arbitration shown in Fig.3.2(c), all masters use the bus with no violations by configuring the priority levels (transfer lengths) of M1, M2, and M3 as the lowest, highest, and intermediate priorities (4, 8, and 2), respectively.
Fig. 3.2. Arbitration scheme examples in an embedded system. (a) Arbitration scheme with no consideration of the latency constraint. (b) Arbitration scheme minimizing latency. (c) SM arbitration scheme.
A 32-b address bus of the masters is used to inform the arbiters of the priority level and the desired transfer length of the masters. Fig. 3 shows the decoding information for our address bus.
Fig.3.3. Decoding information of the 32-b address bus. In Fig.3.3, S_Number indicates the target slave number, P_Level means the priority level of a master, T_Length denotes the desired transfer length of a master, and Offset_Add specifies the internal address of the target slave. Each of S_Number and P_Level consists of 3 b because the maximum number of masterslave sets is 8x8. Also, T_Length is composed of 4 b because the maximum number of burst lengths is 16. Although we used 7 b for P_Level and T_Length in the 32-b address bus to notify the arbiters of the priority level and the desired transfer length of a master, we consider it adequate to express the internal address of a slave because the range of Offset_Add is from 0 to 222-1. Through the aforementioned assumption, the priority level and transfer length can then be changed by the SM demand of each master.
Figure 3.4. Internal structure of our arbiter. Fig.3. 4 shows the internal structure of our arbiter based upon the SM arbitration scheme. In Fig.3.4, the NoPort signal means that none of the masters must be selected and that the address and control signals to the shared slave must be driven to an inactive state, while Master No. indicates the currently selected master number generated by the controller for the SM arbitration scheme. In general, our arbiter consists of an RR block, a P block, two multiplexers, a counter, a controller, and two flip-flops. MUX_1 and MUX_2 are used to select the arbitration scheme and the desired transfer length of a master, respectively. A counter calculates the transfer length, with two flipflops being inserted to avoid the attempts by the critical path to arbitrate. An RR block (P block) performs the round-robin- or priority-based arbitration scheme. A controller compares the priority levels of the requesting masters. If the masters have equal priorities, the controller selects the round-robin arbitration scheme (RR block); in other cases, it chooses the priority arbitration scheme (P block). The controller also makes the final decision on the master for the next transfer based on the transfer length of the selected master. The control process follows the following three steps. 1) If HMASTLOCK is asserted, the same master remains selected. 2) If HMASTLOCK is not asserted and the currently selected master does not exist, the following hold. a) If no master is requesting access, the NoPort signal is asserted. b) Otherwise, a new master for the next transfer is initially selected. If the masters have equal priorities, the round-robin arbitration scheme is selected; otherwise, the priority arbitration scheme is chosen. In addition, the counter is updated based on the transfer length of the selected master. 3) If none of the previous statements applies, the following hold.
a) If the counter is expired, the following hold. i) If the requesting masters do not exist, the No-Port signal is updated based on the HSEL signal of the currently selected master. If the HSEL signal is 1, the same master remains selected, and the NoPort signal is deasserted. Otherwise, the NoPort signal is asserted. ii) Otherwise, a master for the next transfer is selected based on the priority levels of the requesting masters. Also, the counter is updated. b) If the counter is not expired, and the HSEL signal of the current master is 1, the same master remains selected, and the counter is decreased. c) If the currently selected master completes a transaction before the counter is expired, the following hold. i) If the requesting masters do not exist, the No-Port signal is asserted. ii) Otherwise, a master for the next transfer is chosen based on the priority levels of the requesting masters, and the counter is updated. The SM arbitration scheme is achieved through iteration of the aforementioned steps. Combining the priority level and the desired transfer length of the masters allows our arbiter to handle the transfer-based fixedpriority, round-robin, and dynamic-priority arbitration schemes (abbreviated as the FT, RT, and DT arbitration schemes, respectively), as well as the transaction-based fixed-priority, round-robin, and dynamic-priority arbitration schemes (abbreviated as the FR, RR, and DR arbitration schemes, respectively). Moreover, our arbiter can also deal with the desired-transferlength-based fixed-priority, round-robin, and dynamic-priority arbitration schemes (abbreviated as the FL, RL, and DL arbitration schemes, respectively). The transfer- or transaction-based arbiter switches the data transfer based upon a single transfer (burst transaction), and the desired-transfer-lengthbased arbiter multiplexes the data transfer based on the transfer length assigned by the masters.
Fig. 3.5 shows the internal process of an RR block. Initially, we create the up- and down-mask vectors (Up_Mask and Dn_Mask) based on the number of currently selected masters, as shown in Fig. 3.5.
Fig.3.5. Internal process of the RR block. We then generate the up- or down-masked vector created through bitwise AND-ing operation between the mask vectors Fig. 3.5. Internal process of the RR block. And the requested master vector. After generating the up- and down-masked vectors, we examine each masked vector as to whether they are zero or not. If the up-masked vector is zero, the down-masked vector is inserted to the input parameter of the round-robin function; if it is not zero, the upmasked vector is the one inserted. A master for the next transfer is chosen by the round-robin function, and the current master is updated after 1 clock cycle. The RR block is then performed by repeating the arbitration procedure shown in Fig.3. 5. Fig. 3.6 shows the internal procedure of the P block. First of all, we create the highest priority vector (V) through the round-robin function of Fig. 3.6. After generating the highest priority vector (V), the priority-level vectors and the highest priority vector (V) are inserted to the input parameters of the priority function. The master with the highest priority is chosen by the priority function, while the current master is updated after 1 clock cycle.
Fig. 3.6. Internal procedure of the P block. FUTURE SCOPE
Presently there is much other architecture along with AMBA ML -AHB, but with its high performance, speed and reliability its usage is widely increasing. The AMBA-AHB architecture is now only confined to ARM Company, but due to AMBA ML- Self Motivated AHB advantages, many other companies are moving towards implementing this architecture. Presently the AMBA ML- Self Motivated AHB is using the 1024 bit transfer per clock period but this rate can further increased with the improving technology.
The configurations of the SM arbitration scheme with the maximum performance need to be found automatically during run time. In future The AMBA ML- Self Motivated AHB may applicable to AMBA AXI.
CONCLUSION
In this paper, the proposed a flexible arbiter based on the SM arbitration scheme for the ML-AHB bus matrix. This arbiter supports three priority policies-fixed priority, round-robin, and dynamic priority-and three approaches to data multiplexing- transfer, transaction, and desired
transfer length; in other words, there are nine possible arbitration schemes. In addition, the proposed SM arbiter selects one of the nine possible arbitration schemes based on the prioritylevel notifications and the desired transfer length from the masters to allow the arbitration to lead to the maximum performance. This design proposed ML- AHB SM arbitration schemes increases area than the other arbitration schemes in ML-AHB, but ML-AHB SM arbitration scheme gives the better performance when it selects the input stage and output stage in self motivated manner. Therefore expect that it would be better to apply our SM arbitration scheme to an application- specific system because it is easy to tune the arbitration scheme according to the features of the target system. REFERENCES [1] M. Drinic, D. Kirovski, S. Megerian, and M. Potkonjak, Latencyguided on-chip busnetwork design, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 25, no. 12, pp. 26632673, Dec. 2006. [2] S. Y. Hwang, K. S. Jhang, H. J. Park, Y. H. Bae, and H. J. Cho, An ameliorated design method of ML-AHB busmatrix, ETRI J., vol. 28, no. 3, pp. 397400, Jun. 2006. [3] ARM, AHB Example AMBA System, 2001 [Online]. Available: http://www.arm.com/products/solutions/AMBA_Spec.html [4] IBM, New York, 32-bit Processor Local Bus Architecture Specification, 2001. [5] R. Usselmann, WISHBONE interconnect matrix IP core, Open- Cores, 2002. [Online]. Available: http://www.opencores.org/ ?do=project=wb_conmax [6] N.-J. Kim and H.-J. Lee, Design of AMBA wrappers for multipleclock operations, in Proc. Int. Conf. ICCCAS, Jun. 2004, vol. 2, pp. 14381442. [7] D. Flynn, AMBA: Enabling reusable on-chip designs, IEEE Micro, vol. 17, no. 4, pp. 20 27, Jul./Aug. 1997. 1http://www.arm.com/products/solutions/axi_spec.html, accessed Feb. 2008 [8] S. Y. Hwang, H.-J. Park, and K.-S. Jhang, Performance analysis of slave-side arbitration schemes for the multi-layer AHB busmatrix, J. KISS, Comput. Syst. Theory, vol. 34, no. 5, pp. 257266, Jun. 2007.
[9] S. S. Kallakuri and A. Doboli, Customization of arbitration policies and buffer space distribution using continuous-time Markov decision processes, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 2, pp. 240245, Feb. 2007. [10] D. Seo and M. Thottethodi, Table-lookup based crossbar arbitration for minimal-routed, 2D mesh and torus networks, in Proc. Int. Conf. IPDPS, Mar. 2007, pp. 110. [11] K. Lahiri, A. Raghunathan, and S. Dey, Performance analysis of systems with multichannel communication architectures, in Proc. Int. Conf. VLSI Design, Jan. 2000, pp. 530537. [12] J. Turner and N. Yamanaka, Architectural choices in large scale ATM switches, IEICE Trans. Commun., vol. E-81B, no. 2, pp. 120137, Feb. 1998. [13] C. H. Pyoun, C. H. Lin, H. S. Kim, and J. W. Chong, The efficient bus arbitration scheme in SoC environment, in Proc. Int. Conf. SoC Real-Time Appl., Jul. 2003, pp. 311315. [14] K. Lahiri, A. Raghunathan, and G. Lakshminarayana, The LOTTERYBUS on-chip communication architecture, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 6, pp. 596608, Jun. 2006. [15] J. H. Han, M. Y. Lee, B. Younghwan, and C. Hanjin, Application specific processor design for H.264 decoder with a configurable embedded processor, ETRI J., vol. 27, no. 5, pp. 491 496, Oct. 2005. [16] M. Jun, K. Bang, H.-J. Lee, N. Chang, and E.-Y. Chung, Slack-based bus arbitration scheme for soft real-time constrained embedded systems, in Proc. Int. Conf. ASPDAC, Jan. 2007, pp. 159164. [17] S. Y. Hwang, H. J. Park, and K. S. Jhang , An Efficient Implementation Method of Arbiter for the ML-AHB Busmatrix. Berlin, Germany: Springer-Verlag, May 2007, vol. 4523, LNCS, pp. 229240. [18] E.-G. Jeong, J.-G. Lee, K.-S. Jhang, J.-A. Lee, and D. Har, Asynchronous layered interface of multimedia socs for multiple outstanding transactions, J. VLSI Signal Process. Syst., vol. 46, no. 2/3, pp. 133151, Mar. 2007.
[19] S. Y. Hwang, H. J. Park, and K. S. Jhang, An implementation and performance analysis of slave-side arbitration schemes for the ML-AHB busmatrix, in Proc. Int. Conf. ACM Symp. Appl. Comput., Mar. 2007, vol. 2, pp. 15451551.

Multi Layer AHB Protocol

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multi Layer AHB Protocol

Uploaded by

Copyright:

Available Formats

Abstract

Figure 1.1 ARM soc Block Diagram

1.5 Multi-layer AHB

Figure 1.3 Basic multi-layer concept

CHAPTER 2 Literature survey

Figure 2.1 AMBA AHB block diagram

Indicates the size of the transfer

additional information on the status of a transfer

Figure 2.2 Decoder and Slave select signals

Figure 2.10 multiple slaves on one slave port

Figure 2.13 Multi-port slaves

Fig. 3.6. Internal procedure of the P block. FUTURE SCOPE

You might also like