Professional Documents
Culture Documents
D
yayavaram@yahoo.com
applications. The ARM is supported by a toolkit which includes an instruction set emulator for
hardware modelling and software testing and benchmarking, an assembler, C and C++ compilers,
a linker and a symbolic debugger.
The 16-bit CISC microprocessors that were available in 1983 were slower than standard memory
parts. They also had instructions that took many clock cycles to complete (in some cases, many
hundreds of clock cycles), giving them very long interrupt latencies.As a result of these
frustrations with the commercial microprocessor offerings, the design of a proprietary
microprocessor was considered and ARM chip was designed.
ARM 7TDMI-S Processor : The ARM7TDMI-S processor is a member of the ARM
family of general-purpose 32-bit microprocessors. The ARM family offers high performance for
very low-power consumption and gate count. The ARM7TDMI-S processor has a Von Neumann
architecture, with a single 32-bit data bus carrying both instructions and data. Only load, store,
and swap instructions can access data from memory. The ARM7TDMI-S processor uses a three
stage pipeline to increase the speed of the flow of instructions to the processor. This enables
several operations to take place simultaneously, and the processing, and memory systems to
operate continuously. In the three-stage pipeline the instructions are executed in three stages.
The three stage pipelined architecture of the ARM7 processor is shown in the above figure.
1
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
resources are limited. The Jazelle mode is used in ARM9 processor to work with 8-bit Java
code.
ARCHITECTURE OF ARM PROCESSORS:
The ARM 7 processor is based on Von Neman model with a single bus for both data and
instructions..( The ARM9 uses Harvard model).Though this will decrease the performance of
ARM, it is overcome by the pipe line concept. ARM uses the Advanced Microcontroller Bus
Architecture (AMBA) bus architecture. This AMBA include two system buses: the AMBA
High-Speed Bus (AHB) or the Advanced System Bus (ASB), and the Advanced Peripheral Bus
(APB).
The ARM processor consists of
Arithmetic Logic Unit (32-bit)
One Booth multiplier(32-bit)
One Barrel shifter
One Control unit
Register file of 37 registers each of 32 bits.
In addition to this the ARM also consists of a Program status register of 32 bits, Some
special registers like the instruction register, memory data read and write register and
memory address register ,one Priority encoder which is used in the multiple load and
store instruction to indicate which register in the register file to be loaded or stored and
Multiplexers etc.
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
ARM Registers : ARM has a total of 37 registers .In which - 31 are general-purpose registers
of 32-bits, and six status registers .But all these registers are not seen at once. The processor
state and operating mode decide which registers are available to the programmer. At any time,
among the 31 general purpose registers only 16 registers are available to the user. The remaining
15 registers are used to speed up exception processing. there are two program status registers:
CPSR and SPSR (the current and saved program status registers, respectively
In ARM state the registers r0 to r13 are orthogonalany instruction that you can apply to r0 you
can equally well apply to any of the other registers.
The main bank of 16 registers is used by all unprivileged code. These are the User mode
registers. User mode is different from all other modes as it is unprivileged. In addition to this
register bank ,there is also one 32-bit Current Program status Register(CPSR)
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
In the 15 registers ,the r13 acts as a stack pointer register and r14 acts as a link register and r15
acts as a program counter register.
Register r13 is the sp register ,and it is used to store the address of the stack top. R13 is used by
the PUSH and POP
ARMv6.
Register 14 is the Link Register (LR). This register holds the address of the next instruction after
a Branch and Link (BL or BLX) instruction, which is the instruction used to make a subroutine
call. It is also used for return address information on entry to exception modes. At all other times,
R14 can be used as a general-purpose register.
Register 15 is the Program Counter (PC). It can be used in most instructions as a pointer to the
instruction which is two instructions after the instruction being executed.
The remaining 13 registers have no special hardware purpose.
CPSR : The ARM core uses the CPSR register to monitor and control internal operations. The
CPSR is a dedicated 32-bit register and resides in the register file. The CPSR is divided into
four fields, each of 8 bits wide : flags, status, extension, and control. The extension and status
fields are reserved for future use. The control field
contains
interrupt mask bits. The flags field contains the condition flags. The 32-bit CPSR register is
shown below.
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
Processor Modes: There are seven processor modes .Six privileged modes abort, fast interrupt
request, interrupt request, supervisor, system, and undefined and one non-privileged mode
called user mode.
The processor enters abort mode when there is a failed attempt to access memory. Fast interrupt
request and interrupt request modes correspond to the two interrupt levels available on the ARM
processor. Supervisor mode is the mode that the processor is in after reset and is generally the
mode that an operating system kernel operates in. System mode is a special version of user mode
that allows full read-write access to the CPSR. Undefined mode is used when the processor
encounters an instruction that is undefined or not supported by the implementation. User mode is
used for programs and applications.
Banked Registers : Out of the 32 registers , 20 registers are hidden from a program at different
times. These registers are called banked registers and are identified by the shading in the
diagram. They are available only when the processor is in a particular mode; for example, abort
mode has banked registers
registers of a particular
mode are denoted by an underline character post-fixed to the mode mnemonic or _mode.
When the T bit is 1, then the processor is in Thumb state. To change states the core executes a
specialized branch instruction and when T= 0 the processor is in ARM state and executes ARM
instructions. There are two interrupt request levels available on the ARM processor core
interrupt request (IRQ) and fast interrupt request (FIQ).
V, C , Z , N are the Condition flags .
V (oVerflow) : Set if the result causes a signed overflow
C (Carry)
Z (Zero)
: This bit is set when the result after an arithmetic operation is zero, frequently
used to indicate equality
N (Negative) : This bit is set when the bit 31 of the result is a binary 1.
PIPE LINE : Pipeline is the mechanism used by the RISC processor to execute instructions at
an increased speed. This pipeline speeds up execution by fetching the next instruction while
other instructions are being decoded and executed. During the execution of an instruction ,the
processor Fetches the instruction .It means loads an instruction from memory.And decodes the
6
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
instruction i.e identifies the instruction to be executed and finally Executes the instruction and
writes the result back to a register.
The ARM7 processor has a three stage pipelining architecture namely Fetch , Decode and
Execute.And the ARM 9 has five stage Pipe line architecture.The three stage pipelining is
explained as below.
To explain the pipelining ,let us consider that there are three instructions Compare, Subtract and
Add.The ARM7 processor fetches the first instruction CMP in the first cycle and during the
second cycle it decodes the CMP instruction and at the same time it will fetch the SUB
instruction. During the third cycle it executes the CMP instruction , while decoding the SUB
instruction and also at the same time will fetch the third instruction ADD. This will improve the
speed of operation. This leads to the concept of parallel processing .This pipeline
shown in the following diagram.
example is
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
As the pipeline length increases, the amount of work done at each stage is reduced, which allows
the processor to attain a higher operating frequency. This in turn increases the performance. One
important feature of this pipeline is the execution of a branch instruction or branching by the
direct modification of the PC causes the ARM core to flush its pipeline.
Exceptions, Interrupts, and the Vector Table :
Exceptions are generated by internal and external sources to cause the ARM processor to handle
an event, such as an externally generated interrupt or an attempt to execute an Undefined
instruction. The processor state just before handling the exception is normally preserved so that
the original program can be resumed after the completion of the exception routine. More than
one exception can arise at the same time.ARM exceptions may be considered in three groups
1. Exceptions generated as the direct effect of executing an instruction.Software interrupts,
undefined instructions (including coprocessor instructions where the requested coprocessor is
absent) and prefetch aborts (instructions that are invalid due to a memory fault occurring during
fetch) come under this group.
2. Exceptions generated as a side-effect of an instruction.Data aborts (a memory fault during a
load or store data access) are in this group.
3. Exceptions generated externally, unrelated to the instruction flow.Reset, IRQ and FIQ are in
this group.
The ARM architecture supports seven types of exceptions.
i.Reset
ii.Undefined Instruction
iii.Software Interrupt(SWI)
iv. Pre-fetch abort(Instruction Fetch memory fault)
v.Data abort (Data access memory fault)
vi. IRQ(normal Interrupt)
vii. FIQ (Fast Interrupt request).
When an Exception occurs , the processor performs the following sequence of actions:
It changes to the operating mode corresponding to the particular exception.
It saves the address of the instruction following the exception entry instruction in r14 of the
new mode.
8
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
It saves the old value of the CPSR in the SPSR of the new mode.
It disables IRQs by setting bit 7 of the CPSR and, if the exception is a fast interrupt, disables
further fast interrupts by setting bit 6 of the CPSR.
It forces the PC to begin executing at the relevant vector address
Excdption / Interrupt
Reset
Undefined Instruction
Software Interrupt
Pre-fetch Abort
Data Abort
Reserved
Interrupt Request
Fast Interrupt Request
Name
RESET
UNDEF
SWI
Address
0X00000000
0X00000004
0X00000008
High Address
0Xffff0000
0Xffff0004
0Xffff0008
PABT
DABT
--IRQ
0X0000000C
0X00000010
0X00000014
0X00000018
0Xffff000c
0Xffff0010
0Xffff0014
0Xffff0018
FIQ
0X0000001C
0Xffff001c
The exception Vector table shown above gives the address of the subroutine program to be
executed when the exception or interrupt occurs. Each vector table entry contains a form of
branch instruction pointing to the start of a specific routine.
Reset vector is the location of the first instruction executed by the processor when power is
applied. This instruction branches to the initialization code.
Undefined instruction vector is used when the processor cannot decode an instruction.
Software interrupt vector is called when you execute a SWI instruction. The SWI instruction is
frequently used as the mechanism to invoke an operating system routine.
Pre-fetch abort vector occurs when the processor attempts to fetch an instruction from an address
without the correct access permissions. The actual abort occurs in the decode stage.
Data abort vector is similar to a prefetch abort but is raised when an instruction attempts to
access data memory without the correct access permissions.
Interrupt request vector is used by external hardware to interrupt the normal execution flow of
the processor. It can only be raised if IRQs are not masked in the CPSR.
ARM Processor Families
There are various ARM processors available in the market for different application .These are
grouped into different families based on the core .These families are based on the ARM7,
ARM9, ARM10, and ARM11 cores. The
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
ARM 8
indicates
following table gives a brief comparison of their performance and available resources.
The ARM7 core has a Von Neumannstyle architecture, where both data and instructions use the
same bus. The core has a three-stage pipeline and executes the architecture ARMv4T instruction
set. The ARM7TDMI was introduced in 1995 by ARM. It is currently a very popular core and is
used in many 32-bit embedded processors.
The ARM9 family was released in 1997. It has five stage pipeline architecture .Hence , the
ARM9 processor can run at higher clock frequencies than the ARM7 family. The extra stages
improve the overall performance of the processor. The memory system has been redesigned to
follow the Harvard architecture, with separate data and instruction .buses. The first processor in
the ARM9 family was the ARM920T, which includes a separate D + I cache and an MMU. This
processor can be used by operating systems requiring virtual memory support. ARM922T is a
variation on the ARM920T but with half the D +I cache size.
The latest core in the ARM9 product line is the ARM926EJ-S synthesizable processor core,
announced in 2000. It is designed for use in small portable Java-enabled devices such as 3G
phones and personal digital assistants (PDAs).
The ARM10 was released in 1999 . It extends the ARM9 pipeline to six stages. It also supports
an optional vector floating-point (VFP) unit, which adds a seventh stage to the ARM10 pipeline.
The VFP significantly increases floating-point performance and is compliant with the IEEE
754.1985 floating-point standard.
The ARM1136J-S is the ARM11 processor released in the year 2003 and it is designed for high
performance and power efficient applications.
implementation to execute architecture
10
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
ARM
Year of Architecture
Pipelin
Operationa
Multiplie
MIPS
Family
Releas
ARM7
ARM9
ARM1
e
1995
1997
1999
Von Neumann
Harvard
Harvard
3 stage
5 stage
6 stage
Frequency
80 M.Hz
150M.Hz
260M.Hz
8x32
8x32
16x32
0.97
1.1
1.3
0
ARM1
2003
Harvard
8 stage
335M.Hz
16x32
1.2
11
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
i.Move Instructions : Move instruction copies R into a destination register Rd, where R is a
register or immediate value. This instruction is useful for setting initial values and transferring
data between registers.
Example1 :
PRE
r5 = 5
r7 = 8
MOV r7, r5 ;
POST
r5 = 5
r7 = 5
The MOV instruction takes the contents of register r5 and copies them into register r7.
Example 2:
SUBS r1, r1, #1 ; The SUBS instruction is useful for decrementing loop counters. In this
example we subtract the immediate value one from the value one stored in
register r1. The result value zero is written to register r1.
12
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
Logical Instructions : These Logical instructions perform bitwise logical operations on the two
source registers.
BIC r0, r1, r2 ; BIC, carries out a logical bit clear. register r2 contains a binary pattern where
every binary 1 in r2 clears a corresponding bit location in register r1. This instruction is
particularly useful when clearing status bits and is frequently used to change interrupt masks in
the cpsr.
Comparison Instructions : The comparison instructions are used to compare or test a
register with a 32-bit value. This instruction affects only CPSR register flags.
Branch Instructions: A branch instruction changes the normal flow of execution of a main
program or is used to call a subroutine routine. This type of instruction allows programs to have
subroutines, if-then-else structures, and loops. The change of execution flow forces the program
counter pc to point to a new address.
Example 1:
forward
;
;
;
;
;
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
Similarly Backward branch :
backward :
ADD
r1, r2, #4
SUB
r1, r2, #4
ADD
r4, r6, r7
backward.
The branch with link, or BL, instruction is similar to the B instruction but overwrites
the
link register lr with a return address. It performs a subroutine call .
BL subroutine
branch to subroutine
CMP r1, #5
compare r1 with 5
MOVEQ r1, #0 ;
if (r1==5) then r1 = 0 :
Subroutine
MOV pc, lr ;
return by moving pc = lr
The Branch Exchange (BX) and Branch Exchange with Link (BLX) are the third type
of branch instruction. The BX instruction uses an absolute address stored in register
Rm. It is primarily used to branch to and from Thumb code. The T bit in the cpsr is
updated by the least significant bit of the branch register. Similarly the BLX
instruction updates the T bit of the cpsr with the least significant bit and additionally
sets the link register with the return address.
The details of the branch instructions are given in the table above.
14
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
store
the
contents
Ex1:
of
r0
to
the
Ex2 : LDR r0, [r1] ; = LDR r0, [r1, #0] ; load register r0 with the contents of the
memory address pointed to by register r1.
15
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
operations. The pop operation (removing data from a stack) uses a load multiple
instruction; similarly, the push operation (placing data onto the stack) uses a store multiple
instruction.
A stack is either ascending (A) or descending (D). Ascending stacks grow towards higher
memory addresses; in contrast, descending stacks which grow towards lower memory addresses.
When a full stack (F)is used , the stack pointer sp points to an address that is the last used or full
location (i.e., sp points to the last item on the stack). In contrast, if an empty stack (E) is used ,
the sp points to an address that is the first unused or empty location (i.e., it points after the last
item on the stack).
Example1 : The STMFD instruction pushes registers onto the stack, updating the sp.
STMFD sp! , {r1,r4}; Store Multiple Full Descending Stack
PRE
r1 = 0x00000002
r4 = 0x00000003
sp = 0x00080014
POST
r1 = 0x00000002
r4 = 0x00000003
sp = 0x0008000c.
16
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
Example2: The STMED instruction pushes the registers onto the stack but updates register sp to
point to the next empty location as shown in the below diagram..
PRE
r1 = 0x00000002
r4 = 0x00000003
sp = 0x00080010
STMED sp! , {r1,r4} ; Store Multiple Empty Descending Stack
POST r1 = 0x00000002
r4 = 0x00000003
sp = 0x00080008
Swap Instruction :
The Swap instruction is a special case of a load-store instruction. It swaps (Similar to exchange)
the contents of memory with the contents of a register. This instruction is an atomic operationit
reads and writes a location in the same bus operation, preventing any other instruction from
reading or writing to that location until it completes.Swap cannot be interrupted by any other
instruction or any other bus access. So, the system holds the bus until the transaction is
complete.
Ex 1:
17
SWP
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
Ex2 :
SWPB
Ex 3: SWP r0, r1, [r2] ; The swap instruction loads a word from memory into register
r0 and overwrites the memory with register r1.
Software Interrupt Instruction : A software interrupt instruction (SWI) is used to generate a
software interrupt exception, which can be used to
processor executes an SWI instruction, it sets the program counter pc to the offset 0x8 in the
vector table. The instruction also forces the processor mode to SVC, which allows an operating
system routine to be called in a privileged mode. Each SWI instruction has an associated SWI
number, which is used to represent a particular function call or feature.
Ex:
0x00008000
SWI
0x123456
18
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
The MSR first copies the CPSR into register r1. The BIC instruction clears bit 7 of r1. Register
r1 is then copied back into the CPSR, which enables IRQ interrupts. Here the code preserves all
the other settings in the CPSR intact and only modifies the I bit in the control field.
Loading Constants : In ARM instruction set there are no instructions to move the 32-bit
constant into a register. Since ARM instructions are 32 bits in size, they obviously cannot
specify a general 32-bit constant. To overcome this problem .two
DCD 0xff00ffff.
Here the LDR instruction loads a 32-bit constant 0xff00ffff into register r0.
Example 3: The same constant can be loaded into the register r0 using the MVN instruction also.
MVN r0,
After execution
#0x00ff0000
r0 = 0xff00ffff.
Introduction to Thumb instruction set : Thumb encodes a subset of the 32-bit ARM
instructions into a 16-bit instruction set space. Since Thumb has higher performance than ARM
on a processor with a 16-bit data bus, but lower performance than ARM on a 32-bit data bus, use
Thumb for memory-constrained systems. Thumb has higher code densitythe space taken up in
memory by an executable programthan ARM. For memory-constrained embedded systems, for
example, mobile phones and PDAs, code density is very important. Cost pressures also limit
memory size, width, and speed.
Thumb execution is flagged by the T bit (bit [5] ) in the CPSR. A Thumb implementation of the
same code takes up around 30% less memory than the equivalent ARM implementation. Even
though the Thumb implementation uses more instructions ; the overall memory footprint is
19
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
reduced. Code density was the main driving force for the Thumb instruction set. Because it was
also designed as a compiler target, rather than for hand-written assembly code. Below example
explains the difference between ARM and Thumb code
From the above example it is clear that the Thumb code is more denser than the ARM code.
Exceptions generated during Thumb execution switch to ARM execution before executing the
exception handler . The state of the T bit is preserved in the SPSR, and the LR of the exception
mode is set so that the normal return instruction performs correctly, regardless of whether the
exception occurred during ARM or Thumb execution.
In Thumb state, all the registers can not be accessed . Only the low registers r0 to r7 can be
accessed. The higher registers r8 to r12 are only accessible with MOV, ADD, or CMP
instructions. CMP and all the data processing instructions that operate on low registers update the
condition flags in the CPSR
The list of registers and their accessibility in Thumb mode are shown in the following table..
S.No
1
2
3
4
5
6
7
20
Registers
r0 r7
r8 r12
r13SP
r14 lr
r15 PC
CPSR
SPSR
Access
Fully accessible
Only accessible by MOV ,ADD &CMP
Limited accessibility
Limited accessibility
Limited accessibility
Only indirect access
No access
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
Form the above discussion, it is clear that there are no MSR and MRS equivalent Thumb
instructions. To alter the CPSR or SPSR , one must switch into ARM state to use MSR and
MRS. Similarly, there are no coprocessor instructions in Thumb state. You need to be in ARM
state to access the coprocessor for configuring cache and memory management.
ARM-Thumb interworking is the method of linking ARM and Thumb code together for both
assembly and C/C++. It handles the transition between the two states. To call a Thumb routine
from an ARM routine, the core has to change state. This is done with the T bit of CPSR . The
BX and BLX branch instructions cause a switch between ARM and Thumb state while branching
to a routine. The BX lr instruction returns from a routine, also with a state switch if necessary.
The data processing instructions manipulate data within registers. They include move
instructions, arithmetic instructions, shifts, logical instructions, comparison instructions, and
multiply instructions. The Thumb data processing instructions are a subset of the ARM data
processing instructions.
Exs :
21
Dr.Y.Narasimha MurthyPh.D
yayavaram@yahoo.com
Rd = Rn
Rd = Rm
MUL : multiply two 32-bit values Rd = (Rm * Rd)[31:0]
MVN : move the logical NOT of a 32-bit value into a register Rd = NOT(Rm)
NEG : negate a 32-bit value Rd = 0 Rm
ORR : logical bitwise OR of two 32-bit values Rd = Rd OR Rm
ROR : rotate right a 32-bit value Rd = Rd RIGHT_ROTATE Rs,
C flag= Rd[Rs1]
SBC : subtract with carry a 32-bit value Rd = Rd Rm NOT(C flag)
SUB : subtract two 32-bit values Rd = Rn immediate
Rd = Rd immediate
Rd = Rn Rm
sp = sp (immediate_2)
TST : test bits of a 32-bit value Rn AND Rm sets flags
Note : Thumb deviates from the ARM style in that the barrel shift operations (ASR, LSL, LSR,
and ROR) are separate instructions.
Thankful to the following people for their invaluable information .
References : 1. ARM System Developers Guide Designing and Optimizing System Software
ARM Limited.
22