PROGRAMMABLE DSPs

9/21/2013
CHAPTER 3
PROGRAMMABLE DSPS
SYLLABUS
Introduction, Commercial Digital Signal-processing

De ices
Devices
Data Addressing Modes of TMS32OC54xx.
Memory Space of TMS32OC54xx Processors
Program Control.
9/21/2013
TMS320C54XX
Summary of the Architectural Features of three fixed-Points DSPsTexas Instruments, Motorola & Analog Devices
9/21/2013
Summary of the Architectural Features of three fixed-Points DSPsTexas Instruments, Motorola & Analog Devices
ARCHITECTURE OF
TMS320C54XX
9/21/2013
ARCHITECTURE OF TMS320C54XX
TMS320C54xx processors retains the basic Harvard

architecture of their predecessor, TMS320C25, but have
several additional features, which improve their
performance over it.
They have one program and three data memory spaces
with separate buses, which provide simultaneous accesses
to program instruction and two data operands and enables
writing of result at the same time.
Part of the memory is implemented on-chip and consists of
combinations of ROM, dual-access RAM, and single-access
RAM.
Transfers between the memory spaces are also possible.
The central processing unit (CPU) of TMS320C54xx

processors consists of a 40-bit
40 bit arithmetic logic unit
(ALU), two 40-bit accumulators, a barrel shifter, a
17x17 multiplier, a 40-bit adder, data address
generation logic (DAGEN) with its own arithmetic unit,
and program address generation logic (PAGEN).
These major functional units are supported by a number
of registers and logic in the architecture.
Several peripherals, such as a clock generator, a
hardware timer, a wait state generator, parallel I/O
ports, and serial I/O ports, are also provided on-chip.
These peripherals make it convenient to interface the
signal processors to the outside world.
9/21/2013
A powerful instruction set with a hardware-supported,

single instr ction repeat and block repeat operations
single-instruction
operations,
block memory move instructions, instructions that pack
two or three simultaneous reads, and arithmetic
instructions with parallel store and load make these
devices very efficient for running high-speed DSP
algorithms.
algorithms
ARCHITECTURE OF TMS320C54XXBUS STRUCTURE
Eight major 16-bit buses (four program/data buses and four address buses):
The p
program
g
bus (PB)
( ) carries the instruction code and immediate operands
p
from program memory.
Three data buses (CB, DB, and EB) interconnect to various elements, such as
the CPU, data address generation logic, program address generation logic,
on-chip peripherals, and data memory.
The CB and DB carry the operands that are read from data memory.
The EB carries the data to be written to memory.
F
Four
address
dd
buses
b
(PAB,
(PAB CAB,
CAB DAB
DAB, and
d EAB) carry the
h addresses
dd
needed
d d for
f
instruction execution (corresponding to PB, CB,DB & EB respectively).
The C54x DSP can generate up to two data-memory addresses per cycle
using the two auxiliary register arithmetic units (ARAU0 and ARAU1) in
DAGEN block.
This enables accessing two operands simultaneously.
9/21/2013
ARCHITECTURE OF TMS320C54XXBUS STRUCTURE
The PB can carry data operands stored in program space (for instance,
a coefficient table) to the multiplier and adder for multiply/accumulate
operations or to a destination in data space for data move instructions
(MVPD and READA). This capability, in conjunction with the feature of
dual-operand read, supports the execution of single-cycle, 3-operand
instructions such as the FIRS instruction.
The C54x DSP also has an on-chip
p bidirectional bus for accessing
g onchip peripherals. This bus is connected to DB and EB through the bus
exchanger in the CPU interface. Accesses that use this bus can require
two or more cycles for reads and writes, depending on the peripherals
structure.
ARCHITECTURE OF TMS320C54XX-CPU
The C54x CPU contains:

40-bit
40 bi
arithmetic
i h i llogic
i uniti (ALU)
Two 40-bit accumulators
Barrel shifter
17 17-bit multiplier
40-bit adder
Compare, select, and store unit (CSSU)
Data address generation unit (DAGEN)
Program address generation unit (PAGEN)
Exponent encoder (EXP)
9/21/2013
CPU
The C54x DSP performs 2s-complement arithmetic with a 40-bit

arithmetic
ith ti llogic
i unitit (ALU) and
d ttwo 40
40-bit
bit accumulators
l t
(accumulators A and B).
The ALU can also perform bit level Boolean operations. The
ALU uses these inputs:
ALU
16-bit immediate value

16-bit word from data memory
16-bit
16
bit value in the temporary register, T
Two 16-bit words from data memory
32-bit word from data memory
40-bit word from either accumulator
The ALU can also function as two 16-bit ALUs and perform two
16-bit operations simultaneously.
CPU
ALU
9/21/2013
CPU
The 40-bit ALU, implements a wide range of arithmetic

and logical functions, most of which execute in a single
clock cycle.
After an operation is performed in the ALU, the result is
usually transferred to a destination accumulator
(accumulator A or B).
Instructions that perform memory-to memory operations
(ADDM, ANDM, ORM, and XORM) are exceptions.
CPU
ALU
ALU
The Y input source to the ALU is any of three values:

The value in one of the accumulators (A or B)
A data-memory operand from data bus CB
The value in the T register
When a 16-bit data-memory operand is fed through data

bus CB or DB, the 40-bit ALU input is constructed in one of
two ways:
If b
bits 15 through
h
h 0 contain the
h d
data-memory operand,
d bits
b 39
through 16 are zero filled (SXM = 0) or sign-extended (SXM = 1).
If bits 31 through 16 contain the data-memory operand, bits 15
through 0 are zero filled, and bits 39 through 32 are either zero
filled (SXM = 0) or sign extended (SXM = 1).
9/21/2013
ALU
OVERFLOW HANDLING
The ALU saturation logic prevents a result from overflowing

by keeping the result at a maximum (or minimum) value.
value This
feature is useful for filter calculations.
The logic is enabled when the overflow mode bit (OVM) in
status register ST1 is set.
When a result overflows:
If OVM = 0,
0 the
h accumulators
l
are loaded
l d d with
i h the
h ALU resultl
without modification.
If OVM = 1, the accumulators are loaded with either the most
positive 32-bit value (00 7FFF FFFFh) or the most negative 32-bit
value (FF 8000 0000h), depending on the direction of the overflow.
ALU
OVERFLOW HANDLING
The
overflow flag (OVA/OVB) in status register ST0 is

set for the destination accumulator and remains set until
one of the following occurs:
A
reset is performed.
A conditional instruction (such as a branch, a return, a call, or
an execute) is executed on an overflow condition.
The overflow flag (OVA/OVB) is cleared.
cleared
9/21/2013
ALU
The ALU has an associated carry bit (C) that is affected by

most arithmetic ALU instructions, including
g rotate and shift
operations.
The carry bit supports efficient computation of extendedprecision arithmetic operations.
The carry bit is not affected by loading the accumulator,
performing logical operations, or executing other nonarithmetic or control instructions, so it can be used for overflow
management.
management
Two conditional operands, C and NC, enable branching,
calling, returning, and conditionally executing according to the
status (set or cleared) of the carry bit.
Also, the RSBX and SSBX instructions can be used to load the
carry bit. The carry bit is set on a hardware reset.
ALU
CARRY BIT
DUAL 16-BIT MODE
For arithmetic operations, the ALU can operate in a special

d al 16
dual
16-bit
bit arithmetic mode that performs two 16-bit
16 bit
operations (for instance, two additions or two subtractions)
in one cycle.
This mode can be selected by setting the C16 field of ST1.
This mode is especially useful for the Viterbi
add/compare/select
dd/
/ l t operation
ti
10
9/21/2013
CPU
ACCUMULATOR
The C54x devices have two 40-bit accumulators: accumulator A and

accumulator
l t BB.
Each accumulator is memory-mapped that can be pushed onto and
popped from the stack for context saves and restores by using PSHM
and POPM instructions and is partitioned into accumulator low word
(AL, BL), accumulator high word (AH, BH), and accumulator guard bits
(AG, BG) which can be stored and retrieved individually.
These registers can also be used by other instructions that use memorymapped registers (MMR) for page 0 addressing.
The only difference between accumulators A and B is that bits 3216
of A can be used as an input to the multiplier in the multiplier/adder
unit.
CPU
ACCUMULATOR
It stores the output from the ALU or the multiplier/adder block

and provide a second input to the ALU.
It can be configured as the destination registers. The guard bits
are used as a head margin for computations.
11
9/21/2013
CPU
TEMPORARY REGISTER (T)
The temporary (T) register has many uses. For example, it may
hold:
One of the multiplicands for multiply and multiply/ accumulate

instructions
A dynamic (execution-time programmable) shift count for instructions with
shift operation such as the ADD, LD, and SUB instructions
A dynamic bit address for the BITT instruction
Branch metrics used byy the DADST and DSADT instructions for ACS
operation of Viterbi decoding
In addition, the EXP instruction stores the exponent value computed into T
register, and then the NORM instruction uses the T register value to
normalize the number.
CPU
TRANSITION REGISTER (TRN)
The 16-bit transition (TRN) register holds the transition

decision for the path to new metrics to perform the
Viterbi algorithm. The CMPS (compare select max
and store) instruction updates the contents of TRN
register on the basis of the comparison between the
accumulator high word and the accumulator low word.
12
9/21/2013
CPU AUXILIARY REGISTERS (AR0AR7)

The eight 16-bit auxiliary registers (AR0AR7)
can b
be accessed
d by
b the
h CPU and
d modified
difi d b
by
the auxiliary register arithmetic units (ARAUs).
The primary function of the auxiliary registers
is to generate 16-bit addresses for data
p
space.
However, these registers can also act as
general-purpose registers or counters.
CPU-BLOCK-REPEAT REGISTERS (BRC, RSA, REA)

The 16-bit block-repeat counter (BRC) register
specifies
ifi the
h number
b off times
i
a block
bl k off code
d
is to repeat when a block repeat is performed.
The 16-bit block repeat start address (RSA)
register contains the starting address of the
block of p
program
g
memoryy to be repeated.
p
The 16-bit block-repeat end address (REA)
register contains the ending address of the
block of program memory to be repeated.
13
9/21/2013
CPU
Stack-Pointer Register (SP)
The 16
Th
16-bit
bit stack-pointer
t k i t register
i t (SP) contains
t i th
the address
dd
of the top of the system stack. The SP always points to the
last element pushed onto the stack. The stack is manipulated
by interrupts, traps, calls, returns, and the PSHD, PSHM,
POPD, and POPM instructions. Pushes and pops of the stack
predecrement and postincrement, respectively, the 16-bit
value in the stack pointer.
Circular-Buffer Size Register (BK)
The ARAUs use16-bit circular-buffer size register (BK) in

circular addressing to specify the data block size.
CPU
BARREL SHIFTER
The C54x DSP barrel shifter has a 40-bit input

connected to the accumulators or to data memory
(using CB or DB), and a 40-bit output connected to
the ALU or to data memory (using EB).
The barrel shifter can produce a left shift of 0 to 31
bits and a right shift of 0 to 16 bits on the input data.
The shift requirements are defined in the shift count
field of the instruction, the shift count field (ASM) of
status register ST1, or in temporary register T (when it
is designated as a shift count register).
14
9/21/2013
CPU
The barrel shifter and the exponent

p
encoder
normalize the values in an accumulator in a single
cycle.
The LSBs of the output are filled with 0s, and the
MSBs can be either zero filled or sign extended,
depending on the state of the sign-extension mode
bit (SXM) in ST1.
ST1
Additional shift capabilities enable the processor to
perform numerical scaling, bit extraction, extended
arithmetic, and overflow prevention operations.
CPU
BARREL SHIFTER
BARREL SHIFTER
The barrel shifter is used for scaling operations

such as:
Pre-scaling
an input data-memory operand or the

accumulator value before an ALU operation
Performing a logical or arithmetic shift of the
accumulator value
Normalizing the accumulator
Post-scaling the accumulator before storing the
accumulator value into data memory
15
9/21/2013
CPU
The 40-bit is connected as follows:

The input is connected to:
DB for a 16-bit data input operand
DB and CB for a 32-bit data input operand
Either one of the two 40-bit accumulators
The output is connected to:
One of the ALU inputs
The EB bus through the MSW/LSW write select unit
CPU
BARREL SHIFTER
BARREL SHIFTER
The SXM bit controls signed/unsigned extension of the data

operands;
d when
h th
the bit iis set,t sign
i extension
t i is
i performed.
f
d
Some instructions, such as ADDS, LDU, MAC, and SUBS
operate with unsigned memory operands and do not
perform sign extension, regardless of the SXM value.
The shift count determines how many bits to shift.
P i i shift
Positive
hif values
l
correspond
d to left
l f shifts,
hif whereas
h
negative values correspond to right shifts.
The shift count is specified as a 2s-complement value in
several ways, depending on the instruction type.
16
9/21/2013
CPU
BARREL SHIFTER
An immediate operand, the accumulator shift

mode (ASM) field of ST1
ST1, or T can be used to
define the shift count
A 4 or 5-bit immediate value specified in the
operand of an instruction represents a shift count
value in the 16 to 15 range.
ADD A,-4,B
, , ; Add accumulator A ((right-shifted
g
; 4 bits)) to
accumulator B ; (one word, one cycle).
SFTL A,+8 ; Shift (logical) accumulator A eight bits; left (one
word, one cycle)
CPU
BARREL SHIFTER
The ASM value represents

p
a shift count value in the 16
to 15 range and can be loaded by the LD instruction
(with an immediate operand or with a data-memory
operand). For example:
ADD
A, ASM, B ; Add accumulator A to accumulator B ; with

a shift specified by ASM
The six LSBs of T represent a shift count value in the

16 to 31 range. For example:
NORM
A ; Normalize accumulator A (T contains the

exponent value)
17
9/21/2013
Multiplier/Adder Unit
The multiplier/adder unit performs 17 X 17-bit 2s-complement

multiplication
lti li ti with
ith a 40
40-bit
bit addition
dditi in
i a single
i l instruction
i t ti cycle.
l Th
The
multiplier/adder block consists of several elements: a multiplier, an
adder, signed/unsigned input control logic, fractional control logic, a
zero detector, a rounder (2s complement), overflow/saturation logic,
and a 16-bit temporary storage register (T).
The multiplier has two inputs: one input is selected from T, a datamemory operand,
d or accumulator
l t A;
A the
th other
th is
i selected
l t d from
f
program memory, data memory, accumulator A, or an immediate
value.
18
9/21/2013
Multiplier/Adder Unit
The fast, on-chip multiplier allows the C54x DSP to

perform operations efficiently such as convolution,
convolution
correlation, and filtering.
In addition, the multiplier and ALU together execute
multiply/accumulate (MAC) computations and ALU
operations in parallel in a single instruction cycle.
Thi function
This
f ti is
i used
d in
i determining
d t
i i the
th Euclidian
E lidi di
distance
t
and in implementing symmetrical and LMS filters, which
are required for complex DSP algorithms.
Compare, Select, and Store Unit (CSSU)
The compare, select, and store unit (CSSU) performs

maximum comparisons between the accumulators
accumulator s high
and low word, allows both the test/control flag bit (TC)
in status register ST0 and the transition register (TRN) to
keep their transition histories, and selects the larger
word in the accumulator to store into data memory.
The CSSU also accelerates Viterbi-type butterfly
computations with optimized on-chip hardware.
19
9/21/2013
Internal Memory Organization
The C54x DSP memory is organized into three

individually selectable spaces: program, data, and I/O
space.
The C54x devices can contain random access memory
(RAM) and read-only memory (ROM).
Among the devices, the following types of RAM are
represented: dual-access RAM (DARAM), single-access
RAM (SARAM),
(SARAM) and
d two-way
t
shared
h d RAM.
RAM
The on-chip RAM for these processors is

organized in pages having 128 word locations
on each page.
20
9/21/2013

DARAM or SARAM can be shared within
subsystems
b
off a multiple
l i l CPU core device.
d i
Can
be configured as data memory or
program/data memory.
26 CPU registers
and peripheral registers
mapped into data space.
space
The on-chip ROM is part of the program memory space

and,
d in
i some cases, partt off the
th data
d t memory space.
Contains boot loader that is useful for booting to faster onchip or external RAM.
The DARAM is composed of several blocks.

Because each DARAM block can be accessed twice per
machine cycle, the CPU and peripherals, can read from and
write to a DARAM memory address in the same cycle.
The DARAM is always mapped in data space and is primarily
intended to store data values.
It can also be mapped into program space and used to store
program code.
21
9/21/2013
The SARAM is composed of several blocks.

Each block is accessible once per machine cycle for either a
read or a write.
The SARAM is always mapped in data space and is primarily
intended to store data values.
It can also be mapped into program space and used to store
program code.
d
The devices with multiple CPU cores include

two way shared
h d RAM during
d i eachh clock
l k cycle.
l
Allow
simultaneous program space access from 2 CPU
cores.
Each CPU can perform a single access with zero states
to any location in 2-way shared RAM during each clock
cycle.
cycle
Shared memory is program write protected or read
only by CPU. Only DMA controller can write to shared
memory.
22
9/21/2013
Memory-Mapped Registers
The C54x DSP also has 26 CPU registers plus

peripheral registers that are mapped in data-memory
space.
The data memory space contains memory-mapped
registers for the CPU and the on-chip peripherals.
These registers are located on data page 0,
simplifying
i lif i access to them.
h
The memory-mapped access provides a convenient
way to save and restore the registers for context
switches and to transfer information between the
accumulators and the other registers.
23
9/21/2013
CPU Status and Control Registers
The C54xE DSP has three status and control registers:
Status register 0 (ST0)

Status register 1 (ST1)
Processor mode status register (PMST)
ST0 and ST1 contain the status of various conditions and modes;
PMST contains memory-setup status and control information. It is
used
d to
t configure
fi
the
th processor.
Because these registers are memory-mapped, they can be stored
into and loaded from data memory; the status of the processor
can be saved and restored for subroutines and interrupt service
routines (ISRs).
24
9/21/2013
Status Registers (ST0 and ST1)
The individual bits of the ST0 and ST1 registers can be set or cleared with the SSBX
and RSBX instructions. For example, the sign-extension mode is set with SSBX 1,
SXM, or reset with RSBX 1, SXM.
The ARP, DP, and ASM bit fields can be loaded using the LD instruction with a shortimmediate operand.
The ASM and DP fields can be also loaded with data-memory values by using the
LD instruction.
Status Register 0 (ST0) Diagram
ARP:Auxiliaryregisterpointer.TC:Test/controlflag.
C:Carrybit.OVA:OverflowflagforaccumulatorA.
OVB:OverflowflagforaccumulatorB.DP:Datamemorypagepointer.
Status Register (ST0)
25
9/21/2013
26
9/21/2013
27
9/21/2013
28
9/21/2013
Processor Mode Status Register (PMST)
29
9/21/2013
SMUL- Saturation on multiplication.

p
When SMUL = 1, saturation of a
multiplication result occurs before performing the accumulation in a MAC or
MAS instruction.
The SMUL bit applies only when OVM = 1 and FRCT = 1.
The effect is that the result of 8000h X 8000h is saturated to 7FF FFFFh in
fractional mode, before performing subsequent addition/subtraction
required by a MAC or MAS instruction. In this mode, the MAC instruction is
equivalent
q
to MPY + ADD when OVM=1.
If the mode is not set and OVM = 1, the result of the multiplication is not
saturated before performing the addition/ subtraction, only the results of
the MAC and MAS instructions are saturated.
30
9/21/2013
ADDRESSING MODES: Data Addressing Mode
Provides various ways to access operands to execute instructions and

place results in the memory or the registers.
The C54xE DSP offers seven basic data addressing modes:
Immediate addressing uses the instruction to encode a fixed value.
Absolute addressing uses the instruction to encode a fixed address.
Accumulator addressing uses accumulator A to access a location in program memory as

data.
Direct addressing uses seven bits of the instruction to encode the lower seven bits of an
address The seven bits are used with the data page pointer (DP) or the stack pointer (SP)
address.
to determine the actual memory address.
Indirect addressing uses the auxiliary registers to access memory.
Memory-mapped register addressing uses the memory-mapped registers without

modifying either the current DP value or the current SP value.
Stack addressing manages adding and removing items from the system stack.
ADDRESSING MODES: Data Addressing Mode
During the execution of instructions using direct,

i di
indirect,
or memory-mapped
d register
i
addressing, the data-address generation logic
(DAGEN) computes the addresses of datamemory operands.
31
9/21/2013
ADDRESSING MODES: Program Memory

Addressing Mode
Program memory is usually addressed with the program counter (PC). With some
instructions, however, absolute addressing may be used to access data items that have
been stored in program memory.
The PC, which is used to fetch individual instructions, is loaded by the programaddress generation logic (PAGEN).
The program-address generation logic (PAGEN) generates the address used to access
instructions, coefficient tables, 16-bit immediate operands, or other information stored
in program memory, and puts this address on the PAB.
PAGEN consists
i off fi
five registers:
i
Program counter (PC)
Repeat counter (RC)
Block-repeat counter (BRC)
Block-repeat start address register (RSA)
Block-repeat end address register (REA)
ADDRESSING MODES: Program Memory

Addressing Mode
Typically, the PAGEN increments the PC as sequential

instructions are fetched.
However, the PAGEN may load the PC with a nonsequential value as a result of some instructions or other
operations.
Operations that cause a discontinuity include branches, calls,
returns, conditional operations, single-instruction repeats,
multiple-instruction repeats, reset, and interrupts.
For calls and interrupts, the current PC is saved onto the
stack, which is referenced by the stack pointer (SP). When
the called function or interrupt service routine is finished, the
PC value that was saved is restored from the stack via a
return instruction.
32
9/21/2013
Data Addressing Mode
In immediate addressing, the instruction syntax contains the specific

value of the operand.
Two types of values can be encoded in an instruction:
Immediate Addressing
Short immediate values can be 3, 5, 8, or 9 bits in length.

Long immediate values are always 16 bits in length.
Immediate values can be encoded in 1-word or 2-word instructions.

The 3-, 5-, 8-, or 9-bit values are encoded into 1-word instructions;
16-bit values are encoded into 2-word instructions.
The syntax for immediate addressing uses a number sign (#)
immediately preceding the value or symbol to indicate that it is an
immediate value. For example, to load accumulator A with the
value 80 in hexadecimal, you would write: LD #80h, A
Absolute Addressing
There are four types of absolute addressing:

Data
Data-memory
memory address (dmad) addressing: Data
Data-memory
memory address
(dmad) addressing uses a specific value to specify an address in data
space.
The syntax for dmad addressing uses a symbol or a number to specify
an address in data space. For example, to copy the value contained at
the address labeled SAMPLE in data space to the memory location in
data space pointed to by AR5, you would write:
MVKD SAMPLE, *AR5
In this example, the address referenced by SAMPLE is the dmad value.
33
9/21/2013
Program-memory address (pmad) addressing uses a

specific value to specify an address in program space.
The syntax for pmad addressing uses a symbol or a
number to specify an address in program space.
For example, to copy a word in the program memory
location labeled TABLE to a data-memory location
specified by AR7, you would write:
M
MVPD
TABLE,
A
*A
*AR7
In this example, the address referenced by TABLE is the
pmad value.
Absolute Addressing
Absolute Addressing
Port address (PA) addressing uses a specific value

to specify
specif an external
e ternal I/O port address.
address
The syntax for PA addressing uses a symbol or a
number to specify the port address.
For example, to copy a value from the I/O port at
port address FIFO to a data-memory location
pointed to by AR5, you would write:
PORTR FIFO, *AR5
In the example, FIFO refers to the port address.
34
9/21/2013
Absolute Addressing
*(lk) addressing is used with all instructions that support

g data-memory
y ((Smem)) operand.
p
the use of a single
It uses a specific value to specify an address in data space.
The syntax for *(lk) addressing uses a symbol or a number
to specify an address in data space.
For example, to load accumulator A with the value
contained in address BUFFER in data space, you would
write:
LD *(BUFFER),A
LD *(1000h),A
Absolute Addressing
Absolute addresses are always encoded with a

l
length
h off 16 bi
bits, so instructions
i
i
that
h encode
d
absolute addresses are always at least two
words in length.
35
9/21/2013
Accumulator addressing uses the value in the accumulator as an address.

This addressing
g mode is used to address program
p g
memoryy as data.
Two instructions allow you to use the accumulator as an address:
Accumulator Addressing
READA Smem
WRITA Smem
READA transfers a word from a program-memory location specified by

accumulator A to a data-memory location specified by the single datamemory (Smem) operand of the instruction.
WRITA transfers a word from a data-memory location specified by the
Smem operand of the instruction to a program-memory location specified by
accumulator A.
In repeat mode, an increment may be used to increment accumulator A.
READA *AR2 i.e. *A-*AR2
Direct Addressing
In direct addressing,
g the instruction contains the lower seven bits
of the data memory address (dma).
The 7-bit dma is an address offset that is combined with a base
address, with the data-page pointer (DP), or with the stack
pointer (SP) to form a 16-bit data-memory address.
Using this form
f
off addressing, you can access any off 128
locations in random order without changing the DP or the SP.
Either DP or SP can be combined with the dma offset to generate
the actual address.
36
9/21/2013
The compiler mode bit (CPL), located in status register ST1,

selects which method is used to generate the address:
Direct Addressing
When CPL = 0, the dma field is concatenated with the 9-bit DP field
to form the 16-bit data-memory address.
When CPL = 1, the dma field is added (positive offset) to SP to form
the 16-bit data-memory address.
The syntax for direct addressing uses a symbol or a number to

specify the offset value. For example, to add the contents of
the memory location SAMPLE to accumulator B, provided that
the correct base address is in DP (CPL = 0) or SP (CPL = 1),
you would write: ADD SAMPLE, B
The lower seven bits of the address of SAMPLE are stored in
the instruction word.
Direct Addressing
37
9/21/2013
Direct Addressing
Direct Addressing
The SP points to any address in memory. The dma points to the specific location
on the page, allowing you to access a contiguous 128-word (27 1) block
in memory from any base address.
SP can also add or remove items from the stack.
38
9/21/2013
Direct Addressing
Indirect Addressing
In indirect addressing, any location in the 64K-word data space can be

accessed using the 16-bit address contained in an auxiliary register.
The C54xE DSP has eight 16-bit auxiliary registers (AR0AR7). Indirect
addressing is used mainly when there is a need to step through
sequential locations in memory in fixed-size steps.
When memory is addressed with indirect addressing, the auxiliary
register and the address can be optionally modified by a decrement,
an increment, an offset, or an index.
S i l modes
Special
d offer
ff circular
i l and
d bit
bit-reversed
d addressing.
dd
i
A circular buffer size register (BK) is used with circular addressing. The
AR0 register is used for indexed and bit-reversed addressing modes in
addition to being used to point to memory as the other auxiliary
registers do.
39
9/21/2013
Indirect Addressing
Indirect Addressing
Two auxiliary register arithmetic units (ARAU0 and ARAU1) operate on the contents
of the auxiliary registers. The ARAUs perform unsigned, 16-bit auxiliary register
arithmetic operations. Some addresses can be obtained by pre-modifying the
auxiliary register.
The auxiliary registers can be:
Loaded with an immediate value using the STM instruction
Loaded via the data bus by writing to the memory-mapped auxiliary registers
Modified by the indirect addressing field of any instruction that supports indirect addressing
M difi d b
Modified
by th
the modify
dif auxiliary
ili
register
i t (MAR) iinstruction
t ti
Used as loop counters using the BANZ[D] instruction
Figure shows the ARAUs used to generate an address in the indirect addressing mode
using a single data-memory operand. As the figure shows, the main components used
for address generation in indirect addressing are the auxiliary register arithmetic
units (ARAU0 and ARAU1) and the auxiliary registers (AR0AR7).
40
9/21/2013
Indirect Addressing
Indirect Addressing
You can modify the addresses you use in instructions before or

after they are accessed,
accessed or you can leave them unchanged.
unchanged
You can modify them by incrementing or decrementing the
address by 1, adding a 16-bit offset (lk), or indexing with the
value in AR0.
These three types of action combined with taking the action
plus the ways
y of leaving
g the
either before or after the access,, p
address unchanged make a total of 16 addressing types,
each assigned to a value of MOD, the 4-bit modification field
in the encoding of an instruction using indirect addressing.
41
9/21/2013
42
9/21/2013
Problem
Assuming the current content of AR3 to be 200h, what
will be its contents after each of the following
TMS320C54xx addressing modes is used? Assume that
the contents of AR0 are 20h.
a. *AR3+0
b. *AR3-0
c. *AR3+
d. *AR3e. *AR3
f. *+AR3(40h)
g. *+AR3(-40h)
Solution
43
9/21/2013

Circular Addressing
Indirect Addressing
Many algorithms, such as convolution, correlation, and FIR

filters require the implementation of a circular buffer in
filters,
memory.
In these algorithms, a circular buffer is a sliding window
containing the most recent data. As new data comes in, the
buffer overwrites the oldest data.
The key to the implementation of a circular buffer is the
implementation of circular addressing.
The circular-buffer size register (BK) specifies the size of the
circular buffer.

Circular Addressing
Indirect Addressing
A circular buffer of size R must start on a N-bit boundary (that

is, the N LSBs of the base address of the circular buffer must
b 0),
be
0) where
h
N is the
h smallest
ll
integer that
h satisfies
f 2N > R.
The value R must be loaded into BK.
For example, a 31-word circular buffer must start at an address
whose five LSBs are 0 (that is, XXXX XXXX XXX0 00002), and
the value 30 must be loaded into BK.
BK
As a second example, a 48-word circular buffer must start at
an address whose six LSBs are 0 (that is, XXXX XXXX XX00
00002), and the value 47 must be loaded into BK.
44
9/21/2013

Circular Addressing
Indirect Addressing
The effective base address (EFB) of the circular buffer is determined

by zeroing the N LSBs of a user-selected auxiliary register (ARx).
The end of buffer address (EOB) of the circular buffer is determined
by replacing the N LSBs of ARx with the N LSBs of BK.
The index of the circular buffer is simply the N LSBs of ARx and the
step is the quantity being added to or subtracted from the auxiliary
register. Follow these three rules when you use circular addressing:
Place the first (lowest) address of the circular buffer on a 2N boundary where 2N is larger
than the circular buffer size.
Use a step less than or equal to the circular buffer size.
The first time the circular queue is addressed, the auxiliary register must point to an
element in the circular queue.

Circular Addressing
Indirect Addressing
The algorithm for circular addressing is as

f ll
follows:
45
9/21/2013

Circular Addressing
Indirect Addressing
Circular addressing can be used for single data-memory or dual

data-memory
data
memory operands. When BK is zero, the circular modifier
results in no circular address modification. This is especially useful
when a dual operand must perform an address modification
equivalent to ARx+0.
Figure A illustrates the relationships among BK, the auxiliary
register (ARx), the bottom of the circular buffer, the top of the
circular buffer, and the index into the circular buffer.
Figure B shows how the circular buffer is implemented and
illustrates the relationship between the generated values and the
elements in the circular buffer.

Circular Addressing
Indirect Addressing
46
9/21/2013

Circular Addressing
Indirect Addressing
Problem
Assume that the register AR3 with contents 1020h is selected as

the pointer for the circular buffer. Let BK = 40h to specify the
circular
l b
buffer
ff size as 40h.
40h Determine the
h start and
d the
h end
d
addresses for the buffer. What will be the contents of register
AR3 after the execution of the instruction LD*AR3 + 0%, A, if
the contents of register AR0 are 0025h?
Solution: BK=40 So N=6 (25=32, 26=64)
AR3 = 1020h means that currently it points to location 1020h.
Masking the lower 6 bits zeros gives the start address of the
buffer as 1000h. Replacing the same bits with the BK gives the
end address as 1040h.
The Instruction LD*AR3 + 0%, A modifies AR3 by adding AR0
to it and applying the circular modification. It yields AR3 =
circ(1020h+0025h) = circ(1045h) = 1045h - 40h = 1005h.
Thus the location 1005h is the one pointed to by AR3.
47
9/21/2013

Bit reversed Addressing
Indirect Addressing
Bit-Reversed Address Modifications (MOD = 4 or 7)

Bit-reversed addressing enhances execution speed and
program memory for FFT algorithms that use a variety of
radixes.
In this addressing mode, AR0 specifies one half of the size of
the FFT. The value contained in AR0 must be equal to 2N1,
where N is an integer, and the FFT size is 2N.
A auxiliary
An
ili
register
i
points
i to the
h physical
h i l location
l
i off a data
d
value. When you add AR0 to the auxiliary register using bitreversed addressing, the address is generated in a bitreversed fashion, with the carry bit propagating from left to
right, instead of the normal right to left.

Indirect Addressing
Dual-Operand Address Modifications
48
9/21/2013

Indirect Addressing

Indirect Addressing
49
9/21/2013

Indirect Addressing

Memory-Mapped Register Addressing
Memory-mapped register addressing is used to modify the

memory-mapped
d registers
i t without
ith t affecting
ff ti either
ith th
the currentt
data-page pointer (DP) value or the current stack-pointer (SP)
value. Because DP and SP do not need to be modified in this mode,
the overhead for writing to a register is minimal.
Memory-mapped register addressing works for both direct and
indirect addressing.
Addresses are generated by:
Forcing the nine most significant bits (MSBs) of data-memory address to

0, regardless of the current value of DP or SP when direct addressing is
used
Using the seven LSBs of the current auxiliary register value when indirect
addressing is used
50
9/21/2013

For example, if AR1 is used to point to a memorymapped

d register
i
in
i memory mapped
d register
i
addressing mode and it contains a value of FF25h,
then AR1 points to the timer period register (PRD),
since the seven LSBs of AR1 are 25h and the
address of the PRD is 0025h. After execution, the
value remaining in AR1 is 0025h.
0025h

51
9/21/2013
The system stack is used to automatically store the program counter during interrupts
and subroutines. It can also be used at your discretion to store additional items of
context or to pass data values
values. The stack is filled from the highest to the lowest
memory address. The processor uses a 16-bit memory mapped register, the stack
pointer (SP), to address the stack. SP always points to the last element stored onto
the stack.
Four instructions access the stack using the stack addressing mode:
Stack Addressing
PSHD pushes a data-memory value onto the stack.

PSHM pushes a memory-mapped register onto the stack.
POPD pops a data-memory value from the stack.
POPM pops a memory-mapped
d register
i
from
f
the
h stack.
k
A push pre-decrements and a pop post-increments the address in the SP.

The stack is used during interrupts and subroutines to save and restore the PC
contents.
The FRAME instruction also affects the stack. This instruction adds a short immediate
offset to the stack pointer.
Stack Addressing
52
9/21/2013
M
E
M
O
R
Y
S
P
A
C
E
MEMORY SPACE
53
9/21/2013
54
9/21/2013
Program Control
Program control unit contains the program counter (PC), PC related

H/W PAGEN, H/W stack, repeat counters, status registers
The PC is a 16-bit register that contains the internal or external
program memory address used when an instruction is fetched or when
a 16-bit-immediate operand or coefficient table in program memory is
accessed. To address program memory, the address in the PC is put
onto the PAB.
55

PROGRAMMABLE DSPs

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PROGRAMMABLE DSPs

Uploaded by

Copyright:

Available Formats

9/21/2013

Introduction, Commercial Digital Signal-processing

TMS320C54xx processors retains the basic Harvard

The central processing unit (CPU) of TMS320C54xx

A powerful instruction set with a hardware-supported,

ARCHITECTURE OF TMS320C54XXBUS STRUCTURE

ARCHITECTURE OF TMS320C54XXBUS STRUCTURE

The C54x CPU contains:

The C54x DSP performs 2s-complement arithmetic with a 40-bit

16-bit immediate value

The 40-bit ALU, implements a wide range of arithmetic

The Y input source to the ALU is any of three values:

When a 16-bit data-memory operand is fed through data

The ALU saturation logic prevents a result from overflowing

overflow flag (OVA/OVB) in status register ST0 is

The ALU has an associated carry bit (C) that is affected by

DUAL 16-BIT MODE

For arithmetic operations, the ALU can operate in a special

The C54x devices have two 40-bit accumulators: accumulator A and

It stores the output from the ALU or the multiplier/adder block

TEMPORARY REGISTER (T)

One of the multiplicands for multiply and multiply/ accumulate

TRANSITION REGISTER (TRN)

The 16-bit transition (TRN) register holds the transition

CPU AUXILIARY REGISTERS (AR0AR7)

CPU-BLOCK-REPEAT REGISTERS (BRC, RSA, REA)

Stack-Pointer Register (SP)

Circular-Buffer Size Register (BK)

The ARAUs use16-bit circular-buffer size register (BK) in

The C54x DSP barrel shifter has a 40-bit input

The barrel shifter and the exponent

The barrel shifter is used for scaling operations

an input data-memory operand or the

The 40-bit is connected as follows:

The SXM bit controls signed/unsigned extension of the data

An immediate operand, the accumulator shift

The ASM value represents

A, ASM, B ; Add accumulator A to accumulator B ; with

The six LSBs of T represent a shift count value in the

A ; Normalize accumulator A (T contains the

The multiplier/adder unit performs 17 X 17-bit 2s-complement

The fast, on-chip multiplier allows the C54x DSP to

Compare, Select, and Store Unit (CSSU)

The compare, select, and store unit (CSSU) performs

Internal Memory Organization

The C54x DSP memory is organized into three

The on-chip RAM for these processors is

Internal Memory Organization

Internal Memory Organization

The on-chip ROM is part of the program memory space

The DARAM is composed of several blocks.

Internal Memory Organization

The SARAM is composed of several blocks.

Internal Memory Organization

The devices with multiple CPU cores include

simultaneous program space access from 2 CPU

The C54x DSP also has 26 CPU registers plus

CPU Status and Control Registers

The C54xE DSP has three status and control registers:

Status register 0 (ST0)

Status Registers (ST0 and ST1)

Status Register (ST0)

Status Register (ST0)

Status Register (ST1)