You are on page 1of 77

Number Theoretic

Transforms

G.A. Jullien

Extract from Number Theoretic Techniques in Digital Signal


Processing
Book Chapter
Advances in Electronics and Electron Physics, Academic Press
Inc., (Invited), vol. 80, Chapter 2, pp. 69-163, 1991.
Introduction

This chapter discusses the application of number theory to the intensive computations required in

many Digital Signal Processing (DSP) systems. Number theory was always looked upon as one of

the last vestiges of pure mathematical pursuit, where the idea of applying the theories to practical

purposes was irrelevant. Over the last few decades we have been discovering very practical

applications for number theory; among these are applications in coding theory, communications,

physics and digital information (Schroeder (1985)). Our specific interest in this work is the

application of number theory to the mechanics of arithmetic computations themselves. Modern day

computer technology is heavily dependent on the manipulation of numbers, in fact the automization

of arithmetic operations was the quest of all the early computer pioneers, from Pascal to Babbage to

Von-Neumann. The involvement of number theory in this endeavor, however, has been strangely

very limited. The use of number theory in helping to perform the basic arithmetic operations has not

been accepted by the commercial world. Perhaps the reason for this is the pervasion of the binary

number system in the mechanization of computer arithmetic. A large number of hardware designers

have little or no knowledge that alternative ways of computing arithmetic exist!

There is one field in which the high speed arithmetic manipulation of numbers is of paramount

importance. This is the field of Digital Signal Processing (DSP), and it is here that alternative

arithmetic hardware may find a niche. This is indeed the subject of this paper.

Why is it that some DSP systems are so computationaly intensive? Typically the DSP system we

are discussing will be involved with the processing of streams of data that are indeterminately long;

the requirement of the system is to produce processed output data at the same rate that the input

data is entering the processor. This is referred to as real-time processing. Often the arithmetic

manipulations are simple in form (cascades of additions and multiplications in a well defined

structure) but the numbers of operations that have to be computed every second can be enormous. It

Page 1
Number Theoretic Techniques in Digital Signal Processing

will be constructive at this point to consider an example to illustrate the processing power required

by a typical DSP system.

Let us consider a basic problem; the filtering of television pictures in real-time. We will assume that

a two-dimensional filter is required and is given by the following equation:

2 2
ym,n = Â! Â! wi,j . xm-i,n-j
i=-2 j=-2

Here wi ,j is the filter kernel, and represents the weights on a two-dimensional averaging procedure

around each output sample (ym,n). The averaging is based on a neighborhood of input samples {xi

,j }, that extends 5 samples in each dimension and centered on the output pixel position. The double

summation, in fact, represents two-dimensional convolution. Convolution is a basic operation to all

linear signal processing problems since it represents the mapping function of an input signal to an

output signal through a linear system. The real time constraint is important since, as we will see, an

extremely large computational requirement will be necessary in order to construct the filter.

In order to see the speed of operation required by our filter, we will perform some elementary

calculations based on typical requirements for the filter. We assume a 525 line scan with 500

picture elements (pixels) in each line. We also assume that each primary colour is to be filtered

independently with the 5 x 5 filter. With an interlaced system we need to filter 30 images every

second. Each output pixel (made up of three primary colour sub-pixels) requires 75

multiply/accumulate operations (M/A Ops) to implement the filter. There are 262,500 pixels in each

image and so 1.96875 x 107 M/A Ops are required for each image. Since we require to filter 30

images per second, the total number of M/A!Ops per second is 5.9 x 108. Assuming that an M/A

Op is considered a basic operation in the processor used to filter the image, then we need a machine

that delivers 590 Mega Operations per second (MOPS). In these terms we require a super

computer to deliver this performance!

Page 2
Number Theoretic Techniques in Digital Signal Processing

Clearly the implementation has to be somewhat different to a large main frame super computer, it

would certainly be difficult to fit one into a normal sized television set!

The filtering operation has the advantage of not requiring the full features of a general purpose

computer (the filtering operation does not have to be performed within the typical multi-user,

complex operating system environment of a main frame machine) and the computations are simply

repeated arithmetic operations in a predetermined structure. In fact, it is rather pointless to talk about

the computational requirements in normal general computer terms (MIPs, MOPs, MFLOPs, etc.),

but rather to talk in direct terms of the data flow requirements; that is, the throughput rate of the

filter in samples per second (in this example, a sample is a pixel). For our example filter we have to

filter pixels at a rate of 7.875!x!106 pixels per second. If we provide arithmetic hardware which

contains 75 Multiply/Accumulate circuits, then each circuit has to operate within 120nS. If we use

currently available integrated circuits that operate within 60nS, then we can build a circuit with only

38 integrated circuits for the basic arithmetic operations, and perform two operations, sequentially,

with one circuit. We have therefore reduced the original requirement of a supercomputer to a

system containing a relatively small number of 'chips'.

It is the purpose of this paper to discuss ways in which number theoretic techniques can be used to

perform DSP operations, such as this example filter, by reducing the amount of hardware involved

in the circuitry. Since we are going to be directly concerned with the implementation of DSP

arithmetic, it might be helpful to finish off our example by considering the arithmetic details of each

Multiply/Accumulate operation.

The representation of each primary colour pixel by 8-bits is more than adequate for very high

definition of the image and the quantization of each filter weight by 8-bits is also very generous.

This naturally leads to a 16-bit multiplier structure. Since there are an accumulation of 25 such

multiplications for each output pixel, an extra 5-bits are required for this. With 21-bits we can, in

fact, compute each output pixel with no error in the computation process. Since 21-bits is a

pessimistic upper bound for the computational dynamic range (only required if both the filter

Page 3
Number Theoretic Techniques in Digital Signal Processing

coefficients and the input pixels are at their maximum values - a very rare occurrence), we can easily

specify a 20-bit range for accurate computation.

This calculation leads us to an important observation that we do not need floating point arithmetic.

We can, in fact, perform error-free computation within the dynamic range of a typical floating point

standard mantissa length - no manipulation of the exponent being required at all! This example

represents a fine-grained computation (one in which successive parts of the computation are of a

limited dynamic range) and is typical of many DSP algorithms. The limitations we are able to

impose on the arithmetic calculations will be a great help in reducing the hardware required in the

very high-speed throughput systems to be discussed in this paper.

For the purposes of this paper we will restrict ourselves to the computation of linear filter and

transform algorithms which have, for the one-dimensional case, the inner product form:

N-1
yn = Âxm.wf(n,m)
m=0

In the case of linear filtering (convolution) f(n,m)!=!n-m; for the case of a linear transform with the

convolution property, wf(n,m) is a primitive Nth root of unity. For the DFT, wf(n,m)!=!e-i N ! n m !.

Although this limitation appears overly restrictive, the general form of equation (1) probably

accounts for the vast majority of digital signal processing functions implemented commercially, and

also leads us to a body of literature that uses number theory to indirectly compute the convolution

form of the inner product formulation. We will also restrict our investigation of the application of

number theoretic techniques to those that use finite field arithmetic directly in the computations.

This will not cover those algorithms whose structure is derived via number theoretic techniques, and

whose computation is performed over the usual complex field (e.g. Winograd Fourier Transforms);

there is a vast amount of literature already available on this separate subject, and the reader is

directed to the work by McClellan and Rader (1979) as a starting point.

Page 4
Number Theoretic Techniques in Digital Signal Processing

Our discussion in this review will concentrate on two areas: the use of finite ring/field operations

for computing ordinary integer computations (Residue Number System) for high speed signal

processing and the use of algorithms defined over finite rings/fields in which not only the arithmetic

is performed over such finite structures, but also the properties of the algorithm depend on the

properties of the finite field/ring over which the algorithm is defined. This is an important

differentiation. Our sole concern here is to examine arithmetic and general computations over finite

mathematical systems.

Our starting point should be at the recognized starting point for the modern field of digital signal

processing, the Fast Fourier Transform (FFT). The FFT is not a new transform, but really a name

for a set of algorithms that compute the Discrete Fourier Transform (DFT) with much reduced

arithmetic operations. In particular, multiplications, traditionally a hardware and/or time consuming

arithmetic operation, are reduced considerably. With the publishing of the first FFT algorithm

(Cooley and Tukey (1965)) digital signal processing began to emerge as a field in its own right.

Only a few years after that landmark paper, the first applications of finite field transforms to

computing convolution appeared. We will not attempt to retrace the history here, but will discuss

these two discoveries as initial steps on the road to the application of number theory to the digital

processing of signals. The discovery of efficient ways of computing the DFT allowed both the

computation of spectra, and the implementation of finite impulse response filters to be carried out

with much reduced computational time. The most interesting use of the FFT, in the initial years, was

in the latter application. The ability of the DFT to indirectly compute the inner product form for FIR

filters comes from its convolution property. The result is surprising because it seems, on the

surface, to involve more work in that three transforms have to be computed. It is also surprising that

a transform with a basis set involving trigonometrical functions can be used to efficiently compute

convolutions between, say, integer sequences, but it does indeed work.

In order to examine this remarkable property we will briefly discuss FIR filters and then introduce

the DFT.

Page 5
Number Theoretic Techniques in Digital Signal Processing

A. Finite Impulse Response Digital Filtering using DFTs

Finite impulse response (FIR) digital filters produce an output based on a weighted sum of present

and past inputs. Mathematically this is a weighted averaging scheme represented by:

N-1
yk = Â! hn . xk-n
i=0

where {hn} is the weighting sequence, and {xn} is the input sequence.

The weighting sequence, {hn} characterizes the filter, and it is the determination of the sequence that

corresponds to the design of the filter. The sequence is sometimes referred to as the impulse

response based on the equivalent characterization of analog filters. In the digital case the impulse

function is a perfectly realizable sequence {1, 0, 0, 0, .....}. If we use this impulse function as the x
sequence, the output of the filter will be the {hn} sequence. We can also define a frequency
response for the filter by using an input sequence that is a complex sinusoid, xn = eiΩn . Here, the

frequency is simply an angular increment, Ω, which locates the value of the input, at sample n, on

the unit circle at an angle of Ωn. As with continuous systems, we can invoke the use of a transform

to map convolution into multiplication. The transform we use is the z-transform (Jury (1964))

which is defined as:

N-1
X(z) = Â!xn!z-n
i=0

Using this definition, and the convolution property, we are able to define a transfer function for the

FIR filter as:

Y(z)
X(z) = H(z)

By letting z= eiΩ , we can find the magnitude and phase shift (polar coordinates) of H(eiΩ ), and

hence the frequency response (H(eiΩ ) vs Ω). We note that a simple inversion procedure exists; we

Page 6
Number Theoretic Techniques in Digital Signal Processing

simply examine the coefficients of H(z) and assign them to delayed samples of the impulse

response. H(z) can also be obtained by a polynomial division of Y(z) by X(z); for the FIR filter we

know that this division will always have finite length and so H(z) (for z an indeterminate complex

number) has only zeros. For filters which have an infinite length response (infinite impulse

response (IIR) filters), H(z) contains poles as well as zeros; i.e. we represent H(z) as a rational

function of polynomials. Closed solutions exist for determining the impulse response for IIR

filters based on the roots of H(z) (Jury (1964)). We will not consider IIR filters here since our

interest is in implementing a finite convolution process. IIR filters possess the property that they

are able to implement arbitrary frequency magnitude responses with much fewer multiplications

than equivalent FIR filters. They have unfortunate side-effects, however, of: possible instability due

to inexact multipliers; possibility of producing limit cycle oscillations; possibility of overflow

oscillations; difficulty in implementing linear phase responses. Although a great deal of effort has

been spent on studying such filters (see, for example, Rabiner and Gold (1975)), it is not clear that

the implementation efficiency suitably compensates for the problems. Perhaps the best solution is

to concentrate on efficient implementation strategies for FIR filters. This is the main subject of this

paper.

II. Discrete Fourier Transforms


D
The DFT is defined as a transformation that maps a sequence xn fi Xk where:

N-1 N-1
-j2πnk/N jW Nnk
Xk = ∑!!xn !e = ∑!!xn !e (1)
!!!n=0 !!!n=0


with WN = N

In terms of the Fourier transform, this may be regarded as an approximation to the Fourier integral,

but it turns out that it is a transform in its own right with exact properties. Two of the properties that

Page 7
Number Theoretic Techniques in Digital Signal Processing

are important to the subject of this paper are the inverse and convolution property. In fact we need

the first to prove the second.

A. Inverse and Convolution Property


D-1
One important property is that the transform is reversible, i.e. the mapping xn! fi !Xk exists, with

an inverse given by:

N-1
1 j-W Nnk
xn = N ∑!!Xk !e (2)
!!!k=0

The proof that the inverse exists yields the basis for the second important property, the convolution

property.

The easiest way to prove that (2) is the inverse of (1) is by direct substitution. We will make the
substitution using an appropriate dummy variable, m; the result should yield that xn=xm,"m=n.

N-1 È N-1 ˘
1 Í jW Nmk˙ -jW Nnk
xn = N ∑ Í ∑!!xm !e ˙ e
Î
!!!k=0 !!!m=0 ˚

interchanging the order of summation and gathering terms we find:

N-1 È N-1 ˘
1 Í jW Nk(m-n)˙
xn = N ∑ xm Í ∑!!e ˙
!!!m=0 Î !!!k=0 ˚

or

N-1 È N-1 ˘
1 Í ˙
xn = N! ∑ xm Í ∑!!WN(k[m-n])˙ (3)
!!!m=0 Î !!!k=0 ˚

The function, W(k[m-n]), displays rather an interesting property.

Page 8
Number Theoretic Techniques in Digital Signal Processing

j[m-n]WN
W(k[m-n]) is a function on the unit circle in the complex plane with N unique values: 1,!e
j2[m-n]WN j(N-1)[m-n]WN
,!e ,!....!,!e for k = 0, 1, 2, .... , N-1. For [m-n]=KN, where K is some integer,
N-1
all values of the function are 1 and so the sum, ∑ W N(k[m-n]), evaluates to N. For [m-n]≠KN
!!!k=0

we can invoke the fact that the function yields a finite geometric series with the sum:

N-1 jN[m-n]W
!!W N(k[m-n]) = 1!-!e
N
∑ j[m-n]WN = 0.
1!-!e
!!!k=0

Thus, the only non-zero values generated by (3) are for [m-n]=KN, and for these values we obtain
the result xn=xn+KN. If we restrict both the sample and transformation domain to be periodic, with

period N, then we have proved that the transform is invertible.

Based on this result for the existence of an exact inverse for the DFT, we can now proceed to

examine the convolution property.

We start by considering the result of multiplying two transform domain sequences together.
Assume that the sample domain sequences are xn and yn with transform domain sequences of Xk

and Yk. The result of multiplying the transform domain sequences is a new sequence Zk!=!Xk . Yk .

Using the DFT of the x sequence explicitly, we have:

N-1
Zk = ∑!!xn.!WN(kn) .Yk (4)
!!!n=0

Inverting equation (4) yields:

N-1 Ê N-1 ˆ
1 Á ˜
zm = N ∑ Á ! ∑!!xn.!WN(kn)!.Yk˜ . W N(-km) (5)
k=0 Ë !!!n=0 ¯

We now interchange the order of summation to find:

Page 9
Number Theoretic Techniques in Digital Signal Processing

N-1Ê N-1 ˆ
Á1 ˜
zm = ∑ xn Á N! ∑!!!Yk !.!WN(k[n-m])˜ (6)
n=0 Ë !!!k=0 ¯

The inner expression of equation (6), in parenthesis, is the sequence y[m-n]. Since {y} is periodic,
we can retain values from a single period by forming y[m⊕N(-n)]. Note that here we are using the

notation ⊕N to indicate addition modulo N. This notation will be expanded later.

From (6) we obtain the final result for the output sequence:

N-1
zm = ∑ xn y[m⊕N(-n)] (7)
!n=0

zm is therefore the convolution of the sequence {x} with the sequence {y}. It is the convolution of

two periodic sequences, and is often called cyclic, or periodic, convolution. As noted before, we only

need to know the samples of a single period in order to perform the calculation. If we require to

determine the periodic convolution of two sequences, then we can perform the calculation indirectly

by taking DFT's of each sequence, multiplying in the transform domain, and inverting the result. If

we require the convolution of two aperiodic sequences, it turns out that we can still use periodic

convolution, providing that we are prepared to perform several periodic convolutions with sub

samples of the sequences, and assemble the final sequence from partial outputs of the periodic

convolutions; these are the so-called overlap/save and overlap/add procedures. We are not able to

give full details of such procedures here, but there are many fine text books which detail the two

approaches (e.g. Rabiner and Gold (1975)).

The use of an indirect computation of convolution is advantageous when the amount of computation

involved is considerably less than that required using the direct convolution sum. An N point

periodic convolution requires N2 multiplications and N2-N binary (two valued) additions. Can we

reduce this by an indirect computation involving DFT's? The answer is yes, but only if the

transform can be computed efficiently. An algorithm that efficiently computes a DFT is called a fast

Page 10
Number Theoretic Techniques in Digital Signal Processing

algorithm, and the computation of a DFT with such an algorithm is called a Fast Fourier Transform

(FFT). Note that the FFT is not a separate transform, just a fast (or efficient) way of computing a

DFT.

For simplicity we will discuss the development of fixed radix decimation algorithms that use a

power of two as the radix. These were the first Fast algorithms to be developed, and still represent

the major implementation procedure for commercial hardware and software solutions.

III. Fast Fourier Transform (FFT) Algorithms

We will briefly examine the classical decimation in time algorithm to appreciate the magnitude of

savings that can occur in computing the DFT using a fast algorithm.

A. Decimation in Time FFT Algorithm

There are two basic decimation (sequence slicing) procedures: decimation over the sample domain

(decimation in time, DIT) or decimation over the transform domain (decimation in frequency, DIF).

We will consider the first procedure, with the assumption that N contains some power of 2 factors.

D
The transformation: xn fi Xk can be written:

N-1
Xk = ∑ xn W N(nk) (8)
!!!k=0

and we can decompose the computation by computing (8) for the even and odd samples, separately,

and then put the two results back together. This yields:

N N
2!-1 2!-1
Xk = ∑ x2n W N(nk) + WN(k) . ∑ x2n+1 W N(nk) (9)
!!!n=0 2 !!!n=0 2

Page 11
Number Theoretic Techniques in Digital Signal Processing

We have now reduced the computational requirement for an N-point DFT to that required for two
N
2 point DFTs and an extra N multiplications. If we are interested in reducing multiplications then
N2
we have reduced the requirement from N2 multiplications to 2 + N multiplications. Although this
N
may not represent a large reduction, we are able to repeat the process by decimating each of the 2
N
point DFTs to two 4 point DFTs. We can continue in this way until we run out of factors of N.

The greatest decimation occurs for N, a power of 2, in which the basic computational element

reduces to a 2-point DFT (which can be computed using real numbers). In this case the number of
N
multiplications reduces to 2 log2N (in fact some of these multiplications are trivial, but we will use
N
this figure as an upper bound). The calculation reduces to the calculation of 2 log2N two-sample

DFT's with a multiplication by a power of W. This multiplier is popularly referred to as a 'twiddle

factor'.

The basic computational element is shown in the signal flow graph in Fig 1 below, and is often

referred to as a 'butterfly' based on its structure.

+ r
A ∑ C = A + B. W N
+

+
r
B ∑ D = A - B. W N
-

r
WN

Figure 1 Butterfly calculation with twiddle factor multiplier

Page 12
Number Theoretic Techniques in Digital Signal Processing

x0
X0

x1
W 08 X4
-1
x 0
2 W X2
8
-1
x 0 2
3 W 8 W8 X6
-1 -1
x 0
4 W X1
8
-1
x 1
5 W 08 W8 X5
-1 -1
x6 0 2
W W8 X3
8
-1 -1
x7 0 2 3
W W W8 X7
8 8
-1 -1 -1

Figure 2 FFT Signal Flow Graph for 8-point DFT transform

Notice that there is only one complex multiplication and two complex additions per butterfly. An
example is shown in Figure 2 for an 8-point transform using 4log28!=!4 x 3!=!12 butterfly

calculations. Appropriate pairs of shaded nodes represent the addition/subtraction network of the

butterfly calculation.

Note that although there are 12 multiplications indicated, in fact only 5 of these are non-trivial (i.e.

not a multiplication by unity). Before we leave this brief introduction to fast algorithms it can be

observed that the output sequence has been scrambled in the process of reducing the number of

multiplications, and that the structure is not the same from stage to stage. Algorithms are available

that do not scramble the output data (or require the input data to be scrambled) and there are also

algorithms that maintain a constant structure, but not together (a sort of Heisenberg uncertainty

principle for FFT's !).

Page 13
Number Theoretic Techniques in Digital Signal Processing

It seems to be a law of nature that as one attempts to reduce the number of arithmetic operations (or

at least the most costly..multiplication) the structure suffers. This comes back to haunt us when we

examine VLSI implementations of such algorithms.

There are many works available on the myriad of fast algorithms that have been discovered since the

initial algorithm was published (for example Blahut (1985)). It has been our intent here simply to

point out that these algorithms exist and to briefly discuss one of the earliest.

B. Use of the DFT in indirect computation of convolution

Based on the fact that DFT's can be computed very efficiently, it becomes clear that the operation of

convolution can also be computed much more efficiently than directly implementing the convolution

sum. Assuming that we wish to compute circular convolution, then an N-point convolution can be

computed using 2 DFTs and an inverse DFT. Since an inverse DFT is the same complexity as a
forward DFT, the number of multiplications involved are 3N!log2!N!+!N. The reduction in
1
multiplications for an N-point circular convolution is N !!{ 1!+!3log2!N} . For the case of

filtering with a fixed set of weights, we can store the DFT of the weight sequence and obtain a
1
greater reduction of N !{ 1!+!2.!log2!N} multiplications. These savings are reduced if one is

interested in aperiodic convolution (the normal filtering operation) but for large filter kernels (large

N) the savings can be significant enough to warrant using this indirect approach.

We note one unfortunate fact, even if we are convolving only real sequences, it is still necessary to

use complex arithmetic. In fact our reduction calculations above are predicated on the need for

complex sequence convolution. There is another unfortunate effect that since the roots of unity are

obtained from transcendental functions, we are not able to compute convolutions without error; even

the convolution between sequences of integers, which in the direct form can be easily computed

without invoking any errors, will have to be computed using approximations to the roots, and will, in

general, invoke errors in the final convolution sequence.

Page 14
Number Theoretic Techniques in Digital Signal Processing

There are mathematical systems in which roots of unity can be generated and used in calculations

without any approximations. In fact, approximations are meaningless. The mathematical systems

are finite algebras, and we will devote the remainder of this chapter to discussing their use in the

computation of DSP algorithms.

It is important to establish some basic concepts and theories relating to the subject of finite algebra;

the following section is included for that purpose.

IV. Finite Algebras

For our specific purposes we will be using algebraic structures that emulate normal integer

computations. The entities to be discussed are groups, rings and fields which contain a finite

number of integers as their elements and allow arithmetic operations that are modifications of

normal integer arithmetic operations. This will not be an exhaustive review; the reader is referred to

standard texts on number theory (e.g. Dudley, 1969, and Vinogradov, 1954) as well as texts on

modern algebra (e.g. Burton, 1970, and Schilling and Piper, 1975).

As a starting point, we will formally define the notion of a ring and field.

A. Rings and Fields

Definition 1:

If R is a non empty set on which there are defined binary operations of addition, +, and
multiplication, •, such that the following postulates (I) - (VI) hold, we say that R is a ring,

(I) Commutative law of addition a+b = b+a

(II) Associative law (a+b)+c = a+(b+c)

(III) Distributive law a•(b+c) = a•b + a•c

Page 15
Number Theoretic Techniques in Digital Signal Processing

(IV) Existence of an element, denoted by the symbol 0, of R such that a+0 = a for every a
ŒR

(V) Existence of an additive inverse. For each a ΠR, there exists x ΠR such that

a+x!=!0

(VI) Closure, where it is understood that a, b, c are arbitrary elements of R in the above

postulates

A ring in which multiplication is a commutative operation is called a commutative ring. A ring with

identity is a ring in which there exists an identity element, 1, for the operation of multiplication,
a•1 = 1•a = a for all a Œ R.

The ring of integers is a well known example of a commutative ring with identity. In abstract

algebra, elements of a ring are not necessarily the integers, even not necessarily numbers; e.g. they
can be polynomials. Given a ring R with identity 1, an element a ΠR is said to be invertible, or to

be a unit, whenever a possesses an inverse with respect to multiplication. The multiplicative inverse,
a-1, is an element such that a-1•a = 1. The set of all invertible elements of a ring is a group with
respect to the operation of multiplication and is called a multiplicative group.

Definition 2:

If R is a ring and 0≠a Œ R, then a is called a divisor of zero if there exists some b≠0, b Œ R such

that a•b = 0.

Definition 3:

An integral domain is a commutative ring with identity which has no divisors of zero.

Definition 4:

A ring, F, is said to be a field provided that the set F-{0} is a commutative group under

multiplication.

Page 16
Number Theoretic Techniques in Digital Signal Processing

Viewed otherwise: a field is a commutative ring with identity in which each non-zero element

possesses an inverse under multiplication. It follows (McCoy, 1972) that every field is an integral

domain. The rings (fields) with a finite number of elements are called finite rings (fields). A ring of
integers modulo m, denoted here as Zm, is an example of a finite ring.

In every finite field with p elements, the nonzero elements form a multiplicative group. This
multiplicative group of order p-1 is cyclic; i.e. it contains an element, a, whose powers exhaust the

entire group. This element is called a generator, or a primitive root of unity, and the period or order
of a is p-1. The order of any element x in the multiplicative group is the least positive integer t such

that xt = 1, xs≠1, sŒ[1,t). The order t is a divisor of p-1 and x is called a primitive t-th root of

unity.

For every element x of the set F-{0} the mapping defined by: x=at , t Π{0, 1, ..., p-2} is the

isomorphism of the additive group of integers with addition modulo p-1 and the multiplicative

group of the field F.

The integer t is called the index of x relative to the base , denoted inda x.

1. The ring of residue classes

In order to describe the system, the notion of congruence should be introduced. The basic theorem,

known as the division algorithm will be established first.

Division Algorithm

For given integers a and b, b not zero, there exists two unique integers, q and r, such that a!=!b.q +
a
r, r Π[0, b). It is clear that q is the integer value of the quotient b . The quantity r is the least

positive (integer) remainder of the division of a by b and is designated as the residue of a modulo b,
or ab. We will say that b divides a (written ba) if there exists an integer k such that a = b.k.

Definition 5:

Page 17
Number Theoretic Techniques in Digital Signal Processing

Two integers c and d are said to be congruent modulo m, written

c ≡ d (mod m) if and only if m(c-d)

Since m0, c ≡ c (mod m) by definition

Another alternative definition of congruence can be stated as follows:

Two integers c and d are congruent modulo m if and only if they leave the same remainder when
divided by m, cm = dm .

Definition 6:

A set of integers containing exactly those integers which are congruent, modulo m, to a fixed integer

is called a residue class, modulo m.

The residue classes (mod m) form a commutative ring with identity with respect to the modulo m

addition and multiplication, traditionally known as the ring of integers modulo m or the residue ring,
and denoted Z m . The ring of residue classes (mod m) contains exactly m distinct elements. The

ring of residue classes (mod m) is a field if and only if m is a prime number, because multiplicative
inverses, denoted a-1 (mod m) or a-1m , exist for each nonzero a!Œ!Zm. Thus the nonzero classes

of Zm form a cyclic multiplicative group of order m-1, {1,!2, ..., m-1}, with multiplication modulo

m, isomorphic to the additive group {0,!1,!...,!m-2} with addition modulo m-1.

If m is composite, Zm is not a field. Multiplicative inverses do not exist for those nonzero elements

a Œ Zm for which the gcd (a, m) ≠ 1. The Euler's Totient Function, denoted f(m), is a widely used

number theoretic function and is defined as the number of positive integers less than m and
relatively prime to it. It follows that the number of invertible elements of Z m is equal to f(m). If m

has the prime power factorization m = m1e1 . m2e2 . .... mLeL then we can write:

È 1 ˘È 1 ˘ È 1 ˘
f(m) = m ÍÎ !1!-!!m1!˙˚ ÍÎ !1!-!!m2!˙˚ ...... ÍÎ !1!-!!mL!˙˚

Page 18
Number Theoretic Techniques in Digital Signal Processing

The Euler-Fermat theorem states that if c is an integer, m is a positive integer and

(c, m) = 1 , then cf(m) ≡ 1 (mod m).

The important consequence of this theorem is that there is an upper limit on the order of any
element a ΠZm. Specifically, the order t is a divisor of f(m).

2. Galois Fields

For any prime m and any positive integer n, there exists a finite field with mn elements. This

(essentially unique) field is commonly denoted by the symbol GF(mn) and is called a Galois field

in honor of the French mathematician Evariste Galois. Since any finite field with mn elements is a
simple algebraic extension of the field Zm, a brief review of the basic concepts of the extensions of

a given field will be presented.

Let F be a field. Then any field K containing F is an extension of F. If l is algebraic over F, i.e. if l

is a root of some irreducible polynomial f(x) ΠF[x] such that f(l)!=!0, then the extension field

arising from a field F by the adjunction of a root l is called a simple algebraic extension, denoted

F(l).

Each element of F(l) can be uniquely represented as a polynomial:

a0 + a1 l + .... + an-1 l n-1 , ai ΠF.

This unique representation closely resembles the representation of a vector in terms of the vectors

of a basis

1, l, ....... , ln-1.

The vector space concepts are sometimes applied to the extension fields, and F(l) is considered as a

vector space of dimension n over F.

Page 19
Number Theoretic Techniques in Digital Signal Processing

The field of complex numbers is an example of an extension of the field of real numbers; it is

generated by adjoining a root j = -1 of the irreducible polynomial x2 + 1.

If f(x) is an irreducible polynomial of degree n over Z m, m a prime, then the Galois Field with mn

elements, GF(mn) is usually defined as the quotient field Z m[x] / (f(x)), i.e., the field of residue

classes of polynomials of Z m[x] reduced modulo f(x). All fields containing mn elements are

isomorphic to each other. In particular, Z m[x]!/!(f(x)) is isomorphic to the simple algebraic

extension Zm(l), where l is a root of f(x) = 0.

Now that we have some basic background in finite algebraic structures, we can return to the

problem of indirectly computing convolution via transforms that exhibit the cyclic convolution

property. We will expand our definition of such transforms from the DFT to the NTT (Number

Theoretic Transform).

V. Number Theoretic Transforms

Number Theoretic Transforms (NTT's) were discovered independently by a number of researchers

(Nicholson 1971; Pollard 1971; Rader 1972) as a generalization of the standard DFT with respect

to the ring over which the transform is defined. The transform retains the same cyclic convolution

property as the DFT, but allows error free computation with the promise of much faster hardware

solutions.

The NTT has the same form as the DFT, with the exception that it is computed over a finite ring, or

field, rather than over the field of complex numbers.

N-1
Xk = Â ! p [ xn!ƒp!ank] (10)
n=0

where the ring, or field, is R(p) or GF(p) respectively. Note that a new notation has been introduced

here. In order to explicitly identify arithmetic operations over finite fields, or rings, we will use the

Page 20
Number Theoretic Techniques in Digital Signal Processing

symbols ⊕p, ƒp to represent addition and multiplication modulo p, respectively, and  ! p to

represent summation modulo p. Where it is clear that a single ring or field is being referred to, then

the subscript may be dropped (except on the summation). This notation will be used throughout the

rest of the paper, and will be extended where appropriate.

The NTT that has probably been studied more than any other is the Fermat Number Transform

(FNT). This, in large part, is due to the ability to use modified binary arithmetic to perform the

computations. It is not clear, given the present body of knowledge on VLSI implementations, that

this ability to use modified binary hardware is necessarily a desirable attribute of number theoretic

algorithms. It is the intent of this paper to present alternative forms of implementation of number

theoretic techniques that do not have to be predicated on the existence of standard computational

elements. This opens up the possibility of using a wider variety of number theoretic transforms. Let

us, however, start off by exploring number theoretic transforms based on Fermat numbers.

A. Fermat Number Transforms

Although the NTT allows convolution to be computed without error (unlike the DFT which

introduces errors due to the requirement for arithmetic with coefficients obtained from

transcendental functions), the use of such a finite field (or ring) transform is only appropriate when

the hardware has no greater complexity than normal computations over the reals. Satisfying this

requirement can be a problem when the field modulus is not chosen with the implementation in

mind. Among the earliest implementations considered (and still occupying a small, but current,

research interest) are Fermat Number Transforms (FNT's).

t
The FNT is initially defined over a Galois Field GF(Ft) where the modulus Ft = 22 +1 is the t'th

Fermat Number (FN), where the FN is prime (primes do not exist for all values of t). This

transform brings about three main implementation simplicities (Agarwal and Burrus, 1974):

Page 21
Number Theoretic Techniques in Digital Signal Processing

1 The arithmetic is very similar to ordinary binary arithmetic because the modulus is close to a

power of 2.

2 Simple forms can be found for the multiplicative group generator that reduces the

multiplications required to compute the transform to modified bit-pattern shifts.

3 A standard Fast algorithm, obtained from an appropriate FFT, can be used to compute the

transform.

The idea of being able to compute over a finite field but with only slightly modified binary hardware

is obviously appealing, albeit constraining.

The FNT pair is defined as:

È ˘ (11)
!!F t Î xn!ƒF t!ank˚
Xk = N-1
Â

n=0

xn = N-1 N-1 È ˘ (12)


!!F t Î Xk!ƒF t!a-nk˚
Â

k=0

There are some points that we should note:

t
1 Since NÙFt -1, it is clear that we can generate a maximally composite N= 22 and so use

one of the fast algorithms available for the DFT, to optimize a computational structure for the FNT.

2 If we restrict N to be even, N-1 will always exist.

3 The first four Fermat Numbers are prime, and 3 is a primitive root for all four of these

primes.

2 t
From 1 and 3 we find that 32 ≡ 1. If we wish to find a generator, a, for a multiplicative group of
b! 2t ! ! (2t-b)
order 2b (this will allow a transform of length 2b), then a 2 ≡!32 ≡!1; hence a ≡!32 . This

last expression allows us to determine the relationship between suitable values of a (for ease of

implementation) and possible transform lengths. The easiest value of a to choose is clearly a = 2.

Page 22
Number Theoretic Techniques in Digital Signal Processing

This will allow multiplication by powers of a to be replaced by shifts modulo Ft. Since Ft is close

to a power of 2, the hardware should be only moderately more complex than shifts modulo a power

of 2 (binary shifts).

It turns out that for a = 2, the order is 2t+1. This can be verified, using ordinary arithmetic, from:

(t+1) t t !
22 = (22 + 1) . (22 - 1) + 1 ≡!1

The largest Fermat prime is F4, for which the order is 32; the arithmetic is modified 17-bit binary.

If we examine F5 and F6, we find that they are composite numbers of the form Ft!=!K.2t+2!+!1.

We have to pause at this moment to reflect on the fact that Ft, t!Œ!{5,!6} is not a prime number.

Clearly we will not be computing over a field, so will the NTT still work? It turns out that the

conditions for the existence of the transform still exist when computing over a ring. The restriction

is that the transform has to exist over each field constructed from each of the prime factors of the
(i)
composite modulus. The transform length, N, therefore satisfies NÙF! t "i, where

Ft!=! ’F(i)! t (this assumes no powers of primes as factors) . For F5 and F6 we can show that
N!=!2t+2 is the maximum transform length supported by these Fermat numbers; as a necessary
condition, we can see that 2t+2Ù(Ft - 1). Although we are only computing over a ring of integers
Z F t, 2t+2Ù
/ K.2t+2+1, and so N-1 exists. The length supported by a = 2 is 2t+1, as before.

For all the Fermat Numbers considered here, it turns out that the transform length, N!=!2t+2, can be
achieved by using a value of a = 2 (Agarwal and Burrus (1974)) ; i.e. a2!=!2. The expression for

a = 2 is given by:

(t-2) (t-1)
a = 22 . (22 - 1)

which is clearly representable by a 2-bit (2 ones in a field of zeros) generator. For a fast DIT or
DIF algorithm, only the first, or last, stage will require odd powers of a, and so most of the

multiplications will only require single-bit multipliers. Given the use of a = 2 and the fifth FN,

Page 23
Number Theoretic Techniques in Digital Signal Processing

we have a dynamic range of just over 32-bits, with a transform length of N!=!128. It is clear that

there is an unfortunate link between dynamic range and transform length, assuming the requirement

for a fast algorithm.

It might be helpful to take an example of convolution over GF(F2) in order to reinforce the results

presented here. In order to illustrate the difference between performing convolution using an NTT
and an 8-point DFT, we will chose a = 32 = 9 in order that the NTT may also generate an 8-point

transform length.

The transform pair is:

7
Xk = Â!17 [ xn!ƒ17!9nk]
n=0

7
xn = 8-1 Â!17 [ Xk!ƒ17!9-nk]
n=0

We can concisely show the calculation using matrix notation (the modulo arithmetic is implied):

X=Tx (13)

where T is the transformation matrix, whose elements are given by:

Tkn = 9kn

Note that the corresponding DFT transformation matrix has elements given by:

Tkn = e-jπkn/4

We can immediately see the major difference in the computation of both transforms. Although the

DFT uses normal arithmetic, the calculations are over the complex field, with elements that are not

representable in a finite machine. The NTT, although requiring finite field (modulo) arithmetic,

requires calculations over an integer field using elements that are defined without error.

Page 24
Number Theoretic Techniques in Digital Signal Processing

As an example calculation, in order to compare the two approaches, we will cyclically convolve the

sequence {x} = {1, 2, 3, 4, 0, 0, 0, 0} with itself. This example is chosen to illustrate several points,

as can be seen if we compute the cyclic convolution directly:

7
yk = Â! xn . x[k⊕8(-n)] or {y} = {1, 4, 10, 20, 25, 24, 16, 0}
n=0

If we now compute the aperiodic (normal) convolution we find:

7
yk = Â!xn!.!xk-n or {y} = {1, 4, 10, 20, 25, 24, 16, 0}
n=0

We can observe that both the computed sequences are identical, which shows that it is possible to

generate normal aperiodic convolution from periodic convolution by padding the sequences to be

convolved with zeros. This forms the basis for the classical techniques of overlap-add and overlap-

save for using cyclic convolution to compute normal convolution (Gold and Rader (1969)). As a

second point, we also notice that some of the elements of the convolution output lie outside the

elements of the field over which the NTT is to be computed, GF(17). This will be discussed after

the example calculations.

We will generate the forward transform as the matrix multiplication of equation (13).

Ê 1016 ˆ Ê 1
1
1
9
1
13
1
15
1
16
1
8
1
4
1
2
ˆ Ê 12 ˆ
Á6 ˜ Á 1 13 16 4 1 13 16 4 ˜ Á3˜
Á 1115 ˜ = Á 1
1
15
16
4
1
9
16
16
1
2
16
13
1
8
16
˜ Q 17 Á 0 ˜
4

Á 13 ˜ Á 1 8 13 2 16 9 4 15 ˜ Á0˜
Ë 157 ¯ Ë 1
1
4
2
16
4
13
8
1
16
4
15
16
13
13
9
¯ Ë 00 ¯
The symbol, Q17 , indicates matrix multiplication over GF(17). Since we are convolving the

sequence with itself, we generate the transform domain point-wise multiplication (over GF(17)) as

below, and feed this result to the input of the inverse transform calculation.

Page 25
Number Theoretic Techniques in Digital Signal Processing

Ê 151 ˆ Ê 1016 ˆ Ê 1016 ˆ


Á2 ˜ Á6 ˜ Á6 ˜
Á 24 ˜ = Á 1115 ˜ ƒ17 Á 15 ˜
11

Á 16 ˜ Á 13 ˜ Á 13 ˜
Ë 154 ¯ Ë 157 ¯ Ë 157 ¯

Ê 158 ˆ Ê 1
1
1
2
1
4
1
8
1
16
1
15
1
13
1
9
ˆ Ê 151 ˆ
Á 12 ˜ Á 1 4 16 13 1 4 16 13 ˜ Á2 ˜
Á 137 ˜ = Á 1
1
8
16
13
1
2
16
16
1
9
16
4
1
15
16
˜ Q 17 Á 4 ˜
2

Á5 ˜ Á 1 15 4 9 16 2 13 8 ˜ Á 16 ˜
Ë 90 ¯ Ë 1
1
13
9
16
13
4
15
1
16
13
8
16
4
4
2
¯ Ë 154 ¯

The inverse transform requires a final multiplication by 8-1 = 15 to obtain the cyclic convolution

output.

Ê 14 ˆ Ê 158 ˆ
Á 10 ˜ Á 12 ˜
Á 38 ˜ = 8 -1 ƒ17 Á 13 ˜
7

Á7 ˜ Á5 ˜
Ë 160 ¯ Ë 90 ¯
We note that some of the elements of the output convolution do not equal the original calculation

using the direct convolution sum. They are, in fact, the result of finding the least positive residue,

modulo 17, of the correct output. In fact the results are not wrong in the accepted sense; they are

simply a mapping of the correct values onto GF(17). We will see later that such results can still be

Page 26
Number Theoretic Techniques in Digital Signal Processing

used, and in fact will provide a technique for removing some of the problems of the linkage of

dynamic range with transform length.

It will be constructive at this point to examine the use of the DFT in performing the same

convolution. Because we are performing the calculations over the complex field, we will treat the

'real' and 'imaginary' parts of the sequences and W elements as the elements of a 2-tuple; the

interaction between the 'real' and 'imaginary' parts will use the usual rules of complex arithmetic.

The forward transform is shown below:

10.00 0.00
-0.41 -7.24
-2.00 2.00
2.41 -1.24 =
-2.00 0.00
2.41 1.24
-2.00 -2.00
-0.41 7.24

1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00
1.00 0.00 0.71-0.71 0.00 -1.00 -0.71 -0.71 -1.00 0.00 -0.71 0.71 0.00 1.00 0.71 0.71 2.00 0.00
1.00 0.00 0.00-1.00 -1.00 0.00 0.00 1.00 1.00 0.00 0.00 -1.00 -1.00 0.00 0.00 1.00 3.00 0.00
1.00 0.00 -0.71-0.71 0.00 1.00 0.71 -0.71 -1.00 0.00 0.71 0.71 0.00 -1.00 -0.71 0.71 4.00 0.00
1.00 0.00 -1.00 0.00 1.00 0.00 -1.00 0.00 1.00 0.00 -1.00 0.00 1.00 0.00 -1.00 0.00 0.00 0.00
1.00 0.00 -0.71 0.71 0.00 -1.00 0.71 0.71 -1.00 0.00 0.71 -0.71 0.00 1.00 -0.71 -0.71 0.00 0.00
1.00 0.00 0.00 1.00 -1.00 0.00 0.00 -1.00 1.00 0.00 0.00 1.00 -1.00 0.00 0.00 -1.00 0.00 0.00
1.00 0.00 0.71 0.71 0.00 1.00 -0.71 0.71 -1.00 0.00 -0.71-0.71 0.00 -1.00 0.71 -0.71 0.00 0.00

The transform domain input to the inverse transform is the point-wise multiplication of the forward

transform with itself:

Page 27
Number Theoretic Techniques in Digital Signal Processing

100.00 0.00 10.00 0.00 10.00 0.00


-52.28 6.00 -0.41 -7.24 -0.41 -7.24
0.00 -8.00 -2.00 2.00 -2.00 2.00
4.28 -6.00 = 2.41 -1.24 2.41 -1.24
4.00 0.00 -2.00 0.00 -2.00 0.00
4.28 6.00 2.41 1.24 2.41 1.24
0.00 8.00 -2.00 -2.00 -2.00 -2.00
-52.28 -6.00 -0.41 7.24 -0.41 7.24

The inverse transform is computed below with a final division by 8.

8.00 0.00
32.00 0.00
80.00 0.00
160.00 0.00
=
200.00 0.00
192.00 0.00
128.00 0.00
0.00 0.00

1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 1.00 0.00 100.00 0.00
1.00 0.00 0.71 0.71 0.00 1.00 -0.71 0.71 -1.00 0.00 -0.71 -0.71 0.00 -1.00 0.71 -0.71 -52.28 6.00
1.00 0.00 0.00 1.00 -1.00 0.00 0.00 -1.00 1.00 0.00 0.00 1.00 -1.00 0.00 0.00 -1.00 0.00 -8.00
1.00 0.00 -0.71 0.71 0.00 -1.00 0.71 0.71 -1.00 0.00 0.71 -0.71 0.00 1.00 -0.71 -0.71 4.28 -6.00
1.00 0.00 -1.00 0.00 1.00 0.00 -1.00 0.00 1.00 0.00 -1.00 0.00 1.00 0.00 -1.00 0.00 4.00 0.00
1.00 0.00 -0.71 -0.71 0.00 1.00 0.71 -0.71 -1.00 0.00 0.71 0.71 0.00 -1.00 -0.71 0.71 4.28 6.00
1.00 0.00 0.00 -1.00 -1.00 0.00 0.00 1.00 1.00 0.00 0.00 -1.00 -1.00 0.00 0.00 1.00 0.00 8.00
1.00 0.00 0.71 -0.71 0.00 -1.00 -0.71 -0.71 -1.00 0.00 -0.71 0.71 0.00 1.00 0.71 0.71 -52.28 -6.00

Page 28
Number Theoretic Techniques in Digital Signal Processing

1.00 0.00 8.00 0.00


4.00 0.00 32.00 0.00
10.00 0.00 80.00 0.00
20.00 0.00 1 160.00 0.00
=
25.00 0.00 8 200.00 0.00
24.00 0.00 192.00 0.00
16.00 0.00 128.00 0.00
0.00 0.00 0.00 0.00

The calculations were performed at great precision, and so the effect of quantization noise is not

seen in this limited precision format. The quantization noise is present in both the inexact

representation of the W coefficients, and in the scaling procedures used to keep the data within the

dynamic range of the computational system. Even at large precision, zero elements become non-

zero, albeit very small, values. The fact that the input sequence is purely real, shows up in the

hermitian symmetry of the input to the inverse transform (even symmetry in real part about the

center sample (sample 4) and odd symmetry in the imaginary part). Although we have explicitly

shown all details of the calculations (not normally shown in most texts) an examination of the DFT

calculation compared to the GF(17) NTT calculation, demonstrates the simplicity and error free

behavior of computing indirect convolution over finite fields.

We are now in a position to summarize the features about each approach to computing indirect

convolution. This is presented in Table I below:

Table I Comparison of DFT and FNT integer sequence convolution

Comments

Feature DFT FNT

Transform Length No restriction Restricted by requirement

(i)
NÙF! t "i

Page 29
Number Theoretic Techniques in Digital Signal Processing

Dynamic Range No restriction Restricted by requirement

Ft
|Xk| < 2

Arithmetic Ordinary complex arithmetic Modulo integer arithmetic

Coefficients Transcendental roots of unity Integer elements with simple


structure

Precision Limited by wordlength of the Error free


computer system

Algorithmic Structure All 'fast' algorithms Transform length for 'fast'


algorithms restricted

There are probably more issues than those covered in Table I, but the essential points are there. We

see that the DFT offers no restrictions on dynamic range, transform length and type of fast

algorithm to be employed, but inherently introduces errors both in the inability to represent the

roots of unity with perfect precision, and the fact that in computing the transform there is

computational number growth that has to be checked by scaling. On the other hand the FNT has

some fairly severe algebraic constraints that limit the type of transform that can be computed, but

offers completely error free computation of convolution, with the ability to perform the calculations

with modified integer arithmetic and with simply structured coefficients.

How are we to set about removing, or at least relaxing, some of the transform length and dynamic

range constraints of the original FNT? We will start by considering algebraic 'tricks' that can be

played, followed by the use of multi-dimensional algorithms that can be used to dramatically

increase transform length with no increase in the dynamic range requirement. We will now expand

our horizons to examine general forms of NTT's.

Page 30
Number Theoretic Techniques in Digital Signal Processing

B. Indirect convolution using general NTT's

In order to look more carefully at the properties of the NTT, let us introduce it in a formal way:

The general form of the Number Theoretic Transform, over a ring, R, is given by:

T:

N-1
Xk = Â!p [ xn!ƒp!ank] ; k Œ {0, 1, 2, ..., ,N-1} (14)
n=0

where a is a primitive Nth root of unity in R. The inverse transform has the form:

T-1:

N-1
xn = Â!p [ Xk!ƒp!a-kn] ; n Œ {0, 1, 2, ..., ,N-1} (15)
n=0

with the restriction that N-1 ΠR.


-j N
Over the complex field, the DFT, with a = e , is the only transform that exists of this form, and

possesses the cyclic convolution property as we demonstrated earlier. The complex field can

support a DFT of any length, since N'th roots of unity exist for all N. In a finite algebraic system

(finite ring1 or field), the length of the transform depends on the choice of the modulus, p, and the
generator, a.

As part of this formal introduction of NTT's, we will consider the conditions for the existence of a

transform. Nicholson (1971) was the first to present these conditions for existence, in his treatise

on the algebraic theory of the 'generalized DFT'. His algebraic system for the generalization was a

1 Here the term 'ring' implies a commutative ring with identity

Page 31
Number Theoretic Techniques in Digital Signal Processing

Ÿ
commutative ring R with identity and without zero divisors. (i.e. an integral domain). The necessary

and sufficient conditions to generate an NTT of length N are:

Ÿ
1) That a is a primitive root of unity in R , i.e. aN = 1, ak ≠ 1, k = 1, 2, ...., N-1. If we denote
Ÿ Ÿ Ÿ
by R ƒ a multiplicative (cyclic) group of R , a is any generator of R ƒ of order N.

Ÿ
2) That the multiplicative inverse of N, N-1, belongs to R .

Ÿ
For the special case of a field, GF(p), the maximum number of elements in R ƒ, is p-1. More

general results for the existence of NTT's over finite commutative rings have been presented by
Ÿ Ÿ
Dubois and Venetsanopoulos (1978). Let R be a finite commutative ring with identity. Then R
Ÿ
decomposes uniquely as a direct sum of local rings1, {Ri}, which we will write as R @ ⊕∑ Ri
Ÿ
where @ indicates isomorphism. Under this decomposition, an element r ΠR has an L-tuple
Ÿ
representation (r1, .... , rL). R supports a generalized DFT of length N if and only if:

1) Each Ri contains a primitive Nth root of unity ai.


Ÿ
2) N-1 exists in R .

A primitive root of unity in Ri is any generator of a multiplicative group of order N. If Ii is the


maximal ideal of Ri, then Ri/Ii is a finite (Galois) field of mini elements (Burton (1970)). Thus for

any finite ring R, a necessary and sufficient condition can be derived, that R supports a length N
generalized DFT if and only if NÙgcd(mini - 1), " i!Œ!{1, 2, ...., L}. The notation aÙb means a

divides b; gcd is the greatest common divisor.

Practical considerations dictate a selection of ring/fields that support transforms whose parameters

lead to efficient implementation of modular arithmetic, either in hardware or software. Most of the

1A local ring is a commutative ring with identity which has a unique maximal ideal I (any element
a!œ!I is invertible in a local ring).

Page 32
Number Theoretic Techniques in Digital Signal Processing

reported work on Number Theoretic Transforms has supposed that the hardware will be

implemented using the binary number system. In the conventional binary arithmetic, residue

reduction is particularly easy when the modulus is of the form 2k ± 1. The choice 2k does not admit

useful transforms since N = 1. Also, in order to simplify multiplications in the binary number
system, the cyclic group generator, a, is chosen as a power of 2. When this constraint has to be

fulfilled, the transform length is usually not close to the maximum length attainable. In addition to

the above, the transform length is normally chosen to be highly composite, so that a 'fast' algorithm

exists for the implementation.

A considerable effort has been made to provide rings and fields which allow for adequate dynamic

range and satisfy the above constraints, so that the conflicts hidden in these constraints are

minimized. In particular, Rader (1972b), proposed transforms defined over the ring of integers

modulo Mersenne numbers, M!=!2p!-!1, p prime. These transforms are referred to as Mersenne

Number Transforms (MNT). In the ring of integers, modulo a Mersenne number, 2 is a pth root of

unity and -2 is a 2pth root of unity. Thus the transform can be computed without general

multiplication, although the use of 'fast' algorithms is precluded because the order of the transform

is not a power of 2 and not even highly composite. And, as we have already discussed, both Rader

(1972a) and Agarwal and Burrus (1974) proposed to compute the NTT with a modulus of the form
of the t'th Fermat number, Ft = 2b + 1, b!=!2t. Agarwal and Burrus (1974) show that an FNT with

a!=!2 allows a transform length N = 2t+1 and an FNT with a = 22t+2 (22t-1 - 1) (known as 2 ,

since a2 ≡ 2 (Mod Ft) ) allows N = 2t+2. Thus an FFT type algorithm can be used. The hardware

implementation of a 64-point FNT is described by McClellan (1976).

The main disadvantage of both the MNT and the FNT is the rigid relationship between the dynamic
range and attainable transform lengths. For example, with a 32-bit word machine using F5 = 232 +

1, N = 64 for a = 2 and N = 128 for a = 2 . There is also a limited choice of possible

wordlengths. This point is especially significant when a FNT is used and may result in a mismatch

Page 33
Number Theoretic Techniques in Digital Signal Processing

of wordlength and dynamic range required for the particular convolution, because of the large

spacing between Fermat numbers.

As a modification to the original FNT and MNT, Agarwal and Burrus (1974) introduced a modified

FNT by using ring moduli that had the same implementation properties as a Fermat number;

namely 2b+1, b!≠!2t. Such moduli have small factors and so yield small transform lengths. The

transform length can be artificially increased by using multi-dimensional transform techniques, as

we will see later. Probably the most interesting idea along the lines of modifying the ring modulus,

while retaining implementation efficiency was proposed by Nussbaumer (1976b), as pseudo-

Fermat and pseudo-Mersenne number transforms. A pseudo-MNT is defined with a ring modulus
(2p!-1)
of MM = q , p composite and q some factor of 2p!-1, and a pseudo-FNT is defined with a
(2b+1)
ring modulus of MF!=! s , b!≠!2t and s some factor of 2b+1. Because 2p -1 and 2b+1,

defined as above are not prime, transforms will have a short length. Thus, if 2b+1 and 2p -1 contain

small factors, these can be divided out in the pseudo-MNT, or pseudo-FNT, to yield longer

transform lengths. The arithmetic implementation of these transforms can still be performed using

modified binary arithmetic over the modulus 2p -1, or modulus 2b+1, for the pseudo-MNT or
pseudo-FNT, respectively, followed by a final reduction modulo MM or MF. The difficult

arithmetic operation (that is not performed easily with binary arithmetic) is therefore relegated to a

very small part of the overall computation budget.

In a complete search for algebraic properties that lead to useful transforms, and yet are still easily

implemented by modified binary arithmetic, the search broadened to include field, or ring, moduli

with a 3-bit form (3 'ones' in a field of 'zeros') as compared to the 2-bit structure of Fermat and

Mersenne numbers. These moduli are of the form 2n ± 2m ± 1, and have been investigated by

Pollard (1976) and Liu et al.(1976). Other rings have also been considered, such as the ring of

Eisenstein integers (Dubois and Venetsanopoulos (1978)). The driving force behind all of the

investigations reported, has been the use of modified binary arithmetic elements with multiplication

by elements of the transform matrix kept as simple as possible, whilst yielding highly composite

Page 34
Number Theoretic Techniques in Digital Signal Processing

transform lengths that are reasonably long. It is amazing that algebraic structures exist that allow all

of these requirements to be met! In the quest for more degrees of freedom in the algebraic

structures that had already been discovered, the use of extension fields, and rings, provided an

important tool.

C. NTT's over extension fields

Extension fields can be of any dimension; in order to see how such fields are built, we will discuss

the construction of a second degree extension field. The concept is a familiar one to those not

versed in any modern algebra or number theory. We simply have to consider the relationship

between the field of real numbers, R, and the field of complex numbers, C. The field, C, is in fact

built from R, by the adjoining of the solution of an irreducible polynomial over R. The polynomial
over R that has no solution in x2 + 1 = 0. We usually refer to the solution as i œ R,, and we

'build' the complex field as a polynomial field in i by the construction C = { S : +, *} where S!=!a
+ ib with a, b!Œ!R. The 'complex' operators + and * are defined by reducing the polynomial that

results from the ordinary algebraic application of addition and multiplication, + and * respectively,

by the irreducible polynomial, i2 !+!1!=!0. In the case of addition, the resulting polynomial is first

order and so no reduction is applied, i.e. addition operates point-wise. In the case of multiplication,

the resulting polynomial is of second order in i, and so it is reduced. This gives rise to the extra

arithmetic operations involved in complex multiplication compared to complex addition, i.e. four

'real' multiplications and two 'real' additions for complex multiplication, compared to only 2 'real'

additions for complex addition. The specific, well known, details are shown in table II below.

Table II Decomposition of complex arithmetic operations to real operations

Operation Resulting Polynomial Reduction Result

(a + i b) + (c + i d) i (b + d) + (a + c) i!(b+d)!!+!(a+c) (a + c) + i (b + d)
i2 !+!1

Page 35
Number Theoretic Techniques in Digital Signal Processing

(a + i b) * (c + i d) i2(b*d) + i(a*d+b*c) i2(b*d)+i(a*d+b*c) (a*c - b*d) + i (a*d


+ (a*c) + b*c)
!!!!+!(a*c)!!!!!!
i2 !+!1

Note that the addition operator in the polynomial elements of C is formal, in that it is never

implemented. We could just as easily write the element (a+ib) as (a,b). The same concept of

building fields can be applied to a Galois field, GF(p), and the degree of the extension field, d,

produces a field of pd elements, GF(pd). In order to build the field, we use an irreducible d'th order

polynomial over GF(p). It has been shown by Pollard (1971), that the cyclic convolution property
exists over such extension fields, and that the transform length NÙ(pd - 1). A polynomial of degree
d
d of the form f(x) = Âai!xi with ai!Œ!GF(p) and ad ≠ 0 is defined to be irreducible (Gaal (1971))
i=0

if it cannot be expressed as a product of two polynomials of positive degree over GF(p). The

quotient field GF(p) / (f(x)) is the required Galois field with pd elements. Up to isomorphism, the

field GF(pd) depends only on the degree of the polynomial and not on its particular form.

A special case is d=2, for which we generate a field GF(p2) which can also be written GF(p) [x],

where x2 - r is the irreducible polynomial with which the field is built. In order to reinforce the

ideas, we will go over this case with an example.

We will take p = 7; the base field is given by GF(7) = {0, 1, 2, 3, 4, 5, 6 ; ⊕7, ƒ7}. We will assume

an irreducible polynomial x2 + 1 = 0. In order to examine the assumption we create Table III as

followings.

Table III. (x2 + 1) over GF(7)

x 0 1 2 3 4 5 6

x2 + 1 1 2 5 10 17 26 37

(x2 + 1) Mod 7 1 2 5 3 3 4 2

Page 36
Number Theoretic Techniques in Digital Signal Processing

We note that none of the results of applying the polynomial, x2 + 1, are zero, and so the polynomial

is indeed irreducible over GF(7). As a contrast to this, we see that, in Table IV below, the same

polynomial defined over GF(5) is reducible.

Table IV. (x2 + 1) over GF(5)

x 0 1 2 3 4

x2 + 1 1 2 5 10 17

(x2 + 1) Mod 5 1 2 0 0 2

Formally we define -1 as a quadratic nonresidue over GF(7) and as a quadratic residue over

GF(5).

The determination of -1 as a quadratic residues, or nonresidue, holds special significance for the

computation of complex sequences over finite fields. The following theorem tells us which form of

primes support which type of residue or nonresidue. This is a classical theorem, but we will present

a proof using indices, (Baraniecka and Jullien (1980)) that is normally not used. We will use

indices later, and so an introduction here will be useful. A field GF(p) possesses a multiplicative
group GM(p-1) = {1, 2, .... , p: ƒp} = GF(p)-{0}, with p-1 non-zero elements. The entire group

can be generated by repeatedly multiplying a generator, a, by itself p-1 times. We therefore have a

unique mapping between the power of the generator and any element in the group. Thus a k ~ e
where e is an element in GM(p-1). We call k the index of e. Note that k!ΠGA(p-1) = {0, 1, .... ,

p-1: ƒp-1}, an additive group. Indices are the finite field equivalent of logarithms and can be used to

map multiplication into addition.

Theorem 1

-1 is a quadratic residue over GF(p) for all odd primes, p, of the form p = 4K+1, and -1 is a

quadratic nonresidue over GF(p) for all odd primes, p, of the form p = 4K+3.

Page 37
Number Theoretic Techniques in Digital Signal Processing

Proof

Clearly p=4K+1, or p=4K+3 defines the form of any odd number and therefore of any odd prime.
Let t be the index of -1 , and s be the index of -1. Then 2t!=!s(mod(p-1)). A solution will only

exist for s divisible by 2; 2-1 (mod(p-1)) does not exist since 2Ù(p-1). -1 is therefore a quadratic

residue for s even and a quadratic nonresidue for s odd. ( -1 )4 ≡ 1(mod p), and so 4s ≡ p-1

(mod p-1). This congruence has a solution only for p = 4K+1.


*

Although we can find other quadratic non-residues, with which to build second degree extension

fields, -1 carries special significance in that it allows the emulation of complex arithmetic over a

Galois Field. If we compute transforms over such fields, we can convolve sequences of complex

numbers; a useful property in many communication filtering problems. We see that all primes are

not created equal when it comes to emulating complex arithmetic.

The development of the extension field operations of multiplication and addition uses the same

principle as the development of complex field operations from base real field addition and

multiplication; this is shown in Table V (similar to Table II). In the case of a finite field, the

polynomial reduction uses arithmetic defined over the base field GF(p). We illustrate the operation

using a quadratic nonresidue for -1 over GF(p2).

Table V. Extension field arithmetic operations decomposed as base field operations for a

quadratic nonresidue of -1

Operation Result
Note in the table that
(a,b) ⊕p (c,d) ( [a ⊕p c] , [b ⊕p d] )
we have introduced
(a,b) ƒp (c,d) ({[a ƒp c] ⊕p [-b ƒp d]} , {[a ƒp d] ⊕p [b ƒp c]}) the notation, ⊕p and

ƒp, for extension

field operations of addition and multiplication, respectively; the base field modulus is used for the

subscript. The degree of the extension field is obtained from the number of elements within each n-

Page 38
Number Theoretic Techniques in Digital Signal Processing

tuple (in this case 2); we have explicitly shown each element of GF(p2) as a two-tuple rather than

using the formal operator, +. Note, also, that all implemented operations are base field operations.

The result in table III is a specific result for the quadratic nonresidue of -1. In general, for a

quadratic nonresidue of q, we can generate Table VI.

As an example of multiplication over GF(72), consider (2,4) ƒ7 (5,3). The result is:

(2,4) ƒ7 (5,3) = ({[2 ƒ7 5] ⊕7 [-4 ƒ7 3]} , {[2 ƒ7 3] ⊕7 [4 ƒ7 5]} = ({3 ⊕7 2} , {6 ⊕7 6})

=!(5!,!5)

Table VI Extension field arithmetic operations decomposed as base field operations for a

quadratic nonresidue of q

Operation Result

(a,b) ⊕p (c,d) ( [a ⊕p c] , [b ⊕p d] )

(a,b) ƒp (c,d) ({[a ƒp c] ⊕p [q ƒp bƒp d]} , {[a ƒp d] ⊕p [b ƒp c]})

There are perhaps two questions uppermost in our minds right now: why does such a structure

form a field, and why are such fields useful in implementing NTT's ?

To answer the first question we need to explore the existence of all the field axioms. We do not

offer a formal proof here except to point out that we need to prove; closure, the rules associated with

the operations of addition and multiplication, identities, and the existence of the inverse of every

element. The reason such fields are useful is two-fold. Firstly the fields have pd elements, compared
to the p elements of the base field, this means that transform lengths NÙ(pd - 1), as was pointed out

Page 39
Number Theoretic Techniques in Digital Signal Processing

earlier, and so the algebraic link between transform length and modulus of implementation has been

relaxed.

To answer the second question, in the case of second degree extension fields based on non-

quadratic residues of -1, the field provides a natural system for implementing convolution between

sequences of complex numbers. The form of a number theoretic transform over extension fields is

identical to that of a base field transform, with the exception that the elements are n-tuples, and that

the arithmetic follows the rules of polynomial addition and multiplication with reduction by the

irreducible polynomial. Details of complex NTT's can be found in Vanwormhoud (1978) and

Baraniecka (1980).

In order to reinforce the ideas presented above, let us take the previous example of convolution, but

this time we will compute the transform over GF(31); note that 31 is a Mersenne prime, and of form

4K+3 which supports the quadratic nonresidue of -1. First we need to determine the possible
transform lengths. The length, NÙ(312 - 1) or NÙ960 and clearly N = 8 is a valid transform

length. We can determine suitable generators for an 8-element multiplicative cyclic group by

performing a linear search. In general, the search for an N'th order group over a field GF(p2) can be

facilitated by noting that (Baraniecka (1980)):

N
i) We need to evaluate only N/2 powers of a, because a 2 ≡ -1 (mod p)

ii) We only need to find an element -


a whose order is KN where K is any positive integer,

since a = -
a K.

A suitable generator is a = (4, 4 ); this can be demonstrated by generating the series

{a,!a2,!.....!,a8}. The series is generated in Table VII, below, with the two-tuple of each element in

the series placed vertically.

Table VII. 8th Order multiplicative group over GF(312)

Page 40
Number Theoretic Techniques in Digital Signal Processing

a1 a2 a3 a4 a5 a6 a7 a8

4 0 27 30 27 0 4 1

4 1 4 0 27 30 27 0

We note that a8 = (1,0) and that a n ≠ (1,0), n<8, which satisfies the criterion for the multiplicative

group. The convolution is shown below.

We start with the forward transform, the elements are shown as 2-tuples; the symbol Q31 indicates
matrix multiplication over the extension field GF(312).

Ê 1024 270 ˆ Ê 11 0
0
1 0
4 4
1 0
0 1
1 0
27 4
1 0
30 0
1 0
27 27
1 0
0 30
1 0
4 27
ˆ Ê 12 00 ˆ
Á 29 29 ˜ Á1 0 0 1 30 0 0 30 1 0 0 1 30 0 0 30 ˜ Á3 0˜
Á 299 210 ˜ = Á1
1 0
0
27 4
30 0
0 30
1 0
4 4
30 0
30 0
1 0
4 27
30 0
0 1
1 0
27 27
30 0
˜ Q 31 Á 0 0 ˜
4 0

Á 9 10 ˜ Á1 0 27 27 0 1 4 27 30 0 4 4 0 30 27 4 ˜ Á0 0˜
Ë 2924 24 ¯ Ë 11 0
0
0 30
4 27
30 0
0 30
0 1
27 27
1 0
30 0
0 30
27 4
30 0
0 1
0 1
4 4
¯ Ë 00 00 ¯
The input to the inverse calculation is the transform domain point-wise multiplication shown below.

Ê 72 0
ˆ Ê 1024 0
ˆ Ê 1024 0
ˆ
Á 0 25
8
˜ Á 29 27
29
˜ Á 29 27
29
˜
Á 12 6 ˜ Á 9 21 ˜ Á 9 21 ˜
ƒ 31 Á 29
Á 4 0 ˜ =
Á 29 0 ˜ 0 ˜
Á 120 25
23
˜ Á 299 10
2
˜ Á 299 10
2
˜
Ë 2 6 ¯ Ë 24 4 ¯ Ë 24 4 ¯
The inverse transform is computed as below.

Page 41
Number Theoretic Techniques in Digital Signal Processing

Ê 81 00 ˆ Ê 11 0
0
1 0
4 27
1 0
0 30
1 0
27 27
1 0
30 0
1 0
27 4
1 0
0 1
1 0
4 4
ˆ Ê 72 250 ˆ
Á 18 0 ˜ Á1 0 0 30 30 0 0 1 1 0 0 30 30 0 0 1 ˜ Á0 8 ˜
Á 145 00 ˜ = Á1
1 0
0
27 27
30 0
0 1
1 0
4 27
30 0
30 0
1 0
4 4
30 0
0 30
1 0
27 4
30 0
˜ Q 31 Á 4 0 ˜
12 6

Á 6 0˜ Á1 0 27 4 0 30 4 4 30 0 4 27 0 1 27 27 ˜ Á 12 25 ˜
Ë 40 00 ¯ Ë 11 0
0
0 1
4 4
30 0
0 1
0 30
27 4
1 0
30 0
0 1
27 27
30 0
0 30
0 30
4 27
¯ Ë 02 236 ¯
with the final reduction by the multiplicative inverse of 8: 8-1 = 4.

Ê 14 00 ˆ Ê 81 00 ˆ
Á 10 0 ˜ Á 18 0 ˜
Á 2025 00 ˜ = 8 -1 ƒ31 Á 14 0 ˜
5 0

Á 24 0 ˜ Á 6 0˜
Ë 160 00 ¯ Ë 40 00 ¯
We can note similarities between the calculation over GF(312) and the DFT calculation. The

transform matrices for both the forward and reverse transform have the same symmetry; if numbers

close to 31 are replaced by their negative equivalent, the similarity is easy to see. We notice the

same Hermitian symmetry in the forward transform domain as the DFT example, and, because the

modulus is larger than the largest convolution output sample, the convolution is determined exactly

rather than with least positive residues for some of the samples, as was the case with the calculation

over GF(17). Note also that convolution over GF(31) only yields a maximum transform length of

30 against the maximum transform length of 960 for GF(312). We note that the largest power of 2

transform length for GF(31) is 2 whereas the largest power of 2 transform length for GF(312) is

64. This gives a dramatic example of the usefulness of extension fields for relaxing the relationship

between dynamic range and transform length. We can find, quite easily, the maximum power of two

transform length for an NTT defined over GF(p2), p of form 4K+3. We simply have to express the

prime in a different way. This is presented in the following theorem (Baraniecka (1980)).

Page 42
Number Theoretic Techniques in Digital Signal Processing

Theorem 2

Given a prime p = 4K+3 = q.2r -1 where r is any positive integer and (q, 2) = 1; an NTT defined

over GF(p2) can have a transform length N = 2B where the maximum value of B is r+1.

Proof

The order of the cyclic subgroup used to define the NTT has to divide p2-1. Therefore NÙ(q222r -

q2r+1). Since q is odd then NÙy2r+1 where y is odd. Therefore we can always find an N = 2B,

where the maximum value of B = r+1.


*

We can try this out in the previous example. p = 31 = 1 . 25 - 1; r = 5 and so the maximum

transform length is N = 26 = 64, which checks with our calculation from p2-1 = 312 -1 = 960.

Since one of the reasons for using extension fields is the ability to handle larger transform lengths,

particularly transforms of maximum length N = 2B, then the above theorem is important from an

implementation perspective.

We can always convolve 2 'real' sequences with a third 'real' sequence using the complex extension

field transform, by creating a complex sequence consisting of elements comprised of a sample from
each of the sequences; thus if the 2 sequences are {xn} and {yn} then the sequence to be convolved

is the complex sequence {xn, yn}. This works since the sequence being convolved with has the

form {xn, 0}, which operates on the two parts of the complex sequence independently. This also

works for DFT indirect convolution. The 2 juxtaposed sequences do not have to be from separate

sources, they can be from the same source at different sampling times. We can illustrate this below

using the same NTT as in the previous example. We will assume that all three sequences are the

example sequence {1, 2, 3, 4, 0, 0, 0, 0}.

The forward transform is computed as below:

Page 43
Number Theoretic Techniques in Digital Signal Processing

Ê 1028 1020 ˆ Ê 11 0
0
1 0
4 4
1 0
0 1
1 0
27 4
1 0
30 0
1 0
27 27
1 0
0 30
1 0
4 27
ˆ Ê 12 12 ˆ
Á 0 27 ˜ Á1 0 0 1 30 0 0 30 1 0 0 1 30 0 0 30 ˜ Á3 3˜
Á 1929 3029 ˜ = Á
1
1
0
0
27 4
30 0
0 30
1 0
4 4
30 0
30 0
1 0
4 27
30 0
0 1
1 0
27 27
30 0
˜ Q 31 Á 0 0 ˜
4 4

Á 30 19 ˜ Á1 0 27 27 0 1 4 27 30 0 4 4 0 30 27 4 ˜ Á0 0˜
Ë 2720 280 ¯ Ë 11 0
0
0 30
4 27
30 0
0 30
0 1
27 27
1 0
30 0
0 30
27 4
30 0
0 1
0 1
4 4
¯ Ë 00 00 ¯
The input to the inverse calculation is the transform domain multiplication shown below. Note that

one of the vectors is taken from the previous convolution example transform domain output
representing the transform domain output of the sequence {(xn,!0)}.

Ê 78 7
ˆ Ê 1028 10
ˆ Ê 1024 0
ˆ
Á 23 27
8
˜ Á 0 20
27
˜ Á 29 27
29
˜
Á 6 18 ˜ Á 19 30 ˜ Á 9 21 ˜
ƒ 31 Á 29
Á 4 4 ˜ =
Á 29 29 ˜ 0 ˜
Á 188 6
˜ Á 3027 19
˜ Á 299 10
˜
Ë 27 23
8
¯ Ë 20 0
28
¯ Ë 24 2
4 ¯
The inverse transform is computed as below.

Ê 81 81 ˆ Ê 11 0
0
1 0
4 27
1 0
0 30
1 0
27 27
1 0
30 0
1 0
27 4
1 0
0 1
1 0
4 4
ˆ Ê 78 277 ˆ
Á 18 18 ˜ Á1 0 0 30 30 0 0 1 1 0 0 30 30 0 0 1 ˜ Á 23 8 ˜
Á 145 145 ˜ = Á1
1 0
0
27 27
30 0
0 1
1 0
4 27
30 0
30 0
1 0
4 4
30 0
0 30
1 0
27 4
30 0
˜ Q 31 Á 4 4 ˜
6 18

Á6 6 ˜ Á1 0 27 4 0 30 4 4 30 0 4 27 0 1 27 27 ˜ Á 18 6 ˜
Ë 40 40 ¯ Ë 11 0
0
0 1
4 4
30 0
0 1
0 30
27 4
1 0
30 0
0 1
27 27
30 0
0 30
0 30
4 27
¯ Ë 278 238 ¯

Page 44
Number Theoretic Techniques in Digital Signal Processing

Ê 14 14 ˆ Ê 81 81 ˆ
Á 10 10 ˜ Á 18 18 ˜
Á 2025 2025 ˜ = 8 -1 ƒ31 Á 14 14 ˜
5 5

Á 24 24 ˜ Á6 6 ˜
Ë 160 160 ¯ Ë 40 40 ¯

Note that both the elements in the 2-tuple output sequence produce the correct convolved sequence.

In general the juxtaposed sequence will be from different sequences or from the same sequence

sampled at different times.

One possible problem is the fact that the generator is no longer of a simple form; in fact we have

two elements to manipulate in the generator. Using the rules of the extension field arithmetic, we

have increased the additive complexity by 2 and the multiplicative complexity by more than 4 (4

multiplications and 2 additions).

Let us examine the form of the generator for both types of prime field moduli. Although primes of

the form p = 4K+3 allow complex arithmetic rules for second degree extension fields, GF(p2), we

cannot find a generator for a multiplicative group of order N = 2B (where B is maximum) that has a

simple form (i.e. only one element). This has been shown by Baraniecka (1980), which we will

express in the following theorem.

Theorem 3

The generator, a, of the multiplicative group GF(p2) - {0}, of order N = 2B, has to have the general

polynomial form, a = (g, b) with g, b ≠ 0 for B = r+1, where p!=!q.2r!-1, q odd.

Proof

Assume b = 0, then a has maximum multiplicative order p-1 < 2p+1; hence b!≠!0. Let a = (g, b) =

(c + d d) q have order 2r+1, where x2 - d = 0 is the irreducible polynomial used to build the

Page 45
Number Theoretic Techniques in Digital Signal Processing

extension field. Then (c + d d) q2r = -1, since aError! ))!=!-1. The above can be written as: (c +
p-1
d d) . (c + d d) q2r-1 = -1. If we employ the binomial theorem and the property d 2 = -1

(Vinogradov (1954)), hence (! d)


p !=!-

Error!
*

We now examine primes of the form p = 4K + 1. We can represent such primes in another way:

p!= s.2t + 1, where s is odd. The largest multiplicative group, with a power-of-2 order, has an

order N = 2t+1. Baraniecka (1980) has shown, for such primes, that the generator can always be

simplified to (0,a), a single non-zero element form. The following theorem generates this result:

Theorem 4

Let p = s.2t + 1, (s, 2) = 1, be an odd prime number. Then:

i) If g is a generator for the multiplicative group GF(p) - {0}, then x2!-!g, is an irreducible

polynomial in GF(p) [x].

ii) If g is as in i), then g has multiplicative order s.2t+1 in GF(p2), where GF(p2)!=!{a!+!b g
: a, b ΠGF(p)}.

iii) We can find a generator r , of a cyclic subgroup of order 2t+1 in GF(p2), where r!=!grs
with (r, 2) = 1 and x2!-!r an irreducible polynomial in GF(p) [x].

Proof:

i) Suppose x2!-!g is reducible. Then there exists d ΠGF(p2) such that d2 = g. From Fermat's
È p-1˘ p-1
Î ˚
theorem (Dudley (1969)) dp-1 = 1 = d2 2 = g 2 = -1. This is not the case for an odd prime,

hence x2!-!g is irreducible.

Page 46
Number Theoretic Techniques in Digital Signal Processing

t+1 t
ii) ( g )s.2 = gs.2 = 1. We now have to prove that s.2t+1 is the smallest order of g .
Suppose ( g )v = 1 for some 1 ≤ v < s.2t+1, then if 2Ùv, so that v!=!2u, 1!≤ u < s.2t, we have ( g

)2u = gu = 1 which is impossible. If

Error!

t+1
iii) From ii), ( g )s.2 = 1, therefore ( g )s has order 2t+1. Since s is odd, ( g )s !œ!GF(p)

and so there exists an irreducible polynomial, x2!-!gs. We also know (Dudley (1969)) that for some
r, such that (r, 2t+1) = 1, ( g )rs has order 2t+1. It is clear (from the prime factor decomposition

of 2t+1) that we only need to show (r, 2) = 1. We now set r = grs. Since rs is odd, x2!-!r is an

irreducible polynomial in GF(p) [x].

The theorem is important because it not only tells us that a simple generator form exists, but also

how to find it! This is not the case for the other form of primes. We have, of course, only gained a

doubling of the transform length from that offered by the base field, but we should not lose any of

the efficiency inherent in using extension fields for p!=!4K!+!3, since a 'complex' multiplication by

the elements of the transform matrix only requires 2 'real' multiplications (plus a 'real' addition). We
must note here that in producing the first element of the 2-tuple (when the power of a has a zero

2nd element) a multiplication by a constant, r, is required in the calculation. It is generally assumed

that multiplication by constants is free, and although this seems a strange statement to make, we will

demonstrate this fact in the section on hardware implementation.

The rules of arithmetic used in GF(p2), p = 4K + 3, are shown in Table VIII below, where j!=! r .

Table VIII. Arithmetic rules for GF(p2), p = 4K + 3

Operation Resulting Polynomial Reduction Result

(a + j b) + (c + j) d) j (b + d) + (a + c) j!(b!+!d)!!+!(a!+!c) (a + c) + j(b + d)
j2 !+!r

Page 47
Number Theoretic Techniques in Digital Signal Processing

(a + j b) * (c + j d) j2(b*d) + j (a*d+ j2(b*d) + j(a*d + (a*c + r* b*d)

b*c) + (a*c) b*c) + j (a*d + b*c)

!!!!+!(a*c)!!!!!!
j2 !+!r

We see the constant r in the multiplication result for the first element of the resulting 2-tuple.

Let us take an example to see how the transform over this new type of field modulus works; we can

find a suitable prime from tabulated data such as Abramowitz and Stegun (1968). We can find the

tables under the section on combinational analysis, in which a table of least positive and negative

primitive roots, along with the factorization of p-1, for p a prime, is given. Our task is to find a

prime, p, such that the factorization of p-1 has a power-of-2 as one of the factors. As an example we

will take p = 29 for which p-1 = 22.7. This means that a transform of order 8 is available over the

extension field. Using the least positive primitive root from the tables, g = 2, and so 2 will

generate a multiplicative sub-group of order 23.7 = 56. The generator of the multiplicative group of
order 8 is given by a = r = ( 2 )7r where (r, 2) = 1. Mapping to indices, we are searching for an
element r subject to the constraint ind2r = 7r. The only indices which satisfy this constraint are

7!x!1!=!7 and 7!x!3 = 21. We can compute the associated values of r from r = 2ind2r . We therefore
obtain r = 12 and r = 17 as the only two values of r. These yield four values of a: a = 12 = 8 2

= (0, 8) = 21 2 =!(0,!21) and a = 17 = 9 2 = (0, 9) = 20 2 = (0, 20). The elements of the

multiplicative sub-group generated by each of these generators are shown in Table IX below. The

elements are arranged vertically with the first element of the 2-tuple above the second.

Table IX. Multiplicative sub-groups over GF(292)

0 12 0 28 0 17 0 1
8 0 9 0 21 0 20 0

0 17 0 28 0 12 0 1

Page 48
Number Theoretic Techniques in Digital Signal Processing

9 0 8 0 20 0 21 0

0 17 0 28 0 12 0 1
20 0 21 0 9 0 8 0

0 12 0 28 0 17 0 1
21 0 20 0 8 0 9 0

We note from the table that the lower two sub-groups are the inverse of the upper two sub-groups
in that s!x!t = (1, 0) , where s,t are elements of the respective groups. In forming a transform we

can use one of the sub-groups to generate the transform matrix for the forward transform, and the

other to generate the matrix for the inverse transform. This is demonstrated below using the same

convolution example as used earlier, with the first and third sub-groups used to generate the

forward and inverse transforms, respectively.

The forward transform is computed as below:

Ê 1025 102 ˆ Ê 11 0
0
1 0
0 8
1 0
12 0
1 0
0 9
1 0
28 0
1 0
0 21
1 0
17 0
1 0
0 20
ˆ Ê 12 12 ˆ
Á3 3 ˜ Á1 0 12 0 28 0 17 0 1 0 12 0 28 0 17 0 ˜ Á3 3˜
Á 277 1527 ˜ = Á
1
1
0
0
0 9
28 0
17 0
1 0
0 8
28 0
28 0
1 0
0 20
28 0
12 0
1 0
0 21
28 0
˜ Q 29 Á 40 40 ˜
Á 20 14 ˜ Á1 0 0 21 12 0 0 20 28 0 0 8 17 0 0 9 ˜ Á0 0˜
Ë 2210 222 ¯ Ë 11 0
0
17 0
0 20
28 0
17 0
12 0
0 21
1 0
28 0
17 0
0 9
28 0
12 0
12 0
0 8
¯ Ë 00 00 ¯
The input to the inverse calculation is the transform domain multiplication shown below. The

rightmost vector was generated from the forward transform of the sequence:

{1,0 2,0 3,0 4,0 0,0 0,0 0,0 0,0 }.

Page 49
Number Theoretic Techniques in Digital Signal Processing

Ê 132 13
ˆ Ê 1025 10
ˆ Ê 108 0
ˆ
Á 9 11
9
˜ Á 3 2
3
˜ Á 3 23
0
˜
Á 8 28 ˜ Á 7 15 ˜ Á ˜
ƒ 29 Á 23 21
Á 4 4 ˜ =
Á 27 27 ˜ 27 0 ˜
Á 209 0
˜ Á 2022 14
˜ Á 228 6
˜
Ë 1 20
10
¯ Ë 10 22
2
¯ Ë 23 0
8
¯
The inverse transform is computed as below.

Ê 83 83 ˆ Ê 11 0
0
1 0
0 20
1 0
17 0
1 0
0 21
1 0
28 0
1 0
0 9
1 0
12 0
1 0
0 8
ˆ Ê 132 1311 ˆ
Á 22 22 ˜ Á1 0 17 0 28 0 12 0 1 0 17 0 28 0 12 0 ˜ Á9 9 ˜
Á 1526 1526 ˜ = Á
1
1
0
0
0 21
28 0
12 0
1 0
0 20
28 0
28 0
1 0
0 8
28 0
17 0
1 0
0 9
28 0
˜ Q 29 Á 84 284 ˜
Á 18 18 ˜ Á1 0 0 9 17 0 0 8 28 0 0 20 12 0 0 21 ˜ Á9 0 ˜
Ë 120 120 ¯ Ë 11 0
0
12 0
0 8
28 0
12 0
17 0
0 9
1 0
28 0
12 0
0 21
28 0
17 0
17 0
0 20
¯ Ë 201 2010 ¯
With the final multiplication:

Ê 14 14 ˆ Ê 83 83 ˆ
Á 10 10 ˜ Á 22 22 ˜
Á 2025 2025 ˜ = 8 -1 ƒ29 Á 15 15
26 26
˜
Á 24 24 ˜ Á 18 18 ˜
Ë 160 160 ¯ Ë 120 120 ¯
We can see that each element of the transform matrix, for both the forward and reverse transform, is

of the simple form, but the zero generator element is exchanged between the first and second place

in the 2-tuple for adjacent matrix elements. It appears as though we may need to multiplex the

arithmetic hardware between the adjacent inner product partial product calculations in the transform

matrix/vector multiplication for both the forward and reverse transforms. An interesting result

Page 50
Number Theoretic Techniques in Digital Signal Processing

occurs if we consider implementing the transform using a fast algorithm of the type discussed in

section III, A.

Consider the decimation-in-time algorithm, as shown in the 8-point transform flow graph of Fig 2.
Only in the final stage do we require multiplications by odd powers of a, the other stages only use

even powers of a and for these we only require a real multiplication. Thus the hardware only

changes at the final stage.

We will illustrate the operation of the fast algorithm on the previous convolution example using the

8-point DIT fast algorithm to compute the transforms. The forward transform uses the 'twiddle

factors' shown in Table Xa below, and the inverse transform uses the 'twiddle factors' in Table Xb;

the 2-tuple is arranged vertically.

Table Xa Table Xb

n 1 2 3 n 1 2 3
n 0 12 0 n 0 17 0
W8 W8
8 0 9 20 0 21

The Fast NTT (FNTT) is displayed below in columns of 2-tuples, within vertical lines, with bold

face italics indicating the result after multiplication. If no twiddle factor is present in the signal flow

graph of Fig 2, a multiplier of unity is used for consistency, and the italics are normal face. Since

this is a numerical equivalent of the flow graph of Fig 2, the operation proceeds from left to right,

rather than the reverse for the matrix multiplication notation. The forward transform is shown below

with the standard input sequence {(1,1), (2,2), (3,3), (4,4), (0,0), (0,0), (0,0), (0,0)}.

1 1 1 1 1 1 1 1 4 4 4 4 10 10 10 10
2 2 2 2 2 2 2 2 6 6 6 6 27 27 25 2
3 3 3 3 3 3 3 3 27 27 27 27 3 3 3 3

Page 51
Number Theoretic Techniques in Digital Signal Processing

4 4 4 4 4 4 4 4 27 27 5 5 22 22 7 15
0 0 0 0 1 1 1 1 8 8 8 8 25 2 27 27
0 0 0 0 2 2 2 2 21 21 17 23 20 14 20 14
0 0 0 0 3 3 7 7 23 23 23 23 7 15 22 22
0 0 0 0 4 4 19 19 12 12 13 21 10 2 10 2

Input Multiply 1st Stage Multiply 2nd Stage Multiply 3rd Stage Shuffle
Butterfly Butterfly Butterfly

The output of the transform is multiplied with the transform of the convolving sequence: {1,0 2,0

3,0 4,0 0,0 0,0 0,0 0,0}.

10 10 10 0 13 13
25 2 8 23 2 11
3 3 3 0 9 9
7 15 23 21 8 28
27 27 x 27 0 fi 4 4
20 14 8 6 9 0
22 22 22 0 20 20
10 2 23 8 1 10

The result of the multiplication is now used as the input to the inverse transform:

13 13 13 13 17 17 17 17 17 17 17 17 8 8 8 8

2 11 2 11 11 11 11 11 20 20 20 20 26 26 3 3

9 9 9 9 0 0 0 0 17 17 17 17 22 22 22 22

8 28 8 28 9 9 9 9 2 2 5 5 12 12 15 15

4 4 4 4 9 9 9 9 25 25 25 25 3 3 26 26

9 0 9 0 22 11 22 11 25 27 7 7 18 18 18 18

Page 52
Number Theoretic Techniques in Digital Signal Processing

20 20 20 20 18 18 16 16 22 22 22 22 15 15 12 12

1 10 1 10 7 18 3 16 19 24 22 22 0 0 0 0

Input Multiply 1st Stage Multiply 2nd Stage Multiply 3rd Stage Shuffle
Butterfly Butterfly Butterfly

and the result is finally multiplied by 8-1 = 11.

8 8 1 1

3 3 4 4

22 22 10 10

15 15 x 8-1 20 20

26 26 fi 25 25

18 18 24 24

12 12 16 16

0 0 0 0

D. Quadratic Residue Rings

While we are on the subject of implementation simplicities, it will be useful to examine some work

by Nussbaumer (1976a) on Fermat Number Transforms. Very often it is required to be able to

convolve complex sequences, rather than the real sequences that have been the subject of our

discussion so far. We have already seen that extension fields, GF(p2), where p!=!4K+3, can

support finite field equivalents of arithmetic operations over the complex numbers. The only

Page 53
Number Theoretic Techniques in Digital Signal Processing

problem is that we have to perform these operations over the base field (the same way that complex

arithmetic is computed over the reals) and this involves some unfortunate complexities with regard

to the implementation of multiplication. Addition is component-wise, but multiplication involves a

'cross-coupling' of the parts of the 2-tuple defining the field elements, including the requirement for

four base field multiplications and 2 base field additions, as was discussed earlier in the definition

of second degree extension fields.

Nussbaumer discovered a simplification in this requirement based on what would initially appear to

be a wrong way of doing things. He considered the case of Fermat primes, where the prime is of the

form p = 4K + 1. This prime will not support a 'complex' extension field, since -1 is a quadratic
residue; we can, however, form a ring of 2-tuples, CR(Ft), where the ring operators take the form of

complex addition and complex multiplication on the elements of the 2-tuples, but all base operations
are carried out modulo Ft. What Nussbaumer did was to effectively define an extension ring (he did

not call it that) which is isomorphic to this complex ring, but where the operations of addition and
multiplication are both component-wise. We will define this Quadratic Residue ring as QR(Ft) = {

S : ⊕, ƒ }, where the set S consists of elements that are 2-tuples, a = (a°, a*), a!Œ S, and the

operations, ⊕, and ƒ are both component-wise. We can map the results of this ring to CR(Ft) in

the following way:

Let a = a r + ia i; a r,!aiΠGF(Ft), a ΠCR(Ft), then we define the mapping


t
a°!=!ar!⊕F t!(j!ƒF t !a i) and a*!=!ar ⊕F t (-j ƒF t a i), where j!=! -1 = 2 2 . Clearly

a°,!a*!Œ!GF(Ft). The inverse mapping can be found to be a r!= 2-1ƒ(a°!⊕ a*) and a i!=!2-1ƒj-
1ƒ(a°!⊕ (-a*)) (we have dropped the subscript, F , for
t clarity). Let b!=!(b°,!b*), map to b = b r +

ibi, apply point-wise addition and multiplication to a and b, and confirm the mapping to CR(Ft):

Addition

D!(a°⊕b° , a*⊕b*)!
c = a!⊕!b!=

Page 54
Number Theoretic Techniques in Digital Signal Processing

c° = a°⊕b° = ar!⊕!(jƒai)!⊕!br!⊕!(jƒbi)!= !ar!⊕!br! ⊕!!jƒ(ai!⊕ bi)

c* = a*⊕b* = ar!⊕!(-jƒai)!⊕!br!⊕!(-jƒbi)!= !ar!⊕!br! ⊕!!-jƒ(ai!⊕ bi)

Now apply the inverse mapping to the result over CR(Ft), g = gr + ig i:

gr = 2-1ƒ(c°!⊕ c*) = 2-1ƒ(ar!⊕!br! ⊕!!ar!⊕!br) = ar!⊕!br

gi = 2-1ƒj-1ƒ(c°!⊕ c*) = 2-1ƒj-1ƒ[ jƒ(ai!⊕!bi!⊕!ai!⊕!bi)] = ai!⊕!bi.

This verifies the mapping for addition.

Multiplication

D!(a°ƒb° , a*ƒb*)!
c = a!ƒ!b!=

c° = a°ƒb° = [ a r!⊕!(jƒai)] !ƒ![ b r!⊕!(jƒbi)] !=

![ (arƒbr)!! ⊕!-(aiƒbi)] ⊕ j[ (arƒbi)!! ⊕!(aiƒbr)]

c* = a*ƒb* = [ a r!⊕!(-jƒai)] !ƒ![ b r!⊕!(-jƒbi)] != !

![ (arƒbr)!! ⊕!-(aiƒbi)] ⊕ -j[ (arƒbi)!! ⊕!(aiƒbr)]

Now apply the inverse mapping to the result over CR(Ft), g = gr + ig i:

gr = 2-1ƒ(c°!⊕ c*) = 2-1 ƒ 2 ƒ[ (arƒbr)!!⊕!-(aiƒbi)] = [ (arƒbr)!! ⊕!-(aiƒbi)]

gi = 2-1ƒj-1ƒ(c°!⊕ c*) = 2-1ƒj-1ƒ j ƒ [ (arƒbi)!!⊕!(aiƒbr)] =[ (arƒbi)!!⊕!(aiƒbr)]

This verifies the mapping for multiplication.

For notational convenience we will refer to the a° term as the normal element and the a* term as the

conjugate element.

Page 55
Number Theoretic Techniques in Digital Signal Processing

It might be useful to reinforce the idea behind this quadratic residue mapping with a simple scalar

example, using GF(17). The parameters to be used are j!=!22!=!4; j-1 = 13; 2-1!=!9.

Let a = (13, 5), b = (9, 11). Computing over the ring, CR(17), yields:

a!⊕!b!=!(13⊕9,!5⊕11) = (5, 16) and a!ƒ!b!=![(13ƒ9)⊕ -(5ƒ11),!(13ƒ11)⊕(5ƒ9)] = (11,!1).

Now let us compare performing this calculation using the QR mapping:

QR(17)
a = (13, 5) fi a = (13 ⊕ (4ƒ5), 13 ⊕ -(4ƒ5)) = (16, 10)

QR(17)
b = (9, 11) fi b = (9 ⊕ (4ƒ11), 9 ⊕ -(4ƒ11)) = (2, 16)

Now we compute the addition and multiplication over QR(17), component-wise:

a!⊕!b = (16⊕2,!10⊕16) = (1, 9); a!ƒ!b = (16ƒ2,!10ƒ16) = (15, 7)

Finally we map back to CR(17):

CR(17)
(1, 9) fi [9ƒ(1⊕9),!9ƒ13ƒ(1⊕ -9)] = (5, 16)

CR(17)
(15, 7) fi [9ƒ(15⊕7),!9ƒ13ƒ(15⊕ -7)] = (11, 1)

and we see that we obtain the same result as the direct complex calculation over CR(17).

The method seems involved, in that there are 3 distinct steps involved. Note, however, that if the

middle step (the component-wise calculation) contains many multiplications and additions, that the

overhead associated with steps 1 and 3 can be relatively small. We therefore concentrate on the

features of step 2, bearing this assumption in mind. There are two main features to the calculation
over QR(Ft):

1) The multiplication operation requires 2 base field multiplications compared to 4 base field
multiplications and 2 base field additions for the calculation over CR(Ft).

Page 56
Number Theoretic Techniques in Digital Signal Processing

2) The calculations can be carried out without any interaction between the normal and

conjugate channels.

The first feature has advantages in minimizing implementation hardware, the second feature

provides advantages in testing and fault tolerance of a complex sequence processor.

A more comprehensive example is probably in order at this point. We will consider the complex

convolution of two 4-point sequences over GF(17). The sequences and their convolution sum are
shown in equation (16) below, where the symbol, O
* , represents the cyclic convolution operator:

Ê 15 4
ˆ Ê1 3
ˆ Ê1 1
ˆ
Á 15 8 ˜ Á2 2 ˜ Á1 1 ˜
= O
* Á0 0 ˜
(16)
Á2 8 ˜ Á3 1 ˜
Ë2 ¯ Ë0 ¯ Ë0 0
¯
4 0

We will now consider the computation of this convolution using the FNTT defined over GF(17).

We can compute this by transforming each real and imaginary sequence of both inputs separately.

We then multiply these transformed sequences, using the rules of complex arithmetic, and invert the

real and imaginary sequences of this result separately. The inverted two sequences will be the real

and imaginary components of the convolution sum (Nussbaumer (1976a)).

The transform of the real component of the first input sequence is shown in equation (17); Q17
represents matrix multiplication over GF(17).

Ê6ˆ Ê1 1 1 1
ˆ Ê1ˆ
Á7˜ Á1 13 16 4 ˜ Á2˜
Á2˜ Á1 Q17
16 ˜ Á3˜
= (17)
16 1
Ë6¯ Ë1 4 16 13
¯ Ë0¯

Page 57
Number Theoretic Techniques in Digital Signal Processing

The transformed sequences and their multiplication, using the rules of complex arithmetic, are

shown in equation (18).

Ê0 7
ˆ Ê6 ˆ6
Ê2 2
ˆ
Á 12 14 ˜ Á7 11 ˜ Á 14 14 ˜

Á0 ƒ17
0 ˜ Á2 2 ˜ Á0 0 ˜
= (18)
Ë 14 12
¯ Ë6 10
¯ Ë5 5
¯
We now invert the real and imaginary parts of the result as separate sequences. The inverse

transform for the real component is shown in equation (19).

Ê9ˆ Ê1 1 1 1
ˆ Ê0ˆ
Á9˜ Á1 4 16 13 ˜ Á 12 ˜
Á8˜ Á1 Q17 Á 0 ˜
16 ˜
= (19)
16 1
Ë8¯ Ë1 13 16 4
¯ Ë 14 ¯
The final sequence is the combination of the inverse of the real and imaginary components of the

transform domain multiplication (shown in equation (20) with the final multiplication by 4-1 =!13).

Ê 15 4
ˆ Ê9 16
ˆ
Á 15 8 ˜ Á9 15 ˜

Á2 4-1 ƒ17
8 ˜ Á8 15 ˜
= (20)
Ë2 4
¯ Ë8 16
¯
The final result agrees with the original convolution sum in equation (16).

Now we have to compute the convolution using the mapping to QR(17). First we perform the

mapping, as shown below:

Page 58
Number Theoretic Techniques in Digital Signal Processing

Ê1 3
ˆ Ê 13 6
ˆ
Á2 2 ˜ QR(17) Á 10 11 ˜

Á3 1 ˜
fi
Á7 16 ˜
Ë0 0
¯ Ë0 0
¯

Ê1 1
ˆ Ê5 14
ˆ
Á1 1 ˜ QR(17) Á 5 14 ˜
Á0 0 ˜
fi
Á0 0 ˜
Ë0 0
¯ Ë0 0
¯
Now we compute the convolution for the normal and conjugate sequences separately. The

computation for the normal sequence is shown below:

Ê 13 ˆ Ê 13 ˆ Ê5ˆ Ê 10 ˆ
Á 10 ˜ FNTT(17) Á 0 ˜ Á 5 ˜ FNTT(17) Á 2 ˜
Á 7 ˜ fi Á 10 ˜ Á 0 ˜ fi Á 0 ˜
Ë0¯ Ë 12 ¯ Ë0¯ Ë8¯

Ê 11 ˆ Ê 13 ˆ Ê 10 ˆ Ê 11 ˆ Ê 14 ˆ
Á0˜ Á0˜ Á2˜ Á0˜ IFNTT(17)
Á 13 ˜
Á0˜ =
Á 10 ˜ ƒ17 Á 0 ˜ Á0˜ fi
Á0˜
Ë 11 ¯ Ë 12 ¯ Ë8¯ Ë 11 ¯ Ë1¯

The computation for the conjugate sequence is shown below:

Page 59
Number Theoretic Techniques in Digital Signal Processing

Ê6ˆ Ê 16 ˆ Ê 14 ˆ Ê 11 ˆ
Á 11 ˜ FNTT(17) Á 14 ˜ Á 14 ˜ FNTT(17) Á 9 ˜
Á 16 ˜ fi Á 11 ˜ Á 0 ˜ fi Á 0 ˜
Ë0¯ Ë0¯ Ë0¯ Ë2¯

Ê6ˆ Ê 16 ˆ Ê 11 ˆ Ê6ˆ Ê 16 ˆ
Á7˜ Á 14 ˜ Á9˜ Á7˜ IFNTT(17)
Á0˜
Á0˜ =
Á 11 ˜ ƒ17 Á 0 ˜ Á0˜ fi
Á4˜
Ë0¯ Ë0¯ Ë2¯ Ë0¯ Ë3¯
Now all it remains for us to do is to compute the inverse mapping from the quadratic residue ring:

Ê 14 16
ˆ Ê 15 4
ˆ
Á 13 0 ˜ !!CR(17) Á 15 8 ˜

Á0 4 ˜ fi
Á2 8 ˜
Ë1 3
¯ Ë2 4
¯

The result matches that obtained directly and using the complex ring transform method. Note the

complete independence of the normal and conjugate calculations once the mapping to the quadratic

ring is performed. We are able to compute the forward transforms, transform domain multiplication

and inverse transform without invoking any interaction between normal and conjugate channels.

Although the quadratic residue ring has been built from Galois Fields based on Fermat numbers,

the only requirement that such a ring exist is that the base field have a prime modulus of the form

p!= 4K+1, since we have already discovered that such a field contains a quadratic residue for -1.

The ramifications of computations over quadratic rings goes beyond the simplification of

convolution via Fermat NTT's, and it seems that Nussbaumer did not realize the importance of this

finding at the time. We will return to quadratic rings later.

Page 60
Number Theoretic Techniques in Digital Signal Processing

E. Multi-Dimensional Mapping

Another technique used to remove the tight coupling of transform length with the algebraic

properties of the number of elements in the field, is that of multi-dimensional mapping. This was

first pointed out by Rader (1972b) and expanded upon by Agarwal and Burrus (1974). The basic

concept is that a one-dimensional convolution can always be implemented by rewriting the one-

dimensional input sequence as a multi-dimensional sequence. The convolution can then be

indirectly computed via multi-dimensional transforms which, in turn, can be computed as a series of

short one-dimensional transforms. The final step is a mapping back to the one-dimensional output

sequence. As an example for a two-dimensional mapping of an original one-dimensional sequence,

consider the cyclic convolution of equation (21):

N-1
yn = ∑ xq h[n⊕ (-q)]
N
(21)
!!!q=0

We assume that we can write N = L.M where L and M are integers. A change of variables can now

be made:

n!=!l!+!mLÔ̧
˝ k, l = 0, 1, ...., L-1; p, m = 0, 1, ......, M-1
q!=!k!+!pLÔ˛

and the convolution now becomes:

L-1 M-1
yl+mL = ∑∑ xk+pL h[l⊕mL⊕(-k)⊕(-pL)] (22)
!!!k=0!!!p=0

where we have dropped the subscript on the modulo N addition operator. Let us now define two-

dimensional arrays for y, x and h. We will keep the same notation as used by Agarwal and Burrus

(1974). Thus:

^ ^ ^
X l, m = xl+mL ; H l, m = hl+mL ; Y l, m = yl+mL (23)

Page 61
Number Theoretic Techniques in Digital Signal Processing

and the convolution can be written as equation (24):

L-1 M-1
^ ^ ^
Y l, m = ∑∑ H l⊕(-k), m⊕(-p) X k, p (24)
!!!k=0!!!p=0

This is a two-dimensional cyclic convolution, and we can compute it indirectly using two-

dimensional NTT's.

Two-dimensional NTT's can be calculated using one-dimensional NTT's along the rows and then

along the columns of the intermediate results. Clearly two-dimensional convolution is a sort of

overlay of column-wise, followed by row-wise (or vice-versa) one-dimensional cyclic convolution.

If we examine the decomposition of the original one-dimensional sequence, we find that increasing

values of the m-index (row index) defines a sampling of the original signal by a reduction factor of

L and thus preserves the cyclic nature of the sequence (this new sequence has period M rather than

N). Increasing values of the l-index (column index) are contiguous samples of only a segment of

the original sequence. Thus, although cyclic convolution will work for the rows it will not work for

the columns, since this sequence is not a periodic sub-sampling of the original signal. We must

therefore compute aperiodic convolution along the columns, and this means invoking one of the two

techniques (overlap-add or overlap-save (Gold and Rader (1969), chapter by T. Stockham, Jr.),

available for computing aperiodic convolution from cyclic convolution. Another way of looking at

the problem, is to consider that although the indices of the 2-D sequences are computed over finite

rings, R(L) and R(M), the formation of these rings from the original index sequence was over

R(N).

The overlap-save technique involves appending at least (L-1) zero samples to the original column
^ ^
sequences of the X array; the H array is augmented by the periodic extension of the original {h}

sequence, as indicated in the index mapping of equation (23). The final result will have L correct

values and L-1 incorrect values per column (Agarwal and Burrus (1974)). Normally, in order to
^
compute fast convolution, we will require to append L zeros to the X columns (rather than L-1

Page 62
Number Theoretic Techniques in Digital Signal Processing

zeros), requiring a total 2-dimensional array of 2L x M. Two of the rows of the final result will be

found to be dependent (except for a cyclic shift) because of this redundancy of one extra row added

to the 2-D arrays.

The 2-D NTT is defined in equation (), where the Nth order generator (a) has been replaced by an

Mth order generator (aL) and an 2Lth order generator (aM/2).

^^ 2L-1M-1
^
X l, m = ∑ ∑ X k, p . a(Mkl+Lpm) (25)
!!!k=0!!!p=0

Consider taking 1-D transforms along the columns (p index) of the input, xk, p.

M-1
^ ^
X [k]m = ∑ X k, p . aLpm
!!!p=0

The index [k] corresponds to the column of the 1-D transform result. We can now compute the 2-

D NTT by taking 1-D transforms along the rows (k index) of this modified intermediate result, as

shown in equation (26).

^^ 2L-1
^ (26)
X l, m = ∑ X [k]m . aMkl
!!!k=0

^^ ^^
By computing 2-D transforms for X l,m and X l,m, we can form the 2-D cyclic convolution by

multiplication in the transform domain, followed by inverse transformation.

2L-1 M-1 ^ ^^
^ ^
Y k,p = (2N) ∑ ∑ X l, m . H l, m . a-(Mlk+Lmp)
-1 (27)
!!!l=0!!!m=0

It now remains to unscramble the resulting 2-D sequence back into a 1-D sequence; this result is

the required 1-D cyclic convolution. The entire process appears quite involved, but it allows the use

Page 63
Number Theoretic Techniques in Digital Signal Processing

of small length transforms to implement much longer length convolutions. As before, we will take

an example to illustrate the procedure, and as before we will consider the cyclic convolution of the

sequence {1,!2,!3,!4,!0,!0,!0,!0} with itself.

In order to demonstrate the ability of this technique to allow longer length convolutions than

possible with a direct procedure, we will consider the convolution over GF(29). We have already

determined that we can only achieve a direct one-dimensional length 8 transform using a second

degree extension field, GF(292). We will decompose the length N = 8 = 4 x 2, if we let L = 2 and

M = 4, then we can compute the convolution using 4 x 4 arrays, with a series of length 4 one-
dimensional convolutions. The parameters for the example are a L = a M/2 = a 2 = 12 (a = 12

does not exist in GF(29)), and (2N-1) = 16-1 = 20. The sequence {1,!2,!3,!4,!0,!0,!0,!0} is mapped
^ ^
into the H and X array as below:

0 3 1 0 3 1 0 0
Ê ˆ Ê ˆ
^
Á 0 4 2 0 ˜ ^
Á 4 2 0 0 ˜
H =
Á 0 0 3 1 ˜ X =
Á 0 0 0 0 ˜
Ë 0 0 4 2
¯ Ë 0 0 0 0
¯
^ ^
Note the cyclic extension of the H array and the 2 rows of zeros appended to the X array. The

next step is to compute the two-dimensional NTT of each array. We first show the intermediate step

after taking 1-dimensional NTT's of each column in the two arrays:

0 7 10 3 7 3 0 0
Ê ˆ Ê ˆ
Ï ^ [k] ¸
Á 0 22 3 4 ˜ Ï ^ [k] ¸
Á 22 25 0 0 ˜
ÌH
Ó m˝˛ =
Á 0 28 27 28 ˜ ÌX
Ó m˝˛ =
Á 28 28 0 0 ˜
Ë 0 13 22 23
¯ Ë 13 6 0 0
¯
Followed by 1-dimensional NTT's of the rows:

Page 64
Number Theoretic Techniques in Digital Signal Processing

20 9 0 0 10 14 4 0
Ê ˆ Ê ˆ
^^ Á 0 10 6 13 ˜ ^^ Á 18 3 26 12 ˜
H =
Á 25 2 0 2 ˜ X =
Á 27 16 0 11 ˜
Ë 0 3 15 11
¯ Ë 19 27 7 28
¯
^^ ^
The next step is to form the product, H ƒ29 X^ , and invert the result. This is shown below:

26 10 0 0
Ê ˆ
^^ ^^ Á 0 1 11 11 ˜
H ƒ29 X =
Á ˜
8 3 0 22
Ë 0 23 18 18
¯
and the inverse transform is computed by 1-D transforms on the columns and rows with a final

multiplication by 16-1. The final inverse transform with multiplication is shown below:

6 11 4 28
Ê ˆ
^
Á 0 6 1 7 ˜
Y = 16-1 ƒ29 Á ˜
16 15 23 24
Ë 6 1 7 0
¯
4 17 22 9
Ê ˆ
Table XI Primes (10 bits or less) and Á 0 4 20 24 ˜
transform lengths =
Á 1 10 25 16 ˜
p
Ë ¯
2q 4 20 24 0
97 32
We now unravel the bottom two rows of the result
193 64
257 256 array to find the output sequence: {1,!4, 10, 20, 25,

449 64 24, 16, 0}.


577 64
641 128 F. Extension of dynamic range
673 32
769 256

Page 65
Number Theoretic Techniques in Digital Signal Processing

An inverse problem to that discussed above is the extension of dynamic range for a given transform

length. This seems trivial, in that we can probably find a large transform length for a large dynamic

range, and simply reduce the transform length by using powers of the multiplicative group

generator. That, however, is not the exercise we wish to perform here. Suppose that we find small

(10-bits or less) primes that have desirable properties for the construction of fields in which

reasonable power-of-two transform lengths are possible, and our goal is to produce such a length
transform. For example, primes, p, which have the property 2qÙp-1, will allow transform lengths of

2q. Table XI shows some small primes and their associated transform length, 2q. The transform

lengths shown are quite 'respectable' in terms of usefulness in DSP, but the dynamic range afforded

by the size of the prime is probably inadequate. Can we 'stick' several of these transforms together

to form a larger dynamic range. The answer is a resounding yes, and we will show, in the following

sections, that this approach has much larger ramifications than one would initially imagine.

G. Binary Implementations

The interesting fact is that, beside from the flurry of academic and general theoretical interest, all of

these efforts have lead to very little in the way of special purpose hardware being built to implement

these transforms. It appears that most of the implementations are on general purpose computers

(software implementations) with dubious usefulness. The only hardware approach, using a single

base field, has been discussed by McClellan (1976) using a Fermat Number Transform. A special
coding scheme was used to implement the modulo Ft computation. An interesting modification to

binary arithmetic was also proposed by Leibowitz (1976). Both of these approaches have important

ramifications for the efficient binary implementation of Fermat Number Transform operations. It is

appropriate to briefly discuss these techniques here, for later contrast with the non-binary

implementations to be discussed later.

1. McClellan Approach

Page 66
Number Theoretic Techniques in Digital Signal Processing

The representation of Fermat numbers in the binary representation is only 50% efficient, since only

one of the field1 elements requires the most significant bit to be one. In the obvious mapping, this

element is the largest in the field. The coding scheme introduced by McClellan maps this isolated

case to zero. The full mapping is described below.

The representation of an element, B, in GF(Ft) is by t+1 bits {bt, bt-1, ...., b0} with the following

mapping:

1 If bt = 1, then B = 0.

t
2 If bt = 0, then B = Âst-i!2t-i where sj
i=1
Ï1 if!bj !=!1
= ÌÓ
-1 if!bj !=!0

Using the example given by McClellan for F4:

10000 represents zero

01010 represents 23 - 22 + 2 - 1 = 5

00011 represents -23 - 22 + 2 + 1 = -9 (=8)

Over GF(Ft) this representation is valid for all elements. The representation of zero as a special case

(bt = 1) allows simple hardware for arithmetic with zero as one of the elements. General addition

involves an ordinary binary addition followed by an adjustment based on the state of the output

carry. This is the same complexity as 1's complement binary arithmetic. Multiplication by powers

of 2 (required in the formation of the forward or inverse FNT) is a simple cyclic shift with the

addition of a logical inversion as bits are fed back into the least significant position.

1We will assume that the Fermat Number chosen is prime

Page 67
Number Theoretic Techniques in Digital Signal Processing

We see that ordinary binary arithmetic elements can be used, with slight modifications to the

circuitry. General multiplication, of course, retains the same complexity that it has for binary

arithmetic. We note that all general arithmetic operations are performed with only t bits rather than

the t+1 bits of the full representation. McClellan's hardware only computed the transform and not

the transform domain multiplication.

2. Leibowitz Approach

Leibowitz introduced a modification that reduced the complexity of the arithmetic hardware. This

diminished-1 representation involves a simple translation of 1-bit from the normal binary

representation. This contrasts to McClellan's use of symmetrical weightings {-1,1}. The

diminished-1 representation simply adds one to the binary representation, thus {0, 1, 2, .... , 2t}
becomes {1, 2, 3, ... , 0}; note that this mapping places 0 as the only element that requires bt = 1,

and so corresponds with McClellan's mapping of that element. The other elements are, in general,

mapped to different positions than McClellan's mapping. Leibowitz demonstrates that simpler

implementations are possible, compared to McClellan's mapping, and also that general

multiplication can be carried out as a simple modification of a binary multiplication.

We see that general multiplication is more complicated than binary multiplication (where the

majority of computational complexity is centered) and that these representations only work for

fields based on Fermat primes.

We will now introduce an alternative to the single modulus NTT that has been used to great effect

in the implementation of convolution hardware, and which opens the door to some very interesting

VLSI implementations; this alternative is the Residue Number System (RNS). Rather than finding

special large moduli that allow modified binary arithmetic hardware to be used, we will restrict the

modulus to be small enough that look-up tables can be used to implement the arithmetic (these

look-up tables are basically unminimized truth tables). We are no longer restricted as to the form of

Page 68
Number Theoretic Techniques in Digital Signal Processing

the modulus, just the size for practical hardware solutions. We can remove the modulus size

problem by several parallel computations over a direct sum ring. We will start by digressing,

somewhat, into the theory of the RNS; this knowledge will then be applied to the implementation of

flexible NTTs.

Page 69
Number Theoretic Techniques in Digital Signal Processing

VI. References

Abramowitz, M. and Stegun, I.A. Eds. (1968). "Handbook of Mathematical Functions with

Formulas, Graphs and Mathematical Tables." National Bureau of Standards, Applied

Mathematics Series, 55, 7th printing.

Agarwal R.C. and Burrus, C. S. (1974). "Fast convolution using Fermat number transforms with

applications to digital filtering." IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-

22, no. 2, April, pp. 87-97.

Baraniecka, A.Z. (1980). "Digital Filtering using Number Theoretic Techniques". Ph.!D.

dissertation, University of Windsor, Windsor, Ontario, Canada.

Baraniecka, A.Z. and Jullien, G.A. (1980). "Residue Number System!Implementations of Number

Theoretic Transforms in Complex Residue!Rings." IEEE Trans. on Acoustics, Speech, and

Signal Processing,!Vol. ASSP-28, No.3, June, pp.285-291.

Baugh, R.A. and Day, E.C. (1961). "Electronic sign evaluation for residue number systems." Tech.

Rep. No. TR-60-597-32, RCA, Camden, HJ, and Burlington, MA, Jan.

Bayoumi, M.A., Jullien, G.A., Miller, W.C. (1987). "A VLSI Implementation of Residue Adders",

IEEE Trans. on Circuits and Systems, Vol. CAS-34, No.3.

Blahut, R.E. (1985). "Fast Algorithms for Digital Signal Processing." Addison-Wesley Publishing

Co., Inc., Reading, MA. .

Burton, D.M. (1970). "A First Course in Rings and Ideals." Addison-Wesley Publishing Co., Inc.,

Page 70
Number Theoretic Techniques in Digital Signal Processing

Carhoun, D.O., Johnson, B.L., Redinbo, G.R. (1983). “A Synthesis Algorithm for Recursive Finite

Field FIR Digital Filters.” Proc. of the 1983 Int. Symp. on Circuits and Systems, Vol. 2, pp.

689-693.

Cheney, P.W. (1961). "A Digital Correlator Based on the Residue Number System." IRE

Transactions on Electronic Computers, vol. EC-11, March, pp. 63-70.

Cooley, J.W. and Tukey, J.W. (1965). “An Algorithm for the Machine Computation of Complex

Fourier Transforms.”, Mathematics of Computation, April, pp. 297-301.

Dillinger, T. E. (1988). “VLSI Engineering.” Prentice-Hall, ISBN 0-13-942731-7 025, New

Jersey.

Dubois, E. and Venetsanopoulos, A.N. (1978). "The Discrete Fourier Transform over Finite Rings

with Application to Fast Convolution." IEEE Transactions on Computers, Vol. C-17, No. 7,

July, pp. 586-593.

Dudley, U. (1969). "Elementary Number Theory." W.H. Freeman and Co.

Gold, B, and Rader, C.M. (1969). "Digital Processing of Signals." McGraw-Hill Book Company,

New York.

Garner, H.L. (1959). "The residue number system." IRE Trans. Electronic Computers, vol. EC-8,

June, pp. 140-147.

Hatamian, M., Cash, G.L. (1986). "High Speed Signal Processing, Pipelining, and Signal

Processing," Proceedings of the Int. Conf. on Acoustics, Speech, and Signal Processing,

April, pp.1173-1176.

Jenkins W.K, and Leon, B.J. (1977). “The Use of Residue Number Systems in the Design of

Finite Impulse Response Digital Filters”, IEEE Trans. on Circuits and Systems, Vol. CAS-

24, April, pp. 191-201.

Page 71
Number Theoretic Techniques in Digital Signal Processing

Jenkins W.K, Paul, D.F., Davidson, E.S. (1985). “A Custom Designed Integrated Circuit for the

Realization of Residue Number Digital Filters.” Proc. 1985 Int. Conf. on ASSP, Tampa, Fl.,

March, pp. 220-223.

Jenkins, W.K. and Altman, E.J. (1988). "Self-checking properties of residue number error checkers

based on mixed radix conversion." IEEE Transactions on Circuits and Systems, Vol. CAS-

35, No. 2, February, pp. 159-167.

Jenkins, W.K. and Krogmeier, J.V. (1987a). "The design of dual-mode complex signal processors

based on quadratic modular number codes." IEEE Transactions on Circuits and Systems,

CAS vol. 34, no. 4, April, pp 354-364.

Jenkins, W.K. and Lao, S.F. (1987b). "The design of an RNS digital filter module using the IBM

MVISA design system." Proceedings of the International Symposium on Circuits and

Systems, Philadelphia, PA, May, pp 122 - 125.

Jenkins, W.K. (1975). "Composite number theoretic transforms for digital filtering." Proc. 9th

Asilomar Conference on Cir., Sys. and Comp., Nov., pp. 458-462.

Jullien, G.A. ( 1978). "Residue Number Scaling and Other Operations Using ROM Arrays." IEEE

Trans on Computers, Vol. C-27, No.4, April, pp. 325-336.

Jullien, G.A. ( 1980). "Implementation of Multiplication, Modulo a Prime Number, with

Applications to Number Theoretic Transforms." IEEE Trans on Computers, Vol. C-29,

No.10, October, pp. 899-905.

Jullien, G.A., Krishnan, R. and Miller, W. C. (1987). "Complex digital signal processing over finite

fields." IEEE Transactions on Circuits and Systems, CAS vol. 34, no. 4, April, pp 365-337.

Page 72
Number Theoretic Techniques in Digital Signal Processing

Jullien, G.A., Bird, P.D., Carr, J.T., Taheri, M., Miller, W.C. (1989). "An Efficient Bit-Level

Systolic Cell Design for Finite Ring Digital Signal Processing Applications.” Journal of

VLSI Signal Processing, Vol I-3, (In Print)

Jury, E.I (1964). “Theory and Application of the z-Transform Method.” John Wiley and Sons,

Inc., New York.

Kung, H.T. and Leiserson, C.E. (1978). "Systolic Arrays (for VLSI)," Sparse Matrix Proc. 1978,

Academic Press, Orlando, Fla., pp.256-282.

Leibowitz, L. M. (1976). "A Simplified Binary Arithmetic for the Fermat Number Transform."

IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-24, no. 5, October, pp. 356-359.

Liu, K.Y., Reed, I.S. and Truong, T.K. (1976). "Fast Number Theoretic Transforms for Digital

Filtering." Electronic Letters, Vol. 12, November, pp. 644-646.

McCanny, J.V., McWhirter, J.G. (1982). "Implementation of Signal Processing Functions using 1-

bit Systolic Arrays." Electronic Letters, Vol. 18, No. 6, March, pp. 241-243.

McClellan, J.H. (1976). "Hardware Realization of a Fermat Number Transform." IEEE

Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP-24, June, pp. 216-225.

McClellan, J.H. and Rader, C.M. (1979). “Number Theory in Digital Signal Processing.”

Prentice-Hall, Inc., Englewood Cliffs, NJ.

McCoy, N. (1972). "Fundamentals of Abstract Algebra." Allyn and Bacon Inc.

Nagpal, H.K., Jullien, G.A., Miller, W.C. ( 1983). "Processor Architectures for Two-Dimensional

Convolvers Using a Single Multiplexed Computational Element with Finite Field Arithmetic."

IEEE Trans on Computers, Vol. C-32, No.11, November, pp. 989-1000.

Page 73
Number Theoretic Techniques in Digital Signal Processing

Nicholson, P.J. (1971). "Algebraic Theory of Finite Fourier Transforms." J. Comput., Sys. Sci.,

Vol. 5, pp. 524-547.

Nussbaumer, H. (1976a). "Digital Filtering using Complex Mersenne Transforms." IBM J.

Research and Development, Vol. 20, May, pp. 282-284.

Nussbaumer, H. (1976b). "Complex Convolution Via Fermat Number Transforms." IBM J.

Research and Development, Vol. 20, September, pp. 498-504.

Peled, A. and Liu, B. (1974). "A new hardware realization of digital filters." IEEE Trans. on

Acoustics, Speech, and Signal Processing, vol. ASSP-22, Dec., pp 456-462.

Pollard, J.M. (1971). "The Fast Fourier Transform in a Finite Field.” Math. Comp., Vol. 25, April,

pp. 365-374.

Pollard, J.M. (1976). "Implementation of Number Theoretic Transforms.” Electronics Letters, Vol.

12, July, pp.378-379.

Rader, C.M. (1972a). "The Number Theoretic DFT and Exact Discrete Convolution.” IEEE Arden

House Workshop on Digital Signal Processing, Jan. (Verbal Report)

Rader, C.M. (1972b). "Discrete Convolutions via Mersenne Transforms.” IEEE Transactions on

Computers, Vol. C-21, December, pp. 1269-1273.

Rabiner, L.R. and Gold, B. (1975). "Theory and Application of Digital Signal Processing.”

Prentice-Hall, Englewood Cliffs, N.J.

Schilling, O. F. G., and Piper, W. S. (1975). "Basic Abstract Algebra.” Allyn & Bacon, Boston.

Schroeder, M.R. (1985). "Number Theory in Science and Communication.” Second Edition,

Springer Series in Information Sciences, Springer-Verlag, New York.

Page 74
Number Theoretic Techniques in Digital Signal Processing

Senoy, A.P., and Kumaresan, R. (1987). " Residue to Binary Conversion for RNS Arithmetic

Using only Modular Look-Up Tables.” Submitted for publication to IEEE trans. Circuits and

Systems.

Soderstrand, M.A.. (1977). “A High-Speed Low-Cost Recursive Digital Filter Using Residue

Number Arithmetic”, Proc. IEEE, Vol. 65, No. 7, July, pp. 1065-1067.

Soderstrand, M.A., Jenkins, W.K., Jullien, G.A. and Taylor, F.J. (eds) (1986). "Residue Number

System Arithmetic: Modern Applications in Digital Signal Processing.” IEEE Press, New

York, NY.

Soderstrand, M.A., Vernia, and Chang, J-H. (1983). "An Improved Residue Number System

Digital-to-Analog Converter." IEEE Trans. on Circuits and Systems, Vol. CAS-30,

December 1983, pp 903-907.

Soderstrand, M.A., Chang, B. (1986). “Design of a High Performance FIR Digital Filter on a

CMOS Semi-Custom VLSI Chip.” Proc. 1986 ISMM Int. Conf. on Mini and Micro

Computers, Beverly Hills, CA., Feb.

Svoboda, A. and Valach, M. (1955). "Operational circuits." Stroje Na Zpracovani Informaci, vol. 3,

Nakl. CSAV, Praha.

Svoboda, A. (1957). "Rational numerical system of residual classes." Stroje Na Zpracovani

Informaci, vol. 5, pp. 9-37, Prague.

Svoboda, A. (1958). "The numerical system of residual classes in mathematical machines." Proc.

Congreso Internacional De Automatica, Madrid, Spain, October. (Also Information

Processing (Proc. of UNESCO Conference, June 1959), pp. 419-422, 1960.)

Szabo, N.S. and Tanaka, R. I. (1967). "Residue Arithmetic and Its Applications to Computer

Technology.” McGraw-Hill Publishing Co.

Page 75
Number Theoretic Techniques in Digital Signal Processing

Taheri, M., Jullien, G.A. and Miller, W.C. (1988). "High Speed Signal Processing Using Systolic

Arrays Over Finite Rings." IEEE Transactions on Selected Areas in Communications, VLSI

in Communications III, Vol. 6, No. 3, April, pp. 504-512.

Tanaka, R.I. (1962). "Modular arithmetic techniques." Tech. Rep. 2-38-62-1A, ASTDR, Lockheed

Missiles and Space Co., Nov.

Taylor, F.J. and Huang, C.H. (1982). "An Autoscale Residue Multiplier." IEEE Trans Computer.

vol. C-31, no. 4, April, pp. 321-325.

Taylor, F.J. and Ramnarayanan, A.S. (1981). "An Efficient Residue-to-Decimal Converter." IEEE

Trans. Circuits and Systems, vol. CAS 28, no. 12, December, pp. 1164-1169.

Vanwormhoud, M.C. (1978). "Structural Properties of Complex Residue Rings Applied to Number

Theoretic Fourier Transforms.”. IEEE Trans. Acoust., Speech and Signal Processing, Vol.

ASSP-26, Feb., pp. 99-104.

Vinogradov, I.M. (1954). "Elements of Number Theory.” New York, Dover Publishing Co.

Page 76

You might also like