You are on page 1of 10

732

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005

Design-for-Testability and Fault-Tolerant


Techniques for FFT Processors
Shyue-Kung Lu, Jen-Sheng Shih, and Shih-Chang Huang

AbstractIn this paper, we first propose a novel design-for-testability approach based on M-testability conditions for module-level
systolic fast Fourier transform (FFT) arrays. Our M-testability
conditions guarantee 100% single-module-fault testability with a
minimum number of test patterns. Based on this testable design,
fault-tolerant approaches at the bit level and the multiply-subtract-add (MSA) module level are proposed, respectively. If the
reconfiguration is performed at the bit level, then the FFTBIT network is constructed. Two types of reconfiguration schemes (Type-I
FFTMSA and Type-II FFTMSA ) are proposed at the MSA module
level. Since both the design for testability (DFT) and the design
for yield (DFY) issues are considered at the same time for all these
proposed approaches, the resulting architectures are simpler as
compared with previous works. The reliability of the FFT system
increases significantly. The hardware overhead is lowabout 12%
for the FFTBIT network and the Type-II FFTMSA
and 1 2
network, respectively. An experimental chip is also implemented
to verify our approaches. Reliabilities and hardware overhead are
also evaluated and compared with previous works.
Index TermsButterfly network, C-testable, design for testability (DFT), fast Fourier transform (FFT), fault tolerant, logic
testing.

I. INTRODUCTION

AST FOURIER transform (FFT) algorithms are among the


most important digital signal processing (DSP) algorithms.
They provide a means to greatly speed up discrete Fourier transform computations. The performance improved by FFTs has
made the realization of many sophisticated signal processing
algorithms economically. Due to the rapid advance in semiconductor fabrication technology, a large number of processing elements can be integrated on a single chip. It therefore will soon be
possible that special-purpose VLSI chips are used to construct
FFT systems. A straightforward implementation of the -point
stages,
FFT uses two-point butterflies, which consists of
two-point butterflies. However, inand each stage contains
tegrating a large number of processors on a single chip results
in the increase in the logic-per-pin ratio, which drastically reduces the controllability and observability of the logic on the
chip. Consequently, testing such highly complex and dense circuits becomes very difficult and expensive.
There are several testable structures and fault-tolerant designs proposed to improve the testability and fabrication yield
Manuscript received September 10, 2003; revised November 26, 2004.
S.-K. Lu and S.-C. Huang are with the Very Large Scale Integration/
Computer-Aided Design Laboratory, Department of Electronic Engineering,
Fu-Jen Catholic University, Taipei 24205, Taiwan, R.O.C. (e-mail: sklu@ee.fju.
edu.tw).
J.-S. Shih was with the Department of Electronic Engineering, Fu-Jen
Catholic University, Taipei 24205, Taiwan, R.O.C. He is now with Pixelworks,
U.S., Taipei 24205, Taiwan, R.O.C. (e-mail: jshih@pixelworks.com).
Digital Object Identifier 10.1109/TVLSI.2005.844306

of FFT processors, e.g., triple modular redundancy (TMR) with


voting [1], hybrid redundancy [2], recomputing with shifted
operands (RESO) [3], and triple time redundancy [4]. Some
other concurrent error detection (CED) and testable schemes for
FFT networks can be found in [5][9], [14], and [16][22]. In
[8], Choi and Malek proposed a scheme called recomputing by
alternate path for concurrent error detection and fault diagnosis
of FFT networks. Once an error is detected, a faulty butterfly
additional cycles. In [6],
can be located within
Jou and Abraham presented an algorithm-based fault-tolerant
scheme for FFT networks. They show that 100% fault coverage
and no loss of throughput could be achieved theoretically.
Lombardi and Muzio [9] presented a new approach for CED
and fault location in homogeneous VLSI/WSI architectures
for computing complex FFT. Tao et al. [5] also proposed
an algorithm-based CED scheme for FFT processors, which
maintains the low hardware overhead and high throughput of
Jou and Abrahams scheme, and at the same time increases the
fault coverage significantly.
It is well known that the general logic testing problem is
NP-complete. For certain iterative logic arrays (ILAs), however,
the fault detection problem is solvable in polynomial time [10].
In this paper, we show that the FFT processor can be viewed as
an ILA. Our work has grounded on the theory established in a
series of papers reported in [11][15]. In these papers, testability
conditions for -testable [11], [12], and -testable [13][15]
mesh-connected arrays, hexagonally connected arrays, sequential arrays, and bilateral arrays are proposed. A C-testable array
is an array testable with a constant number of test patterns independent of the size of the array. An M-testable array is also
an array testable with a constant number of test patterns. However, this constant number is also a minimum value. Therefore, M-testable techniques are always superior to C-testable
techniques.
In this paper, a design-for-testability approach is applied to
the module-level systolic array for computing FFT. Our M-testability conditions guarantee 100% single-module-fault testability
with a minimum number of test patterns. Based on this testable
design, fault-tolerant approaches at the bit level and the MSA
module level are proposed, respectively. If the reconfiguration
network is conis perform at the bit level, then the FFT
structed. Two types of reconfiguration schemes (Type-I FFT
and Type-II FFT
) are proposed at the MSA module level.
Since both the DFT and DFY issues are considered at the same
time for all these proposed approaches, the resulting architectures are simpler as compared with previous works. The reliability of the FFT system increases significantly. The hardware
for the FFT
network
overhead is lowabout 12% and

1063-8210/$20.00 2005 IEEE

LU et al.: DESIGN-FOR-TESTABILITY AND FAULT-TOLERANT TECHNIQUES

733

Fig. 2.

Butterfly consists of four MSA modules.

(5)

Fig. 1. Eight-point FFT butterfly network.

and the Type-II FFT


network, respectively. An experimental
chip is implemented. Reliabilities and hardware overhead are
also evaluated and compared with previous works.
II. FFT
The discrete Fourier transform is defined by the following
equation:
(1)
where
. A straightforward implementation of
-stage butterfly netan -point FFT in hardware is by a
work. Each stage contains
two-point butterfly modules.
In this paper, such a circuit is called an FFT network, which
is assumed pipelined. An eight-point FFT network is shown in
Fig. 1. Each butterfly module (i.e., the two-point FFT module)
in Fig. 1 performs the following computations:

(2)
where
is a representative twiddle factor. All the quantities
in these equations are complex-valued. For implementation purposes, it is necessary to use a functionally equivalent butterfly
that employs only real quantities and real operations. Let us ex, and
in complex form as follows:
press

(3)
where is the square root of
. Combining these equations,
and
as
we can recast

(4)

The butterfly module can be constructed with four identical multiply-subtract-add (MSA) modules, as shown in Fig. 2.
III. REVIEW OF M-TESTABILITY CONDITIONS
,
Definition: A cell is a combinational machine
is the cell function and
and
where
for
. A cell can be a bit-level cell such
as the adder cell. Moreover, it can also represent a word-level
cell such as a two-point butterfly module as shown in Fig. 1.
An ILA is an array of cells. We use the terms array and ILA
interchangeably.
Definition: A complete or exhaustive input sequence for a
cell is an input sequence consisting of all possible input combi, where
nations for the cell, i.e.,
.
Definition: A complete output sequence
is defined analogously. A minimal complete sequence is a
denotes
shortest such sequence (which has a length of
the word length of a cell).
Definition: A -testable array is an array testable with a constant number of test patterns independent of the size of the array.
An -testable array is also an array testable with a constant
number of test patterns. However, this constant number is also a
minimum value (equal to ). Therefore, M-testable techniques
are always superior to C-testable techniques.
We assume that the cells behavior is invariant over time, even
if it is faulty. A faulty cells function may deviate from the correct one in any manner, as long as it remains combinational.
That is, we are testing for permanent combinational faults only
[12]. We now turn to the FFT arrays. A straightforward implementation of an -point FFT network is to use two-point butterflies, which consists of
stages and each stage contains
two-input butterflies. Let the inputs of a butterfly module,
and
, be assigned the values and , respectively, and
and be the values of their corresponding outputs (see Fig. 1).
Since
, and
. The
bijectivity of the module function can easily be verified in our
previous work [11].
Theorem 1: An -point FFT butterfly network can be made
M-testable by swapping the outputs of the lower left cells of

734

Fig. 3.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005

Tessellation of test patterns.

Fig. 5.

Testable FFT butterfly network.

Fig. 6.

Butterfly consists of four TMSA modules.

IV. TESTABLE DESIGN AT THE MODULE LEVEL


Fig. 4. Four-point FFT network.

each four-point module, where


[11]. In other
words, the number of test patterns for the FFT network is equal
to the patterns for a single two-point butterfly module. The tesdesellation of test patterns is shown in Fig. 3, where
notes minimal complete sequence,
and
. In this
figure, modules with a cross mark in them denote their outputs
must be swapped.
Proof: The case where
is trivial. For
, i.e.,
a four-point FFT module as shown in Fig. 4, we apply a minto both cell and cell . Since
imal complete sequence
the cell function is a bijection, the output sequence
of both cells are also minimal complete. We swap the outputs
of cell ; then cell and cell receive sequences
and
, respectively. Since,
is minimal complete, so is
. The resulting outputs of cell and cell are
,
, respectively, which are also minimal complete.
and
Using this tessellation, each cell receives a minimal complete
input sequence. Thanks to the bijectivity of the cell function,
any fault is propagated to some observable primary outputs concurrently.
The four-point FFT module is therefore M-testable after a
mechanism is implemented on cell to facilitate the capability
of swapping its outputs during test mode of the network. In
general, for an -point FFT network, it can be shown to be
M-testable by induction [11].
According to this theorem, the outputs of the lower left
module of each four-point butterfly network should be swapped.
Therefore, we modify the original MSA module into a testable
MSA module (TMSA) that will be described in the following.

According to Theorem 1, a testable design of the FFT butterfly networks is shown in Fig. 5. In this figure, the lower left
module of each four-point butterfly network is constructed with
four TMSA modules. The two-point butterflies designated as
MSA (TMSA) denotes that they are constructed with four MSA
(TMSA) modules, respectively.
Since the function of a two-point FFT module is bijective,
this leaves us the job of designing the TMSA modules in order
to make the whole array M-testable. The swapping mechanism
can be implemented with negligible cost, since its property is
inherent in the computation of the FFT modules. In Section II,
we showed that a butterfly module could be constructed with
four identical MSA modules (see Fig. 2). Our goal now is to
and
of the specified modules in test
swap the outputs
and
be the outputs of a module after swapmode. Let
ping; then the module performs the following function:

(6)
From (4) and (5) we have

(7)
(8)
Comparing (4) and (5) with (7) and (8), respectively, we see
that swapping the outputs is tantamount to changing the sign of
, or to replace the adders by subtractors and vice versa. This
can be implemented by using four TMSA modules as shown
in Fig. 6. When the processor operates in normal mode, each

LU et al.: DESIGN-FOR-TESTABILITY AND FAULT-TOLERANT TECHNIQUES

Fig. 7.

735

Fig. 10.

Fault-tolerant/testable FFT butterfly network.

Fig. 11.

FMSA module in the form of an ILA.

MSA module in the form of an ILA.

Fig. 8. Three cell types used in the original MSA module. (a) The multiplier
cell. (b) The subtractor cell. (c) The adder cell.

by the
(subtractor/adder) selection signal. When
,
, it performs
it performs the subtractor function. When
the adder function instead.
V. FAULT TOLERANCE AT THE BIT LEVEL

Fig. 9. Adder/subtractor cell.

two-point FFT module is configured as four MSA modules, and


their outputs are not swapped. In test mode, each of the specified FFT modules is configured as four MAS (multiply-add-substract) modules, and their outputs are swapped. A control circuit
must be designed for the TMSA modules to switch between
these two configurations. The MSA module in the form of an
ILA is shown in Fig. 7, where the word length is 3. In this figure,
three types of cells are usedthe multiplier cell (MC), the adder
cell (AC), and the subtractor cell (SC). The cell structures are
shown in Fig. 8.
The TMSA module is similar to that in Fig. 7. The only difference is that the adder cells and subtractor cells are replaced
with subtractor/adder (SA) cells. The cell structure of the SA
cell is shown in Fig. 9. In this figure, the XOR gate is controlled

This section deals with the off-line reconfiguration architecture for FFT arrays at the bit level. In our fault-tolerant
design, a redundant column col is included and placed in
between the multiplier cells and the subtractor cells for each
TMSA and MSA modules. We call the modified fault-tolerant
TMSA and MSA modules the FTMSA (fault-tolerant TMSA)
and the FMSA (fault-tolerant MSA) modules, respectively. The
fault-tolerant/testable structure of FFT butterfly networks is
shown in Fig. 10. This type of fault-tolerant FFT network is
network. In this figure, the lower
referred to as the FFT
left two-point butterfly of each four-point butterfly network
is constructed with FTMSA modules. The two-point butterfly
designated as FMSA (FTMSA) denotes that the butterflies are
constructed with FMSA (FTMSA) modules, respectively.
The FMSA module in the form of an ILA is shown in Fig. 11,
where the word length is 3. Each column in the array is labeled
. Our reconfigurausing the notation col , where
tion algorithm proceeds as follows.
1) If col is faulty,
. That is, some of the multiplier column is faulty. This faulty col is then replaced
with col , which is in turn replaced with col . This

736

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005

Fig. 13.

FTMSA module in the form of an ILA.

When

Fig. 12. (a) Multiplier cell (MC). (b) The multiplier/subtractor cell (MS).
(c) The adder/subtractor cell (SA). (d) The adder cell (AC).

process continues until col


is replaced with the redun(subtractor
dant col . In this case, functions of col
remain unchanged.
cells) and col
(subtractor cell) is faulty, then col is used
2) If col
to replace the faulty column. The functions of all other
columns remain unchanged.
(adder cell) is faulty, the subtractor column
3) When col
and the redundant column are used to replace the adder
column and subtractor column, respectively. In other
is replaced with col
and col
is
words, col
replaced with col . The functions of all other columns
remain unchanged.
To implement the FMSA module, four types of cells are
used in the fault-tolerant design: 1) the multiplier cell (MC);
2) the multiplier-subtractor cell (MS); 3) the subtractor-adder
cell (SA); and 4) the adder cell (AC). All these cells should
have bypass capability and have almost the same complexity.
The detailed implementations of these cell types are shown
in Fig. 12. For the multiplier cell and the adder cell, the only
difference with their corresponding original cells is the inwhich control the source of the
clusion of a multiplexer
output signal. The multiplexer is controlled by BP and BP
(bypass signals), denotes the column index, for adder cells and
,
multiplier cells, respectively. That is, when BP BP
the adder (multiplier) cells perform their normal function.
, they act merely as bypass registers.
When BP BP
The design of the redundant multiplier-subtractor cell is more
and
complicated. In Fig. 12(b), there are two multiplexers (
) included in the cell.
is controlled by the BP signal,
has the identical
and denotes the redundant column.
is controlled by the
function as s in MC and AC cells.
(multiplier/subtractor) selection signal.

, it performs the multiplier function. When


, it performs the subtractor function instead. If this
column is faulty, it can be bypassed by controlling the BP
signal. The subtractor/adder cell is shown in Fig. 12(c). To implement its function, a CMOS XNOR gate, which can be implemented with four transistors is included in the design. By
, we can switch the function of the cell between
controlling
normal phase and reconfiguration phase. If this column is faulty,
it can be bypassed by controlling the BP signal. Furthermore,
the FTMSA module in the form of an ILA is shown in Fig. 13,
where the word length is 3. The unique difference between the
FTMSA module and the FMSA module is that the last column
of the FTMSA module consists of subtractor/adder cells. In
test mode, the FTMSA module must be configured as a MAS
module, whereas the FMSA module must be configured as a
MSA module. For the bit-level design, the FTMSA modules as
shown in Fig. 10 can also perform the swap operation. Therefore, Theorem 1 can be applied to this fault-tolerant design directly. We can conclude that this fault-tolerant architecture is
also M-testable.
For the bit-level fault-tolerant design, a single column is used
as the basic replacement element. Therefore, diagnosis algorithms must be used to locate a faulty column. Since each cell
in the FTMSA module contains a bypass multiplexer, it can be
used to isolate a single stage first. If a single stage is isolated
and the faulty behaviors can not be observed from the primary
outputs, we can conclude that the isolated stage is faulty. It is ev.
ident that the complexity to isolate a faulty stage is
Similarly, after a faulty stage is located, a single column within
a butterfly module can be isolated to locate the faulty column.
.
The diagnosis complexity is
VI. FAULT TOLERANCE AT THE MSA MODULE LEVEL
This section deals with the off-line reconfiguration architec. Conture for FFT arrays at the MSA module level FFT
sider the four MSA modules shown in Fig. 2. These four MSA
modules can be divided into two groups and each contains 2
MSA modules. Group and Group compute the real part and
imaginary part of the outputs, respectively. In our reconfigurable
architecture, an extra MSA module MSA is included and
placed at the top of Group and Group , so there are a total of

LU et al.: DESIGN-FOR-TESTABILITY AND FAULT-TOLERANT TECHNIQUES

737

apply the all-0 and all-1 patterns to the bit-level cells in normal
mode. The reason to choose these two patterns is that the outputs of the basic cell are the same as the inputs when they are applied. These two patterns can detect all the stuck-at faults of the
XOR gate and the multiplexers during normal operation mode.
patterns are required to achieve 100%
Therefore, only
fault coverage for the MSA module. For this module-level design, since four MSA modules can be constructed within each
modified butterfly module, then Theorem 1 can also be applied
directed. Therefore, we can see that the fault-tolerant design is
also M-testable.
VII. RELIABILITY AND HARDWARE OVERHEAD ANALYSIS
Fig. 14.

Fault-tolerant structure of a butterfly module.

Let the reliability of a fault-tolerant MSA module in the


network be
. The word length of the processor is
FFT
denoted as . Assume that each cell becomes faulty randomly
and independently, with a constant failure rate . Then, the
. The reliability of a column,
reliability of a single cell is
, is
. Then
can be expressed as follows:
(9)
The reliability
fault-tolerant design is

of a butterfly module with bit-level

(10)
Fig. 15.

Reconfiguration of MSA modules.

five MSA modules in a butterfly. Extra local interconnections


are added in the design as shown in Fig. 14.
If a faulty module is identified, then the reconfiguration mechanism must be activated to replace the faulty module. Assume
MSA is faulty; then according to our reconfiguration algorithm,
the faulty module is replaced by MSA
, which is in turn
replaced by MSA , and so on. Finally, the first redundant
module is used. Fig. 15 shows the case when MSA is faulty.
MSA and MSA constitute the first group, and MSA and
MSA constitute the second group. The reconfiguration mechanism is simply a multiplexer placed in each MSA module to
is referred to as
select appropriate inputs. This type of FFT
network as opposed to the Type-II FFT
the Type-I FFT
network discussed next. This scheme has a low hardware overhead as compared with the original circuitapproximately 25%
(an extra MSA module out of four).
Instead of using a redundant MSA module in each butterfly,
network adds a redundant MSA module at
the Type-II FFT
each stage. This redundant MSA module, which is not limited
to be used for replacement in the same butterfly, can be used to
replace any faulty MSA module at the same stage. Therefore, the
. It is clear that the
hardware overhead is approximately
network has more efficient resource utilization
Type-II FFT
than the Type-I FFT
network.
In order to test the MSA, FMSA, and FTMSA modules completely. We cannot assume that there is no fault in the XOR circuit
(DFT) as well as the multiplexers (DFY). Fortunately, we can

Similarly, the system reliability


can be expressed as

of the FFT

network

(11)
network is evaluated
The hardware overhead of the FFT
in the following. We define TC
TC
, and TC
as the transistor counts of a single MSA, FMSA, and FTMSA
module, respectively. The term TC is defined as the number of
network. In other words
extra transistors in the FFT
TC

TC

TC
TC

TC

TC
TC

TC

TC

(12)

The hardware overhead ratio (HO) is defined as


HO

TC
TC
TC
TC

(13)

The number of multiplier cells, multiplier-subtractor cells,


subtractor-adder cells, adder cells, and corner cells in a single

738

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005

FMSA module are denoted as


, and
, respectively. The number of multiplier cells, multipliersubtractor cells, subtractor-adder cells, and corner cells in a
,
single FTMSA module are denoted as
, respectively. The word length is denoted as . From
and
Figs. 11 and 13, we can see that
. The transistor counts of a MSA, FMSA, and FTMSA module are
TC

TC

Fig. 16.

Chip layout.

TC
(14)
respectively. Substituting (14) into (12) and (13), we can find
the hardware overhead of the FFT
network. For example, if
and
, the hardware overhead ratio is calculated
%.
as HO
Now we turn our analysis to the module level design. Let the
. Then
reliability of a MSA module at the module level be
can be expressed as follows:

MSA module. Furthermore, the switches of the MSA-level design are assumed to be fault free in our analysis. In fact, the
added routing areas may affect the reliability of the system. To
take the effect into consideration, we can increase the failure
rate of a cell proportional to the overall hardware overhead ratio.
HO . For exThat is, the reliability of a cell becomes to
ample, the hardware overhead ratio can be found in the previous
and to these
discussions. We can substitute the values of
equations to obtain the real hardware overhead. Then the compared results can be analyzed.

(15)
The reliability of a two-point butterfly module in the Type-I
FFT
network is then given by
(16)
For the network to work correctly, all two-point butterflies
must work correctly. Therefore, the reliability of the Type-I
FFT
network
can be expressed as
(17)
Similarly, the reliability to obtain an operational butterfly
stage in the Type-II FFT
network can be expressed as
(18)
For the FFT processor to work correctly, all the stages must
operate correctly. That is, the system reliability for the Type-II
FFT
network can be expressed as
(19)
Since the number of MSA modules at each stage is
. The
network is approxhardware overhead for the Type-II FFT
imately
(one redundant MSA is added in each stage).
Note that the extra routing area is neglected here since they are
connected locally and occupies less than 3% of the area of a

VIII. EXPERIMENTAL RESULTS AND COMPARISONS


To verify the bit-level design, a VLSI chip
for the
FTMSA module is designed using Cadence full-custom design
tools. The technology used is TSMC 0.18 m, 1p6m. The transistor count is 40 464, and the chip size is 3.79 mm . The whole
chip layout is shown in Fig. 16. The area overhead is about 12%.
This overhead is lower than the analysis results shown above.
This is since several routing layers can be used for the layout. If
more layout layers are used, then the overhead may be further
reduced.
The reliabilities of the bit-level designs with different computation points are shown in Fig. 17, where
and
. The curve marked nonred denotes that it is a nonfault-tolerant design. From this figure, we can find that if the computation point increase, the reliability decreases significantly.
However, even the computation point is 256, the reliability is
still higher than the nonfault-tolerant design. The reliabilities of
the bit-level designs with different word lengths are shown in
Fig. 18. The computation point is assumed to be 16. From this
figure, we can find that less word length will result in greater
reliability improvement.
The reliabilities of the module-level design (a redundant
MSA module is added in each two-point butterfly) for different
computation points and word lengths are shown in Figs. 19
and 20, respectively. From Fig. 20 we can find that if the word
length is greater than 32, the reliability is even lower than the
nonfault-tolerant design. This is because a redundant MSA
module with larger word length has higher area. Therefore, the

LU et al.: DESIGN-FOR-TESTABILITY AND FAULT-TOLERANT TECHNIQUES

Fig. 17.

Reliabilities of the bit-level designs with different computation points

( = 0:0005; w = 16).

Fig. 18. Reliabilities of the bit-level designs with different word lengths. ( =
0:0003; N = 16).

probability of getting a faulty module increases significantly.


Comparing Figs. 18 and 20, we can find that the bit-level
designs have higher reliabilities than the module-level designs.
,
The comparison of our approaches (FFT , Type-I FFT
and Type-II FFT
) with previous works [18], [22], [24], [25]
is shown in Table I. The approach used in [22] uses a BIST
circuitry in each eight-point FFT network with word length
. The test pattern generator (TPG) used is a pseudorandom
pattern generator, which cannot guarantee 100% fault coverage
with a test length of 4096. In [24], the test approach is derived
from algorithm flow graphs (AFGs), which allows detection and
location of all single faults. Moreover, interconnect faults can
operations for an -point FFT network. In
be covered in
[25], a C-testable approach based on component-level faults was
test patterns to test the whole
proposed. It requires
denotes the number of test patterns
FFT network, where
required for testing a component.

739

Fig. 19. Reliabilities of the module-level designs with different computation


points. ( = 0:0001;w = 16).

Fig. 20.

Reliabilities of the module-level designs with different word lengths.

( = 0:0005;N = 16).

Our proposed approaches are improvements over our previous work [18], which are superior to these as can be seen from
the table. The DFT technique used in [18] aims at the bit level.
Since the basic bit-level cells (adder cells, multiplier cells, and
subtractor cells) do not inherently possess the property of bijection, therefore, two multiplexers should be added to make them
bijective. Therefore, significant hardware and delay overhead is
required (5.66%). Moreover, the fault-tolerant design proposed
in [18] used a spare row. In this approach, the faulty row is replaced by the neighboring row to its above, which is in turn replaced by the next row to the above, and so on. Since a basic
cell contains three vertical inputs, it requires three multiplexers
for each cell to bypass itself when it is faulty. Therefore, the
overhead for the DFY/DFY design is almost 40%. On the conis simtrary, the column-based reconfiguration used in FFT
pler than that in [18] since each cell contains only one horizontal
input. From Table I, we can see that the proposed three fault-tolerant approaches are all superior to [18] in terms of hardware

740

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 6, JUNE 2005

TABLE I
FAULT-TOLERANT AND TEST FEATURE COMPARISON WITH PREVIOUS SCHEMES

overhead. Although the number of test patterns is greater than


that in [18], however, the test patterns can be generated by a binary counter, which can provide at-speed testing. Therefore, the
testable design proposed in this paper is also suitable for practical applications.
Since some modifications are performed for the basic cells,
the performance penalty is inevitable. We used HSPICE circuit
simulation tool to estimate the dynamic performance of Fig. 8(b)
(the subtractor cell) and Fig. 12(c) (the adder/subtractor cell
with DFT and DFY circuits incorporated). We design all nMOS
and pMOS transistors with a (W/L) ratio of (0.5 m/0.18 m),
which is the minimum transistor size allowed in the 0.18- m
process technology. The differences between the propagation
for both circuits are 0.0248 ns. It is about
delay times of the
10% higher than the original propagation delay time. However,
in order to increase the chips testability, reliability, and yield,
this performance penalty is inevitable. If the timing specifications are tight, we still can use circuit design techniques to cure
this problem. For example, we can increase the (W/L) ratios of
all transistors in the critical path of the cell. Of course, this will
suffer from the increasing of chip area.
IX. CONCLUSION
In this paper, M-testability conditions and a design-for-testability technique are applied for testable design of FFT butterfly
networks. Our M-testability conditions guarantee 100% singlemodule-fault testability with a constant number of test patterns,
which results in a design-for-testability approach requiring negligible hardware overhead. The number of test patterns needed
for M-testing the FFT processors at the module level is equal
to that for a single module. If the word length is greater than
16, M-testing the FFT processor at the bit level is more appropriate. Although the fault model adopted here is the single
module fault, our M-testability condition can be applied to lower
level fault models, such as delay faults and sequential faults.
Moreover, built-in self-test structures can easily be designed and
applied to the module-level array, which can be tested at the
system clock rate. Based on this testable design, fault-tolerant
approaches at the bit level and the MSA module level are proposed, respectively. If the reconfiguration is perform at the bit
network is constructed. Two types of relevel, then the FFT
and Type-II FFT
)
configuration schemes (Type-I FFT
are proposed at the MSA module level. The resulting architectures are simpler as compared with previous works. The reliability of the FFT system increases significantly. The hardware
for the FFT
network
overhead is lowabout 12% and

network, respectively. An experimental


and the Type-II FFT
chip is also implemented to verify our approaches. Reliabilities
and hardware overhead are also evaluated and compared with
previous works.
REFERENCES
[1] B. W. Johnson, Design and Analysis of Fault Tolerant Digital Systems.
Reading, MA: Addison-Wesley, 1989.
[2] P. K. Lara, Fault Tolerant and Fault Testable Hardware Design. Englewood Cliffs, NJ: Prentice-Hall, 1987.
[3] S. Laha and J. H. Patel, Error correction in arithmetic operations using
time redundancy, in Proc. 13th Annu. Int. Symp. Fault-Tolerant Computing, Jun. 1983, pp. 298305.
[4] E. E. Swartzlander, Jr. et al., Sign/logarithm arithmetic for FFT implementation, IEEE Trans. Comput., vol. C-32, no. 6, pp. 526534, Jun.
1983.
[5] D. L. Tao, C. R. P. Hartmann, and Y. S. Chen, A novel concurrent error
detection scheme for FFT networks, in Proc. Int. Symp. Fault-Tolerant
Comput., Jun. 1990, pp. 114121.
[6] J. Y. Jou and J. A. Abraham, Fault-tolerant FFT networks, IEEE Trans.
Comput., vol. 37, no. 5, pp. 548561, May 1988.
[7] M. Tsunoyama and S. Naito, A fault-tolerant FFT processor, in Int.
Symp. Fault-Tolerant Comput., Jun. 1991, pp. 128135.
[8] Y. H. Choi and M. Malek, A fault-tolerant FFT processor, IEEE Trans.
Comput., vol. 37, no. 5, pp. 617621, May 1988.
[9] F. Lombardi and J. Muzio, Concurrent error detection in reconfigurable
WSI structures for FFT computation, in Proc. Int. Conf. Wafer Scale
Integration, 1991, pp. 4653.
[10] H. Fujiwara and S. Toida, The complexity of fault detection problems
for combinational logic circuits, IEEE Trans. Comput., vol. C-31, no.
6, pp. 555560, Jun. 1982.
[11] S. K. Lu, C. W. Wu, and S.-Y. Kuo, Enhancing testability of VLSI arrays for fast Fourier transform, Proc. Inst. Elect. Eng., E, vol. 140, no.
3, pp. 161166, May 1993.
[12] C. W. Wu and P. R. Cappello, Easily testable iterative logic arrays,
IEEE Trans. Comput., vol. 31, no. 5, pp. 640652, May 1990.
[13] W. H. Kautz, Testing for faults in combinational cellular logic arrays,
in Proc. 8th Annu. Symp. Switching, Automata Theory, 1967, pp.
161174.
[14] P. R. Menon and A. D. Friedman, Fault detection in iterative arrays,
IEEE Trans. Comput., vol. C-20, pp. 524535, May 1971.
[15] A. D. Friedman, Easily testable iterative systems, IEEE Trans.
Comput., vol. C-22, pp. 10611064, Dec. 1973.
[16] T. H. Chen and L. G. Chen, Concurrent error-detectable butterfly chip
for real-time FFT processing through time redundancy, IEEE J. SolidState Circuits, vol. 28, no. 5, pp. 537547, May 1993.
[17] V. K. Jain, H. A. Nienhaus, D. L. Landis, S. Al-Arian, and C. E. Alvarez,
Wafer scale architecture for an FFT processor, in Proc. Int. Symp. Circuits Systems, 1989, pp. 453456.
[18] J. F. Li, S. K. Lu, S. Y. Huang, and C. W. Wu, Easily testable and fault
tolerant FFT butterfly networks, IEEE Trans. Circuits Syst. II, Analog
Digit. Signal Process., vol. 47, no. 9, pp. 919929, Sep. 2000.
[19] V. K. Jain, H. Hikawa, and E. E. Swartzlander, Defect tolerance and
yield for a wafer scale FFT processor system, in Proc. Int. Conf. Wafer
Scale Integration, 1991, pp. 5460.
[20] V. Piuri and E. E. Swartzlander, Time-shared modular redundancy for
fault-tolerant FFT processors, in Proc. Int. Symp. Defect Fault Tolerance in VLSI Systems, 1999, pp. 265273.
[21] L. Breveglieri and V. Piuri, A fast pipelined FFT unit, in Proc. Int.
Conf. Application Specific Array Processors, 1994, pp. 143151.

LU et al.: DESIGN-FOR-TESTABILITY AND FAULT-TOLERANT TECHNIQUES

[22] K. Yamashita, A. Kanasugi, S. Hijiya, G. Goto, N. Matsumura, and T.


Shirato, A wafer-scale 170 000-gate FFT processor with built-in test
circuits, IEEE J. Solid-State Circuits, vol. 23, no. 2, pp. 336342, Apr.
1988.
[23] V. K. Jain, S. A. Al-Arian, D. L. Landis, and H. A. Nienhaus, Fully
parallel and testable WSI architecture for an FFT processor, Int. J.
Comput.-Aided VLSI Des., vol. 3, pp. 113131, 1991.
[24] A. Antola and M. G. Sami, Testing and diagnosis of FFT arrays, J.
VLSI Signal Process., vol. 3, pp. 225236, 1991.
[25] C. Feng, J. C. Muzio, and F. Lombardi, On the testability of the array
structures for FFT computation, J. Electron. Testing: Theory Applicat.,
vol. 4, pp. 215224, Aug. 1993.

Shyue-Kung Lu received the Ph.D. degree in electrical engineering from the National Taiwan University, Taipei, in 1995.
From 1995 to 1998, he was an Associate Professor in the Department of Electrical Engineering,
Lunghwa Junior College of Technology and Commerce. Since 1998, he has been with the Department
of Electronics Engineering, Fu Jen Catholic University, Taipei, where he is a Professor. His research
interests include the areas of VLSI testing and
fault-tolerant computing.

741

Jen-Sheng Shih was born in Taiwan, R.O.C., in


1973. He received the B.S. degree from the National
Taipei University, Taipei, Taiwan, and the M.S.
degree from the Fu-Jen Catholic University, Taipei,
in 1998 and 2000, both in electronic engineering.
He was with the Acer Laboratories Inc. (Ali) from
2000 to 2003, where he was engaged in MPEG IC design. He joined Pixelworks, U.S., Taipei, from 2003
to now. His research interests include FPGA testing,
MPEG design, and LCD TV systems.

Shih-Chang Huang received the B.S. degree in


electronic engineering from Fu-Jen Catholic University, Taipei, Taiwan, R.O.C., in 2002. He is currently
pursuing the M.S. degree in electrical engineering at
Fu-Jen Catholic University.

You might also like