You are on page 1of 6

17th Asian Test Symposium

Not All Xs are Bad for Scan Compression


Anshuman Chandra and Rohit Kapur
Synopsys, Inc., 700 E. Middlefield Rd., Mountain View, CA, 94043, USA
{anshuman, rkapur}@synopsys.com

Abstract
Scan compression technology combines the expected Scan in Scan in
÷3
responses from multiple scan chains to be observed at
fewer scan outputs. As a result unknowns (Xs) in the test
response interfere with the good values that could be
observed. Prior to this paper, Xs in the test response were
treated as bad for compression and solutions either
removed, bypassed, or blocked the Xs from interfering with
the other responses. In this paper we show that some Xs
can be added to improve test compression quality of Scan out X masking logic Scan out

results. The trade-off between improved observability due Test Data Volume = Patterns x Scan interface x chain length
to simultaneous clocking of interacting clock domains is Test Application Time = Patterns x chain length
played against the reduced observability caused by the Xs
in the response due to race conditions. In this paper we Figure 1: Decoupling of the scan-interface from the internal
show that when the inter clock domain Xs are added but scan chains to allow for reduction in test data volume and
limited, the gains achieved by adding the Xs far exceeds test application time.
the losses in bringing together the Xs with other observes Scan compression technology came about as a solution for
in scan compression.
the increased test data volume and test application time
1. Introduction seen in scan testing. When no unknowns (Xs) exist in the
test response scan and compression technology behave
The need for test data compression has been driven not quite similarly. However, when an X is observed in a scan
only by the increasing transistor density on the chips but design the X has no negative impact on the scan solution.
also by the increasing use of test sets for multiple fault In scan compression the values from multiple scan cells
models [1]. In a span of few years, a number of are combined to create fewer values where the X in the
technologies that went under the name of scan response masks off any values it is combined with.
compression were developed to add logic between the scan Depending upon the amount of compression being targeted
inputs/outputs and internal scan chains to reduce test data the Xs could mask a large number of good responses that
volume. These efforts leveraged the fact that a large the test patterns relied upon for fault detection. Thus Xs in
portion of ATPG-generated tests had logic don’t cares. the response have a significant negative impact on the QoR
These technologies relied on decoupling the scan- of compression. All solutions in scan compression today
inputs/outputs from the internal chains, such that a larger treat these Xs as negative things and proactively manage
number of internal chains can be driven from a smaller the Xs such that they do not appear in the test response. In
interface. the next section we provide an overview of the existing
Figure 1 shows the impact of decoupling the scan solution to Xs in the response. In the following section
terminals from the internal scan chain. This figure also (Section 3) we present a new kind of X that has never been
shows the relationship of chain length to test data volume considered in the past researchin compression. These Xs
and test application time. A reduction in chain length have a good and bad aspect to them. On the good side they
linearly reduces the test data volume and test application appear as a result of increased observability caused due to
time. A ratio of 3x more internal chains than the scan aggressive clocking which translates to fewer patterns. On
interface translates to 3x shorter chains and the the bad side in compression they interfere with the
corresponding reduction in the quality of results (data and observability of other values – increasing the number of
time). This is the fundamental mechanism behind the test patterns. In Section 4 we show how managing the
numerous research papers and the commercially available number of Xs caused by aggressive clocking can result in
scan compression technologies of today. improved test data volume and test application time in scan
compression. We finally present our conclusions.

1081-7735/08 $25.00 © 2008 IEEE 7


DOI 10.1109/ATS.2008.37
2. Xs in the Response – Prior Art An implementation of masking logic for a compressor that
has redundancy in the XORs of the compressor has been
Logic added to interface the internal scan chains to the discussed in [13][15] where, the masking logic ensures
scan-outputs is referred to as the compressor, as it takes that within any group of scan chains that are observed, the
many values from the flip-flops and funnels the data to a logic-X’s in the response captured in any of the scan cells
much smaller set of terminals. Un-initialized memory does not interfere in the observability of the scan cells in
elements, bus-contention or other timing related issues other scan chains. While there can be many variations of
cause Xs to be captured in the response. Logic-X’s chain configurations that can be masked, the masking
(unknowns) in the test response have a negative impact on logic tends to have the following properties:
the observability of good responses that are coming
together in the compressor. 1. A configuration where no chains are blocked

The X’s generated during response capture can be 2. Many configurations that block a sub-set of
proactively blocked from reaching the scan cell by chains from entering the compressor.
identifying the X-sources and then removing them or by The mask control signals could either come from primary
inserting additional DFT logic to fix the X-sources by inputs, the decompressor of the architecture or an
adding additional test points [2]. Another way to block the independent scan chain that is not connected to the
Xs from reaching the scan cells is by careful test pattern decompressor [9][10][11][12][13]. Furthermore, masking
generation where the don’t-care bits in the scan-in vector implementations in scan compression schemes can vary
can be set to control values to block the Xs from reaching between per-pattern masking to per-shift masking with
the scan cell [3]. intermediate possibilities [13]. Compression schemes that
Efficient compactors have been proposed that provide use MISRs need additional blocking to guarantee 100%
good compression without loss in coverage due to error unknowns from reaching the MISR.
masking and X-masking. Error masking occurs when As one can see there is a lot of effort and DFT put in place
multiple errors cancel each other due to the compactor in scan compression to avoid logic-X’s. Logic-X’s in the
architecture and X-masking happens when the Xs in the response can impact the fault coverage of the scan
response prevent the error from propagating to the compression scheme which is typically not an issue that
compactor output. The space compactors are can be compromised. When fault coverage is lost due to a
combinational circuits typically built of XOR gates to logic-X interfering with the good values in the compressor,
compact test response coming out from N scan chains in the coverage would need to be recovered in scan mode
one shift-out cycle {0,1,X}N into a compacted test response which takes away from the efficiency achieved by the scan
observable at Q outputs {0,1,X}Q, where N > Q and X compression scheme. For example take the case where
denotes the unknown value. The space compactors are 10X compression was targeted and the final
defined by parity check matrices of linear block codes as implementation requires 10% of the patterns to be applied
proposed in [3]. The ability of space compactors to tolerate through the traditional scan mode as a result of logic-X’s.
unknown values in the test responses was addressed by X- The final test application time is T=0.9x(t/10) + 0.1t =
compact technique [4]. The analysis presented in [5] 0.19t. Which represents only 5X compression. Masking
provides an estimation of the compaction ratio of the space ensures that logic-X’s do not cause this situation to occur
compactors to be able to tolerate a certain number of because of unknowns in the test response.
unknown values.
However, masking itself does not come without any
The convolutional compaction schemes presented in issues. Use of masking logic reduces the observability in
[6][7][8] are a class of finite state compactors that the design making the test pattern count increase for the
combine time and space compaction techniques. The same fault coverage. Now consider the case where twice
convolutional compactors convert test responses shifting the patterns are created because masking reduced the
out of N scan chains in a finite number of s shift-out cycles observability of the design to avoid unknowns. For a 10X
into compacted test response observable at Q outputs, scan compression scheme the final reduction in test
where N > Q. Designed specifically to meet certain application time would be 10/2=5X over that of scan.
requirements of manufacturing test [6], the convolutional Hence bad QoR for the scan compression scheme.
compactors are able achieve much higher compaction
ratios N/Q with Q scan out channels. The ability of the For designs that are not X-clean, the number of Xs
convolutional compactors to tolerate unknown values was generated could be large or the distribution of these Xs
studied in [7]. could be non optimal w.r.t. the masking selection logic.
When some scan cells always observe an X on capture
By far the most popular technique to handle X’s captured values, instead of invoking masking it is better to architect
by the scan cells is to insert a masking circuit between the the scan chains to separate the static-X scan cells such that
scan chain output and the compactor input (see Figure 1). the Xs do not enter the compressor at all [14].

8
3. Introducing Xs for Better Compression 2. X (plain text): Represents a dynamic-X that does
not come from the simultaneous clocking issue
The traditional ATPG operates in a zero delay being discussed in this paper. A dynamic-X
environment where no-timing information is available. In represents a scan cell that sometimes captures an
such cases strict DRC rules are followed to ensure that the unknown value and sometimes a good response.
captured values predicted by the ATPG tool matches the
values seen when timing is taken into account. One of the 3. X (italics): Represents a scan cell that captures a
rules ATPG follows is that one clock can be turned on at dynamic X due to inter-clock-domain issues
any given time. When the flip-flops of two clock domains when clocks are simultaneously pulsed.
do not interact with each other their clocks can be pulsed
simultaneously without causing problems to the zero delay
environment assumed in ATPG. Figure 2 (a) shows a
situation where the FF's from one clock domain are not Clk 1 Clk 2
connected to FFs from another clock domain. Since not
pulsing a clock during capture makes a set of FFs
unobservable for a test, ATPG is designed to (a) Independent clock domains where no
simultaneously pulse all the clocks it is allowed to for FFs from one domain are connected to
maximum observability. the other domain.

If ATPG turns on the clocks of two interacting clock


domains then all paths between the two clock domains end
X
up in logic-Xs in the flip-flops as the captured values Clk 1 Clk 2
cannot be predicted without exact timing information [16]. X
Figure 2 (b) shows the condition where there are some
paths between FFs of one clock domain to the other. In (b) Clock domains when FFs from one
this case a zero delay ATPG cannot predict the response of domain are connected to the other
the paths between the clock domains. If the number of domain.
flip-flops observing good values is more than the Xs such
that pulsing both clocks together has more observability Figure 2: Connectivity of clock domains which determines
than pulsing one clock at a time test sets can be compacted if clocks can be turned on simultaneously.
further. However, in scan compression the Xs have a
negative impact where the Xs mask good values in the D’s represent faults being detected. In this example D1 and
compressor or observability of the scan chains gets
D2 represent unique faults being detected. These faults are
reduced due to the use of masking. If the increased
not observed in another scan cell in the entire test set. All
observability of the simultaneous clocking of interacting
flip-flops with no-arrows going to them represent flip-
domains exceeds the negative impact of the Xs then
flops that are in independent clock domains. It should be
benefits can be achieved in test data volume and test
noted that not all flip-flops have inter-clock domain paths
application time in scan compression. affecting the captured value.
A typical design would have multiple clock domains.
Turning on all the clocks simultaneously would mean a In scan environment, where each chain has a dedicated
large number of Xs would get created in the flip-flops on scan out port, the contents of the flip-flops containing an X
the receiving end of the inter-domain value. By can be dropped from observation on the automatic test
intelligently selecting the clocks that could be turned on equipment (ATE). In a scan-architecture the unique faults,
simultaneously the number of Xs can be limited to stay D1 and D2, shown in the Figure 3, can be observed at so1
within some budget. This budget is usually dependent on and so4 with the Xs being scanned out at so2 and so3.
the compressor design and the number of scan out ports. Since the Xs do not interfere in the detection of the unique
faults, clocking the three clocks together allowed for all
Let us consider a hypothetical example to understand the faults to be detected by a single pattern. This can be
effect of unknowns created during ATPG due to flops of compared to clocking each clock separately where flip-
multiple clocks being clocked simultaneously. Figure 3 flops in other clock domains are not disturbed during
shows a set of flip-flops in 4 scan chains, ch1, ch2, ch3 capture results in three separate patterns and thus leading
and ch4. Three clock domains exist and are marked with to pattern inflation.
Clk1, Clk2 and Clk3. The example shows a number of
unknowns in the captured response. The issue of inter-clock-domain Xs becomes more
complex with scan compression. Consider the same scan
1. X (bold face): Represents a static-X which is a chains with a compressor where the four chains are now
scan-cell that always captures an X. observed at only two scan output ports, so1 and so2 (see
Figure 4). If all the chains are simultaneously observed

9
with mask value Mall = 111, chain set {ch1, ch2, ch3} and Shift position 111 Mask
001
{ch1, ch3, ch4} are observed at so1 and so2 respectively. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 010
values
However, with mask values {M1 = 001, M2 = 010}, chains ch1
D D1
{{ch1, ch3}, {ch2, ch4}} are observed at so1 and so2, ch1 X D X D D
ch2
so1
ch2
respectively. For the shift position 1 through 4, since all D X X X X X X X D
Clk1 ch3
the Xs are static Xs, the Ds have to be observed using the
masks M1 and M2. Similarly, for shift position 11 and 13, Combinational logic
due to scan chains ch2 and ch3 being observed at multiple ch3
so2
outputs (compressor redundancy) for mask Mall, all Ds can ch1
ch3
be observed without compromising chain observability. D X X X X X 0 1 0 1 1 0 0 0
ch4
Clk3
ch4
1
ch1 X D X D D D D 2
Clk2 D D D D D X X
so1
ch2
D X X X X X X X D
Clk1
so2 Mask Inter clock Compressor
required domain Xs redundancy
Combinational logic
Figure 4: Scan cells capturing Xs in scan compression
architecture with multiple clocks being clocked
ch3 D X X X X X X X X X D separately.
so3

ch4
Clk3
so4
4. Experimental Results
For our experiments, we used 17 large industrial designs,
2
Clk2 D D D D D X X

TetraMAX [18] to generate the patterns and DFT


Figure 3: Scan cells capturing Xs in scan architecture with Compiler for inserting the scan chains and the compressor
multiple clocks being clocked simultaneously. [17]. To determine a baseline for comparing how much
compression is obtained, we used the following
However, for the shift position 5 through 10, if all the methodology. For each compression mode inserted in the
clocks are simultaneously pulsed for this architecture, all design, a corresponding base scan mode was also inserted,
the Ds including D1 and D2 are masked due to Xs in chain where the patterns are generated for both compression and
ch2 and ch3. This is because no mask value is available to scan mode with different ATPG settings. For all the
block Xs in chain ch2 and ch3, simultaneously. To observe designs, no fault coverage drop was observed.
as many Ds as possible for shift positions 5 through 10, we
can choose to pulse two out of three clocks. For example, 4.1 Impact on Compression
if we do not pulse clock Clk3, we are left with fully
We first study how the pattern inflation is affected when
specified scan in data in the flops clocked by Clk3 in chain
the total number of inter-clock-domain Xs introduced
ch3. However, due to the specified bit in shift position 9 of
during ATPG is changed. By default, ATPG allows 20%
chain ch3, D2 in chain ch4 can be observed at so2. On the
of the total scan cells or 1000 scan cells in the design,
other hand, D1 in shift position 7 of chain ch1 still cannot
depending on which is smaller of the two, to be Xed out
be observed as the Xs in ch2 and ch3 mask it at both the
due to inter-clock-domain issues. As discussed above, the
scan out ports. Therefore, it can be observed that with scan
number of Xs that can be handled by the compressor is
compression, there is a compromise in how many inter-
directly dependent on the number of scan chains NSC,
clock-domain Xs can be introduced in the chains versus the
number of outputs Q and the number of chains connected
total number of patterns. In the hypothetical case discussed
to each output i.e., chain fanout f. Let us consider an
above, all the faults can be observed in two patterns rather
example where NSC = 900, Q = 15 and f = 3. Therefore,
than three patterns by clocking {Clk1, Clk3} and {Clk1,
each output observes a set of (900/15)*3 = 180 chains. If
Clk2} in pairs and managing the total number of inter-
we were to divide 900 chains into unique groups of 180
clock-domain Xs introduced in the chains.
chains each, there can be only 5 such groups. This implies
that for a bad distribution of Xs, mere 5 Xs are enough to
Another byproduct of pulsing fewer clocks to reduce inter-
completely block observability at all the outputs for a
clock-domain Xs is reduced use of masking. As we will
show in the results section, using the mask less frequently single shift.
increases the observability significantly and results in Similarly, if Q = 12, only 4 unique groups of 225 chains
lower pattern inflation. This can also be observed in the each can be built and therefore, only 4 Xs are enough in a
example shown in Figure 4. The D in shift position 12 in single shift to completely block observability at all the
chain ch1 can be observed at so2 without using the mask if outputs. Hence, if the number of Xs could be reduced
Clk3 is not pulsed as opposed to pulsing all the clocks during ATPG so as to reduce the probability of having five
simultaneously. or more Xs in a single shift for Q = 15, significant gains in

10
observability can be obtained. For example, if the number obtained with the proposed approach translates into lower
Xs could be reduced to three Xs per shift, even in the test application time and lower test cost.
worst case distribution, 360 chains can be observed. If
those 360 chains happen to be where Ds are being 6. References
observed, all chains can be observed without needing to [1] E. J. McCluskey, D. Burek, B. Koenemann, S. Mitra, J.
invoke X masking. This implies that ATPG can leverage Patel, J. Rajski, J. Waicukauski, "Test data compression,”
the compaction advantages by clocking fewer clocks IEEE Design & Test of Computers, vol. 20, pp. 76–87,
together and by introducing fewer Xs so as not to March-April 2003.
[2] H. Tang et al., "On efficient X-handling using a selective
adversely affect the compressor observability. compaction scheme to achieve high test response
Figure 5 shows the results of the compression obtained for compaction ratios," in Proc. Int. Conf. on VLSI Design, pp.
the default settings. Since we did not change the ATPG to 59–64, 2005.
[3] C. Wang, S.M. Reddy, I. Pomeranz, J. Rajski and J.Tyszer,
dynamically monitor the inter-clock-domain Xs, we
"On compacting test response data containing unknown
statistically determined that if we set the maximum limit values," in Proc. Int. Conf. on Computer Aided Design, pp.
for the number of cells to be Xed out, SCX = 75, we 855–862, 2003.
obtained significantly better results. If we look at Figure 5, [4] S. Mitra, K.S. Kim, “X-Compact: An efficient response
we observe that there are two kinds of circuits: the ones on compaction technique for test cost reduction”, in Proc. Int.
the left hand side that have fewer inter-clock-domain Xs Test Conf., pp.311–320, 2002.
and the ones on the right with large number of inter-clock- [5] P. Wohl and L. Huisman, "Analysis and design of optimal
domain Xs. Clearly, all the circuits on the right hand side combinational compactors," in Proc. VLSI Test Symp., pp.
show significant gains when the inter-clock-domain Xs are 101–112, 2003.
[6] J. Rajski, J. Tyszer, C. Wang and S. M. Reddy ,
managed to stay within a limit that trades off the increased
“Convolutional compaction of test responses,” in Proc. Int.
observability due to multiple clocks with the negative Test Conf., 2003, pp. 745–754.
impact of the Xs. In fact, for circuit 13 and 17, 80% and [7] J. Rajski and J. Tyszer, "Synthesis of X-tolerant
84% improvement was obtained, respectively. convolutional compactors," in Proc. VLSI Test Symp., pp.
114-119, 2005.
We also generated patterns with SCX = 0 and compared it
[8] Y. Han, Y. Xu, H. Li, X. Li and A. Chandra, “Test resource
with SCX = 75 and the results are shown in Figure 6. The partitioning based on efficient response compaction for test
data in Figure 6 shows that, generating patterns by time and tester channel reduction,” in Proc. Asian Test
allowing a small number of clocks to pulse together is Symp., pp. 440-445, 2003.
more efficient then pulsing each clock separately. [9] I. Pomeranz, S. Kundu and S. M. Reddy, "On output
response compression in the presence of unknown output
4.2 Impact of mask usage values," in Proc. Design Auto. Conf., pp. 255–258, 2002.
To study the compressor performance as the number of [10] V.Chickermane, B. Foutz and B.Keller, “Channel masking
synthesis for efficient on-chip test compression”, in Proc.
dynamic Xs are reduced, we studied the mask usage for Int. Test Conf., pp. 452–461, 2004.
circuit 13 with tool default settings and with SCX = 75. As [11] M. Naruse, I. Porneranz, S. M. Reddy and S. Kundu, "On-
shown in Figure 7, the maximum number of times X chip compression of output responses with unknown values
masking can be used is equal to the scan chain length = using LFSR reseeding," Proc. Int. Test Conf., pp. 1060–
320. Figure 7 also shows the significant reduction in the 1068, 2003.
use X masking from default to the SCX = 75 setting. In [12] P. Wohl, J. A. Waicukauski, S. Patel and M. B. Amin, "X-
fact, mask usage is almost 2X less for SCX = 75 and that tolerant compression and application of scan-ATPG patterns
translates in to the 80% better compression achieved. in a BIST architecture," in Proc. Int. Test Conf., pp. 727–
736, 2003.
5. Conclusions [13] A. Chandra and R. Kapur, "Interval based X-masking for
scan compression architectures," Proc. IEEE Int. Symp. on
We have studied the Xs generated during ATPG due to Quality Electronic Design, pp. 821–826, 2008.
inter-clock-domain issues and have shown that not all Xs [14] A. Chandra, Y. Kanzawa, R. Kapur and T. W. Williams
are bad for compression. We have shown that by allowing “Adapting scan compression to designs,” Proc. IEEE VLSI
some inter-clock-domain Xs during ATPG, scan Design and Test, pp. 309–318, 2008.
compression QoR can be significantly improved. Our [15] P. Wohl, J. Waicukauski, and S. Ramnath, “Fully X-tolerant
results show that the gains obtained in compression by combinational scan compression,” Proc. Int. Test Conf., pp.
clocking multiple clocks together with fewer cells allowed 1–10, 2007.
[16] X. Lin, S. M. Reddy, and I. Pomeranz “Test pattern
to be disturbed are significantly higher than either with
reduction by simultaneously pulsing interacting clocks,”
very large or zero cells to be disturbed. It is observed that Proc. IEEE VLSI Design and Test, pp. 301–308, 2008.
the increased observability obtained due to the clocking of [17] DFT Compiler, Synopsys DFT synthesis solution,
multiple clocks simultaneously outweighs the http://www.synopsys.com/products/test/dft_compiler_ds.pdf
observability lost due to the dynamic Xs generated during [18] TetraMAX®, Synopsys ATPG solution,
ATPG with SCX = 75. Finally, the higher compression http://www.synopsys.com/products/test/tetramax_ds.pdf

11
60

50
45%
Compression (X)

40 6% 36%
34% 84%
30 21%
SCx<1000
11% 80%
20
SCx=75
10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Circuit
X
Figure 5: Compression obtained using default tool setting and with number of disturb scan cells SC = 75

11
9
Improvement (%)

7
5
3
1
-1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
-3
-5
Circuit

X X 75 0 75
Figure 6: Compression improvements of SC = 75 over SC = 0 ( (SC – SC )*100 / SC ).

300
Number of shifts mask used

250

200

150 Max
SCx<1000
100
SCx =75
50

0
1
12
23
34
45
56
67
78
89
100
111
122
133
144
155
166
177
188
199
210
221
232
243
254
265

Patterns

X X
Figure 7: Number of shift cycles where mask was used with SC <1000 vs SC = 75 for circuit 13.

12

You might also like