You are on page 1of 4

IEEF J O U R N A L OF SOLID-STATE CIRCUITS, VOL. 25, NO.

4, AUGUST 1990 1005

Correspondence

CMOS Tapered Buffer


N. c. LI, MEMBER, IEEE, GENE L. HAVILAND, MEMBER, IEEE,
AND A. A. TUSZYNSKI, SENIOR MEMBER, IEEE

Abstract -Jaeger's buffer comprises a string of tapered inverters.


Each inverter is modeled by a capacitor and a conductor. We split the
capacitor into inherent and load components (C, and C y ) , and show
that the value of the optimal taper depends on the C, / C y ratio: the best
taper exceeds Jaeger's 2.72 slope, but only moderately.
Fig. 1 . Circuit configuration explored by Jaeger.

I. BACKGROUND
#O #1 #2 #n-1
The need for buffers at chip-crossing boundaries of MOS IC's
has been highlighted by Weste and Eshraghian [ l ] as well as
Mead and Conway [2]. The wherewithal for the design of such
buffers has been scrutinized by Lin and Linholm (31, Jaeger [4],
Veendrick [5], Hedenstierna and Jeppson [6], Nemes [7], and
Kanuma [8]. Several topics, which bear upon approximations
employed in buffer design, have been discussed by Greenbaum
[9], as well as k n o u t and De Man [lo]. Improvements attain-
Fig. 2. Model implied in Jaeger's paper
able by recourse to BiCMOS have been examined by Rosseel
and Dutton [ll], as well as De Los Santos and Hoefflinger [12].
A severe mismatch between off-chip loads and on-chip logic and the logic-level time constant 7, = C, / g . The taper is p, i.e.,
devices prevails in high-density CMOS circuits. In the interest of +
the W / L ratio of stage # ( k 1) is /3 times larger than that of
speed and power considerations, MOS transistors are laid out to itage # k :
minimal geometries $and W / L ratios close to 1. With gate
oxides of about 250 A, the on-chip capacitance of logic devices (w/L)k +I = p ( W / L ) k . (1)
amounts to several tens of femtofarads against an off-chip load
capacitance of 50 p F or more. Thus, a speed degradation factor The conductance, capacitance, and time constant of stage # k
of three orders of magnitude would result, if the loads were are
connected directly to logic-level transistors. Naturally then,
guided by past practice, one inserts a tapered buffer between
the logic devices and the load.

11. DESIGN
OF THE TAPERED
BUFFER
The overall time constant of the buffer (7,) is assumed to be
We begin with the Jaeger version of the Lin-Linholm ap- equal to the sum of the time constants of the individual stages:
proach, and, then proceed to the split-capacitor modification
n-1
developed by us. In Jaeger's model, each stage of the buffer is
represented by one conductor and one capacitor. We use one 7, = (Tk)=npT,. (3)
k=O
conductor but two capacitors. The thrust of our discussion is
directed at the optimization of the dynamic response of the The load capacitance at the output stage (C,) is
buffer.
Jaeger's buffer and its model are shown in Figs. 1 and 2, CL = pnc,. (4)
respectively. There are n stages, numbered 0 to n -1. The
logic-level capacitance is C,, the logic-level conductance is g , The number of stages of the buffer can, therefore, be written as

Manuscript received August 23, 1989; revised March 7, 1990.


N. C. Li is with the Department of Electrical and Computer Engineering,
Northeastern University, Boston, MA 02115. Substitution of ( 5 ) into (3) yields
G. L. Haviland is with the Solid-state Division, Naval Ocean System
Center, San Diego, CA.
A. A. Tuszynski is with the Department of Electrical and Computer
Engineering, San Diego State University, San Diego, CA 92182.
IEEE Log Number 9036486.

0018-9200/90/0800-lOO5$01.00 01990 IEEE

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY HAMIRPUR. Downloaded on January 19, 2010 at 09:13 from IEEE Xplore. Restrictions apply.
1006 IFEE JOIJRNAL O F SOLID-STATE CIRCUITS, VOL. 25, NO. 4, AUGUST 1990

which leads to #O #1 #@-I)

/3 (optimum) = e = 2.72. (7)


Thus one arrives at an overall delay of

70 = e . [In (C, /c,11. T , (8)


and a buffer insertion penalty factor

B,= e.ln(C,/C,). (9)


Transition from femtofarad logic to picofarad loads incurs the
still surprisingly high penalty factor of almost twenty. TABLE I
TAPE-K 01. NC, / C ,
AS A F V N C I I O
111. SPLIT-CAPACITOR
SOLUTION
C,/C, 0 0.1 0.2 0.3 0.4 0.5 0.6 0.8 1.0 3.0
We adopt the equivalent circuit and the summation of time p 2.72 2.82 2.91 3.00 3.09 3.19 3.27 3.43 3.59 4.97
constants used by Jaeger, but we split the capacitor into two
parts: an inherent output capacitance C, and an incidental load
Optimum beta is now seen to depend on the relative magnitudes
capacitance C , (Fig. 3). The logic-level value of C , + C y is C,.
of C, and C,, (Table I and Fig. 4). As was to be expected, if C ,
The load capacitance of the last stage is C,. To be included in
C , is C,, an equivalent short-circuit current capacitance, whose is negligibly small compared to Cv, then the optimum slope
maximum value is reduces to e = 2.72, in correspondence to Jaeger's solution.
Conversely, if C, is much larger than C y , then P may exceed
2.72 by a considerable margin (Table 11). Typically, P is moder-
ately larger than 2.72.

where Zp is the peak short-circuit current of the inverter, while IV. THEFAN-OUTDECISION
T, and T~ stand for rise and fall times, respectively. See [5] for In general layout work, of special interest are nodes with low
background to (10). to moderate fan-out. Faced with a fan-out of k , d o we or don't
The new definitions read as follows. The logic-level time we use a buffer? Obviously enough, where this question arises,
constant is reference is made to a single-stage buffer, scaled as shown in
Fig. 5(b). That buffer is to be compared with the straight
T, = ( c ,+ C , ) / g inverter in Fig. 5(a).
and the time constant of stage #k is Retaining the technique of linear addition of time constants,
we write the overall delay in Fig. 5(b) as
p k c , + P'k I)cl
+

r/, = (11)
PhR,
-
- c, + c, + ( P - l > C ,
(12)
Sn,
Scrutinizing (19) for best P , one arrives at
= [1 + ( P - l ) P l T , (13)
where
C, f n confirmation of the uniform taper approach. The total delay
a=-. (14)
c, + c, 1s

The total delay through the buffer is .r,(min) = 2(C, +JkC,)/g (21)
7 , = 11Tk (15)
which is to be compared with the "no buffer" delay
where

In ( c, / c,) T:=(Cx+kCy)/g.
n= (16)
InP '
The former is smaller than the latter when
Substituting now (13) and (16) into (15), we get
2( C, + JkC,) < C, + kC,
[I+ ( P - 1)Pl
T , = T , .In ( C, / C y ) .
In P (17) that is when

Finally, differentiating (17) with respect to 0 , and invoking (14) k-2Jk>--.


c,
(24)
we arrive at c,
C
p[1n(p)-1]=>. (18) If C , is very small compared to Cy,then the critical value of the
c,.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOI.. 25, NO. 4, AUGIJST 1990 1007

4
-1 0.0
K
v)
v

m" 7.5
-
a,
n
-
& 5.0
3
m
2.5
0
OL 5 1'0 115 o:
(a)

h
10.0
c
v)
v

2 7.5
-
a
n
- 5.0

2.5

(c) (d)
Fig. 4. Taper versus C , / C , : (a) p = 2.72. (b) = 3.10, ( c ) p = 3.59, and (d) P = 4.32.

TABLE I 1 TABLE 111


EQUATION
(15) A N U SPICE SIMUL.ATION
R~SULIS SLOPt VtRSUSC,/c,
number of stages n 4 5 6 7 8 C,/C,, 0 0.25 0.50 0.75 1.0 2.0 3.0
taper p 7.45 4.99 3.82 3.15 2.73 k,, 4 4.49 4.95 5.39 5.83 6.46 9.0
B, per (15) 13.3 12.2 12.1 12.6 13.0 P 2 2.1 2.2 2.3 2.4 2.5 3.0
B, per SPICE* 13.8 12.6 11.9 12.2 12.4
*For C, = 38.9 fF,C,- = 50 pF, and MOSIS 1.2-/~mCMOS SPICE
parameters. fan-out is

k,,(O) =4.

Othcnvise

k,, = 2+2J(1+ C,/C,) + C,/C,. (26)

Equation (26) and Table 111 reveal that the answer to the buffer
question depends on both C,/C,, and C,/C,. As a rule, a
buffer should be used only when C , /C, is larger than four.
(a) V. CONCLUSION

VI rt
$?lcx p&c.x.$,

C L= kCy
The split-capacitor model leads to the conclusion that the
taper is a function of C, / C , and, therefore, a matter of
technology, i.e., it depends on fcature size, gate-oxide thickness,
junction capacitanccs, etc. For any particular load capacitance,
there exists a best taper and a corresponding best number of
stages, but the law relating the delay penalty to the taper of the
buffer is not very strong. At on-chip distribution points, buffers
are justified only where fan-out cxceeds a factor of 4. However,
the chip-crossing penalty of MOS implementations is severe
(b) even in the best case; therein lies a strong argument in favor of
Fig. 5 . The fan-out decision: (a) direct hookup and (b) buffered connection. BiCMOS I/O.
1008 I E E E JOURNAL OF SOLID-STATE C i n c u i T s , VOL. 25, NO. 4, AUGUST 1990

REFERENCES double-edge-triggered flip-flops (DET-FFs) have two major ad-


vantages. First, power dissipation is reduced. With the conven-
N. H. E. Weste and K. Eshrdgian, Principles of CMOS VLSI Design: A
System Perspectii,e. Reading, MA: Addison-Wesley, 1985. tional SET-FF’s, one of the two clock transitions accomplishes
C. Mead and L. Conway, Introduction to VLSI Systems. Reading, MA: nothing. However, this transition may cause changes in the
Addison-Wesley, 1980.
H. C. Lin and L. W. Linholm, “An optimized output stage for MOS
output of some logic elements internal to the FF’s. In addition,
integrated circuits,” IEEE J . Solid-state Circuits, vol. SC-IO, no. 2, pp. extra energy is wasted to charge or discharge the capacitive load
106-109, Apr. 1975. of the global clock line in a system using SET-FF’s. This is
R. C. Jaeger, “Comments on ‘An optimized output stage for MOS
integrated circuits’,” IEEE J . Solid-state Circuits, vol. SC-IO, no. 3, pp. particularly true in CMOS where static power dissipation is
185-186, June 1975. small and the dynamic power dissipation is the main contributor
H. J. Veendrick, “Short-circuit dissipation of static CMOS circuitry and of energy dissipation. Second, the speed of the system is acceler-
its impact o n the design of buffer circuits,” fEEE J . Solid-state Circuits,
vol. SC-19, no. 4, pp. 468-474, Aug. 1984. ated. With both edges able to cause state transition, some
N. Hedenstierna and K. 0. Jeppson, “CMOS circuit speed and buffer redundant logic can be eliminated. Moreover, the clock period
optimization,” IEEE Trans. Computer-Aided Design, vol. CAD-6, no. 2, will be shortened because there is no need to wait for the clock
pp. 276-281, Mar. 1987.
M. Nemes, “Driving large capacitances in MOS LSI systems,” IEEE J . signal to toggle up and down.
Solid-Stale Circuits, vol. SC-19, no. 1, pp. 159-161, Feb. 1984. The main disadvantage of DET-FF’s has been the substantial
A. Kanuma, “CMOS circuit optimization,” Solid-Slate Electron., vol. 26,
pp. 47-58, 1983.
increase in the number of components required to build such
J. R. Greenbaum, “Digital-IC models for computer-aided design,” FF’s. In most cases, more than double the logic counts is
Electronics, pp. 121-125, Dec. 6, 1973; also pp. 107-112, Dec. 20, 1973. expected. This paper proposes a novel design in CMOS which
G. Arnout and H. J. De Man, “The use of threshold functions and
Boolean-controlled network elements for macromodeling of LSI cir- will implement static DET-FF’s with relatively little increase in
cuits,” IEEE J . Solid-State Circuits, vol. SC-13, no. 3, pp. 326-332, June components. It is based on the single-phase CMOS register
1978. proposed by Lu in [2]. An implementation of a D-type DET-FF
G. P. Rosseel and R. W. Dutton, “Influence of device parameters on
the switching speed of BiCMOS buffers,” IEEE J . Solid-State Circuits, uses only 26 MOS devices in comparison with a typical static
vol. 24, no. 1, pp. 90-99, Feb. 1989. CMOS D-type flip-flop which requires 16 MOS devices. An-
H. L. De Los Santos and B. Hoefflinger, “Optimization and scaling of
CMOS-bipolar drivers for VLSI interconnects,” IEEE Trans. Electron
other disadvantage of DET-FF’s is in the extra delays caused by
Deuces, vol. ED-33, no. 11, pp. 1722-1730, Nov. 1986. the extra gates needed to implement it by parallel decomposi-
tion. The presented CMOS implementation introduces little
delays. It satisfies the speed requirement of the modern digital
system. This D-FF is clocked at 50 MHz. Simulation performed
with parameters obtained from a MOS Implementation System
A Novel CMOS Implementation of
(MOSIS) [3] 2-pm CMOS/bulk process endorses the proposed
Double-Edge-TriggeredFlip-Flops implementation.
SHIH-LIEN LU AND MILOS ERCEGOVAC, MEMBER, IEEE

11. CIRCUIT
DESIGNOF A D-TYPEDET-FF
Abstmct -A CMOS implementation of a D-type double-edge-
triggered flip-flop (DET-FF) is presented. A DET-FF changes its state at A D-type DET-FF consists of two cross-coupled latches with
both the positive and the negative clock edge transitions. It has advan- input gating devices and some simple pass-transistor logic. A
tages with respect to both system speed and power dissipation. The circuit diagram is illustrated in Fig. 1. Its operation principle is
design presented requires little overhead in circuit complexity. This
CMOS D-type DET-FF is capable of operating at more than 50 MHz, similar to the one used by Mead and Wawrzynek [4]. The two
which gives an equivalent system frequency of 100 MHz. cross-coupled latches are enabled/disabled by the clock signal.
When the clock is low, latch 1 is disabled and latch 2 is enabled.
I. INTRODUCTION With clock high, latch 1 is enabled and latch 2 disabled. A
Conventional single-edge-triggered flip-flops (SET-FF’s) disabled latch 1 has both of its output and the complement set
change states at the time when the clock signal goes from 0 to 1 to high (Vdd).A disabled latch 2 has both its output and the
or at the time when the clock goes from 1 to 0. The former are complement set to low (GND). During the rising edge of the
called positive-edge-triggered flip-flops (PET-FF’s) or rising- clock signal, latch 1 is being enabled. Depending on the D input
edge-triggered flip-flops (RET-FF’s) and the latter are called value, either transistor M 7 or M 8 is conducting just before M 9
negative-edge-triggered flip-flops (NET-FF’s) or trailing-edge switches off. Either output Ql or its complement will remain
triggered flip-flops (TET-FF’s). The advantage of edge trigger- charged to high (&,) while the other is discharged to low
ing is that the setup time for data input is independent of the (GND). The set value will stay unchanged throughout the half of
clock pulse width. This makes system design simpler. It is also the clock period while it is high. Similarly, on the trailing edge
less sensitive to noises. However, these flip-flops respond only of the clock signal latch 2 is being enabled. According to the
once per clock pulse cycle. Energy and time are wasted. Unger value of the D input, either Q 2 or its complement will remain
proposed in [l]a class of flip-flops (FF’s) that will respond to low (GND) while the other will be set to high. The output value
both the positive and the negative edge of the clock pulse. These remains stable for the duration of low clock signal. Thus, this
DET-FF is a static flip-flop. It consumes no static power. Table
I gives the logic required to obtain the final output value. We
Manuscript received November 14, 1989; revised December 18, 1989. observed that when the clock is low, both Q l and its comple-
S:L. Lu is with the Department of Computer Science, University of
California, Los Angeles, CA 90024 and with MOSIS, Marina del Rey, CA ment are high. The final value is the value of Q2. When clock is
90292-6695. high, both Q2 and its complement are low. The final value is the
M. Ercegovac is with the Department of Computer Science, University of value of Ql. Pass-transistor logic, shown in Fig. l(c), is used to
California, Los Angeles, CA 90024.
IEEE Log Number 9036484. implement the logic function.

0018-9200/90/0800-1008$01.00 01990 IEEE

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY HAMIRPUR. Downloaded on January 19, 2010 at 09:13 from IEEE Xplore. Restrictions apply.

You might also like