You are on page 1of 14

ESE370

Fall 2014

University of Pennsylvania
Department of Electrical and System Engineering
Circuit-Level Modeling, Design, and Optimization for Digital Systems

Final

ESE370, Fall 2014

Thursday, December 18

Problem weightings shown.


Calculators allowed.
Closed book = No text or notes allowed.
Final answers here.
Additional workspace in exam book. Note where to find work in exam book if relevant.
Sign Code of Academic Integrity statement at back of exam book.

Name: Answers

Mean: 44, Standard Deviation: 15


Default technology:

F =22nm High Performance Process (HP)


=1
Vdd =900mV
nominal Vthn = Vthp =300mV
C0 = 2 1017 F (for W = 1 device)
velocity saturated operation

Device

Vgs

NMOS Vgs < Vthn


Vgs > Vthn
PMOS

Vgs > Vthp


Vgs < Vthp

Ids
Vgs Vthn

(1 106 ) W e 40mV
3 105 W (Vgs  Vthn ) 
6

Vgs Vthp
40mV

(1 10 ) W e
3 105 W (Vgs Vthp )

1(a)
1(b)
1(c)
1(d)
1(e)
2(a)
2(b)
2(c)
3(a)
3(b)
4(a)
4(b)
4(c)
Total

ESE370

Fall 2014

1. Speed and Energy.


Consider using CMOS nand2 gates sized to equalize rise and fall time, but otherwise
minimum sized.
(a) Assume the critical path in the design (including flip-flop setup time and clockto-q delay) can be modeled as a series chain of 10 of these gates , each loaded
by 4 equivalent gates. What is the maximum frequency of operation possible? [5
pts]
There were a large number of interpretations of what gates sized to equalize rise
and fall time, but otherwise minimum sized. We generally took them all, but,
nonetheless, believe there is a single, unambiguous answer.
Equalizing rise and fall, the series devices (pull down) are twice as large as the
parallel (pull up) devices, so Wn = 2, Wp = 1.

1
Out

2
2

R0 =
C0 =
=
tnand2 =
tcycle =
Fmax =

Vdd
0.9V
=
= 50K
(1)
5
I(Vgs = Vdd , W = 1)
3 10 1 (0.9 0.3)
2 1017 F
(2)
R0 C0 = 1ps
(3)
R0
R0
(8 + 4 (Wn + Wp )) C0 +
(4 + 4 (Wn + Wp )) C0 = 18(4)
2
2
10 tnand2 = 180 = 180ps
(5)
1
= 5.6 GHz
(6)
tcycle

Max Frequency 5.6 GHz

ESE370

Fall 2014

(b) Assuming chip cooling allows a maximum dynamic power dissipation1 of 1W,
when operating at the frequency from part (a), what is the maximum number of
gates that can switch during a clock cycle, on average? [5 pts]
In the worst case, each gate switches: Cload = 4 3C0 + 8C0 = 20C0 .
0.5Ngate Cload (Vdd )2 Fmax 1W

(7)

Max gate-evals/clock 1.1 M


(c) Assuming we drop the operating voltage, Vdd , to 450mV, what is the impact on
maximum frequency of operation and maximum number of gates that can switch
during a clock cycle? [5 pts]

R0 =
=
tcycle =
Fmax =
Ngate =

0.45V
Vdd
=
= 100K (8)
5
I(Vgs = Vdd , W = 1)
3 10 1 (0.45 0.3)
R0 C0 = 2ps
(9)
10 tnand2 = 180 = 360ps
(10)
1
= 2.8 GHz
(11)
tcycle
1W
= 8.88 M
(12)
0.5Cload (Vdd )2 Fmax

Max Frequency
2.8 GHz
Max gate-evals/clock 8.9 M
(d) What is the ratio of gate-evaluations/second that can be performed between the
two cases? [5 pts]
8.88 M 2.8 GHz
=4
1.11 M 5.6 GHz

gate-evaluations(Vdd=450mV) 4
gate-evaluations(Vdd=900mV)

Assume leakage neglible or budgeted separately.

(13)

ESE370

Fall 2014

(e) Assuming the output of one of these gates drives a single gate input through an
unbuffered wire with Rwire = 700K/cm, Cwire =1.7pF/cm, what is the maximum
distance the signal can travel in one clock cycle when operating at Vdd = 450mV
and at the maximum clock frequency identified (part c)? [5 pts]
R0
(8C0 + 4C0 + 2Cwire Lwire + 2 3C0 )
2
+0.5Cwire Rwire (Lwire )2 + Rwire Lwire 3C0
(14)
R0
(18C0 + 2Cwire Lwire ) + 0.5Cwire Rwire (Lwire )2 + Rwire Lwire 3C(15)
=
0
2
360 ps = 18 ps
(16)
5
12
+1 10 1.7 10 Lwire
+0.5 7 105 1.7 1012 (Lwire )2
+7 105 3 2 1017 Lwire
(17)
7
342 ps = 1.7 10 Lwire
+5.95 107 (Lwire )2
+42 1012 Lwire
(18)
tcycle =

We can clearly drop the final term since 42 1012 << 1.7 107 . Since Lwire
must be less than 1 (much less than 1), the (Lwire )2 term will be much less than
the Lwire term. So, we can start by solving:
342 ps 1.7 107 Lwire
342 1012
Lwire
1.7 107


3.42

103 cm
1.7
20m

(19)
(20)
(21)
(22)

Checking:
360 ps = 18 ps
+1 105 1.7 1012 20 104


+0.5 7 105 1.7 1012 20 104

(23)
2

+7 105 3 2 1017 20 104


= 18 1012 + 3.4 1010 + 2.38 1012 + 84 1015
= 3.60464 1010 360 ps

Max Distance 20 m
4

(24)
(25)
(26)

ESE370

Fall 2014

2. Memory Segmentation.
Consider a memory bit column where we add an output line and place a sense amplifier
every B rows of memory with the output of the sense amplifiers multiplexed onto the
output line as shown (facing page).
The height of each memory row is 300nm. Rwire = 700K/cm, Cwire =5.0pF/cm.2
Assume every sense amplifier on the column consumes 1015 J on every read operation.3
Assume a 6T SRAM cell with W = 1 transistors on the inverters and W = 2 transistors
for the access transistors. Reads start with bit-lines precharged to Vdd /2. For simplicity,
assume the bit-lines end up being charged all the way to the respective rails during a
read. Note that wire capacitance also contributes to the total bit-line wire capacitance.
Assume 1024 row memory. For parts (a) and (b), compare a B = 128 segmented
case with a B = 1024 unsegmented case. All questions are about the memory column
(address energy is not included).
(a) What is the impact of the B = 128 segmentation on read energy? [10pts]
There are two bit lines, but only one output line. If we ignored the capacitance
of the access transistors, we reduce the energy by a factor of two by using only
the access line and not the bit lines. We save a bit more because the output line
also does not have the access transistor capacitance. However, we must also pay
for the extra sense amps.
(Vdd )2
(Vdd )2 1024 15
+
10 +300 nmCwire (1024B)
2
B
2
(27)
While the bit lines are pre-charged to 0.5Vdd , since we assume they switch all the
way to the rails, the V is still a full Vdd . On the full cycle, we charge the lines
to 0.5Vdd to a rail, then back to 0.5Vdd .
Also note that only the bit lines within the segment with the activated word line
will swing. A segment that does not include an activated word line will have no
memories turned on and hence will not drive the bit lines away from 0.5Vdd . The
bit lines in this non-active segment will stay there. Precharge will serve to keep
them at 0.5Vdd if leakage causes them to drift.
Eread (B) = 2B (300 nm Cwire + 2C0 )

300 nm Cwire = 300 nm 5 1012 F/cm


5 1012 F
= 300 nm
107 nm
19
= 1500 10 F
= 15 1017 F
2

(28)
(29)
(30)
(31)

Total effective capacitance per cm including wire to ground and wire to wire.
Only one could be selected as active, but that would complicate the problem further with both an active
and inactive energy cost per sense amplifier.
3

ESE370

Fall 2014

Eread (B) = 2B 15 1017 F + 4 1017 F

0.92 1024 15
0.92
+
10 +151017 F(1024B)
2
B
2
(32)

Eread (B = 1024)
Energy(B = 1024)
=
Energy(B = 128)
Eread (E = 128)

(33)

2 1024 (15 1017 F + 4 1017 F) 0.92 + 1015


2
2
2 128 (15 1017 F + 4 1017 F) 0.92 + 8 1015 + 15 1017 F (1024 128) 0.92
(34)
2048 (15 + 4) + 100 0.92 2
256 (15 + 4) + 800 0.92 2 + 15 896

(35)

2048 19 + 247
256 19 + 1976 + 15 896

(36)

Energy(B=1024) 1.9
Energy(B=128)
(b) If the low addresses (0-127) are in the segment closest to the output and 90%
of the accesses are to the these low addresses (0-127), what is the impact of the
B = 128 segmentation on average read energy? [5pts]
Here, we do not need to pay for long output line on the 90% of the memory
accesses close to the output.
2048 19 + 247
256 19 + 1976 + 0.1 15 896

Energy(B=1024)
Energy(B=128, 90% low address) 4.8

(37)

ESE370

Fall 2014

(c) For uniform random access to this 1024 row memory, what B minimizes worst-case
energy? [10pts]

0.92
0.92 1024 15
+
10 +151017 F(1024B)
2
B
2
(38)
with respect to B and set equal to zero to find the minimum:

Eread (B) = 2B 15 1017 F + 4 1017 F


Differentiate Eread

2
2


1024 15
dEread
17
17 0.9
17 0.9
= 2 15 10
+ 4 10
2 10 1510
= 0 (39)
dB
2
B
2

0 = (2 19 15)

0.92 1024 100

2
B2

23 0.92
1024 100
=
2
B2
B=

v
u
u 1024 100
t
2
230.9
2

B 105

105

(40)

(41)

(42)

ESE370

Fall 2014
BL

/BL

WL
1

Output Line

Sense
Amp
Select Line

BL

/BL

WL
1

Output Line

6T SRAM

Sense
Amp
Select Line

BL

/BL

WL
1

Output Line

Sense
Amp
Select Line

ESE370

Fall 2014

3. Crosstalk and Throughput.


Consider trying to send a large amount of data across 1mm of a chip. You have
budgeted 5m width to route buffered wires. Wires can be placed at a minimum
pitch of 50nm (25nm wires, spaced Lw2w =25nm apart), so you could put a maximum
of 100 wires in this space. At this minimum pitch, Cwire2gnd = Cwire2wire . Assume
wire width and buffering is fixed. You only get to control wire spacing, Lw2w . While
buffered, assume the wires are not pipelined; for the worst-case switching conditions,
data must get from one end of the 1mm buffered wire to the other before the next bit
can be sent. [We are deliberately not giving you full details of the buffering (buffer
size, frequency) and wire technology (resistance and capacitance per unit length) as it
should be possible to reason through this without those details; nonetheless, you may
assume it is optimally buffered.]

Cwire2wire

Cwire2gnd

(a) What Lw2w maximizes the communication bandwidth? [assume you can specify
any integer number of nanometers] [20pts]
Twire Rwire Cwire (Lwire )2

(43)

Worst-case, the capacitance is to ground and the wires to the left and right of
a particular wire. Furthermore, a wire and its neighbor may switch in opposite
directions, demanding that it be charged to 2 the voltage swing.
Cwire = Cwire2gnd + 2 2Cwire2wire

(44)

Cwire2wire (Lw2w = 25 nm) = Cwire2gnd

(45)

We are given:
, we know the Cwire2wire capacitance varies inversely proportional to
Since C = A
d
d = Lw2w , giving us:
Cwire2wire (Lw2w ) = Cwire2gnd

25 nm
Lw2w

(46)

ESE370

Fall 2014

F =
Nwires =

(47)

Twire

5000nm
25nm + Lw2w

(48)

5000
1




BW = Nwires F =
2
25
25 + Lw2w
Rwire (Lwire ) Cwire2gnd 1 + 4 Lw2w
(49)


BW =

1
5000


2
Rwire (Lwire ) Cwire2gnd
(25 + Lw2w ) 1 +

100
Lw2w

(50)

Only the second term is a function of Lw2w , so we want to minimize it, which we
do by maximizing the denominator:


dterm = (25 + Lw2w ) 1 +


dterm = 25 + Lw2w +

100
Lw2w

(51)

2500
+ 100
Lw2w

(52)

2500
Lw2w

(53)

dterm = 125 + Lw2w +

2500
ddterm
=1
dLw2w
(Lw2w )2

(54)

Lw2w 50 nm
(b) How much better is this than the bandwidth at the minimum pitch? [5pts]



1
(25+50)(1+ 100
50 )
1
(25+25)(1+ 100
25 )

Bandwidth(Lw2w =part a) 1.11


Bandwidth(Lw2w =25nm)

10




50 5
75 3

(55)

ESE370

Fall 2014

4. Series Termination and Variation.


Consider using a CMOS inverter to drive a 50 transmission line from a 600mV supply4
to a single sink that is effectively an open circuit. Because of process variation, the
expected |Vth | range is 150mV to 450mV. P and N vary independently, but all P
transistors on the same die will see the same Vth (similarly for N).
(a) What transistor size, W 1, should you use for proper termination at the nominal
|Vth |=300mV? [5 pts]
50 =

Vdd
0.6V
2 105
=
=
I
3 105 W 1 (0.6 0.3)
3 W1

(56)

2 105
= 1333
150

(57)

W1 =

W 1 1333
(b) Considering variation at this transistor size (W 1), what is the range of possible
magnitudes for the reflections the source might produce? [10 pts]
Low at Vth =150 mV:
Vdd I =

0.6V
= 33
1333 (0.6 0.15)

(58)

0.6V
= 100
3 105 1333 (0.6 0.45)

(59)

105

High at Vth =450 mV:


Vdd I =

Reflection coefficients:
0
R = 33: RZ
= 17
0.20
R+Z0
83
RZ0
50
R = 100: R+Z0 = 150 0.33
The forward pulse is also affected:
R = 33: Forward = 0.6 50
0.36
83
50
R = 100: Forward = 0.6 150 0.20
This gives reflections 0.20 0.36 = 0.072 to 0.33 0.20 = 0.066.

Reflection Range (coefficient) -20%33%


Reflection Range (absolute) -0.0720.066
Original intent here was to identify the coefficients, but absolute voltages were
fine. In fact the impact on the voltage divider makes this even more interesting.
We also accepted answers that did not take the impact of the initial voltage
division into account.
4

Different from default 900mV Vdd .

11

ESE370

Fall 2014

Ctl0
Ctl1
Ctl2
Ctl3
Ctl4
Ctl5

(c) Consider making the drive inverter tri-stateable and adding a second tri-stateable
inverter half the size of the first and a third one-quarter the size of the first (as
shown below). The control inputs (Ctl0...Ctl5) can be set to mitigate process
variationyou set them to try to minimize the magnitude of the source reflection
after the chip has been fabricated. Assuming the control inputs are properly set,
what is the new possible magnitude range for the reflections that may be produced
at the source? [10 pts]
Receiver
W1/4
In

W1/2
Trans. Line
W1
W1
W1/2
W1/4

Logic was slightly wrong here. The intent was to control which transistors came
on. The logic is correct for the pull down, but the pullup needs to combine the
input with the control using an or (or a nand as shown next page) rather than
an and so that it can disable a drive transistor for any input.
With the defective pullup, we cannot set the controls to disable the transistors.
This will make all the transistors in parallel, reducing the resistance by a factor
for 1 + 12 + 14 = 74 . In the 150mV case, the 33 resistance becomes about 19.
The 100 resistance actually gets better (57), which was the original intent. It
is possible to match the pulldown properly, with the controls, but it wont be the
worst-case.
Reflection coefficients:
RZ0
0.45
R = 19: R+Z
= 31
69
0
RZ0
7
R = 57: R+Z0 = 107 0.065
The forward pulse is also affected:
R = 19: Forward = 0.6 50
0.43
69
50
R = 57: Forward = 0.6 107 0.28
This gives reflections 0.45 0.43 = 0.19 to 0.065 0.28 = 0.018.

Reflection Range (coefficient) -45%6.5%


Reflection Range (absolute) -0.190.018

12

ESE370

Fall 2014

Ctl0
Ctl1
Ctl2
Ctl3
Ctl4
Ctl5

Corrected:
Receiver
W1/4
In

W1/2
Trans. Line
W1
W1
W1/2
W1/4

The idea here is that you can control the strength of the drive between 41 R and
7
R in increments of R4 , allowing better matching of the line.
4
At 33, we can bring the resistance up to 44 with 34 R.
At 100, we get 57 using 74 R as noted above.
However, we must now consider all the cases between 33 and 100 to understand
the new worst-case. We will see that these are almost the worst-cases.
As we drop from 100, with the 74 setting, the effective resistance gets closer to
50 until 87.5, it stays above 44 as until 77. At 77, a 64 setting achieves
51. Staying with a 46 and reducing below 77, the resistance stays above 44
until 66. At 66, a 54 setting achieves 52. Staying with the 54 setting as we
drop from 66, keeps the resistance above 44 until 55. Between 55 and 44,
4
leaves the resistance unchanged and it is within the 4457 range. Below 44 ,
4
either we expand the range, or must switch to the 34 setting. At 44, the 43 setting
gives us 59; we can use the 44 setting there, but any lower, we either extend the
range below 44 or may get this slightly larger high resistance value. At 43, the
3
setting gives us 57, and the resistance drops from there as we continue toward
4
33. So, we can claim a 4357 range or a 4459 range.
Reflection coefficients:
0
R = 44: RZ
= 6
0.063
R+Z0
94
RZ0
9
R = 59: R+Z0 = 109 0.083
0
R = 43: RZ
= 7
0.075
R+Z0
93
7
RZ0
R = 57: R+Z0 = 107 0.065
The forward pulse is also effected:
R = 44: Forward = 0.6 50
0.32
94
50
R = 59: Forward = 0.6 109
0.28
50
R = 43: Forward = 0.6 93 0.32
50
R = 57: Forward = 0.6 107
0.28
Using 43/57, this gives reflections 0.0750.32 = 0.024 to 0.0650.28 = 0.018.

Reflection Range (coefficient) -0.075%0.065%


Reflection Range (absolute)
-0.0240.018
13

ESE370

Fall 2014
This page left nearly blank for pagination and calculations.

prefix
scale
G Giga
109
M Mega
106
K Kilo
103
c centi
102
m milli
103
micro 106
n nano
109
p
pico 1012
f femto 1015
14

You might also like