Professional Documents
Culture Documents
Fall 2014
University of Pennsylvania
Department of Electrical and System Engineering
Circuit-Level Modeling, Design, and Optimization for Digital Systems
Final
Thursday, December 18
Name: Answers
Device
Vgs
Ids
Vgs Vthn
(1 106 ) W e 40mV
3 105 W (Vgs Vthn )
6
Vgs Vthp
40mV
(1 10 ) W e
3 105 W (Vgs Vthp )
1(a)
1(b)
1(c)
1(d)
1(e)
2(a)
2(b)
2(c)
3(a)
3(b)
4(a)
4(b)
4(c)
Total
ESE370
Fall 2014
1
Out
2
2
R0 =
C0 =
=
tnand2 =
tcycle =
Fmax =
Vdd
0.9V
=
= 50K
(1)
5
I(Vgs = Vdd , W = 1)
3 10 1 (0.9 0.3)
2 1017 F
(2)
R0 C0 = 1ps
(3)
R0
R0
(8 + 4 (Wn + Wp )) C0 +
(4 + 4 (Wn + Wp )) C0 = 18(4)
2
2
10 tnand2 = 180 = 180ps
(5)
1
= 5.6 GHz
(6)
tcycle
ESE370
Fall 2014
(b) Assuming chip cooling allows a maximum dynamic power dissipation1 of 1W,
when operating at the frequency from part (a), what is the maximum number of
gates that can switch during a clock cycle, on average? [5 pts]
In the worst case, each gate switches: Cload = 4 3C0 + 8C0 = 20C0 .
0.5Ngate Cload (Vdd )2 Fmax 1W
(7)
R0 =
=
tcycle =
Fmax =
Ngate =
0.45V
Vdd
=
= 100K (8)
5
I(Vgs = Vdd , W = 1)
3 10 1 (0.45 0.3)
R0 C0 = 2ps
(9)
10 tnand2 = 180 = 360ps
(10)
1
= 2.8 GHz
(11)
tcycle
1W
= 8.88 M
(12)
0.5Cload (Vdd )2 Fmax
Max Frequency
2.8 GHz
Max gate-evals/clock 8.9 M
(d) What is the ratio of gate-evaluations/second that can be performed between the
two cases? [5 pts]
8.88 M 2.8 GHz
=4
1.11 M 5.6 GHz
gate-evaluations(Vdd=450mV) 4
gate-evaluations(Vdd=900mV)
(13)
ESE370
Fall 2014
(e) Assuming the output of one of these gates drives a single gate input through an
unbuffered wire with Rwire = 700K/cm, Cwire =1.7pF/cm, what is the maximum
distance the signal can travel in one clock cycle when operating at Vdd = 450mV
and at the maximum clock frequency identified (part c)? [5 pts]
R0
(8C0 + 4C0 + 2Cwire Lwire + 2 3C0 )
2
+0.5Cwire Rwire (Lwire )2 + Rwire Lwire 3C0
(14)
R0
(18C0 + 2Cwire Lwire ) + 0.5Cwire Rwire (Lwire )2 + Rwire Lwire 3C(15)
=
0
2
360 ps = 18 ps
(16)
5
12
+1 10 1.7 10 Lwire
+0.5 7 105 1.7 1012 (Lwire )2
+7 105 3 2 1017 Lwire
(17)
7
342 ps = 1.7 10 Lwire
+5.95 107 (Lwire )2
+42 1012 Lwire
(18)
tcycle =
We can clearly drop the final term since 42 1012 << 1.7 107 . Since Lwire
must be less than 1 (much less than 1), the (Lwire )2 term will be much less than
the Lwire term. So, we can start by solving:
342 ps 1.7 107 Lwire
342 1012
Lwire
1.7 107
3.42
103 cm
1.7
20m
(19)
(20)
(21)
(22)
Checking:
360 ps = 18 ps
+1 105 1.7 1012 20 104
(23)
2
Max Distance 20 m
4
(24)
(25)
(26)
ESE370
Fall 2014
2. Memory Segmentation.
Consider a memory bit column where we add an output line and place a sense amplifier
every B rows of memory with the output of the sense amplifiers multiplexed onto the
output line as shown (facing page).
The height of each memory row is 300nm. Rwire = 700K/cm, Cwire =5.0pF/cm.2
Assume every sense amplifier on the column consumes 1015 J on every read operation.3
Assume a 6T SRAM cell with W = 1 transistors on the inverters and W = 2 transistors
for the access transistors. Reads start with bit-lines precharged to Vdd /2. For simplicity,
assume the bit-lines end up being charged all the way to the respective rails during a
read. Note that wire capacitance also contributes to the total bit-line wire capacitance.
Assume 1024 row memory. For parts (a) and (b), compare a B = 128 segmented
case with a B = 1024 unsegmented case. All questions are about the memory column
(address energy is not included).
(a) What is the impact of the B = 128 segmentation on read energy? [10pts]
There are two bit lines, but only one output line. If we ignored the capacitance
of the access transistors, we reduce the energy by a factor of two by using only
the access line and not the bit lines. We save a bit more because the output line
also does not have the access transistor capacitance. However, we must also pay
for the extra sense amps.
(Vdd )2
(Vdd )2 1024 15
+
10 +300 nmCwire (1024B)
2
B
2
(27)
While the bit lines are pre-charged to 0.5Vdd , since we assume they switch all the
way to the rails, the V is still a full Vdd . On the full cycle, we charge the lines
to 0.5Vdd to a rail, then back to 0.5Vdd .
Also note that only the bit lines within the segment with the activated word line
will swing. A segment that does not include an activated word line will have no
memories turned on and hence will not drive the bit lines away from 0.5Vdd . The
bit lines in this non-active segment will stay there. Precharge will serve to keep
them at 0.5Vdd if leakage causes them to drift.
Eread (B) = 2B (300 nm Cwire + 2C0 )
(28)
(29)
(30)
(31)
Total effective capacitance per cm including wire to ground and wire to wire.
Only one could be selected as active, but that would complicate the problem further with both an active
and inactive energy cost per sense amplifier.
3
ESE370
Fall 2014
0.92 1024 15
0.92
+
10 +151017 F(1024B)
2
B
2
(32)
Eread (B = 1024)
Energy(B = 1024)
=
Energy(B = 128)
Eread (E = 128)
(33)
(35)
2048 19 + 247
256 19 + 1976 + 15 896
(36)
Energy(B=1024) 1.9
Energy(B=128)
(b) If the low addresses (0-127) are in the segment closest to the output and 90%
of the accesses are to the these low addresses (0-127), what is the impact of the
B = 128 segmentation on average read energy? [5pts]
Here, we do not need to pay for long output line on the 90% of the memory
accesses close to the output.
2048 19 + 247
256 19 + 1976 + 0.1 15 896
Energy(B=1024)
Energy(B=128, 90% low address) 4.8
(37)
ESE370
Fall 2014
(c) For uniform random access to this 1024 row memory, what B minimizes worst-case
energy? [10pts]
0.92
0.92 1024 15
+
10 +151017 F(1024B)
2
B
2
(38)
with respect to B and set equal to zero to find the minimum:
2
2
1024 15
dEread
17
17 0.9
17 0.9
= 2 15 10
+ 4 10
2 10 1510
= 0 (39)
dB
2
B
2
0 = (2 19 15)
2
B2
23 0.92
1024 100
=
2
B2
B=
v
u
u 1024 100
t
2
230.9
2
B 105
105
(40)
(41)
(42)
ESE370
Fall 2014
BL
/BL
WL
1
Output Line
Sense
Amp
Select Line
BL
/BL
WL
1
Output Line
6T SRAM
Sense
Amp
Select Line
BL
/BL
WL
1
Output Line
Sense
Amp
Select Line
ESE370
Fall 2014
Cwire2wire
Cwire2gnd
(a) What Lw2w maximizes the communication bandwidth? [assume you can specify
any integer number of nanometers] [20pts]
Twire Rwire Cwire (Lwire )2
(43)
Worst-case, the capacitance is to ground and the wires to the left and right of
a particular wire. Furthermore, a wire and its neighbor may switch in opposite
directions, demanding that it be charged to 2 the voltage swing.
Cwire = Cwire2gnd + 2 2Cwire2wire
(44)
(45)
We are given:
, we know the Cwire2wire capacitance varies inversely proportional to
Since C = A
d
d = Lw2w , giving us:
Cwire2wire (Lw2w ) = Cwire2gnd
25 nm
Lw2w
(46)
ESE370
Fall 2014
F =
Nwires =
(47)
Twire
5000nm
25nm + Lw2w
(48)
5000
1
BW = Nwires F =
2
25
25 + Lw2w
Rwire (Lwire ) Cwire2gnd 1 + 4 Lw2w
(49)
BW =
1
5000
2
Rwire (Lwire ) Cwire2gnd
(25 + Lw2w ) 1 +
100
Lw2w
(50)
Only the second term is a function of Lw2w , so we want to minimize it, which we
do by maximizing the denominator:
100
Lw2w
(51)
2500
+ 100
Lw2w
(52)
2500
Lw2w
(53)
2500
ddterm
=1
dLw2w
(Lw2w )2
(54)
Lw2w 50 nm
(b) How much better is this than the bandwidth at the minimum pitch? [5pts]
1
(25+50)(1+ 100
50 )
1
(25+25)(1+ 100
25 )
10
50 5
75 3
(55)
ESE370
Fall 2014
Vdd
0.6V
2 105
=
=
I
3 105 W 1 (0.6 0.3)
3 W1
(56)
2 105
= 1333
150
(57)
W1 =
W 1 1333
(b) Considering variation at this transistor size (W 1), what is the range of possible
magnitudes for the reflections the source might produce? [10 pts]
Low at Vth =150 mV:
Vdd I =
0.6V
= 33
1333 (0.6 0.15)
(58)
0.6V
= 100
3 105 1333 (0.6 0.45)
(59)
105
Reflection coefficients:
0
R = 33: RZ
= 17
0.20
R+Z0
83
RZ0
50
R = 100: R+Z0 = 150 0.33
The forward pulse is also affected:
R = 33: Forward = 0.6 50
0.36
83
50
R = 100: Forward = 0.6 150 0.20
This gives reflections 0.20 0.36 = 0.072 to 0.33 0.20 = 0.066.
11
ESE370
Fall 2014
Ctl0
Ctl1
Ctl2
Ctl3
Ctl4
Ctl5
(c) Consider making the drive inverter tri-stateable and adding a second tri-stateable
inverter half the size of the first and a third one-quarter the size of the first (as
shown below). The control inputs (Ctl0...Ctl5) can be set to mitigate process
variationyou set them to try to minimize the magnitude of the source reflection
after the chip has been fabricated. Assuming the control inputs are properly set,
what is the new possible magnitude range for the reflections that may be produced
at the source? [10 pts]
Receiver
W1/4
In
W1/2
Trans. Line
W1
W1
W1/2
W1/4
Logic was slightly wrong here. The intent was to control which transistors came
on. The logic is correct for the pull down, but the pullup needs to combine the
input with the control using an or (or a nand as shown next page) rather than
an and so that it can disable a drive transistor for any input.
With the defective pullup, we cannot set the controls to disable the transistors.
This will make all the transistors in parallel, reducing the resistance by a factor
for 1 + 12 + 14 = 74 . In the 150mV case, the 33 resistance becomes about 19.
The 100 resistance actually gets better (57), which was the original intent. It
is possible to match the pulldown properly, with the controls, but it wont be the
worst-case.
Reflection coefficients:
RZ0
0.45
R = 19: R+Z
= 31
69
0
RZ0
7
R = 57: R+Z0 = 107 0.065
The forward pulse is also affected:
R = 19: Forward = 0.6 50
0.43
69
50
R = 57: Forward = 0.6 107 0.28
This gives reflections 0.45 0.43 = 0.19 to 0.065 0.28 = 0.018.
12
ESE370
Fall 2014
Ctl0
Ctl1
Ctl2
Ctl3
Ctl4
Ctl5
Corrected:
Receiver
W1/4
In
W1/2
Trans. Line
W1
W1
W1/2
W1/4
The idea here is that you can control the strength of the drive between 41 R and
7
R in increments of R4 , allowing better matching of the line.
4
At 33, we can bring the resistance up to 44 with 34 R.
At 100, we get 57 using 74 R as noted above.
However, we must now consider all the cases between 33 and 100 to understand
the new worst-case. We will see that these are almost the worst-cases.
As we drop from 100, with the 74 setting, the effective resistance gets closer to
50 until 87.5, it stays above 44 as until 77. At 77, a 64 setting achieves
51. Staying with a 46 and reducing below 77, the resistance stays above 44
until 66. At 66, a 54 setting achieves 52. Staying with the 54 setting as we
drop from 66, keeps the resistance above 44 until 55. Between 55 and 44,
4
leaves the resistance unchanged and it is within the 4457 range. Below 44 ,
4
either we expand the range, or must switch to the 34 setting. At 44, the 43 setting
gives us 59; we can use the 44 setting there, but any lower, we either extend the
range below 44 or may get this slightly larger high resistance value. At 43, the
3
setting gives us 57, and the resistance drops from there as we continue toward
4
33. So, we can claim a 4357 range or a 4459 range.
Reflection coefficients:
0
R = 44: RZ
= 6
0.063
R+Z0
94
RZ0
9
R = 59: R+Z0 = 109 0.083
0
R = 43: RZ
= 7
0.075
R+Z0
93
7
RZ0
R = 57: R+Z0 = 107 0.065
The forward pulse is also effected:
R = 44: Forward = 0.6 50
0.32
94
50
R = 59: Forward = 0.6 109
0.28
50
R = 43: Forward = 0.6 93 0.32
50
R = 57: Forward = 0.6 107
0.28
Using 43/57, this gives reflections 0.0750.32 = 0.024 to 0.0650.28 = 0.018.
ESE370
Fall 2014
This page left nearly blank for pagination and calculations.
prefix
scale
G Giga
109
M Mega
106
K Kilo
103
c centi
102
m milli
103
micro 106
n nano
109
p
pico 1012
f femto 1015
14