Professional Documents
Culture Documents
Ruixing Yang
15.01.2009
Outline
The presentation is based on the reference book (M. Keating, et al., Low Power
Methodology Manual for System-on-Chip Design, Springer, 2007. ) chapter 14. All the
contents and figures used here are referenced from the book chapter 14.
Power Gating challenges
However:
I) Overhead
Silicon area taken by the sleep transistors.
Routing resources for permanent and virtual power networks.
Complex power-gating design and implementation processes.
II) Power integrity issues.
IR drop on the sleep transistors
Ground bounce caused by in-rush wake up current.
III) Wakeup latency.
Ring vs. Grid Style
Coarse grain power gating can be implemented in either a ring or a grid style power network.
Ring based switching – place the switches externally to the power gated block effectively
encapsulating the block with a ring of switches.
Grid based switching – the sleep transistors are distributed throughout the power gated region.
1. For the design which implements retention cells, select grid style.
2. If no retention cells, check the area budget and the need for permanent
power supply
p pp y in the ppower-down areas for always-on
y buffers.
3. For the design which has power-gated hard macros, or blocks without
retention logic, select hybrid style.
4. For grid-style, use wide straps in permanent power network to reduce
IR drop.
drop
Header vs. Footer Switch
90nm High VT pMOS Switch Efficiency 90nm high VT nMOS Switch Efficiency
at Normal Body Bias at Normal Body Bias
Header vs. Footer Switch – cont.
2. Area Efficiency Consideration and L/W Choice
The area efficiency depends on the size (L*W) and layout implementation of the sleep
t
transistors.
i t
Optimal L is determined by the switch efficiency and can be obtained from the switch
efficiency curve.
The switch efficiency decreases with the increase of W in pMOS transistors, therefore the
small W is preferred.
p
Figure shows us:
Ion linearly increases with W.
Ion/W becomes constant at
given L and Vbb -> the area
efficiency is determined by
the layout implementation of
the sleep transistors.
Header vs. Footer Switch – cont.
3. Body Bias Considerations
Applying reverse body bias on the sleep transistor can increase the switch
efficiency and reduce leakage significantly.
Cost for the reverse body bias in the header switch is significantly smaller than
in the footer switch.
Reason:
N-well of the pMOS transistor is readily available for bias tapping in the
standard CMOS p process. It can be tapped
pp to its own body y bias supply
pp y as long
g
as N-well of the sleep transistor has enough space from the surrounding
standard cells’ N-wells.
nMOS transistor does not have a well in the standard CMOS process. It is
necessaryy to create wells for nMOS sleepp transistors to allow separate
p body
y
bias. Æ higher chip fabrication cost and design complexity & more process
variations.
Conclusion: pMOS header is preferable in reverse body bias application.
Header vs. Footer Switch – cont.
4. System Level Design Consideration
In SoC designs, blocks usually communicate in the active-high interface
protocols referencing common ground (VSS) as logic “0”. In header switch
implementation, all signal nets in power-gated blocks are settled at Vss which is
convenient from system design perspective.
Header switch avoids p potential signal
g integrity
g y issues and header switch allows
a simple design of a pull-down transistor to isolate power-gated blocks and
clamp output signals at logic “0”.
5. Recommendations – Header vs. Footer
Area efficiencyy is main concern: nMOS,, which produces
p higher
g switch efficiency
y
and smaller transistor size. W should be chosen as large as possible for a given
cell height.
System level design and IP integration: header.
Header is more commonly used than footer in power-gating design currently.
Choice of sleep p transistor can be limited by
y the availability
y of the low-leakage
g
transistor in a given technology.
Minimum standby leakage is main concern: W should be chosen based on high
switch efficiency and hence low leakage.
W is obtained based on the investigation of area and leakage trade-off.
Rail vs. Strap VDD Supply
Sleep transistors get power supply from the permanent power network (VDD) and deliver it
to the virtual power network (VVDD). Two ways to distribute Vdd to the sleep transistors –
Rail
R il vs. St
Strap VDD supply.
l
1. Parallel Rail VDD Distribution
A VDD rail is added to a cell row in parallel with VVDD rail. The sleep transistor gets its
permanent power supply by connecting to VDD rails.
Advantages:
Permanent power supply rail is reachable throughout the design.
No restriction on the placement of cells which require connections to permanent power
supply.
Disadvantages:
Th implementation
The i l t ti takes
t k att least
l t one trace
t off routing
ti resources iin every row iin VDD railil
layer.
Incurs layer conflict with conventional standard library cells which use the metal 1 layer for
cell internal routing.
Rail vs. Strap VDD Supply
Advantages:
Allows the use of a normal standard cell library in a power-gating design.
Disadvantages:
Permanent power network no longer covers the design area.
- Place the cells which need permanent power supply (PPS) under the PPS network
(placement constraint)
- Power-routing the cells which need PPS (complicates the power-routing nets)
Rail vs. Strap VDD Supply
If no available standard cell library which provides extra VDD rail, select power strap VDD.
If impact on routing resources is the main concern, select power strap VDD.
If th
there are a significant
i ifi t number
b off retention
t ti registers
i t in
i a design
d i andd power integrity
i t it iin
power-routing are the main concern, select parallel distribution.
A Sleep Transistor Example
Double row 90nm header switch cell.
60 small pMOS transistors of 0.55um
0 55um widthwidth.
6-row transistor array.
Normal body bias.
VSS is in the middle of the two rows
A pair of inverters that drive the sleep
t
transistors
i t is
i iimplemented
l t d iin th
the cellll ffor
area efficiency.
Wakeup Current and Latency Control
Methods
In power gating design, thousands of sleep transistors waking up simultaneous -> a very
large current in charging the design to a full power-on state -> IR drop -> functional error /
short
h t tterm VDD collapse
ll -> state
t t iin retention
t ti registers
i t andd memoriesi corrupted.
t d
Possible solution: control in-rush current by separating the chip power supply to many rows
and the power is turned on row by row. Disadvantage: crowbar currents -> IR drop. Not
practical in power gating design industry.
1. g Daisy
Single y Chain Sleepp Transistor Distribution
Turn on the sleep transistors gradually by configuring the sleep transistors in a daisy chain
style.
Advantages: simple design. Disadvantages: the short delay of the buffers in the chain
usually turns on the sleep transistors too quickly -> larger than acceptable in-rush current
during wakeup.
2. Dual Daisy Chain Sleep Transistor Distribution
U weak
Use k ttransistors
i t tto trickle
t i kl charge
h th
the d
design
i tto preventt llarge iin-rush
h current.
t
When the design is trickle charged close to VDD, large transistors of the optimal drive
strength are turned on.
Wakeup Current and Latency Control
Methods
The transistors are split into two chains: a weak transistor chain and main transistor chain.
Size of the weak trickle is defined by the user-defined in-rush current limit and maximum
permissible turn-on
turn on delay time
time.
Size of the sleep transistors in the main chain is optimized by the methods described for the
performance and leakage goals.
Trickle sleep transistors are to control wakeup rush current and reduce wakeup latency.
The main chain transistor design is based on meeting IR drop target and reducing sleep
transistor area.
Wakeup Current and Latency Control
Methods
3. Parallel Short Chain Distribution of the Main Sleep Transistor
Wakeup Latency = trickle charge time + turn on time of main chain
Reduce main chain turn time to reduce wakeup latency.
Single daisy chain -> longest time to charge up & small peak charge current.
Parallel array -> smallest delay & largest peak current
Compromise: Parallel short chain – sleep transistors are connected as a number of short daisy
chains
h i connected t d iin a parallel
ll l manner. Th
The short
h td daisy
i chains
h i are tturned
d on simultaneously
i lt l
when the main chain is turned on. -> The delay is shortened and peak current is controlled.