You are on page 1of 39

International Standards for Data

Centre Electrical Design


Developed from The Uptime Institute & TIA942 concepts

Benchmarking Data-Centre Quality




There has long been the need to be able to measure the


quality of a critical facility

The quality is usually expressed as Availability of the IT


functionality of the facility in terms of number-of-nines

e.g. Three Nines = 99.9% Availability

Note that several engineered and human systems have to


contribute to the whole facility and its IT functionality, including the
IT hardware and software itself

At this top-level nines are usually applied over 5-10 years


-

e.g. 99.99% over 5 years = one failure event lasting ~4 hours

It should never be assumed to cover multiple failure events

It should never be assumed to span only one year

How good is 99.9%?




44 minutes of unsafe drinking water per month

3 crash-landings per week at Heathrow

3,000 letters lost by The Post Office, every hour

2,000 surgical mistakes in the NHS, every week

9,000 incorrect banking debits per hour

32,000 missed heartbeats, per person, per year


-

Not all in one go, please.

UK numbers

Availability Nines: A measure of quality?

MTBF
MDT

10 years
1 hour

1 month
1 day
30 seconds 1 second

Availability

99.99885% 99.99885% 99.99884%

Four-Nines = OK? But do you really want a failure every day?

In reality its worse. Assuming the system recovery time is 6 hours:


MDT
Availability

6+1 hours
99.992%

6h+30s
99.17%

6h+1s
74.99%

20ms power events in 12 months?


How many computer crashes will you accept?

Availability

Nines

MDT

20ms failures

99.0%
99.9%
99.99%
99.999%
99.9999%
99.99999%
99.999999%
99.9999999%

2
3
4
5
6
7
8
9

87.6 hrs
8.76 hrs
53 min
5.3 min
31.5 sec
3.15 sec
315 ms
31.5ms

15,768,000
1,576,800
157,680
15,768
1,577
158
15
2

The Nines cannot be applied to power over a single year!


Better to use MTBF/MDT for one failure event

Site/IT functionality and Availability




Your mission critical hardware can only deliver its


maximum potential if the whole facility works
-

Connectivity

Power

Cooling

Fire detection, alarm and suppression

EPO

Maintenance and emergency intervention

Security, internal and external, physical and software attack

Human Error, Systems Training & Facility Management

External disasters earthquake, hurricane, flood, fire .. air-crash

The Uptime Institute




The Uptime Institute [1] has, for more than 10 years, sponsored
research and practical studies into data centre design, operation and
resultant resilience and developed a Tier Classification to describe and
differentiate facilities from an availability standpoint

A White Paper [2] from the Institute (authors of which include the
originator of dual power supplies in IT equipment and the Tier system
itself) is the basis of this review of the facility and operational concepts

The Uptime Institute is a commercial organisation and the guidelines it


created are not in the form of a technical standard. However much of
the principles and details have been incorporated in TIA-942 (see next
slide)

www.uptimeinstitute.org

[1] The Uptime Institute, Building 100, 2904 Rodeo Park Drive East, Santa Fe, NM 87505, USA
[2] Title: Industry Standard Tier Classifications Define Site Infrastructure Performance, Turner, Seader &
Brill, 2001-2005 The Uptime Institute, Inc

American ANSI/TIA Standard




In the absence of any more definitive standards

ANSI/TIA-942-2005
-

Telecommunications Infrastructure Standard for Data Centers

Telecommunications Industry Association


-

Standards and Technology Dept, 2500 Wilson Boulevard, Arlington, VA 22201, USA

www.tiaonline.org/standards/search_n_order.cfm

Follows the same Tier I-IV format and draws heavily on The Uptime
Institute publications but extends the detail, especially in connectivity

Entirely a USA centric ANSI specification, so can only be used as a


guide in other territories - EN/BS etc

Specifically for telecom related data-centre environments <2700W/m2

Tier Classification Tier I to IV




The classification system takes into account that at least 16


major M&E systems contribute to the overall IT availability
(such as fire alarms, EPO etc)
-

Tier IV represents 99.99% site availability (measured over five


years) with the critical systems loaded to a maximum of 90%

Each and every sub-system has to meet this table:

Site Availability Vs System Availability




16 major sub-systems contribute to TUI Tier Classification

To reach a Tier Classification requires all 16 to achieve ...

Interesting to note: 5xNines UPS = Tier III

Tier IV the ultimate in resilience?




Fault Tolerant: A site that that can sustain at least one unplanned
worst-case infrastructure failure with no critical load impact

Concurrently Maintainable: A site that is able to perform planned


maintenance activity without shutting down the critical load
-

Note that it is acceptable that the fault tolerance level will be reduced
during maintenance or after the first fault

Tier IV Classification only applies to dual power supply loads where


complete functionality is obtained with either power supply fed and
where the two inputs, in normal operation, share the power demand, as
defined by The Uptime Institutes own specification [1]

A technical and philosophical argument reigns about Static Transfer


Switches for single-cord loads in Tier IV designs
-

Is that Tier III.5 or IV.5?

[1] Title: Fault Tolerant Power Compliance Specifications, v2.0, see www.uptimeinstitute.org

Electrical Single Line Diagrams




There is no compunction on the designer to strictly follow


the designs derived from the Tier Classifications. In many
cases compromises will have to be made
-

The benchmarking function of the Tier system then provides a


useful yardstick to measure a system against

In the rest of this presentation we only refer to the Electrical


systems, just one of the 16+ engineered systems that are
required to achieve a Classification rating

A particular facilitys Tier rating is the lowest of all its


system Tier Classifications
-

Tier IV power +Tier III all other + Tier II cooling = Tier II Facility

Tier I = majority of critical power systems


A basic single-bus critical power system suitable
for single-corded IT loads
There is no specific redundancy called for,
although it can be argued that the standby
generator set is redundant for the grid supply
Although only N is specified, the designer
should avoid multiple components in powerparallel configuration as it drastically reduces
the potential Availability, i.e. N=1 is best
Maintenance generally involves supplying the
load with non-UPS power and an annual load
shut-down
Availability of Power at load typically 99.95%*
*Over 5 years operation

Tier II increasing levels of redundancy


A single-bus power system suitable for
both single-corded loads
Redundancy is called for in the standby
generator installation to reduce the
chance of failure-to-start, but not the
mains supply
N+1 is specified for the UPS so a high
degree of maintenance can be
concurrent
Load bank connections are mandatory
Availability at load typically 99.98%*
*Over 5 years operation

Tier II with dual-cord loads


A single-bus power system suitable for both single
and dual-corded loads
Redundancy is called for in the standby generator
installation to reduce the chance of failure-to-start,
but not the mains supply
N+1 is specified for the UPS so a high degree of
maintenance can be concurrent
Load bank connections are mandatory
Dual-corded loads (expected minority) should be
fed by separate A+B PDUs whilst only the singlecorded loads should be fed via STSs (performing a
maintenance function rather than Availability
enhancement
Note the option of a B UPS, practical when dualcord loads are few
Availability at load typically 99.98%*
*Over 5 years operation

Tier III more redundancy + segregation


A dual-bus power system suitable for both single
and dual-corded loads
Redundancy is called for in the mains supply and
the standby generator sets. These must be
compartmentalised for lower common mode failure,
fire etc
N+1 is specified for the UPS so a high degree of
maintenance can be concurrent
Dual-corded loads should be fed by separate A+B
PDUs whilst only the single-corded loads should
be fed via STSs (performing a maintenance
function rather than Availability enhancement)
Note the ability of a rapid upgrade to a B UPS and
Tier IV (but dont forget the other systems)
An important extra here is the Load Bus
Synchronisation. When the STSs can have UPS
power on one input and the generator supply on
the other it is essential (for the load) to have the
two supplies within 30
Availability of Power at the load typically 99.99%

Segregation

Tier IV the Uptime purists configuration

Load isolation breaker and N+?




To be able to run the load via the bypass and test the UPS
system as a parallel group is a very attractive and useful
operational/maintenance feature
-

The load isolation breaker enables that function

Generally that means that between the PDU and the output
bus of the UPS system there are at least two MCCBs or
ACBs in series
-

Typical MTBF published at 250,000h (28.5y) with maintenance

Two in series = 125,000h MTBF

This negates the advantage of applying any reliability


enhancement strategy using N+(more than 1)

Distribution limits the UPS Availability


Utility/Generator Feed
Input Switchboard

Maintenance Bypass

Output Switchboard

Typically 250,000h MTBF each


Two in series = 125,000h MTBF

N+2 (or higher) UPS does not improve things


Bus-voltage Availability depends upon these two switches
Single-bus maximum MTBF = 125,000h (14 years), 8h MDT A = 99.99%
Dual-bus maximum MTBF = 110,000 years, A = 8xNines

N+1 redundant UPS architecture: N?


1+1

100% Redundancy
600kVA Load
2x 600kVA modules
R = 10*
Day One only
Highest UPS CapEx
High risk of partial load
High load step
1200kVA of batteries

2+1

3+1

50% Redundancy

25% Redundancy

3x 300kVA modules
R=7
Day One to Two
Scope for load shrink
Medium risk of partial load
Medium load step
900kVA of batteries
25% space saving
Lower battery CapEx

4x 200kVA modules
R=5
Day One to Three
High scope for load shrink
Low risk of partial load
Low load step
800kVA of batteries
33% space saving
etc

*Based on Reliability (R) of a single module = 1

Limitation of N in N+1 systems




As N grows the potential


MTBF of the system
decreases (see graph)

A 5+1 limit is sensible


-

Potential MTBF x 0.333r

Doesnt fall too far during N


operation

With module of 35,000h and


mains of 100h MTBF,
A=7xNines at bus which
equates to 5xNines at UPS
output
*

Tier Classification is more than just power




To truly achieve a Tier Classification means ticking-the-box in 16 subgroups and one of the most important is timely, skilled and proper
maintenance capabilities on site
-

The level of cover and skills in site personnel is a major hurdle


-

Human error remains the largest contributor group to mission failure, most
often when responding to alarms in complex systems

24x7 staffing, factory trained in every product on site, an effective BMS


alarm response plan backed up by a 4 hour site response with parts and
service engineer to ensure a very high first-time-fix rate

For the power system the best (and only cost effective?) solution is to
use 24x7 remote monitoring with trained service personnel
-

Detect and respond before the site personnel

Diagnose alarm and set in motion the right engineer with the right parts

Any combination of MTBF/MTTR = Answer

Tier I & II can wait for a service engineer

Tier III & IV cant

Tier IV The Uptime Institute, original version




Complete physical segregation of the two power supplies from the grid
to the dual-corded load a true Dual-Bus system
-

2x(N+1) in every system, maximum 90% load

Concurrent maintenance possible without load shut down and without


losing N+1 redundancy

Needs two grid sub-stations (they will be on the same MV-ring or diverse
MV-radials) and diverse cable routes into the site

Two mechanical load power switchboards in dual-bus

Note that many engineers question having N+1 on both A & B buses

ONLY dual-corded loads


-

No STSs, no common point of failure except the grid and the load

Simple to operate (idiot proof), fault tolerant, hence reliable

With care in design, installation, operation and maintenance, 99.9999%


power Availability possible

Not all loads are dual-corded, <30%?




Not all loads are dual-corded


-

Power transparent switching via STSs is a great maintenance tool

Feeding dual-corded loads via STSs reduces Availability to that of


the STS itself and negates the principle of dual-bus segregation

Classic Tier IV but with STSs for single-corded loads


-

Essential to have Load Bus Synchronisation

Three PDUs in the data-room


-

A fed from UPS-A for one feed of the dual-cord loads

B fed from UPS-B for the second feed of the dual-cord loads

A/B with STS fed from UPS-A & B for single-cord loads

Tier III.5 or IV.5? That is the question!

Tier IV requires uninterruptible cooling




Even though the TIA-942 specification limits itself to


2700W/m2 and TUI Tier IV refers to 1560W/m2 as the limit
across a large space they call for uninterruptible cooling
for Tier IV

The trend for ever-higher IT cabinet loads is well known


and single hot-spots as high as 20-30kW/m2 are no longer
rare events making uninterruptible cooling essential

E.g. When a 13kW loaded IT cabinet loses all cooling


supply the ambient temperature rises from 22C to 35C in
under 20s (0.65K/s)
-

Interesting to note the specified rate-of-change-of-temperature limit


in TIA-942 = 5K/hour (0.0014K/s)

The only solution to high W/m2 = UPS?




Three steps to achieve continuous cooling


-

Keep the air moving, server fans are often sufficient, obtain
generator power after 10-15 seconds and, preferably, have high
floor-to-ceiling heights

Keep the fluids moving via UPS driven redundant pumping and,
wherever possible, apply Chilled-Water-Storage

If CWS is not practical then power the compressors and heat


rejection plant with UPS, retaining 100% cooling capacity on a
continuous basis

The power required for the cooling system is typically 40%


of the kW IT load (10% pumps, 30% compressors)

Most engineers would prefer to keep the IT and mechanical


loads separate so, separate UPS systems

UPS driven cooling alternative solution




The mechanical cooling load is predominately motors and


variable speed drives, not requiring the high-fidelity voltage
and frequency control normally provided by UPS

Generic computer grade series-on-line UPS has energy


efficiency of 93% to 94%

Optionally, Eco-Mode can be selected and the UPS system


will operate at >98% - ready to switch back into series-online mode in <0.5ms

The 4-5% delta (with no degradation in power for the


mechanical load) will save ~2% of the data-centre kWh and
carbon emissions, at no additional capital expenditure

Eco-Mode = 100% CapEx payback in 2 years

System Load Vs Bus-A and Bus-B Load




Total load will probably peak at 80% capacity (TierIV=90%)

Typically 30% single-cord loads will be present


-

Worst-case balance 1/3rd to 2/3rd on A/B system

Typical Bus loads of a fully loaded system are then 36%


and 44% of rated capacity (for 99.95% per year)

N+1 topology: The higher the N, the higher the module load

Partial load efficiency becomes crucial

25-30% load efficiency point is critical in Tier IV


Above example: At 25% load = 8% efficiency delta

New energy storage developments Vs Tiers?




Flywheels, as a battery substitute, always reduce power efficiency


-

Autonomy 5-15 seconds of flywheel Vs 10-15 minutes of battery

Smaller footprint although <2% of stored energy

Higher capital cost typically 3-8 times that of an equivalent power battery

UPS system is 100% dependent upon the diesel-engine starting reliability

N+1 generators will need special treatment on paralleling-time

Low speed flywheels (steel rotor, bearing load relief via magnetics)
-

Standby losses x20 that of battery float power (+10kW higher losses per MVA)

Medium speed (steel rotor, bearing load relief via magnetics)


-

Routine bearing changes largely offset battery replacement costs

Standby losses x15 that of battery float power (+8kW higher losses per MVA)

High speed (steel or composite rotor, active magnetic bearings)


-

Standby losses x2 that of battery float power (+1kW higher losses per MVA)

Complex, hence lower potential reliability (not predictability) than a battery

Low power module ratings make high-power data-centre application uneconomic

Other contenders in the green debate?




Compressed Air Storage


-

Takes up 200% more floor area than an equivalent VRLA battery

High CapEx, US$1m/MW x10 cost of equivalent VRLA

Higher maintenance costs than VRLA

High standby losses - 35MWh/year higher than battery float power

Hydrogen Fuel Cells


-

Are they a replacement for the generator rather than the battery?

Typically -48V output, needs an energy-bridge to cover starting time

Green? 50% thermal efficiency but what source the fuel?

High CapEx, US$2m/MW x10 cost of diesel genset

Low power ratings for data-centre (but well proven at 10kW)

Embryonic technology for UPS systems, either H-gas or Methanol-Water

Secure Power, Always


Ian F Bitterlin
PhD BSc(Hons) DipDesInn MCIBSE MIET

International Sales Director


Contact details
Tel:
+44 (0) 7717 467 579
E mail: ian.bitterlin@chloridepower.com
Web:
www.chloridepower.com

Unique to TIA-942 - in the detail




Tier IV has to have impedance-based battery monitoring systems

TIA-942 says that when a system (A or B) is shut down for routine


maintenance then the maintenance bypass should be energised by a
UPS supply

Not to rely on the dual-corded loads to operate with one feed dead?

TIA-942, Page 123, RH column UPS Maintenance Bypass Arrangement

A third UPS (C) system? Space hungry, 0.05% utilisation and a poor
return on investment
-

Chloride solution (red-line on diagram)


Cross-feed the output of each UPS system to the maintenance bypass of the
alternate system
Manual control, padlocked and interlocked isolators, break-before-make, no
hot-transfer, no point of common coupling in an auto-mode, sync-check
blocking relays across breakers = safe

Tier IV+STSs + bypass detail from TIA-942

You might also like