Professional Documents
Culture Documents
WHITE PA P E R
Techniques 5
Recovery methodologies
Technology 6
We need to ask ourselves, “How successful have most disaster recovery efforts been?”
Cost 7 The answer is not particularly easy to obtain, since there have been very few major disas-
ters, and most IT managers have never actually experienced one.When a business does
Assessing the classes
attempt a recovery, they often discover that it can’t be done in the required timeframe.
of recovery 8
This occurs because the backup capabilities can’t support the recovery objectives needed
Conclusion 8 by the business.
Many companies prepare for disaster recoveries by performing disaster recovery tests,
which for the most part are minimally successful. One of the major issues that CNT sees
repeatedly during these tests is a lack of prioritization of recovery activities. Instead,
companies attempt to recover all of the infrastructure and data as quickly as possible.
What is needed is a recovery strategy methodology that meets the needs of the business
and allows the business functions to be recovered in a pre-determined order. In this way,
a company will be better equipped to develop and maintain the right solution at the
right cost. Using such a methodology instead of the all-or-nothing approach will benefit
the company by providing the proper level of protection for each of the business units
instead of being excessive or insufficient.
The issue that slows down most disaster recoveries is attempting to restore everything at
once.The sensible thing is to first restore only what is absolutely necessary, less critical
functions second, and non-vital business functions last. Recovering 50 percent of the
infrastructure is certainly easier and quicker than recovering the entire enterprise. By
reducing the number of tapes, servers, and disk requirements, a seemingly impossible
task can turn into an achievable one.
CNT believes that each enterprise should try to categorize their business functions into
several classes of recovery.We recommend using the following general approach:
• Class 0: no reason to recover during a disaster recovery
• Class 1: non-vital business functions
• Class 2: business functions that are vital to the company, but are not the most important
• Class 3: critical, “must have” business functions
• Class 4: continuously available, absolutely cannot go offline for any reason
2 WHITE PA P E R
Breaking down each component into recovery classes makes it easier to determine what
the company’s needs are, what the recovery capabilities currently are, and what needs to
be done to achieve the desired class of recovery should the current capabilities prove to
be inadequate.These requirements can vary from company to company as well as from
industry to industry.
Class 0 and Class 4 require the least amount of detail pertaining to continuance of business.
Class 4 simply must continue uninterrupted at all cost.This class is normally reserved
for the most critical of process such as financial market transactions, air traffic control-
ling, critical health care systems and infrastructure such as power and communication.
Class 0 environments are not recovered at all in the event of an outage.While these sys-
tems may support the overall IT infrastructure of a company, they contain no critical
data and would be replaced rather than recovered if lost in a disaster.We feel it is impor-
tant to acknowledge these different recovery classes as many systems are designated with
these recovery objectives.The remainder of this paper will focus on Class 1, Class 2 and
Class 3.This is where business continuity is achieved for most business functions.
Business requirements
Figure 1 shows the high-level overview of the process needed to assess the recovery
requirements for each business function.
The initial step in this process involves assessing each of the business requirements and
classifying them by their relative importance to the company (see Figure 2, next page).
This is traditionally done through a business impact analysis (BIA), which rates the sever-
ity of the impact on the company should the business function become unavailable.The
impact to the company is perceived in terms of operating costs, infrastructure costs, reg-
ulatory fines or sanctions, financial losses, or damage to the company’s reputation (loss
of market share, decreased customer satisfaction, etc.).While the impact on reputation
is intangible and certainly cannot be quantified, it is a real threat and can eventually
result in some financial reverses.
Business functions that are most important and critical will fall into Class 3. Functions
that are vital but can be recovered after critical functions are restored will be designated
as Class 2. Non-vital functions that can be performed via alternate methods for an
extended period of time will be designated Class 1.
DEFINE
REASSESS DICTATE
COST TECHNIQUES
Classes of Recovery 3
Recovery objectives
After each business requirement is classified as to the relative impact to the business in
the event of an outage, the recovery objectives for each class must be quantified (see
Figure 3).The class of recovery mandated by the business requirements defines the
recovery objectives for each business function. For example, business functions that have
Class 3 business requirements can only have Class 3 recovery objectives.
These objectives also need to be quantified in terms of recovery time objectives (RTO)
and recovery point objectives (RPO).The RTO is the time it takes to restore the busi-
ness function to a functioning level.The RPO is the specific point-in-time the data needs
to be restored to in order to affect a successful recovery. Although the RTO and RPO in
each class of recovery will vary from company to company, some general guidelines are
suggested in figure 4.
To keep things in perspective and make sure the actions are derived within a realistic
timeframe, remember that recovery is comprised of several factors:
• Time to restore hardware infrastructure:
• Server(s)
• Disk
• Network components
• Storage area network (SAN) components
• Tape drives/libraries
• Time to restore operating system(s)
• Time to establish connectivity
• Time to restore application software and data
4 WHITE PA P E R
Each of these factors must be considered when planning for recovery.There will be
overlap and timing issues that vary from company to company, but breaking down the
recovery in a logical manner will result in realistic recovery objectives.
Techniques
The techniques used for each class of recovery are dictated by the recovery objective
time frames. Figure 5 details these techniques according to their recovery class.These
techniques must take into account the nature of the outage and the extent of the recov-
ery effort. For instance, if a company does not wish to recover from a total site disaster
(e.g., fire, flood, earthquake) then the techniques involving any sort of replication of
processors or data at a remote site are not pertinent.
Class 3 recovery techniques are the most robust, and provide the shortest recovery time
and the fastest time to data.These include hot backup systems, remote mirrored disk
arrays, electronic tape vaulting, SANs, local and remote redundant networks, and use of
snapshot disk volumes.
Class 2 recovery techniques are less robust than Class 3 techniques, but still rely on
high-end technology.
Business functions that only require Class 1 recovery methods employ the low-end tech-
niques such as no redundancy, no failover capability, manual backup processes, and static
database backups. Backups are usually performed on a server-by-server basis and the
responsibility for them lies with a single person. For the most part, if recovery is
required for a Class 1 function, it will be handled at that time with little or no planning.
primary site
Class 3 (critical) Class 2 (vital) Class 1 (non-vital) Figure 4: typical recovery time
and recovery point objectives
Classes of Recovery 5
often the backup process is insufficient to handle the volume of data required for recovery;
if this process does not complete in the allotted time it is cancelled without hesitation.
Given the astronomical growth rate of data in most companies, the ability to manage this
data becomes a necessity and warrants strong storage management policies and proce-
dures, as well as the personnel to enforce them. Good storage management helps reduce
the amount of data being backed up by ensuring that only the data required to recover a
business function is backed up.This is a much more effective paradigm than the all-or-
nothing approach.
Note that network recovery needs to be included as part of the overall recovery sce-
nario. Both local and remote redundant networks require a level of planning and testing
well above that required by the lower classes of recovery.This will ensure that neither
performance nor connectivity is compromised.
Technology
The technology for each class of recovery is determined by the technique required to
support the desired recovery objective. Figure 6 details the technology according to
recovery class.
primary site
Figure 5: classifying techniques used Class 3 (critical) Class 2 (vital) Class 1 (non-vital)
for each class of recovery
• Automatic failover • Manual failover • Ad hoc backup and
capabilities capabilities recovery solutions
• Online database
backups
secondary site
Class 3 (critical) Class 2 (vital) Class 1 (non-vital)
• No single point of
failure
6 WHITE PA P E R
Cost
The cost associated with each class of recovery is driven by the technology needed to
employ the appropriate recovery technique for that class (see Figure 7). In general, the
more critical the business function, the more expensive the recovery solution.
When the cost of recovery is deemed to be too high, the business requirements should
be reassessed to see whether or not any business functions could be reclassified to a
lower recovery class. All too often, this process fails because instead of reassessing the
business requirements, the technology is compromised without regard for the business
functions.This jeopardizes the integrity of the recovery class for the business functions in
that class.Those in charge of the business functions must decide what level of risk is
acceptable to the company, and the assumption of risk should never be made solely on
the basis of budget constraints.
Finding the optimum cost of protection is rarely an easy task. How much is enough? Most
companies merely look at the cost associated with the technology needed to protect the
business functions. A simple benefit analysis would help justify the cost of protection.
local
Class 4 Class 3 Class 2 Class 1 Figure 6: determining technology used
for each class of recovery
• Fault tolerant • Server • Departmental • JBOD
hardware clustering Storage
• Locally attached
Solutions
• WAN server tape drives
clustering • Duplicated hard-
• Manual tape
ware configs
• Enterprise stor- movers
age solutions • Automated tape
library
• Redundant
networks • Dedicated back-
up networks
• Networked
storage
remote
Class 4 Class 3 Class 2 Class 1
• Application
based replication
Classes of Recovery 7
To do this, first determine the expected loss if there is an interruption to a business
function and no protection in place (Ln).Then calculate the expected loss due to an
interruption of a business function with protection in place (Lp).The purported benefit
of protection for a business function is represented as follows: Benefit = (Ln) - (Lp).
This is only one method of trying to justify the cost associated with protection.
Quantifying the expected loss in the event of a business function interruption should
always be done and used as one factor in determining what the cost of protection
should be for that business function.
Conclusion
Following the classes of recovery methodology outlined in this paper is beneficial to a
company for several reasons. First, it ensures that recovery methods selected for each
business function are driven by the needs of the business. Second, it allows a company to
easily assess their current recovery capabilities and develop a viable strategy to correct
deficiencies in their existing recovery scheme. Basing the classes of recovery on the
amount of risk that can be assumed by each business function ensures that the expense
required to preserve the business correlates directly to its importance to the company.
Further reading
Another paper in this series, “Evaluating Your Exposure,” details the process of cost justify-
ing business continuity. It begins by discussing factors to consider in establishing the cost
of disrupted service for a business function. It also describes a study known as a business
impact analysis.This paper is co-authored by CNT’s strategic partner in business continu-
ity, Strohl Systems, experts in business impact analysis and business continuity planning.
Figure 7: costs associated with each Class 3 (critical) Class 2 (vital) Class 1 (non-vital)
class of recovery
• Most expensive to • Moderate expense to • Least expensive to
implement implement implement
• Least loss incurred from • Moderate loss incurred • Highest loss incurred
disaster from disaster from disaster
8 WHITE PA P E R
After establishing the desired recovery class of your application environments, and
CNT has nearly two decade’s worth of
after justifying the costs by understanding your exposure, you are ready to talk tech-
experience assessing, designing, and
nology. Additional white papers in this series focus on current technologies necessary
deploying IT solutions to support busi-
to achieve specific recovery objectives. One, “Primary Site Recovery Techniques,”
ness continuity objectives. Our profes-
focuses on business continuation technology within the primary data center. A second
sional consulting organization can help
paper, “Secondary Site Recovery Techniques,” focuses on continuation technology at an
you effectively evaluate and plan your
alternate location. The technology solutions will be compared against others available
optimal solution. From business continu-
within a given recovery class. These comparisons will give consideration to costs to
ity architecture assessments, design, and
implement and complexity to manage.
integration, to remote network manage-
ment and support, we help you stream-
line the decision making process, acceler-
ate technology deployment, and meet
your IT recovery objectives.
Classes of Recovery 9
CNT is one of the world’s largest providers of comprehensive © 2003 by Computer Network Technology Corporation (Nasdaq: USA: 1-800-638-8324 Canada: 905-595-1500
storage networking solutions. For over 20 years, our experts have CMNT). All rights reserved. Any reproduction of these materials U K : 4 4 - 17 5 3 - 7 9 2 4 0 0 F r a n c e : 3 3 - 1 - 4 13 0 - 1 2 1 2
analyzed, designed, and built enterprise storage networks. without the prior written consent of CNT is strictly prohibited. CNT, Australia: 61-2-9540-5486 Germany: 49-89-42 74 11-0
the CNT logo, Channelink, and UltraNet are registered trademarks of Switzerland: 41-1-73 35-733 Belgium: 32-2-737 76 42
Visit www.cnt.com to learn about our solutions, products, partner- Computer Network Technology Corporation. All other trademarks Italy: 39-06-51 49 31 Brazil: 55-11-5509-1504
ships, career opportunities, and more. identified herein are the property of their respective owners. CNT is Japan: 813-5403-4858 Other locations: 1-763-268-6000
an equal opportunity employer. CNT corporate headquarters’ QMS is
registered to ISO 9001: 2000. Certificate #006765. PL563 | 0803