You are on page 1of 191

Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Section 4 – Business Continuity

Introduction

© 2006 EMC Corporation. All rights reserved.

Welcome to Section 4 of Storage Technology Foundations – Business Continuity.

Business Continuity - 1
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Section Objectives
Upon completion of this section, you will be able to:
y Describe what business continuity is
y Describe the basic technologies that are enablers of data
availability
y Describe basic disaster recovery techniques

© 2006 EMC Corporation. All rights reserved. Business Continuity - 2

The objectives for this section are shown here. Please take a moment to read them.

Business Continuity - 2
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

In This Section
This section contains the following modules:
y Business Continuity Overview
y Backup and Recovery
y Business Continuity Local Replication
y Business Continuity Remote Replication

© 2006 EMC Corporation. All rights reserved. Business Continuity - 3

This section contains the following 4 modules:


y Business Continuity Overview
y Backup and Recovery
y Business Continuity Local Replication
y Business Continuity Remote Replication.

Business Continuity - 3
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Apply Your Knowledge


The following modules contain Apply Your Knowledge
information (Available in the Student Resource Guide):
y Business Continuity Overview
y Backup and Recovery
y Business Continuity Local Replication
y Business Continuity Remote Replication

© 2006 EMC Corporation. All rights reserved. Business Continuity - 4

Please note that certain modules of this section contain Apply Your Knowledge information that
is only available in the Student Resource Guide

Business Continuity - 4
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Business Continuity Overview


After completing this module, you will be able to:
y Define and differentiate between Business Continuity and
Disaster Recovery
y Differentiate between Disaster Recovery and Disaster
Restart
y Define terminology such as Recovery Point Objective and
Recovery Time Objective
y Give a high level description of Business Continuity
Planning
y Identify Single Points of Failure and describe solutions to
eliminate them

© 2006 EMC Corporation. All rights reserved. Business Continuity - 5

The are the objectives for this module. Please take a moment to review them.

Business Continuity - 5
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

What is Business Continuity?


y Business Continuity is the preparation for, response to,
and recovery from an application outage that adversely
affects business operations
y Business Continuity Solutions address systems
unavailability, degraded application performance, or
unacceptable recovery strategies

© 2006 EMC Corporation. All rights reserved. Business Continuity - 6

Since information is a primary asset for most businesses, business continuity is a major concern.
This is not just a concern for the Information Technology department, it impacts the entire
business. At one time, data storage was viewed as a simple issue. The requirements have
become more sophisticated. Businesses must now contend with information availability, storage
and business continuation in adverse events – large or small, man-made or natural. Before we
can talk about business continuity and solutions for business continuity, we must first define the
terms. Business Continuity is the preparation for, response to, and recovery from an application
outage that adversely affects business operations. Business Continuity Solutions address
systems unavailability, degraded application performance, or unacceptable recovery strategies.

Business Continuity - 6
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Why Business Continuity


Lost Productivity Know the downtime costs (per Lost Revenue
• Number of employees hour, day, two days...) • Direct loss
impacted (x hours out * • Compensatory payments
hourly rate) • Lost future revenue
• Billing losses
• Investment losses

Damaged Reputation Financial Performance

• Customers • Revenue recognition


• Suppliers • Cash flow
• Financial markets • Lost discounts (A/P)
• Banks • Payment guarantees
• Business partners • Credit rating
• Stock price

Other Expenses
Temporary employees, equipment rental, overtime
costs, extra shipping costs, travel expenses...

© 2006 EMC Corporation. All rights reserved. Business Continuity - 7

There are many factors that need to be considered when calculating the cost of downtime. A
formula to calculate the costs of the outage should capture both the cost of lost productivity of
employees and the cost of lost income from missed sales.
y The Estimated average cost of 1 hour of downtime = (Employee costs per hour) *( Number
of employees affected by outage) + (Average Income per hour).
y Employee costs per hour is simply the total salaries and benefits of all employees per week,
divided by the average number of working hours per week.
y Average income per hour is just the total income of an institution per week, divided by
average number of hours per week that an institution is open for business.

Business Continuity - 7
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Information Availability

% Uptime % Downtime Downtime per Year Downtime per Week

98% 2% 7.3 days 3hrs 22 min

99% 1% 3.65 days 1 hr 41 min

99.8% 0.2% 17 hrs 31 min 20 min 10 sec

99.9% 0.1% 8 hrs 45 min 10 min 5 sec

99.99% 0.01% 52.5 min 1 min

99.999% 0.001% 5.25 min 6 sec

99.9999% 0.0001% 31.5 sec 0.6 sec

© 2006 EMC Corporation. All rights reserved. Business Continuity - 8

Information Availability ensures that applications and business units have access to information
whenever it is needed. The primary components of information availability are:
y Protection from data loss
y Ensuring data access
y Appropriate data security
The online window for some critical applications has moved to 99.999% of time.
Information availability depends upon robust, functional IT systems.

Business Continuity - 8
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Importance of Business Continuity and Planning


Millions of US Dollars per Hour in Lost Revenue

Retail 1.1
Insurance 1.2
Information technology 1.3
Financial institutions 1.5
Manufacturing 1.6
Call location 1.6
Telecommunications 2.0
Credit card sales authorization 2.6
Energy 2.8
Point of sale 3.6
Retail brokerage 6.5
Source Meta Group, 2005

© 2006 EMC Corporation. All rights reserved. Business Continuity - 9

This chart shows how much money each industry loses for each hour of downtime. As you can
see, downtime is expensive.

Business Continuity - 9
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Recovery Point Objective (RPO)

Wks Days Hrs Mins Secs Secs Mins Hrs Days Wks

Recovery Point Recovery Time


Asynchronous

Synchronous
Replication

Replication
Replication
Periodic
Backup
Tape

© 2006 EMC Corporation. All rights reserved. Business Continuity - 10

Recovery Point Objective (RPO) is the point in time to which systems and data must be
recovered after an outage. This defines the amount of data loss a business can endure. Different
business units within an organization may have varying RPOs.
Elevated demand for increased application availability confirms the need to ensure business
continuity practices are consistent with business needs. Interruptions are classified as either
planned or unplanned. Failure to address these specific outage categories seriously compromises
a company’s ability to meet business goals. Planned downtime is expected and scheduled, but it
is still downtime causing data to be unavailable. Causes of planned downtime include new
hardware installation, integration, or maintenance, software upgrades, backups, application and
data restore, data center disruptions from facility operations due to renovations, refreshing a
testing or development environment with production data, and porting the testing or the
development environment over to production environment.

Business Continuity - 10
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Recovery Time Objective (RTO)

Wks Days Hrs Mins Secs Secs Mins Hrs Days Wks

Recovery Point Recovery Time

Tape Restore
Migration
Recovery Time includes:

Manual
Cluster
Global
y Fault detection
y Recovering data
y Bringing apps back online

© 2006 EMC Corporation. All rights reserved. Business Continuity - 11

Recovery Time Objective (RTO) is the period of time within which systems, applications, or
functions must be recovered after an outage. This defines the amount of downtime that a
business can endure, and survive.

Business Continuity - 11
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Disaster Recovery versus Disaster Restart


y Most business critical applications have some level of data
interdependencies
y Disaster recovery
– Restoring previous copy of data and applying logs to that copy to bring it to
a known point of consistency
– Generally implies the use of backup technology
– Data copied to tape and then shipped off-site
– Requires manual intervention during the restore and recovery processes

y Disaster restart
– Process of restarting mirrored consistent copies of data and applications
– Allows restart of all participating DBMS to a common point of consistency
utilizing automated application of recovery logs during DBMS initialization
– The restart time is comparable to the length of time required for the
application to restart after a power failure

© 2006 EMC Corporation. All rights reserved. Business Continuity - 12

Disaster recovery is the process of restoring a previous copy of the data and applying logs or
other necessary processes to that copy to bring it to a known point of consistency.
Disaster restart is the restarting of dependent write consistent copies of data and applications,
utilizing the automated application of DBMS recovery logs during DBMS initialization to bring
the data and application to a transactional point of consistency.
There is a fundamental difference between Disaster Recovery and Disaster Restart. Disaster
recovery is the process of restoring a previous copy of the data and applying logs to that copy to
bring it to a known point of consistency. Disaster restart is the restarting of mirrored consistent
copies of data and applications.
Disaster recovery generally implies the use of backup technology in which data is copied to tape
and then it is shipped off-site. When a disaster is declared, the remote site copies are restored
and logs are applied to bring the data to a point of consistency. Once all recoveries are
completed, the data is validated to ensure it is correct.

Business Continuity - 12
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Disruptors of Data Availability

Disaster (<1% of Occurrences)


Natural or man made
Flood, fire, earthquake
Contaminated building

Unplanned Occurrences (13% of


Occurrences)
Failure
Database corruption
Component failure
Human error

Planned Occurrences (87% of Occurrences)


Competing workloads
Backup, reporting
Data warehouse extracts
Application and data restore

Source: Gartner, Inc.


© 2006 EMC Corporation. All rights reserved. Business Continuity - 13

Elevated demand for increased application availability confirms the need to ensure business
continuity practices are consistent with business needs.
Interruptions are classified as either planned or unplanned. Failure to address these specific
outage categories seriously compromises a company’s ability to meet business goals.
Planned downtime is expected and scheduled, but it is still downtime causing data to be
unavailable. Causes of planned downtime include:
y New hardware installation/integration/maintenance
y Software upgrades/patches
y Backups
y Application and data restore
y Data center disruptions from facility operations (renovations, construction, other)
y Refreshing a testing or development environment with production data
y Porting testing/development environment over to production environment

Business Continuity - 13
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Causes of Downtime

Human Error
System Failure

Infrastructure Failure
Disaster
© 2006 EMC Corporation. All rights reserved. Business Continuity - 14

Today, the most critical component of an organization is information. Any disaster occurrence
will affect information availability critical to run normal business operations.
In our definition of disaster, the organization’s primary systems, data, applications are damaged
or destroyed. Not all unplanned disruptions constitute a disaster.

Business Continuity - 14
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Business Continuity vs. Disaster Recovery


y Business Continuity has a broad focus on prevention:
– Predictive techniques to identify risks
– Procedures to maintain business functions

y Disaster Recovery focuses on the activities that occur


after an adverse event to return the entity to ‘normal’
functioning.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 15

Business Continuity is a holistic approach to planning, preparing, and recovering from an


adverse event. The focus is on prevention, identifying risks, and developing procedures to
ensure the continuity of business function. Disaster recovery planning should be included as
part of business continuity.
BC objectives include:
y Facilitate uninterrupted business support despite the occurrence of problems.
y Create plans that identify risks and mitigate them wherever possible.
y Provide a road map to recover from any event.
Disaster Recovery is more about specific cures, to restore service and damaged assets after an
adverse event. In our context, Disaster Recovery is the coordinated process of restoring systems,
data, and infrastructure required to support key ongoing business operations.

Business Continuity - 15
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Business Continuity Planning (BCP)


Includes the following activities:
y Identifying the mission or critical business functions
y Collecting data on current business processes
y Assessing, prioritizing, mitigating, and managing risk
– Risk Analysis
– Business Impact Analysis (BIA)

y Designing and developing contingency plans and disaster


recovery plan (DR Plan)
y Training, testing, and maintenance

© 2006 EMC Corporation. All rights reserved. Business Continuity - 16

Business Continuity Planning (BCP) is a risk management discipline. It involves the entire
business--not just IT. BCP proactively identifies vulnerabilities and risks, planning in advance
how to prepare for and respond to a business disruption. A business with strong BC practices in
place is better able to continue running the business through the disruption and to return to
“business as usual.”
BCP actually reduces the risk and costs of an adverse event because the process often uncovers
and mitigates potential problems.

Business Continuity - 16
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Business Continuity Planning Lifecycle

Implement,
Maintain, and Objectives
Assess

Train, Test, and


Analysis
Document

Design
Develop

© 2006 EMC Corporation. All rights reserved. Business Continuity - 17

The Business Continuity Planning process includes the following stages:


1. Objectives
y Determine business continuity requirements and objectives including scope and budget
y Team selection (include all areas of the business and subject matter expertise (internal/external)
y Create the project plan
2. Perform analysis
y Collect information on data, business processes, infrastructure supports, dependencies, frequency of use
y Identify critical needs and assign recovery priorities.
y Create a risk analysis (areas of exposure) and mitigation strategies wherever possible.
y Create a Business Impact Analysis (BIA)
y Create a Cost/benefit analysis – identify the cost (per hour/day, etc.) to the business when data is unavailable.
y Evaluate Options
3. Design and Develop the BCP/Strategies
y Evaluate options
y Define roles/responsibilities
y Develop contingency scenarios
y Develop emergency response procedures
y Detail recovery, resumption, and restore procedures
y Design data protection strategies and develop infrastructure
y Implement risk management/mitigation procedures
4. Train, test, and document
5. Implement, maintain, and assess

Business Continuity - 17
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Business Impact Analysis (BIA)


# Business Area Impact Probability Single Loss # Event Loss p/y Est cost of High Risk SPOF Item
Affected (1 -5) (1-5) Expectancy p/y mitigation

1 Entire 5 1 $279,056 .25 $69517 $5,800 No redundant UPS for


Company Networking/phone equip
2 Entire 5 1 $279,066 0.2 $55768 $66,456 Cisco net backbone switch
Company not redundant
3 Entire 5 1 $279,098 0.2 $55619 $10,000 Relocate net equip to a
Company separate physical rack
4 IT-All 4 3 $16,000 1.0 18000 $80,000 Primary dev platforms don’t
have failover
5 Entire 4 3 $16,000 0.5 $8000 $122,000 Computer room does not
Company have sufficient UPS
capacity to run on single
unit
6 IT- 2 1 $400 1.0 $1800 $5,000 No failover for development
Intranet/B2B webserver

© 2006 EMC Corporation. All rights reserved. Business Continuity - 18

This is an example of Business Impact Analysis (BIA). The dollar values are arbitrary and are
used just for illustration. BIA quantifies the impact that an outage will have to the business and
potential costs associated with the interruption. It helps businesses channel their resources based
on probability of failure and associated costs.

Business Continuity - 18
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Identifying Single Points of Failure

Primary
Node

User & Application


Clients

IP

© 2006 EMC Corporation. All rights reserved. Business Continuity - 19

Consider the components in the picture and identify the Single Points of Failure.

Business Continuity - 19
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

HBA Failures
y Configure multiple HBAs, and use
multi-pathing software
– Protects against HBA failure
HBA
– Can provide improved Port
performance (vendor HBA
dependent) Switch
Host
Storage

© 2006 EMC Corporation. All rights reserved. Business Continuity - 20

Configuring multiple HBAs and using multi-pathing software provides path redundancy. Upon
detection of a failed HBA, the software can re-drive the I/O through another available path.

Business Continuity - 20
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Switch/Storage Array Port Failures


y Configure multiple switches
y Make the devices available
via multiple storage array HBA
ports Port
HBA
Port
Host
Switch
Storage

© 2006 EMC Corporation. All rights reserved. Business Continuity - 21

This configuration provides switch redundancy as well as protects against storage array port
failures.

Business Continuity - 21
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Disk Failures
y Use some level of RAID

HBA
Port
HBA
Port
Host
Switch
Storage

© 2006 EMC Corporation. All rights reserved. Business Continuity - 22

As seen earlier, using some level of RAID, such as RAID-1 or RAID-5, will ensure continuous
operation in the event of disk failures.

Business Continuity - 22
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Host Failures
y Clustering protects against production host failures

HBA
Port
HBA
Port
Host
Switch
Storage Storage

Host

© 2006 EMC Corporation. All rights reserved. Business Continuity - 23

Planning and configuring clusters is a complex task. At a high level:


y A cluster is two or more hosts with access to the same set of storage (array) devices
y Simplest configuration is a two node (host) cluster
y One of the nodes would be the production server while the other would be configured as a
standby. This configuration is described as Active/Passive.
y Participating nodes exchange “heart-beats” or “keep-alives” to inform each other about their
health.
y In the event of the primary node failure, cluster management software will shift the
production workload to the standby server.
y Implementation of the cluster failover process is vendor specific.
y A more complex configuration would be to have both the nodes run production workload on
the same set of devices. Either cluster software or application/database should then provide a
locking mechanism so that the nodes do not try to update the same areas on disk
simultaneously. This would be an Active/Active configuration.

Business Continuity - 23
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Site/Storage Array Failures


y Remote replication helps protect against either entire site
or storage array failures

HBA
Port
HBA
Port
Host
Switch
Storage Storage

© 2006 EMC Corporation. All rights reserved. Business Continuity - 24

Remote replication will be explored in-depth in a later module in this section. What is not shown
in the picture is host connectivity to the storage array in the remote site.

Business Continuity - 24
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Resolving Single Points of Failure

Redundant Disks
Redundant Paths RAID 1/RAID5
Redundant
Network Primary
Node
Clustering
User & Application
Software

Keep Alive
Clients

IP
IP

Failover
Node Redundant
Site

© 2006 EMC Corporation. All rights reserved. Business Continuity - 25

This example combines the methods that we have discussed to resolve single points of failure. It
uses clustering, redundant paths and redundant disks, a redundant site, and a redundant network.

Business Continuity - 25
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Local Replication
y Data from the production devices is copied over to a set
of target (replica) devices
y After some time, the replica devices will contain identical
data as those on the production devices
y Subsequently copying of data can be halted. At this point-
in-time, the replica devices can be used independently of
the production devices
y The replicas can then be used for restore operations in
the event of data corruption or other events
y Alternatively the data from the replica devices can be
copied to tape. This off-loads the burden of backup from
the production devices
© 2006 EMC Corporation. All rights reserved. Business Continuity - 26

Local replication technologies offer fast and convenient methods for ensuring data availability.
The different technologies and the uses of replicas for BC/DR operations will be discussed in a
later module in this section. Typically local replication uses replica disk devices. This greatly
speeds up the restore process, thus minimizing the RTO. Frequent point-in-time replicas also
help in minimizing RPO.

Business Continuity - 26
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Backup/Restore
y Backup to tape has been the predominant method for
ensuring data availability and business continuity
y Low cost, high capacity disk drives are now being used
for backup to disk. This considerably speeds up the
backup and the restore process
y Frequency of backup will be dictated by defined
RPO/RTO requirements as well as the rate of change of
data

© 2006 EMC Corporation. All rights reserved. Business Continuity - 27

Far from being antiquated, periodic backup is still a widely used method for preserving copies of
data. In the event of data loss due to corruption or other events, data can be restored up to the
last backup. Evolving technologies now permit faster backups to disks. Magnetic tape drive
speeds and capacities are also continually being enhanced. The various backup paradigms and
the role of backup in B-C/D-R planning will be discussed in detail later in this section.

Business Continuity - 27
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Module Summary
Key points covered in this module:
y Importance of Business Continuity
y Types of outages and their impact to businesses
y Business Continuity Planning and Disaster Recovery
y Definitions of RPO and RTO
y Difference between Disaster Recovery and Disaster
Restart
y Identifying and eliminating Single Points of Failure

© 2006 EMC Corporation. All rights reserved. Business Continuity - 28

These are the key points covered in this module. Please take a moment to review them.

Business Continuity - 28
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Apply Your Knowledge


After completing this case study, you will be able to:
y Describe EMC PowerPath
y Discuss the features and benefits of PowerPath in
storage environments
y Explain how PowerPath achieves transparent recovery

© 2006 EMC Corporation. All rights reserved. Business Continuity - 29

At this point, we will apply what you learned in this lesson to some real world examples. In this
case, we will look at how EMC PowerPath improves business continuity in storage
environments.

Business Continuity - 29
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

What is EMC PowerPath?


Open Systems Host
y Host Based Software Applications
y Resides between DBMS Management
Utils
application and SCSI File System
device driver Logical Volume Manager

y Provides Intelligent I/O PowerPath


path management SCSI SCSI SCSI SCSI SCSI SCSI

SERVER
Driver Driver Driver Driver Driver Driver

y Transparent to the SCSI SCSI SCSI SCSI SCSI SCSI


Controller Controller Controller Controller Controller Controller
application
STORAGE

y Automatic detection Interconnect


Topology
and recovery from
host-to-array path
failures
© 2006 EMC Corporation. All rights reserved. Business Continuity - 30

PowerPath is host-based software that resides between the application and the disk device
layers. Every I/O from the host to the array must pass through the PowerPath driver software.
This allows PowerPath to work in conjunction with the array and connectivity environment to
provide intelligent I/O path management. This includes path failover and dynamic load
balancing, while remaining transparent to any application I/O requests as it automatically
detects and recovers from host-to-array path failures.
PowerPath is supported on various hosts and Operating Systems such as Sun- Solaris, IBM-AIX,
HP-UX, Microsoft Windows, Linux, and Novell. Storage arrays from EMC, Hitachi, HP, and
IBM are supported. The level of OS and array models supported will vary between PowerPath
software versions.

Business Continuity - 30
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

PowerPath Features
y Multiple paths, for higher PowerPath Delivers:
availability and performance
y Dynamic multipath load balancing
y Proactive path testing and
automatic path recovery
y Automatic path failover
y Online path configuration and
management
y High-availability cluster support

© 2006 EMC Corporation. All rights reserved. Business Continuity - 31

PowerPath maximizes application availability, optimizes performance, and automates online


storage management while reducing complexity and cost, all from one powerful data path
management solution. PowerPath supports the following features:
y Multiple path support - PowerPath supports multiple paths between a logical device and a
host. Multiple paths enables the host to access a logical device, even if a specific path is
unavailable. Also, multiple paths enable sharing of the I/O workload to a given logical
device.
y Dynamic load balancing - PowerPath is designed to use all paths at all times. PowerPath
distributes I/O requests to a logical device across all available paths, rather than requiring a
single path to bear the entire I/O burden.
y Proactive path testing and automatic path recovery - PowerPath uses a path test to ascertain
the viability of a path. After a path fails, PowerPath continues testing it periodically to
determine if it is fixed. If the path passes the test, PowerPath restores it to service and
resumes sending I/O to it.
y Automatic path failover - If a path fails, PowerPath redistributes I/O traffic from that path to
functioning paths.
y Online configuration and management - PowerPath management interfaces include a
command line interface and a GUI interface on Windows.
y High availability cluster support - PowerPath is particularly beneficial in cluster
environments, as it can prevent operational interruptions and costly downtime.

Business Continuity - 31
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

PowerPath Configuration
y All volumes are
accessible through all Host Application(s)
paths

SERVER
PowerPath
y Maximum 32 paths to SCSI
SD SD SD SD
a logical volume Driver
HBA HBA HBA HBA Host Bus
y Interconnect support Adapter
for
– SAN
Interconnect
STORAGE
– SCSI Topology
– iSCSI

Storage

© 2006 EMC Corporation. All rights reserved. Business Continuity - 32

Without PowerPath, if a host needed access to 40 devices, and there were four host bus adapters,
you would most likely configure it to present 10 unique devices each host bus adapter. With
PowerPath, you would configure it in a way to allow all 40 devices could be “seen” by all four
host bus adapters. PowerPath supports up to 32 paths to a logical volume. The host can be
connected to the array using a number of interconnect topologies such as SAN, SCSI, or iSCSI.

Business Continuity - 32
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

The PowerPath Filter Driver

y Platform Host Application(s)


independent base

SERVER
driver PowerPath Filter Driver
y Applications direct SD SD SD SDSCSI
I/O to PowerPath Driver
HBA HBA HBA HBA Host Bus
Adapter
y PowerPath directs
I/O to optimal path
based on current Interconnect
workload and path
STORAGE
Topology
availability
y When a path fails
PowerPath chooses Storage
another path in the
set
© 2006 EMC Corporation. All rights reserved. Business Continuity - 33

The PowerPath filter driver is a platform independent driver that resides between the application
and HBA driver.
The driver identifies all paths that read and write to the same device and builds a routing table
called a volume path set for the device. A volume path set is created for each shared device in
the array .
PowerPath can use any path in the set to service an I/O request. If a path fails, PowerPath can
redirect an I/O request from that path to any other available path in the set. This redirection is
transparent to the application, which does not receive an error.

Business Continuity - 33
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Path Fault without PowerPath


y In most environments,
a host will have Host Application(s)
multiple paths to the

SERVER
Storage System SD SD SD SD SCSI
Driver
y Volumes are spread
HBA HBA HBA HBA Host Bus
across all available Adapter
paths
y Each volume has a
single path STORAGE Interconnect
Topology
y Host adapter and
cable connections are
single points of failure
y Work load not Storage
balanced among all
paths
© 2006 EMC Corporation. All rights reserved. Business Continuity - 34

Without PowerPath, the loss of a channel (as indicated in the diagram by a red dotted line)
means one or more applications may stop functioning. This can be caused by the loss of a Host
Bus Adapter, Storage Array Front-end connectivity, Switch port, or a failed cable. In a standard
non-PowerPath environment, these are all single points of failure. In this case, all I/O that was
heading down the path highlighted in red is now lost, resulting in an application failure and the
potential for data loss or corruption.

Business Continuity - 34
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Path Fault with PowerPath


y If a host adapter, cable, or
channel director/Storage Host Application(s)
Processor fails, the device

SERVER
driver returns a timeout to PowerPath
PowerPath SD SD SD SDSCSI
Driver
y PowerPath responds by HBA HBA HBA HBA Host Bus
taking the path offline and Adapter
re-driving I/O through an
alternate path STORAGE Interconnect
y Subsequent I/Os use Topology
surviving path(s)
y Application is unaware of
failure Storage

© 2006 EMC Corporation. All rights reserved. Business Continuity - 35

This example depicts how PowerPath failover works. When a failure occurs, PowerPath
transparently redirects the I/O down the most suitable alternate path. The PowerPath filter driver
looks at the volume path set for the device, considers current workload, load balancing, and
device priority settings, and chooses the best path to send the I/O down. In the example,
PowerPath has three remaining paths to redirect the failed I/O and to load balance.

Business Continuity - 35
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Backup and Recovery


Upon completion of this module, you will be able to:
y Describe best practices for planning Backup and
Recovery.
y Describe the common media and types of data that are
part of a Backup and Recovery strategy.
y Describe the common Backup and Recovery topologies.
y Describe the Backup and Recovery Process.
y Describe Management considerations for Backup and
Recovery.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 36

This lesson looks at Backup and Recovery. Backup and Recovery are a major part of the
planning for Business Continuity.

Business Continuity - 36
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Lesson: Planning for Backup and Recovery


Upon completion of this lesson, you be able to:
y Define Backup and Recovery.
y Describe common reasons for a Backup and Recovery
plan.
y Describe the business considerations for Backup and
Recovery.
y Define RPO and RTO.
y Describe the data considerations for Backup and
Recovery
y Describe the planning for Backup and Recovery.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 37

This lesson provides an overview of the business drivers for backup and recovery and introduces
some of the common terms used when developing a backup and recovery plan.

Business Continuity - 37
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

What is a Backup?
y Backup is an additional copy of data that can be used for
restore and recovery purposes.
y The Backup copy is used when the primary copy is lost
or corrupted.
y This Backup copy can be created as a:
– Simple copy (there can be one or more copies)
– Mirrored copy (the copy is always updated with whatever is written
to the primary copy.)

© 2006 EMC Corporation. All rights reserved. Business Continuity - 38

A Backup is a copy of the online data that resides on primary storage. The backup copy is
created and retained for the sole purpose of recovering deleted, broken, or corrupted data on the
primary disk.
The backup copy is usually retained over a period of time, depending on the type of the data,
and on the type of backup. There are three derivatives for backup: disaster recovery, Archival,
and operational backup. We will review them in more detail, on the next slide.
The data that is backed up may be on such media as disk or tape, depending on the backup
derivative the customer is targeting. For example, backing up to disk may be more efficient than
tape in operational backup environments.

Business Continuity - 38
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Backup and Recovery Strategies


Several choices are available to get the data to the backup
media such as:
y Copy the data.
y Mirror (or snapshot) then copy.
y Remote backup.
y Copy then duplicate or remote copy.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 39

Several choices are available to get the data written to the backup media.
y You can simply copy the data from the primary storage to the secondary storage (disk or
tape), onsite. This is a simple strategy, easily implemented, but impacts the production
server where the data is located, since it will use the server’s resources. This may be
tolerated on some applications, but not high demand ones.
y To avoid an impact on the production application, and to perform serverless backups, you
can mirror (or snap) a production volume. For example, you can mount it on a separate
server and then copy it to the backup media (disk or tape). This option will completely free
up the production server, with the added infrastructure cost associated with additional
resources.
y Remote Backup, can be used to comply with offsite requirements. A copy from the primary
storage is done directly to the backup media that is sitting on another site. The backup media
can be a real library, a virtual library or even a remote filesystem.
y You can do a copy to a first set of backup media, which will be kept onsite for operational
restore requirements, and then duplicate it to another set of media for offsite purposes. To
simplify thr procedure, you can replicate it to an offsite location to remove any manual
procedures associated with moving the backup media to another site.

Business Continuity - 39
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

It’s All About Recovery!


y Businesses back up their data to enable its recovery in
case of potential loss.
y Businesses also back up their data to comply with
regulatory requirements.
y Types of backup derivatives:
– Disaster Recovery
– Archival
– Operational

© 2006 EMC Corporation. All rights reserved. Business Continuity - 40

There are three different Backup derivatives:


Disaster Recovery addresses the requirement to be able to restore all, or a large part of, an IT
infrastructure in the event of a major disaster.
Archival is a common requirement used to preserve transaction records, email, and other
business work products for regulatory compliance. The regulations could be internal,
governmental, or perhaps derived from specific industry requirements.
Operational is typically the collection of data for the eventual purpose of restoring, at some
point in the future, data that has become lost or corrupted.

Business Continuity - 40
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Reasons for a Backup Plan


y Hardware Failures
y Human Factors
y Application Failures
y Security Breaches
y Disasters
y Regulatory and Business Requirements

© 2006 EMC Corporation. All rights reserved. Business Continuity - 41

Reasons for a backup plan include:


y Physical damage to a storage element (such as a disk) that can result in data loss.
y People make mistakes and unhappy employees or external hackers may breach security and
maliciously destroy data.
y Software failures can destroy or lose data and viruses can destroy data, impact data integrity,
and halt key operations.
y Physical security breaches can destroy equipment that contains data and applications.
y Natural disasters and other events such as earthquakes, lightning strikes, floods, tornados,
hurricanes, accidents, chemical spills, and power grid failures can cause not only the loss of
data but also the loss of an entire computer facility. Offsite data storage is often justified to
protect a business from these types of events.
y Government regulations may require certain data to be kept for extended timeframes.
Corporations may establish their own extended retention policies for intellectual property to
protect them against litigation. The regulations and business requirements that drive data as
an archive generally require data to be retained at an offsite location.

Business Continuity - 41
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

How does Backup Work?


y Client/Server Relationship
y Server
– Directs Operation
– Maintains the Backup Catalog

y Client
– Gathers Data for Backup (a backup client sends backup data to a
backup server or storage node).

y Storage Node

© 2006 EMC Corporation. All rights reserved. Business Continuity - 42

Backup products vary, but they do have some common characteristics. The basic architecture of
a backup system is client-server, with a backup server and some number of backup clients or
agents. The backup server directs the operations and owns the backup catalog (the information
about the backup). The catalog contains the table-of-contents for the data set. It also contains
information about the backup session itself.
The backup server depends on the backup client to gather the data to be backed up. The backup
client can be local or it can reside on another system, presumably to backup the data visible to
that system. A backup server receives backup metadata from backup clients to perform its
activities.
There is another component called a storage node. The storage node is the entity responsible for
writing the data set to the backup device. Typically there is a storage node packaged with the
backup server and the backup device is attached directly to the backup server’s host platform.
Storage nodes play an important role in backup planning as it can be used to consolidate backup
servers.

Business Continuity - 42
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

How does Backup Work?


Clients

Servers

Backup Server
& Storage Node

Metadata
Data Set
Catalog

Disk Tape
Storage Backup
© 2006 EMC Corporation. All rights reserved. Business Continuity - 43

The following represents a typical Backup process:


y The Backup Server initiates the backup process (starts the backup application).
y The Backup Server sends a request to a server to “send me your data”.
y The server sends the data to the Backup Server and/or Storage Node.
y The Storage Node sends the data to the tape storage device and the Backup Server begins
building the catalog (metadata) of the backup session.
y When all of the data has been transferred from the server to the Backup Server, the Backup
Server writes the catalog to a disk file and closes the connection to the tape device.

Business Continuity - 43
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Business Considerations
y Customer business needs determine:
– What are the restore requirements – RPO & RTO?
– Where and when will the restores occur?
– What are the most frequent restore requests?
– Which data needs to be backed up?
– How frequently should data be backed up?
¾ hourly, daily, weekly, monthly
– How long will it take to backup?
– How many copies to create?
– How long to retain backup copies?

© 2006 EMC Corporation. All rights reserved. Business Continuity - 44

Some important decisions that need consideration before implementing a Backup/Restore


solution are shown above. Some examples include:
y The Recovery Point Objective (RPO)
y The Recovery Time Objective (RTO)
y The media type to be used (disk or tape)
y Where and when the restore operations will occur – especially if an alternative host will be
used to receive the restore data.
y When to perform backups.
y The granularity of backups – Full, Incremental or cumulative.
y How long to keep the backup – for example, some backups need to be retained for 4 years,
others just for 1 month
y Is it necessary to take copies of the backup or not

Business Continuity - 44
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Data Considerations: File Characteristics


y Location
y Size
y Number

© 2006 EMC Corporation. All rights reserved. Business Continuity - 45

Location:
y Many organizations have dozens of heterogeneous platforms that support a complex
application. Consider a data warehouse where data from many sources is fed into the
warehouse. When this scenario is viewed as “The Data Warehouse Application”, it easily
fits this model. Some of the issues are:
− How the backups for subsets of the data are synchronized
− How these applications are restored
Size:
y Backing up a large amount of data that consists of a few big files may have less system
overhead than backing up a large number of small files. If a file system contains millions of
small files, the very nature of searching the file system structures for changed files can take
hours, since the entire file structure is searched.
y Number: a file system containing one million files with a ten-percent daily change rate will
potentially have to create 100,000 entries in the backup catalog. This brings up other issues
such as:
− How a massive file system search impacts the system
− Search time/Media impact
− Is there an impact on tape start/stop processing?

Business Continuity - 45
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Data Considerations: Data Compression


Compressibility depends on the data type, for example:
y Application binaries – do not compress well.
y Text – compresses well.
y JPEG/ZIP files – are already compressed and expand if
compressed again.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 46

Many backup devices such as tape drives, have built-in hardware compression technologies. To
effectively use these technologies, it is important to understand the characteristics of the data.
Some data, such as application binaries, do not compress well. Text data can compress very
well, while other data, such as JPEG and ZIP files, are already compressed.

Business Continuity - 46
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Data Considerations: Retention Periods


y Operational
– Data sets on primary media (disk) up to the point where most restore
requests are satisfied, then moved to secondary storage (tape).

y Disaster Recovery
– Driven by the organization’s disaster recovery policy
¾ Portable media (tapes) sent to an offsite location / vault.
¾ Replicated over to an offsite location (disk).
¾ Backed up directly to the offsite location (disk, tape or emulated tape).

y Archiving
– Driven by the organization’s policy.
– Dictated by regulatory requirements.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 47

As mentioned before, there are three types of backup models (Operational, Disaster Recovery,
and Archive). Each can be defined by its retention period. Retention Periods are the length of
time that a particular version of a dataset is available to be restored.
Retention periods are driven by the type of recovery the business is trying to achieve:
y For operational restore, data sets could be maintained on a disk primary backup storage
target for a period of time, where most restore requests are likely to be achieved, and then
moved to a secondary backup storage target, such as tape, for long term offsite storage.
y For disaster recovery, backups must be done and moved to an offsite location.
y For archiving, requirements usually will be driven by the organization’s policy and
regulatory conformance requirements. Tapes can be used for some applications, but for
others a more robust and reliable solution, such as disks, may be more appropriate.

Business Continuity - 47
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Lesson: Summary
Topics in this lesson included:
y Backup and Recovery definitions and examples.
y Common reasons for Backup and Recovery.
y The business considerations for Backup and Recovery.
y Recovery Point Objectives and Recovery Time
Objectives.
y The data considerations for Backup and Recovery
y The planning for Backup and Recovery.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 48

In this lesson we reviewed the business and data considerations when planning for Backup and
Recovery including:
What is a Backup and Recovery?
What is the Backup and Recovery process?
Business recovery needs
y RPO Recovery point objectives
y RTO Recovery time objectives
Data characteristics
y Files, compression, retention

Business Continuity - 48
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Lesson: Backup and Recovery Methods


Upon completion of this lesson, you be able to:
y Describe Hot and Cold Backups.
y Describe the levels of Backup Granularity.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 49

We’ve discussed the importance and considerations for a Backup Plan, now this lesson provides
an overview of the different methods for creating a backup set.

Business Continuity - 49
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Database Backup Methods


y Hot Backup: production is not interrupted.
y Cold Backup: production is interrupted.
y Backup Agents manage the backup of different data
types such as:
– Structured (such as databases)
– Semi-structured (such as email)
– Unstructured (file systems)

© 2006 EMC Corporation. All rights reserved. Business Continuity - 50

Backing up databases can occur useing two different methods:


y A Hot backup, which means that the application is still up and running, with users accessing
it, while backup is taking place.
y A Cold backup, which means that the application will be shut down for the backup to take
place.
Most backup applications offer various Backup Agents to do these kinds of operations. There
will be different agents for different types of data and applications.

Business Continuity - 50
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Backup Granularity and Levels


Full Backup

Cumulative (Differential)

Incremental

Full Cumulative Incremental

© 2006 EMC Corporation. All rights reserved. Business Continuity - 51

The granularity and levels for backups depend on business needs, and, to some extent,
technological limitations. Some backup strategies define as many as ten levels of backup. IT
organizations use a combination of these to fulfill their requirements. Most use some
combination of Full, Cumulative, and Incremental backups.
A Full backup is a backup of all data on the target volumes, regardless of any changes made to
the data itself.
An Incremental backup contains the changes since the last backup, of any type, whichever was
most recent.
A Cumulative backup, also known as a Differential backup, is a type of incremental that
contains changes made to a file since the last full backup.

Business Continuity - 51
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Restoring an Incremental Backup


Monday Tuesday Wednesday Thursday

Files 1, 2, 3 File 4 File 3 File 5 Files 1, 2, 3, 4, 5

Full Backup Incremental Incremental Incremental

Production

y Key Features
– Files that have changed since the last full or incremental backup are
backed up.
– Fewest amount of files to be backed up, therefore faster backup and less
storage space.
– Longer restore because last full and all subsequent incremental backups
must be applied.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 52

The following is an example of an incremental backup and restore:


A full backup of the business data is taken on Monday evening. Each day after that, an
incremental backup is taken. These incremental backups only backup files that are new or that
have changed since the last full or incremental backup.
On Tuesday, a new file is added, File 4. No other files have changed. Since File 4 is a new file
added after the previous backup on Monday evening, it will be backed up Tuesday evening.
On Wednesday, there are no new files added since Tuesday, but File 3 has changed. Since File
3 was changed after the previous evening backup (Tuesday), it will be backed up Wednesday
evening.
On Thursday, no files have changed but a new file has been added, File 5. Since File 5 was
added after the previous evening backup, it will be backed up Thursday evening.
On Friday morning, there is a data corruption, so the data must be restored from tape.
y The first step is to restore the full backup from Monday evening. Then, every incremental
backup that was done since the last full backup must be applied, which, in this example,
means the:
y Tuesday,
y Wednesday, and
y Thursday incremental backups.

Business Continuity - 52
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Restoring a Cumulative Backup


Monday Tuesday Wednesday Thursday

Files 1, 2, 3 File 4 Files 4, 5 Files 4, 5, 6 Files 1, 2, 3, 4, 5, 6

Full Backup Cumulative Cumulative Cumulative

Production

y Key Features
– More files to be backed up, therefore it takes more time to backup
and uses more storage space.
– Much faster restore because only the last full and the last cumulative
backup must be applied.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 53

The following is an example of cumulative backup and restore:


A full backup of the data is taken on Monday evening. Each day after that, a cumulative backup
is taken. These cumulative backups backup ALL FILES that have changed since the LAST
FULL BACKUP.
On Tuesday, File 4 is added. Since File 4 is a new file that has been added since the last full
backup, it will be backed up Tuesday evening.
On Wednesday, File 5 is added. Now, since both File 4 and File 5 are files that have been added
or changed since the last full backup, both files will be backed up Wednesday evening.
On Thursday, File 6 is added. Again, File 4, File 5, and File 6 are files that have been added or
changed since the last full backup; all three files will be backed up Thursday evening.
On Friday morning, there is a corruption of the data, so the data must be restored from tape.
y The first step is to restore the full backup from Monday evening.
y Then, only the backup from Thursday evening is restored because it contains all the
new/changed files from Tuesday, Wednesday, and Thursday.

Business Continuity - 53
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Lesson: Summary
Topics in this lesson included:
y Hot and Cold Backups.
y The levels of Backup Granularity.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 54

This lesson provided an introduction to Backup methods and granularity levels, including hot
and cold backups and the levels of backup granularity.

Business Continuity - 54
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Lesson: Backup Architecture Topologies


Upon completion of this lesson, you be able to:
y Describe DAS, LAN, SAN, Mixed topologies.
y Describe backup media considerations.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 55

We have discussed the importance of the Backup plan and the different methods used when
creating a backup set. This lesson provides an overview of the different topologies and media
types that are used to support creating a backup set.

Business Continuity - 55
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Backup Architecture Topologies


y There are 3 basic backup topologies:
– Direct Attached Based Backup
– LAN Based Backup
– SAN Based Backup

y These topologies can be integrated, forming a “mixed”


topology

© 2006 EMC Corporation. All rights reserved. Business Continuity - 56

There are three basic topologies that are used in a backup environment: Direct Attached Based
Backup, LAN Based Backup, and SAN Based Backup.
There is also a fourth topology, called “Mixed”, which is formed when mixing two or more of
these topologies in a given situation.

Business Continuity - 56
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Direct Attached Based Backups

LAN
Metadata

Data
Catalog
Media
Backup Server Storage Node Backup

© 2006 EMC Corporation. All rights reserved. Business Continuity - 57

Here, the backup data flows directly from the host to be backed up to the tape, without utilizing
the LAN. In this model, there is no centralized management and it is difficult to grow the
environment.
Direct Attached Based Backups are performed directly from the backup client’s disk to the
backup client’s tape devices. The advantages and disadvantages are outlined here. The key
advantage of direct-attached backups is speed. The tape devices can operate at the speed of the
channels. Direct-attached backups optimize backup and restore speed since the tape devices are
close to the data source and dedicated to the host. Disadvantages are Direct-attached backups
impact the host and application performance since backups consume host I/O bandwidth,
memory, and CPU resources. Direct-attached backups potentially have distance restrictions, if
short-distance connections such as SCSI are used.

Business Continuity - 57
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

LAN Based Backups


Database Server File Server Mail Server

Metadata
Data LAN
Data
Metadata

Backup Server Storage Node


© 2006 EMC Corporation. All rights reserved. Business Continuity - 58

In this model, the backup data flows from the host to be backed up to the tape through the LAN.
There is centralized management, but there may be an issue with the LAN utilization since all
data goes through it.
As we have defined previously, Backup Metadata contains information about what has been
backed up, such as file names, time of backup, size, permissions, ownership, and most
importantly, tracking information for rapid location and restore. It also indicates where it has
been stored, for example, which tape. Data, the contents of files, databases, etc., is the primary
information source to be backed up. In a LAN Based Backup, the Backup Server is the central
control point for all backups. The metadata and backup policies reside in the Backup Server.
Storage Nodes control backup devices and are controlled by the Backup Server.
The advantages of LAN Based Backup include the following:
y LAN backups enable an organization to centralize backups and pool tape resources.
y The centralization and pooling can enable standardization of processes, tools, and backup
media. Centralization of tapes can also improve operational efficiency.
Disadvantages are:
y The backup process has an impact on production systems, the client network, and the
applications.
y It consumes CPU, I/O bandwidth, LAN bandwidth, and memory.
y In order to maintain finite backup points, applications might have to be halted and databases
shut down.

Business Continuity - 58
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

SAN Based Backups (LAN Free)


Mail Server
Storage Node

LAN Data
SAN

Metadata

Data

Backup Server
© 2006 EMC Corporation. All rights reserved. Business Continuity - 59

A SAN based backup, also known as LAN Free backup, is achieved when there is no backup
data movement over the LAN. In this case, all backup data travels through a SAN to the
destination backup device.
This type of backup still requires network connectivity from the Storage Node to the Backup
Server, since metadata always has to travel through the LAN.
LAN-free backups use Storage Area Networks (SANs) to move backup data rapidly and reliably.
The SAN is usually used in conjunction with backup software that supports tape device sharing.
A SAN-enabled backup infrastructure introduces these advantages to the backup process. It
provides Fibre Channel performance, reliability, and distance. It requires fewer processes and
reduced overhead. It does not use the LAN to move backup data and eliminates or reduces
dedicated backup servers. Finally, it improves backup and restore performance.

Business Continuity - 59
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

SAN/LAN Mixed Based Backups


Database Server Mail Server Storage Node

Data
LAN

Data

SAN
Metadata

Data

Backup Server
© 2006 EMC Corporation. All rights reserved. Business Continuity - 60

A SAN/LAN Mixed Based Backup environment is achieved by using two or more of the
topologies described in the previous slides. In this example, some servers are SAN based while
others are LAN based.

Business Continuity - 60
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Backup Media
y Tape
– Traditional destination for backups
– Sequential access
– No protection

y Disk
– Random access
– Protected by the storage array (RAID, hot spare, etc)

© 2006 EMC Corporation. All rights reserved. Business Continuity - 61

There are two common types of Backup media, tape and disk.

Business Continuity - 61
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Multiple Streams on Tape Media

Data from
Stream 1 Data from
Stream 2 Data from
Stream 3

Tape

y Multiple streams interleaved to achieve higher


throughput on tape
– Keeps the tape streaming, for maximum write performance
– Helps prevent tape mechanical failure
– Greatly increases time to restore

© 2006 EMC Corporation. All rights reserved. Business Continuity - 62

Tape drive streaming is recommended from all vendors, in order to keep the drive busy. If you
do not keep the drive busy during the backup process (writing), performance will suffer.
Multiple streaming helps to improve performance drastically, but it generates one issue as well:
the backup data becomes interleaved, and thus the recovery times are increased.

Business Continuity - 62
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Backup to Disk
y Backup to disk minimizes tape in backup environments
by using disk as the primary destination device
– Cost benefits
– No processes changes needed
– Better service levels

y Backup to disk aligns backup strategy to RTO and


RPO

© 2006 EMC Corporation. All rights reserved. Business Continuity - 63

Backup to disk replaces tape and its associated devices, as the primary target for backup, with
disk. Backup to disk systems offer major advantages over equivalent scale tape systems, in
terms of capital costs, operating costs, support costs, and quality of service. It can be
implemented fully on day 1 or over a phased approach.
While no changes are needed, any number of enhancements to the process, and the services
provided, are now possible. Backup to disk can be a great enabler. Instead of having tape
technology drive the business processes, the business goals drive the backup strategy.

Business Continuity - 63
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Tape versus Disk – Restore Comparison

Disk 24
Backup / Restore Minutes

Tape 108
Backup / Restore Minutes

0 10 20 30 40 50 60 70 80 90 100 110 120


Recovery Time in Minutes*
*Total time from point of failure to return of service to e-mail users

Typical Scenario:
y 800 users, 75 MB mailbox
y 60 GB database

Source: EMC Engineering and EMC IT

© 2006 EMC Corporation. All rights reserved. Business Continuity - 64 64

This example shows a typical recovery scenario using tape and disk. As you can see, recovery
with disk provides much faster recovery than does recovery with tape.
This example shows a typical recovery scenario using tape and disk. As you can see, recovery
with disk provides much faster recovery than recovery with tape.
Keep in mind that this example involves data recovery only. The time it takes to bring the
application online is a separate matter. Even so, you can see in this example that the benefit was
a restore roughly five times faster than it would have gone with tape. What you don’t see is the
mitigated risk of media failure, and time saved in not having to locate and load the correct tapes
before being able to begin the recovery process.

Business Continuity - 64
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Three Backup / Restore Solutions based on RTO


2 Min.
Restore time
BCV / Clone 17 Min. 19 Minutes
Log playback

Backup on ATA 24 Min. 17 Min. 41 Minutes

Backup on tape 108 Min. 17 Min. 125 Minutes

0 10 20 30 40 50 60 70 80 90 100 110 120 130


Recovery Time in Minutes*
*Total time from point of failure to return of service to e-mail users

Typical Scenario: z Time of last image dictates


y 800 users, 75 MB mailbox the log playback time
y 60 GB DB – restore time z Larger data sets extend the
y 500 MB logs – log playback recovery time (ATA and tape)

© 2006 EMC Corporation. All rights reserved. Business Continuity - 65

The diagram shows typical recovery scenarios using different technical solutions. As you can
see recovery with Business Continuance Volumes (BCVs) clones provides the quickest recovery
method.
It is important to note that using BCV or clones on Disk, enables you to be able to make more
copies of your data more often. This will improve RPO (the point from which they can recover).
It will also improve RTO because the log files will be smaller and that will reduce the log
playback time.

Business Continuity - 65
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Traditional Backup, Recovery and Archive Approach

y Production environment grows


– Requires constant tuning and data placement to
maintain performance
Production – Need to add more tier-1 storage

y Backup environment grows


– Backup windows get longer and jobs do not complete
Backup – Restores take longer
Process – Requires more tape drives and silos to keep up with
service levels

y Archive environment grows


– Impact flexibility to retrieve content when requested
Archive
Process – Requires more media, adding management cost
– No investment protection for long term retention
requirements

© 2006 EMC Corporation. All rights reserved. Business Continuity - 66

In a traditional approach for backup and archive, businesses take a backup of production.
Typically backup jobs use weekly full backups and nightly incremental backups. Based on
business requirements, they will then copy the backup jobs and eject the tapes to have them sent
offsite, where they will be stored for a specified amount of time.
The problem with this approach is simple - as the production environment grows, so does the
backup environment.

Business Continuity - 66
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Differences Between Backup / Recovery & Archive


Backup / Recovery Archive
A secondary copy of information Primary copy of information
Used for recovery operations Available for information retrieval
Improves availability by enabling Adds operational efficiencies by
application to be restored to a moving fixed / unstructured content
specific point in time out of operational environment
Typically short-term (weeks or Typically long-term (months, years,
months) or decades)
Data typically maintained for
Data typically overwritten on
analysis, value generation, or
periodic basis (e.g., monthly)
compliance
Useful for compliance and should
Not for regulatory compliance—
take into account information-
though some are forced to use
retention policy
© 2006 EMC Corporation. All rights reserved. Business Continuity - 67

Backup/Recovery and Archiving support different business and goals. This slide compares and
contrasts some of the differences that are significant.

Business Continuity - 67
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

New Architecture for Backup, Recovery & Archive

Backup 3 2 Archive
Process Production Process
4 4

n Understand the environment


o Actively archive valuable information to tiered
storage
p Back up active production information to disk
q Retrieve from archive or recover from backup

© 2006 EMC Corporation. All rights reserved. Business Continuity - 68

The recovery process is much more important than the backup process. It is based on the
appropriate recovery-point objectives (RPOs) and recovery-time objectives (RTOs). The process
usually drives a decision to have a combination of technologies in place, from online Business
Continuance Volumes (BCVs), to backup to disk, to backup to tape for long-term, passive
RPOs.
Archive processes are determined not only by the required retention times, but also by retrieval-
time service levels and the availability requirements of the information in the archive.
For both processes, a combination of hardware and software is needed to deliver the appropriate
service level. The best way to discover the appropriate service level is to classify the data and
align the business applications with it.

Business Continuity - 68
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Lesson: Summary
Topics in this lesson included:
y The DAS, LAN, SAN, and Mixed topologies.
y Backup media considerations.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 69

This lesson provided an overview of the different topologies and media types that support
creating a backup set.

Business Continuity - 69
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Lesson: Managing the Backup Process


Upon completion of this lesson, you be able to:
y Describe features and functions of common
Backup/Recovery applications.
y Describe the Backup/Recovery process management
considerations.
y Describe the importance of the information found in
Backup Reports and in the Backup Catalog.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 70

We have discussed the planning and operations of creating a Backup. This lesson provides an
overview of Management activities and applications that help manage the Backup and Recovery
process.

Business Continuity - 70
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

How a Typical Backup Application Works


y Backup clients are grouped and associated with a Backup
schedule that determines when and which backup type will
occur.
y Groups are associated with Pools, which determine which
backup media will be used.
y Each backup media has a unique label.
y Information about the backup is written to the Backup Catalog
during and after it completes. The Catalog shows:
– when the Backup was performed, and
– which media was used (label).
y Errors and other information is also written to a log.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 71

The process for using a Backup application includes the following:


y Backup clients are grouped and associated with a Backup schedule that determines when and
which backup type will occur.
y Groups are associated with Pools, which determine which backup media will be used. Each
backup media has a unique label.
y Information about the backup is written to the Backup Catalog during and after it completes.
y The Catalog shows when the Backup was performed, and which media was used (label).
Errors and other information are also written to a log.

Business Continuity - 71
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Backup Application User Interfaces


There are typically two types of user interfaces:
y Command Line Interface – CLI
y Graphical User Interfaces – GUI

© 2006 EMC Corporation. All rights reserved. Business Continuity - 72

There are typically two types of user interfaces. With Command Line Interface, CLI, backup
administrators usually write scripts to automate common tasks, such as sending reports via email.
Graphical User Interfaces, GUI, controls the backup and restore process, multiple backup
servers, multiple storage nodes, and multiple platforms/operating systems. It is a single and
easy to use interface that provides the most common (if not all) administrative tasks.

Business Continuity - 72
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Managing the Backup and Restore Process


y Running the B/R Application: Backup
– The backup administrator configures it to be started, most (if not all)
of the times, automatically
– Most backup products offer the ability for the backup client to initiate
their own backup (usually disabled)

y Running the B/R Application: Restore


– There is usually a separate GUI to manage the restore process
– Information is pulled from the backup catalog when the user is
selecting the files to be restored
– Once the selection is finished, the backup server starts reading from
the required backup media, and the files are sent to the backup
client

© 2006 EMC Corporation. All rights reserved. Business Continuity - 73

There are common tasks associated with managing a Backup or Restore activity using the B/R
Application. These include backup and restore. In backup, it configures a backup to be started,
most (if not all) of the times, automatically, and enables the backup client to initiate its own
backup (Note: usually this feature is disabled).
In restore, there is usually a separate GUI to manage the restore process. Information is pulled
from the backup catalog when the user is selecting the files to be restored. Once the selection is
finished, the backup server starts reading from the required backup media, and the files are sent
to the backup client.

Business Continuity - 73
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Backup Reports
y Backup products also offer reporting features.
y These features rely on the backup catalog and log files.
y Reports are meant to be easy to read and provide
important information such as:
– Amount of data backed up
– Number of completed backups
– Number of incomplete backups (failed)
– Types of errors that may have occurred

y Additional reports may be available, depending on the


backup software product used.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 74

Backup products also offer reporting features. These features rely on the backup catalog and log
files. Reports are meant to be easy to read and provide important information such as amount of
data backed up, number of completed backups, number of incomplete backups (failed), and
types of errors that may have occurred. Additional reports may be available, depending on the
backup software product used.

Business Continuity - 74
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Importance of the Backup Catalog


y As you can see, backup operations strongly rely on the
backup catalog
y If the catalog is lost, the backup software alone has no
means to determine where to find a specific file backed
up two months ago, for example
y It can be reconstructed, but this usually means that all of
the backup media (i.e. tapes) have to be read
y It’s a good practice to protect the catalog
– By replicating the file system where it resides to a remote location
– By backing it up

y Some backup products have built-in mechanisms to


protect their catalog (such as automatic backup)
© 2006 EMC Corporation. All rights reserved. Business Continuity - 75

As you can see, backup operations strongly rely on the backup catalog. If the catalog is lost, the
backup software alone has no means to determine where to find a specific file backed up in the
past. It can be reconstructed, but this usually means that all of the backup media (i.e. tapes) has
to be read. It’s a good practice to protect the catalog by replicating the file system where it
resides, to a remote location, and by backing it up. Some backup products have built-in
mechanisms to protect their catalog (such as automatic backup).

Business Continuity - 75
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Lesson: Summary
Topics in this lesson included:
y The features and functions of common Backup/Recovery
applications.
y The Backup/Recovery process management
considerations.
y The importance of the information found in Backup
Reports and in the Backup Catalog.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 76

This lesson provided an overview of Backup and Recovery management activities and tools.

Business Continuity - 76
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Module Summary
Key points covered in this module:
y The best practices for planning Backup and Recovery.
y The common media and types of data that are part of a
Backup and Recovery strategy.
y The common Backup and Recovery topologies.
y The Backup and Recovery Process.
y Management considerations for Backup and Recovery.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 77

These are the key points covered in this module. Please take a moment to review them.

Business Continuity - 77
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Apply Your Knowledge…


Upon completion of this topic, you will be able to:

y Describe EMC’s product implementation of a Backup


and Recovery solution.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 78

Business Continuity - 78
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

EMC NetWorker
Tiered Protection and Recovery Management
Remove risk
Faster and more consistent data
backup

Backup Advanced
Basic
Improve reliability Tape backup
to disk
Disk-backup
backup
Snapshot
Keep recovery copies fresh and and recovery
option management
reduce process errors

Lower total cost of Low SERVICE-LEVEL REQUIREMENTS High

ownership
Centralization and ease of use

© 2006 EMC Corporation. All rights reserved. Business Continuity - 79

NetWorker’s installed base of more than 20,000 customers worldwide is a testament to the
product’s market leadership.
Data-growth rates are accelerating, and the spectrum of data and systems that live in
environments runs the gamut from key applications that are central to the business to other types
of information that may be less important.
What is interesting is that the industry has been somewhat stuck for several years at a one-size-
fits-all strategy to backup and recovery. We’re referring to a “basic” backup scenario, or
traditional tape backup.
Tape backup serves a noble purpose and is working very well for some companies—it’s been
EMC’s core business for some time, so EMC knows it well. But shifting market dynamics, as
well as more demanding business environments, have lead to other important choices for
backup.
Today, traditional tape faces the challenge of meeting service-level requirements for protection
and availability of an ever-increasing quantity of enterprise data. This is why EMC has built into
NetWorker key options to meet the needs of a wide range of environments. This includes the
ability to use disk for backup, as well as to take advantage of advanced-backup capabilities that
connect backup with array-based snapshot and replication management. These provide you with
essentially the highest-possible performance levels for backup and recovery. As the value of
information changes over time, you may choose any one of these, or a combination thereof, to
meet your needs.
Business Continuity - 79
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

NetWorker Backup and Recovery


Solution Features
y Enterprise protection
Basic Architecture – Critical applications
Heterogeneous – Heterogeneous platforms and
clients storage
Key
applications – Scalable architecture
– 256-bit AES encryption and secure
authentication
LAN
y Centralized management
Backup
server – Graphical user interface
NAS – Customizable reporting
Storage
(NDMP) Node – Wizard-driven configuration
SAN
y Performance
– Data multiplexing
Tape
library – Advanced indexing
– Efficient media management
© 2006 EMC Corporation. All rights reserved. Business Continuity - 80

The first key focus is on providing complete coverage. Enterprise protection means the ability to
provide coverage for all the components in the environment. NetWorker provides data
protection for the widest heterogeneous support of operating systems, and is integrated with
leading databases and applications for complete data protection.
A single NetWorker server can be used to protect all clients and servers in the environment—or
secondary servers can be employed, which EMC calls Storage Nodes, as a conduit for additional
processing power or to protect large critical servers directly across a SAN without having to take
data back over the network. Such LAN-free backup is standard with NetWorker.
NetWorker can easily back up environments in LAN, SAN, or WAN environments, with
coverage for key storage such as NAS. As a matter of fact, NetWorker’s NAS-protection
capabilities, leveraging the Network Data Management Protocol (NDMP), are unequaled.
The key here is that NetWorker can easily grow and scale as needed in the environment and
provide advanced functionality, including clustering technologies, open-file protection and
compatibility with tape hardware and the new class of virtual-tape and virtual-disk libraries.
While NetWorker encompasses all these pieces in the environment, EMC has made sure there is
a common set of management tools.
With NetWorker, EMC has focused on what it takes within environments both large and small
to get the best performance possible, in terms of both speed and reliability. This means the
inclusion of capabilities such as multiplexing to protect data as quickly as possible while making
use of the backup storage’s maximum bandwidth. It also means ensuring that the way in which
EMC indexes and manages the saving of data is designed to provide Business
not only the best
Continuity - 80
performance but also stability and reliability
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Critical Application and Database Protection

Backup without Application Backup with NetWorker


Modules Application Modules
Offline (Cold)

Shut down Application


application Application

NetWorker MODULE

24x7 OPERATIONS
DOWNTIME
Back up
application SAVE

Restart
application Application

Integration with application APIs


for backup and recovery

© 2006 EMC Corporation. All rights reserved. Business Continuity - 81

Applications can be backed up either offline or online. NetWorker by itself can back up closed
applications as flat files. During an offline, or cold, backup, the application is shut down, backed
up and restarted after the backup is finished.
This is fine, but during the shutdown and backup period, the application will be unavailable.
This is not acceptable in today’s business environments. This is why EMC has worked to
integrate NetWorker with applications to provide online backup—specifically, with the use of
NetWorker in conjunction with NetWorker Modules.
During an online, or hot, backup, the application is open and is backed up while open. The
NetWorker Module extracts data for backup with an API; the application need not be shut down,
and remains open while the backup finishes.
NetWorker supports a wide range of applications for online backup with granular-level
recovery, including:
y Oracle
y Microsoft Exchange
y Microsoft SQL Server
y Lotus Notes
y Sybase
y Informix
y IBM DB2
y EMC Documentum Business Continuity - 81
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Media-Management Advantages
y Open Tape Format
– Datastream multiplexing
– Self-contained indexing
– Cross-platform format
¾ UNIX ÅÆ Windows ÅÆ Linux
– Minimize impact of tape corruption

y Dynamic drive sharing


– Cross-platform tape-drive sharing
– On-demand device usage
NetWorker NetWorker
– Reduce hardware total cost of UNIX/Linux Windows
ownership

© 2006 EMC Corporation. All rights reserved. Business Continuity - 82

One key advantage of NetWorker is its media-management features.


The first feature is Open Tape Format. It is NetWorker’s way of recording data to tape,
specifically designed to provide several advantages:
y Data can be multiplexed, or interleaved, for performance. This essentially means data can be
accepted and written to the backup media as it comes in, regardless of what order it comes
in, so the tape drives can keep spinning. This enables you to back up faster, but also reduces
wear and tear on the tape hardware, which is more susceptible to error if it is continually
stopping and starting.
y Tapes created by NetWorker are self-describing, so if everything else is gone except for the
tape, you’ll be able to load it and understand what data is there to be restored.
y As the image on the right indicates, Open Tape Format allows you to move tape media
between systems and servers on unlike operating systems—with Open Tape Format, a tape
that began life on a UNIX-based system can easily be read on a Windows-based system.
This is key not just for disaster recovery, but for the entire environment, as you go through a
regular system lifecycle and adopt new platforms.
y Also, with Open Tape Format, NetWorker can skip bad spots on tape and continue data
access. When other solutions on the market encounter any error on tape, they are unable to
do anything further with the tape. Imagine if there is a bad spot 100 MB into a backup
tape…
y Finally, NetWorker can broker tape devices on a SAN to get the best use and performance
out of the hardware investment. So, instead of hard-assigning tape drives to a backup server
or Storage Node, you can dynamically allocate any drive on demand. Business Continuity - 82
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

NetWorker DiskBackup Option

y High performance
Backup-to-Disk Architecture
– Simultaneous-access operations
Heterogeneous – No penalty on restore versus tape
clients
Key
applications y Policy-based migration of data
from disk to tape
– Automated staging and cloning
LAN
– Up to 50% faster
Backup – Clone backups jobs as they
server complete
NAS Storage
– Reduce wear and tear on tape
Node drives and cartridges
SAN
y Superior capability
Disk- – Operational backup and recovery
Tape backup for all clients, including NAS with
library target
NDMP
– Direct file access for fast recovery
© 2006 EMC Corporation. All rights reserved. Business Continuity - 83

The focus here is the resolution of the top pain points around traditional tape-based backup.
Performance—NetWorker backup to disk allows for simultaneous-access operations to a
volume, both reads (restore, staging, cloning) and writes (backups). With NetWorker, as
opposed to with traditional tape-only backup, you don’t "pay a penalty on restore."
Also, cloning from disk to tape is up to 50% faster. Why? As soon as the Save Set (backup job)
is complete, the cloning process can begin without the Administrator having to wait for all the
backup jobs to complete. NetWorker can back up to disk and clone to tape at the same time.
You don’t have to spend 12–16 hours a day running clone operations (tape-to-tape copies)—in
fact, you might actually be able to eliminate the clone jobs. Some NetWorker customers have
seen cloning times reduced from 12–16 hours daily to three to four hours daily.
Cloning from disk to tape also augments the disaster-recovery strategy for tape. As data grows,
more copies must be sent offsite. Because NetWorker backup to disk improves cloning
performance, you can now continue to meet the daily service-level agreements to get tapes
offsite to a vaulting provider.
Taking the idea of leveraging disk even idea further leads us into a discussion of to NetWorker’s
advanced backup capability, which also leverages disk-based technologies.

Business Continuity - 83
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Advanced Backup - Snapshots and CDP


y Integration of backup with snapshots, full-volume mirrors,
and Continuous Data Protection (CDP) Production
server
y Instant restore
y Off-host backups
Production
y Achieve stringent recovery-time objectives (RTOs),
information

recovery-point objectives (RPOs)


Recover

It is expected that snapshot Backup

technology for data protection will


surpass backup to tape as the trend Snapshot Backup snap
in data protection as organizations 5:00 p.m. 10:00 p.m.
Snapshot
continue to focus on recovery times 11:00 a.m. Backup
server
© 2006 EMC Corporation. All rights reserved. Business Continuity - 84

Disk-solution providers, like EMC, provide array-based abilities to perform snapshots and
replication. These “point-in-time” copies of data allow for instant recovery of disk and data
volumes. Many are likely familiar with array-based replication or snapshot capabilities.
NetWorker is engineered to take advantage of these capabilities by providing direct tie-ins with
EMC offerings such as CLARiiON with SnapView, or Symmetrix with TimeFinder/Snap. This
will enable you to begin to meet the most stringent recovery requirements.
In a study done in the spring of 2004, the Taneja Group identified that the market intends to rely
on snapshots for ensuring application-data availability and rapid recoveries. The figures
represent a scale of one to five, with one as the low point, five as the high point:
y Rapid application recovery (4.34)
y Ability to automate backup to tape (4.13)
y Instant backup (3.98)
y Roll back to point in time (3.88)
y Integration with backup strategy (3.87)
y Flexibility to leverage hardware (3.61)
y Multiple fulls throughout day (3.49)

Business Continuity - 84
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

NetWorker PowerSnap Module


y Policy-based management
Advanced Backup
– Administer snapshots in
Heterogeneous NetWorker
clients
Key – Schedule, create, retain, and
applications delete snapshots by policy

y Third-party integration
LAN – Leverage third-party replication
technology
Backup
server ¾ Array-based (Symmetrix DMX,
CLARiiON CX, etc.)
NAS Storage ¾ Software-based (RecoverPoint)
Node
SAN
y Application recovery
CLARiiON – Integration with Application
Tape with Modules to ensure consistent
library SnapView state
¾ Exchange / SQL / Oracle / SAP

© 2006 EMC Corporation. All rights reserved. Business Continuity - 85

In addition to traditional backup-and-recovery application modules for disk and tape, the
snapshot management capability called NetWorker PowerSnap enables you to meet the
demanding service-level agreement requirements in both tape and disk environments by
seamlessly integrating snapshot technology and applications. NetWorker PowerSnap software
works with NetWorker Modules to enable snapshot backups of applications—with consistency.
PowerSnap performs snapshot management by policy—just like standard backup policies to
tape or disk. It uses these policies to determine how many snapshots to create, how long to retain
the snapshots, when to do backups to tape from specified snapshots…all based on business
needs that you define.
Note to Presenter: Click now in Slide Show mode for animation.
For example, snapshots might be taken every few hours, and the three most recent are retained.
You can easily leverage any of those snapshots to back up to tape in an off-host fashion—i.e.,
with no impact to the application servers.
PowerSnap manages the full life cycle of snapshots, including creation, scheduling, backups,
and expiration. This, along with its orchestration with applications, provides a comprehensive
solution for complete application-data protection to help you meet the most stringent of RTOs
and RPOs.

Business Continuity - 85
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

NetWorker SnapImage Module

Advanced Backup y Block-level backups


– Host-based snapshot
– Targeted at high-density file
systems
– Single-file restore
– Sparse backups

y High performance
– Significant backup-and-restore
performance impact—up to 10
times faster
– Drive tape at rated speeds
10,000,000+ files – Optional network-accelerated
serverless backup with Cisco
1,000,000+ directories
intelligent switch

© 2006 EMC Corporation. All rights reserved. Business Continuity - 86

If there are servers with lots of files and lots of directories—what we refer to as high-density file
systems—backup and recovery are particularly challenging. With so many files, traditional
backup struggles to keep up with backup windows.
NetWorker SnapImage enables block-level backup of these file systems while maintaining the
ability to restore a single file. SnapImage is intelligent enough to also support sparse backups.
y Sparse files contain data with portions of empty blocks, or “zeroes.”
y NetWorker backs up only the non-zero blocks, thereby reducing:
− Time for backup
− Amount of backup-media space consumed
y Sparse-file examples:
− Large database files with deleted data or unused database fields
− Files from image applications
y With the NetWorker SnapImage Module, backup and recovery of servers with high-density
file systems is significantly increased:
− The time required to back up 18.8 million 1 KB files in a 100 GB file system with a
block size of 4 KB can be reduced from 31 to seven hours.
− The time required to perform a Save Set restore of one million 4 KB files in a 5.36 GB
internal disk can be reduced from 72 to seven minutes.

Business Continuity - 86
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Solution Example: Major Telecom Company


Enterprise-Information Protection

Business Challenge: Solution:


y NetWorker PowerSnap with
y Complex application Symmetrix and TimeFinder/Snap
environment – Server-free backup

y No backup window y NetWorker DiskBackup Option with


CLARiiON with ATA disks
y Recovery-time objective: – Rapid primary-site protection
Restore 24 TB in two y NetWorker and SRDF/S
hours – Disaster recovery
– Offsite protection

Disaster-Recovery Site Production Site


NetWorker
Disaster- Storage Node
recovery host Application Storage Node Value Proposition
host PowerSnap
Tape Zero backup window for
library SAN
SAN Tape applications
library Eliminated data-loss risk
SRDF/S
Reduced management
Symmetrix Symmetrix CLARiiON
DMX DMX CX overhead

© 2006 EMC Corporation. All rights reserved. Business Continuity - 87 87

EMC has worked with a large Telecommunications company to meet their most demanding IT
challenges:
y Complex application environment—Oracle, and lots of data
y No backup window
y Recovery-time objective: Restore 24 TB in two hours.
They chose to implement NetWorker, along with other key EMC offerings, to achieve a superior
level of protection and recovery management—and confidence in the ability to recover.
Solution:
y NetWorker PowerSnap with Symmetrix and TimeFinder/Snap
− Server-free backup and rapid recovery
y NetWorker DiskBackup with CLARiiON with ATA disks
− Rapid primary-site protection and recovery
y NetWorker and SRDF/S
− Disaster recovery, offsite protection
Here is what they have been able to achieve with the above:
y Zero backup time for their applications
y Zero data loss
y Significantly reduced management overhead
Not all environments will be this complex or demanding, but NetWorker can meet any backup
Business Continuity - 87
and recovery requirements, and can easily be upgraded to meet more stringent requirements as
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Local Replication
After completing this module you will be able to:
y Discuss replicas and the possible uses of replicas
y Explain consistency considerations when replicating file
systems and databases
y Discuss host and array based replication technologies
– Functionality
– Differences
– Considerations
– Selecting the appropriate technology

© 2006 EMC Corporation. All rights reserved. Business Continuity - 88

In this section, we will look at what replication is, technologies used for creating local replicas,
and things that need to be considered when creating replicas.

Business Continuity - 88
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

What is Replication?
y Replica - An exact copy (in all details)
y Replication - The process of reproducing data

REPLICATION

Original Replica

© 2006 EMC Corporation. All rights reserved. Business Continuity - 89

Local replication is a technique for ensuring Business Continuity by making exact copies of
data. With replication, data on the replica will be identical to the data on the original at the
point-in-time that the replica was created.
Examples:
y Copy a specific file
y Copy all the data used by a database application
y Copy all the data in a UNIX Volume Group (including underlying logical volumes, file
systems, etc.)
y Copy data on a storage array to a remote storage array

Business Continuity - 89
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Possible Uses of Replicas


y Alternate source for backup
y Source for fast recovery
y Decision support
y Testing platform
y Migration

© 2006 EMC Corporation. All rights reserved. Business Continuity - 90

Replicas can be used to address a number of Business Continuity functions:


y Provide an alternate source for backup to alleviate the impact on production.
y Provide a source for fast recovery to facilitate faster RPO and RTO.
y Decision Support activities such as reporting.
– For example, a company may have a requirement to generate periodic reports. Running
the reports off of the replicas greatly reduces the burden placed on the production
volumes. Typically reports would need to be generated once a day or once a week, etc.
y Developing and testing proposed changes to an application or an operating environment.
– For example, the application can be run on an alternate server using the replica volumes
and any proposed design changes can be tested.
y Data migration.
– Migration can be as simple as moving applications from one server to the next, or as
complicated as migrating entire data centers from one location to another.

Business Continuity - 90
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Considerations
y What makes a replica good?
– Recoverability
¾ Considerations for resuming operations with primary
– Consistency/re-startability
¾ How is this achieved by various technologies

y Kinds of Replicas
– Point-in-Time (PIT) = finite RPO
– Continuous = zero RPO

y How does the choice of replication technology tie back


into RPO/RTO?

© 2006 EMC Corporation. All rights reserved. Business Continuity - 91

Key factors to consider with replicas:


• What makes a replica good:
– Recoverability from a failure on the production volumes. The replication technology
must allow for the restoration of data from the replicas to the production and then allow
production to resume with a minimal RPO an RTO.
– Consistency/re-startability is very important if data on the replicas will be accessed
directly or if the replicas will be used for restore operations.
• Replicas can either be Point-in-Time (PIT) or continuous:
• Point-in-Time (PIT) - the data on the replica is an identical image of the production at
some specific timestamp
– For example, a replica of a file system is created at 4:00 PM on Monday. This replica
would then be referred to as the Monday 4:00 PM Point-in-Time copy.
Note: The RPO will be a finite value with any PIT. The RPO will map to the time when the PIT
was created to the time when any kind of failure on the production occurred. If there is a failure
on the production at 8:00 PM and there is a 4:00 PM PIT available, the RPO would be 4 hours (8
– 4 = 4). To minimize RPO with PITs, take periodic PITs.
• Continuous replica - the data on the replica is synchronized with the production data at
all times.
– The objective with any continuous replication is to reduce the RPO to zero.
Business Continuity - 91
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Replication of File Systems


Host

Apps

Operating System

DBMS Mgmt Utilities


File System Buffer

Volume Management

Multi-pathing Software
Device Drivers
HBA HBA HBA

Physical Volume
© 2006 EMC Corporation. All rights reserved. Business Continuity - 92

Most OS file systems buffer data in the host before the data is written to the disk on which the
file system resides.
• For data consistency on the replica, the host buffers must be flushed prior to the creation of
the PIT. If the host buffers are not flushed, the data on the replica will not contain the
information that was buffered on the host.
• Some level of recovery will be necessary
Note: If the file system is unmounted prior to the creation of the PIT no recovery would be
needed when accessing data on the replica.

Business Continuity - 92
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Replication of Database Applications


y A database application may be spread out over
numerous files, file systems, and devices—all of which
must be replicated
y Database replication can be offline or online

Data Logs

© 2006 EMC Corporation. All rights reserved. Business Continuity - 93

Database replication can be offline or online:


y Offline – replication takes place when the database and the application are shutdown.
y Online – replication takes place when the database and the application are running.

Business Continuity - 93
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Database: Understanding Consistency


y Databases/Applications maintain integrity by following the
“Dependent Write I/O Principle”
– Dependent Write: A write I/O that will not be issued by an application
until a prior related write I/O has completed
¾ A logical dependency, not a time dependency
– Inherent in all Database Management Systems (DBMS)
¾ e.g. Page (data) write is dependent write I/O based on a successful log
write
– Applications can also use this technology
– Necessary for protection against local outages
¾ Power failures create a dependent write consistent image
¾ A Restart transforms the dependent write consistent to transactionally
consistent
™ i.e. Committed transactions will be recovered, in-flight transactions will be
discarded
© 2006 EMC Corporation. All rights reserved. Business Continuity - 94

All logging database management systems use the concept of dependent write I/Os to maintain
integrity. This is the definition of dependent write consistency. Dependent write consistency is
required for the protection against local power outages, loss of local channel connectivity, or
storage devices. The logical dependency between I/Os is built into database management
systems, certain applications, and operating systems.

Business Continuity - 94
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Database Replication: Transactions

Buffer
1 1

2 2 Data

Database 3 3
Application

4 4

Log

© 2006 EMC Corporation. All rights reserved. Business Continuity - 95

Database applications require that for a transaction to be deemed complete a series of writes
have to occur in a particular order (Dependent Write I/O), these writes would be recorded on the
various devices/file systems.
In this example, steps 1-4 must complete for the transaction to be deemed complete.
• Step 4 is dependent on Step 3 and will occur only if Step 3 is complete
• Step 3 is dependent on Step 2 will occur only if Step 2 is complete
• Step 2 is dependent on Step 1 will occur only if Step 1 is complete
Steps 1-4 are written to the database’s buffer and then to the physical disks.

Business Continuity - 95
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Database Replication: Consistency

Source Replica

1 1

Data Data
2 2

3 3

4 4

Log Log
Consistent

Note: In this example, the database is online.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 96

At the point in time when the replica is created, all the writes to the source devices must be
captured on the replica devices to ensure data consistency on the replica.
• In this example, steps 1-4 on the source devices must be captured on the replica devices for
the data on the replicas to be consistent.

Business Continuity - 96
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Database Replication: Consistency

Source Replica

2
Data

3 3

4 4

Log
Inconsistent

Note: In this example, the database is online.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 97

Creating a PIT for multiple devices happens quickly, but not instantaneously.
• Steps 1-4 which are dependent write I/Os have occurred and have been recorded successfully
on the source devices
• It is possible that steps 3 and 4 were copied to the replica devices, while steps 1 and 2 were
not copied.
• In this case, the data on the replica is inconsistent with the data on the source. If a restart
were to be performed on the replica devices, Step 4 which is available on the replica might
indicate that a particular transaction is complete, but all the data associated with the
transaction will be unavailable on the replica making the replica inconsistent.

Business Continuity - 97
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Database Replication: Ensuring Consistency


y Off-line Replication
– If the database is offline or Source Replica
shutdown and then a replica is
created, the replica will be
consistent
– In many cases, creating an offline Data
replica may not be a viable due to
the 24x7 nature of business

Database
Application
(Offline)

Log
Consistent

© 2006 EMC Corporation. All rights reserved. Business Continuity - 98

Database replication can be performed with the application offline (i.e., application is shutdown,
no I/O activity) or online (i.e., while the application is up and running). If the application is
offline, the replica will be consistent because there is no activity. However, consistency is an
issue if the database application is replicated while it is up and running.

Business Continuity - 98
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Database Replication: Ensuring Consistency


y Online Replication
Source Replica
– Some database applications allow
replication while the application is up
and running 1

– The production database would have to


be put in a state which would allow it to Data
2
be replicated while it is active
– Some level of recovery must be
performed on the replica to make the 3 3
replica consistent

4 4

Log
Inconsistent

© 2006 EMC Corporation. All rights reserved. Business Continuity - 99

In the situation shown, Steps 1-4 are dependent write I/Os. The replica is inconsistent because
Steps 1 & 2 never made it to the replica. To make the database consistent, some level of
recovery would have to be performed. In this example, this could be done by simply discarding
the transaction that was represented by Steps 1-4. Many databases are capable of performing
such recovery tasks.

Business Continuity - 99
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Database Replication: Ensuring Consistency

Source Replica

1 1

5 5 2 2

3 3

4 4

Consistent

© 2006 EMC Corporation. All rights reserved. Business Continuity - 100

An alternative way to ensure that an online replica is consistent is to:


• Hold I/O to all the devices at the same instant.
• Create the replica.
• Release the I/O.
Holding I/O is similar to a power failure and most databases have the ability to restart from a
power failure.
Note: While holding I/O simultaneously one ensures that the data on the replica is identical to
that on the source devices, the database application will timeout if I/O is held for too long.

Business Continuity - 100


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Tracking Changes After PIT Creation

At PIT Later Resynch

Source = Target Source ≠ Target Source = Target

© 2006 EMC Corporation. All rights reserved. Business Continuity - 101

Changes will occur on the production volume after the creation of a PIT, changes could also
occur on the target. Typically the target device will be re-synchronized with the source device at
some future time in order to obtain a more recent PIT.
Note: The replication technology employed should have a mechanism to keep track of changes.
This makes the re-synchronization process will be much faster. If the replication technology
does not track changes between the source and target, every resynchronization operation will
have to be a full operation.

Business Continuity - 101


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Local Replication Technologies


y Host based
– Logical Volume Manager (LVM) based mirroring
– File System Snapshots

y Storage Array based


– Full volume mirroring
– Full volume: Copy on First Access
– Pointer based: Copy on First Write

© 2006 EMC Corporation. All rights reserved. Business Continuity - 102

Replication technologies can classified by:


• Distance over which replication is performed - local or remote
• Where the replication is performed - host or array based
– Host based - all the replication is performed by using the CPU resources of the host
using software that is running on the host.
– Array based - all replication is performed on the storage array using CPU resources on
the array via the array’s operating environment.
Note: In the context of this discussion, local replication refers to replication that is performed
within a data center if it is host based and within a storage array if it is array based.

Business Continuity - 102


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Logical Volume Manager: Review


y Host resident software responsible for
creating and controlling host level logical Logical Storage
storage
– Physical view of storage is converted to a
logical view by mapping. Logical data
blocks are mapped to physical data blocks.
– Logical layer resides between the physical
layer (physical devices and device drivers)
and the application layer (OS and
applications see logical view of storage). LVM

y Usually offered as part of the operating


system or as third party host software
y LVM Components:
– Physical Volumes
– Volume Groups
– Logical Volumes Physical Storage

© 2006 EMC Corporation. All rights reserved. Business Continuity - 103

Logical Volume Managers (LVMs) introduce a logical layer between the operating system and
the physical storage. LVMs have the ability to define logical storage structures that can span
multiple physical devices. The logical storage structures appear contiguous to the operating
system and applications.
The fact that logical storage structures can span multiple physical devices provides flexibility
and additional functionality:
• Dynamic extension of file systems
• Host based mirroring
• Host based striping
The Logical Volume Manager provides a set of operating system commands, library
subroutines, and other tools that enable the creation and control of logical storage.

Business Continuity - 103


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Volume Groups
y One or more Physical Volumes
form a Volume Group Physical Physical Physical
Volume 1 Volume 2 Volume 3
y LVM manages Volume Groups
as a single entity
y Physical Volumes can be added
and removed from a Volume
Group as necessary
Volume Group
y Physical Volumes are typically Physical
divided into contiguous equal- Disk
Block
sized disk blocks
y A host will always have at least
one disk group for the Operating
System
– Application and Operating
System data maintained in
separate volume groups
© 2006 EMC Corporation. All rights reserved. Business Continuity - 104

A Volume Group is created by grouping together one or more Physical Volumes. Physical
Volumes:
• Can be added or removed from a Volume Group dynamically.
• Cannot be shared between Volume Groups, the entire Physical Volume becomes part of a
Volume Group.
Each Physical Volume is partitioned into equal-sized data blocks. The size of a Logical Volume
is based on a multiple of the equal-sized data block.
The Volume Group is handled as a single unit by the LVM.
• A Volume Group, as a whole, can be activated or deactivated.
• A Volume Group would typically contain related information. For example, each host would
have a Volume Group which holds all the OS data, while applications would be on separate
Volume Groups.
Logical Volumes are created within a given Volume Group. A Logical Volume can be thought
of as a virtual disk partition, while the Volume Group itself can be though of as a disk. A
Volume Group can have a number of Logical Volumes.

Business Continuity - 104


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Logical Volumes
Logical Volume

Logical Volume Logical Disk


Block

Physical Volume 1 Physical Volume 2 Physical Volume 3

Physical Disk
Volume Group Block

© 2006 EMC Corporation. All rights reserved. Business Continuity - 105

Logical Volumes (LV) form the basis of logical storage. They contain logically contiguous data
blocks (or logical partitions) within the volume group. Each logical partition is mapped to at
least one physical partition on a physical volume within the Volume Group. The OS treats an
LV like a physical device and accesses it via device special files (character or block). A Logical
Volume:
• Can only belong to one Volume Group. However, a Volume Group can have multiple LVs.
• Can span multiple physical volumes.
• Can be made up of physical disk blocks that are not physically contiguous.
• Appears as a series of contiguous data blocks to the OS.
• Can contain a file system or be used directly. Note: There is a one-to-one relationship
between LV and a File System.
Note: Under normal circumstances there is a one-to-one mapping between a logical and physical
Partition. A one-to-many mapping between a logical and physical partition leads to mirroring of
Logical Volumes.

Business Continuity - 105


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Host Based Replication: Mirrored Logical Volumes

PVID1 Physical
VGDA
Host Logical Volume Volume 1

Logical Volume

PVID2 Physical
VGDA
Volume 2

© 2006 EMC Corporation. All rights reserved. Business Continuity - 106

Logical Volumes may be mirrored to improve data availability. In mirrored logical volumes
every logical partition will map to 2 or more physical partitions on different physical volumes.
y Logical volume mirrors may be added and removed dynamically
y A mirror can be split and data contained used independently
The advantages of Mirroring a Logical Volume are high availability and load balancing during
reads if the parallel policy is used. The cost of mirroring is additional CPU cycles necessary to
perform two writes for every write and the longer cycle time needed to complete the writes.

Business Continuity - 106


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Host Based Replication: File System Snapshots


y Many LVM vendors will allow the creation of File System
Snapshots while a File System is mounted
y File System snapshots are typically easier to manage
than creating mirrored logical volumes and then splitting
them

© 2006 EMC Corporation. All rights reserved. Business Continuity - 107

Many Logical Volume Manager vendors will allow the creation of File System Snapshots while
a File System is mounted. File System snapshots are typically easier to manage than creating
mirrored logical volumes and then splitting them.

Business Continuity - 107


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Host (LVM) Based Replicas: Disadvantages


y LVM based replicas add overhead on host CPUs
y If host devices are already Storage Array devices then
the added redundancy provided by LVM mirroring is
unnecessary
– The devices will have some RAID protection already

y Host based replicas can be usually presented back to the


same server
y Keeping track of changes after the replica has been
created

© 2006 EMC Corporation. All rights reserved. Business Continuity - 108

Host based replicas can be usually presented back to the same server:
y Using the replica from the same host for any BC operation will add an additional CPU
burden on the server
y Replica is useful for fast recovery if there is any logical corruption on the source at the File
System level
y Replica itself may become unavailable if there is a problem at the Volume Group level
y If the Server fails, then the replica and the source would be unavailable until the server is
brought online or another server is given access to the Volume group
y Presenting a LVM based local replica to a second host is usually not possible because the
replica will still be part of the volume group which is usually accessed by one host at any
given time
Keeping track of changes after the replica has been created:
y If changes are not tracked all future resynchronization will be a full operation
y Some LVMs may offer incremental resynchronization

Business Continuity - 108


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Storage Array Based Local Replication


y Replication performed by the Array Operating
Environment
y Replicas are on the same array

Array

Source Replica

Production Business
Server Continuity Server

© 2006 EMC Corporation. All rights reserved. Business Continuity - 109

With storage array based local replication:


y Replication performed by the Array Operating Environment
− Array CPU resources are used for the replication operations
− Host CPU resources can be devoted to production operations instead of replication
operations
y Replicas are on the same array
− Can be accessed by an alternate host for any BC operations
y Typically array based replication is performed at a array device level.
− Need to map storage components used by an application back to the specific array
devices used – then replicate those devices on the array.
− A database could be laid out on over multiple physical volumes which belong. One
would have to replicate all the devices for a PIT copy of the database.

Business Continuity - 109


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Storage Array Based – Local Replication Example


y Typically Array based replication is done at a array device
level
– Need to map storage components used by an application/file system
back to the specific array devices used – then replicate those
devices on the array
Array 1

c12t1d1 c12t1d2
Logical Volume 1 Source Replica
File System 1 Vol 1 Vol 1

Source Replica
Vol 2 Vol 2

Volume Group 1

© 2006 EMC Corporation. All rights reserved. Business Continuity - 110

In this example, File System 1 has to be replicated.


y File System 1 is actually built on Logical Volume 1, which in turn is a part of Volume
Group 1 which is made up of two Physical Volumes c12t1d1 and c12t1d2.
y These physical volumes are actually residing in Array 1 and are Source Vol1 and Source
Vol2.
y In order to replicate File System 1, one has to actually replicate the two Array Devices.
y Since 2 Array Volumes have to replicated we need two Array Volumes to act as the replica
volumes. In this example Replica Vol1 and Replica Vol2 will be used for the replication.

Business Continuity - 110


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Based Local Replication: Full Volume Mirror

Attached

Read/Write Not Ready

Source Target
Array

© 2006 EMC Corporation. All rights reserved. Business Continuity - 111

Full volume mirroring is achieved by attaching the target device to the source device and then
copying all the data from the source to the target. The target is unavailable to its host while it is
attached to the source and the synchronization occurs.
y Target (Replica) device is attached to the Source device and the entire data from the source
device is copied over to the target device
y During this attachment and synchronization period the Target device is unavailable

Business Continuity - 111


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Based Local Replication: Full Volume Mirror

Detached - PIT

Read/Write Read/Write

Source Target
Array

© 2006 EMC Corporation. All rights reserved. Business Continuity - 112

After the synchronization is complete the target can be detached from the source and be made
available for Business Continuity operations. The point-in-time (PIT) is determined by the time
of detachment or separation of the Source and Target. For example, if the detachment time is
4:00 PM, the PIT of the replica is 4:00 PM.

Business Continuity - 112


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Based Local Replication: Full Volume Mirror


y For future re-synchronization to be incremental, most
vendors have the ability to track changes at some level of
granularity (e.g., 512 byte block, 32 KB, etc.)
– Tracking is typically done with some kind of bitmap

y Target device must be at least as large as the Source


device
– For full volume copies the minimum amount of storage required is
the same as the size of the source

© 2006 EMC Corporation. All rights reserved. Business Continuity - 113

For future re-synchronization to be incremental, most vendors have the ability to track changes
at some level of granularity, such as 512 byte block, 32 KB, etc. Tracking is typically done with
some kind of bitmap. The target device must be at least as large as the Source device. For full
volume copies the minimum amount of storage required is the same as the size of the source.

Business Continuity - 113


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Copy on First Access (COFA)


y Target device is made accessible for BC tasks as soon
as the replication session is started
y Point-in-Time is determined by time of activation
y Can be used in Copy First Access mode (deferred) or in
Full Copy mode
y Target device is at least as large as the Source device

© 2006 EMC Corporation. All rights reserved. Business Continuity - 114

Copy on First Access (COFA) provides an alternate method to create full volume copies. Unlike
Full Volume mirrors, the replica is immediately available when the session is started (no waiting
for full synchronization).
y The PIT is determined by the time of activation of the session. Just like the full volume
mirror technology this method requires the Target devices to be at least as large as the
source devices.
y A protection map is created for all the data on the Source device at some level of granularity
(e.g., 512 byte block, 32 KB, etc.). Then the data is copied from the source to the target in
the background based on the mode with which the replication session was invoked.

Business Continuity - 114


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Copy on First Access Mode: Deferred Mode


Write to Source
Read/Write Read/Write

Source Target

Write to Target
Read/Write Read/Write

Source Target

Read from Target


Read/Write Read/Write

Source Target
© 2006 EMC Corporation. All rights reserved. Business Continuity - 115

In the Copy on First Access mode (or the deferred mode), data is copied from the source to the
target only when:
y A write is issued for the first time after the PIT to a specific address on the source
y A read or write is issued for the first time after the PIT to a specific address on the target.
Since data is only copied when required, if the replication session is terminated the target device
will only have data that was copied (not the entire contents of the source at the PIT). In this
scenario, the data on the Target cannot be used as it is incomplete.

Business Continuity - 115


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Copy on First Access: Full Copy Mode


y On session start, the entire contents of the Source device
is copied to the Target device in the background
y Most vendor implementations provide the ability to track
changes:
– Made to the Source or Target
– Enables incremental re-synchronization

© 2006 EMC Corporation. All rights reserved. Business Continuity - 116

In Full Copy mode, the target is made available immediately and all the data from the source is
copied over to the target in the background.
y During this process, if a data block that has not yet been copied to the target is accessed, the
replication process will jump ahead and move the required data block first.
y When a full copy mode session is terminated (after full synchronization), the data on the
Target is still usable as it is a full copy of the original data.

Business Continuity - 116


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array: Pointer Based Copy on First Write


y Targets do not hold actual data, but hold pointers to
where the data is located
– Actual storage requirement for the replicas is usually a small fraction
of the size of the source volumes

y A replication session is setup between the Source and


Target devices and started
– When the session is setup based on the specific vendors
implementation a protection map is created for all the data on the
Source device at some level of granularity (e.g 512 byte block, 32
KB etc.)
– Target devices are accessible immediately when the session is
started
– At the start of the session the Target device holds pointers to the
data on the Source device
© 2006 EMC Corporation. All rights reserved. Business Continuity - 117

Unlike full volume replicas, the target devices for pointer based replicas only hold pointers to
the location of the data but not the data itself. When the copy session is started the target device
holds pointers to the data on the source device. The primary advantage of pointer based copies is
the reduction in storage requirement for the replicas.

Business Continuity - 117


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Pointer Based Copy on First Write Example

Target
Virtual Device

Source Save Location

© 2006 EMC Corporation. All rights reserved. Business Continuity - 118

The original data block from the Source is copied to the save location, when a data block is first
written to after the PIT.
y Prior to a new write to the source or target device:
− Data is copied from the source to a “save” location
− The pointer for that specific address on the Target then points to the “save” location
− Writes to the Target result in writes to the “save” location and the updating of the
pointer to the “save” location
y If a write is issued to the source for the first time after the PIT the original data block is
copied to the save location and the pointer is updated from the Source to the save
location.
y If a write is issued to the Target for the first time after the PIT the original data is copied
from the Source to the Save location, the pointer is updated and then the new data is
written to the save location.
y Reads from the Target are serviced by the Source device or from the save location based
on the where the pointer directs the read.
− Source – When data has not changed since PIT
− Save Location – When data has changed since PIT
Data on the replica is a combined view of unchanged data on the Source and the save location.
Hence if the Source device becomes unavailable the replica will no longer have valid data.

Business Continuity - 118


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Replicas: Tracking Changes


y Changes will/can occur to the Source/Target devices
after PIT has been created
y How and at what level of granularity should this be
tracked?
– Too expensive to track changes at a bit by bit level
¾ Would require an equivalent amount of storage to keep track of which bit
changed for each the source and the target
– Based on the vendor some level of granularity is chosen and a bit
map is created (one for Source and one for Target)
¾ One could choose 32 Kb as the granularity
¾ For a 1 GB device changes would be tracked for 32768 32Kb chunks
¾ If any change is made to any bit on one 32Kb chunk the whole chunk is
flagged as changed in the bit map
¾ 1 GB device map would only take up 32768/8/1024 = 4Kb space
© 2006 EMC Corporation. All rights reserved. Business Continuity - 119

It is too expensive to track changes at a bit by bit level because it would require an equivalent
amount of storage to keep track of which bit changed for each the source and the target. Some
level of granularity is chosen and a bit map is created (one for the Source and one for the Target).
The level of granularity is vendor specific.

Business Continuity - 119


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Replicas: How Changes Are Determined


Source 0 0 0 0 0 0 0 0
At PIT

Target 0 0 0 0 0 0 0 0

Source 1 0 0 1 0 1 0 0
After PIT…

Target 0 0 1 1 0 0 0 1

Resynch 1 0 1 1 0 1 0 1

0 = unchanged 1 = changed
© 2006 EMC Corporation. All rights reserved. Business Continuity - 120

Differential/incremental re-synchronization:
y The bitmaps for the source and target are all set to 0 at the PIT
y Any changes to the source or target after PIT are flagged by setting appropriate flag to 1 in
the bit map
y When a re-synchronization is required the two bitmaps are compared and only those chunks
that have either changed on the source or target are synchronized
y The benefit is that re-synchronization times are minimized.

Business Continuity - 120


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Replication: Multiple PITs


Target Devices

06:00 A.M.

Source
12:00 P.M.

Point-In-Time
06:00 P.M.

12:00 A.M.

: 12 : 01 : 02 : 03 : 04 : 05 : 06 : 07 : 08 : 09 : 10 : 11 : 12 : 01 : 02 : 03 : 04 : 05 : 06 : 07 : 08 : 09 : 10 : 11 :

A.M. P.M.

© 2006 EMC Corporation. All rights reserved. Business Continuity - 121

Most array based replication technologies will allow the Source devices to maintain replication
relationships with multiple Targets.
y This can also reduce RTO because the restore can be a differential restore.
y Each PIT could be used for a different BC activity and also as restore points.
In this example, a PIT is created every six hours from the same source. If any logical or physical
corruption occurs on the Source, the data can be recovered from the latest PIT and at worst the
RPO will be 6 hours.

Business Continuity - 121


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Replicas: Ensuring Consistency

Source Replica Source Replica

1 1 1

2 2 2

3 3 3 3

4 4 4 4

& Consistent
' Inconsistent

© 2006 EMC Corporation. All rights reserved. Business Continuity - 122

Most array based replication technologies will allow the creation of Consistent replicas by
holding I/O to all devices simultaneously when the PIT is created.
y Typically applications are spread out over multiple devices
− Could be on the same array or multiple arrays
y Replication technology must ensure that the PIT for the whole application is consistent
− Need mechanism to ensure that updates do not occur while PIT is created
y Hold I/O to all devices simultaneously for an instant, create PIT and release I/O
− Cannot hold I/O for too long, application will timeout

Business Continuity - 122


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Mechanisms to Hold I/O


y Host based
y Array based
y What if the application straddles multiple hosts and
multiple arrays?

© 2006 EMC Corporation. All rights reserved. Business Continuity - 123

Mechanisms to hold I/O:


y Host based
− Some host based application could be used to hold IO to all the array devices that are to
be replicated when the PIT is created
− Typically achieved at the device driver level or above before the I/O reaches the HBAs
¾ Some vendors implement this at the multi-pathing software layer
y Array based
− I/Os can be held for all the array devices that are to be replicated by the Array Operating
Environment in the array itself when the PIT is created
What if the application straddles multiple hosts and multiple arrays?
y Federated Databases
y Some array vendors are able to ensure consistency in this situation

Business Continuity - 123


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Replicas: Restore/Restart Considerations


y Production has a failure
– Logical Corruption
– Physical failure of production devices
– Failure of Production server

y Solution
– Restore data from replica to production
¾ The restore would typically be done in an incremental manner and the
Applications would be restarted even before the synchronization is
complete leading to very small RTO
-----OR------
– Start production on replica
¾ Resolve issues with production while continuing operations on replicas
¾ After issue resolution restore latest data on replica to production

© 2006 EMC Corporation. All rights reserved. Business Continuity - 124

Failures can occur in many different ways:


y There could be a logical corruption of the data on the production devices, the devices are
available but the data on them is corrupt. In this case, one would opt to restore the data to the
production from the latest replica.
y Production devices may become unavailable due to physical failures (Production server
down, physical drive failure etc.). In this case, one could start the production on the latest
replica and then while the production is being done from the replicas fix the physical
problems on the Production side. Once the situation has been resolved, the latest information
from the replica devices can be restored back to the production volumes.

Business Continuity - 124


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Replicas: Restore/Restart Considerations


y Before a Restore
– Stop all access to the Production devices and the Replica devices
– Identify Replica to be used for restore
¾ Based on RPO and Data Consistency
– Perform Restore

y Before starting production on Replica


– Stop all access to the Production devices and the Replica devices
– Identify Replica to be used for restart
¾ Based on RPO and Data Consistency
– Create a “Gold” copy of Replica
¾ As a precaution against further failures
– Start production on Replica

y RTO drives choice of replication technology


© 2006 EMC Corporation. All rights reserved. Business Continuity - 125

Based on the type of failure on has to choose to either perform a restore to the production
devices or to shift production operations to the replica devices. In either case the
recommendation would be to stop access to the production and replica devices, then identify the
replica that will be used for the restore or the restart operations.
The choice of replica depends on the consistency of the data on the replica and the desired RPO
(E.g., A business may create PIT replicas every 2 hours, if a failure occurs then at most only 2
hours of data would have been lost). If a replica has been written (application testing for
example) to after the creation of the PIT then this replica may not be a viable candidate for the
restore or restart.
Note: RTO is a key driver in the choice of replication technology. The ability to restore or
restart almost instantaneously after any failure is very important.

Business Continuity - 125


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Replicas: Restore Considerations


y Full Volume Replicas
– Restores can be performed to either the original source device or to
any other device of like size
¾ Restores to the original source could be incremental in nature
¾ Restore to a new device would involve a full synchronization

y Pointer Based Replicas


– Restores can be performed to the original source or to any other
device of like size as long as the original source device is healthy
¾ Target only has pointers
™ Pointers to source for data that has not been written to after PIT
™ Pointers to the “save” location for data was written after PIT
¾ Thus to perform a restore to an alternate volume the source must be
healthy to access data that has not yet been copied over to the target

© 2006 EMC Corporation. All rights reserved. Business Continuity - 126

With Full Volume replicas, all the data that was on the source device when the PIT was created
is available on the Replica (either will Full Volume Mirroring or with Full Volume Copies).
With Pointer Based Replicas and Full Volume Copies in deferred mode (COFA), access to all
the data on the Replica is dependent on the health (accessibility) of the original source volumes.
If the original source volume is inaccessible for any reason, pointer based or Full Volume Copy
on First Access replicas are of no use in either a restore or a restart scenario.

Business Continuity - 126


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Replicas: Which Technology?


y Full Volume Replica
– Replica is a full physical copy of the source device
– Storage requirement is identical to the source device
– Restore does not require a healthy source device
– Activity on replica will have no performance impact on the source
device
– Good for full backup, decision support, development, testing and
restore to last PIT
– RPO depends on when the last PIT was created
– RTO is extremely small

© 2006 EMC Corporation. All rights reserved. Business Continuity - 127

Full Volume replicas have a number of advantages over pointer based (COFA) technologies.
y The replica has the entire contents of the original source device from the PIT and any
activity to the replica will have no performance impact on the source device (there is no
COFA or COFW penalty).
y Full Volume replicas can be used for any BC activity.
y The only disadvantage is that the storage requirements for the replica are at least equal to
that of the source devices.

Business Continuity - 127


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Replicas: Which Technology? (continued)


y Pointer based - COFW
– Replica contains pointers to data
¾ Storage requirement is a fraction of the source device (lower cost)
– Restore requires a healthy source device
– Activity on replica will have some performance impact on source
¾ Any first write to the source or target will require data to be copied to the
save location and move pointer to save location
¾ Any read IO to data not in the save location will have to be serviced by
the source device
– Typically recommended if the changes to the source are less than
30%
– RPO depends on when the last PIT was created
– RTO is extremely small

© 2006 EMC Corporation. All rights reserved. Business Continuity - 128

The main benefit of Pointer based copies is the lower storage requirement for the replicas. This
technology is also very useful when the changes to the source are expected to be less that 30%
after the PIT has been created. Heavy activity on the Target devices may cause performance
impact on the source because any first writes to the target will require data to be copied from the
source to the save location, also any reads which are not in the save area will have to be read
from the source device. The source device needs to be accessible for any restart or restore
operations from the Target.

Business Continuity - 128


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Replicas: Which Technology? (continued)


y Full Volume – COFA Replicas
– Replica only has data that was accessed
– Restore requires a healthy source device
– Activity on replica will have some performance impact
¾ Any first access on target will require data to be copied to target before
the I/O to/from target can be satisfied
– Typically replicas created with COFA only are not as useful as
replicas created with the full copy mode – Recommendation would
be to use the full copy mode it the technology allows such an option

© 2006 EMC Corporation. All rights reserved. Business Continuity - 129

The COFA technology requires at least the same amount of storage as the source. The
disadvantages of the COFW penalty and the fact that the replica would be of no use if the source
volume were inaccessible make this technology less desirable. In general this technology should
only be recommended if a full copy mode is available. If a full copy mode is available then one
should always use the full copy mode and then the advantages are identical to that discussed for
Full Volume replicas.

Business Continuity - 129


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Replicas: Full Volume vs. Pointer Based

Full Volume Pointer Based

Required Storage 100% of Source Fraction of Source


Performance Impact None Some
RTO Very small Very small
Restore Source need not be Requires a healthy
healthy source device
Data change No limits < 30%

© 2006 EMC Corporation. All rights reserved. Business Continuity - 130

This table summarizes the differences between Full Volume and Pointer Base replication
technologies.

Business Continuity - 130


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Module Summary
Key points covered in this module:
y Replicas and the possible use of Replicas
y Consistency considerations when replicating File
Systems and Databases
y Host and Array based Replication Technologies
– Advantages/Disadvantages
– Differences
– Considerations
– Selecting the appropriate technology

© 2006 EMC Corporation. All rights reserved. Business Continuity - 131

These are the key points covered in this module. Please take a moment to review them.

Business Continuity - 131


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Apply Your Knowledge…


Upon completion of this topic, you will be able to:

y List EMC’s Local Replication Solutions for the


Symmetrix and CLARiiON arrays
y Describe EMC’s TimeFinder/Mirror Replication Solution
y Describe EMC’s SnapView - Snapshot Replication
Solution

© 2006 EMC Corporation. All rights reserved. Business Continuity - 132

Business Continuity - 132


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

EMC – Local Replication Solutions


y EMC Symmetrix Arrays
– EMC TimeFinder/Mirror
¾ Full volume mirroring
– EMC TimeFinder/Clone
¾ Full volume replication
– EMC TimeFinder/SNAP
¾ Pointer based replication

y EMC CLARiiON Arrays


– EMC SnapView Clone
¾ Full volume replication
– EMC SnapView Snapshot
¾ Pointer based replication

© 2006 EMC Corporation. All rights reserved. Business Continuity - 133

All the local replication solutions that were discussed in this module are available on EMC Symmetrix
and CLARiiON arrays.
y EMC TimeFinder/Mirror and EMC TimeFinder/Clone are full volume replication solutions on the
Symmetrix arrays, while EMC TimeFinder/Snap is a pointer based replication solution on the
Symmetrix. EMC SnapView on the CLARiiON arrays allows full volume replication via SnapView
Clone and pointer based replication via SnapView Snapshot.
y EMC TimeFinder/Mirror: Highly available, ultra-performance mirror images of Symmetrix volumes
that can be non-disruptively split off and used as point-in-time copies for backups, restores, decision
support, or contingency uses.
y EMC TimeFinder/Clone: Highly functional, high-performance, full volume copies of Symmetrix
volumes that can be used as point-in-time copies for data warehouse refreshes, backups, online
restores, and volume migrations.
y EMC SnapView Clone: Highly functional, high-performance, full volume copies of CLARiiON
volumes that can be used as point-in-time copies for data warehouse refreshes, backups, online
restores, and volume migrations.
y EMC TimeFinder/Snap: High function, space-saving, pointer-based copies (logical images) of
Symmetrix volumes that can be used for fast and efficient disk-based restores.
y EMC SnapView Snapshot: High function, space-saving, pointer-based copies (logical images) of
CLARiiON volumes that can be used for fast and efficient disk-based restores.
We will discuss EMC TimeFinder/Mirror and EMC SnapView Snapshot in more detail in the next few
slides.

Business Continuity - 133


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

EMC TimeFinder/Mirror - Introduction


y Array based local replication technology for Full Volume Mirroring on EMC
Symmetrix Storage Arrays
– Create Full Volume Mirrors of an EMC Symmetrix device within an Array
y TimeFinder/Mirror uses special Symmetrix devices called Business
Continuance Volumes (BCV). BCVs:
– Are devices dedicated for Local Replication
– Can be dynamically, non-disruptively established with a Standard device. They
can be subsequently split instantly to create a PIT copy of data.
y The PIT copy of data can be used in a number of ways:
– Instant restore – Use BCVs as standby data for recovery
– Decision Support operations
– Backup – Reduce application downtime to a minimum (offline backup)
– Testing
y TimeFinder/Mirror is available in both Open Systems and Mainframe
environments
© 2006 EMC Corporation. All rights reserved. Business Continuity - 134

EMC TimeFinder/Mirror is an array based local replication technology for Full Volume
Mirroring on EMC Symmetrix Storage Arrays.
• TimeFinder/Mirror Business Continuance Volumes (BCV) are devices dedicated to local
replication.
• The BCVs are typically established with a standard Symmetrix device to create a Full
Volume Mirror.
• After the data has been synchronized the BCV can be “split” from its source device and be
used for any BC task. TimeFinder controls available on Open Systems and Mainframe
environments.

Business Continuity - 134


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

EMC TimeFinder/Mirror – Operations


y Establish
– Synchronize the Standard volume to the BCV
volume
– BCV is set to a Not Ready state when
established
¾ BCV cannot be independently addressed STD BCV

– Re-synchronization is incremental
Establish
– BCVs cannot be established to other BCVs Incremental Establish
– Establish operation is non-disruptive to the
Standard device
– Operations to the Standard can proceed as
normal during the establish

© 2006 EMC Corporation. All rights reserved. Business Continuity - 135

The TimeFinder Establish operation is the first step in creating a TimeFinder/Mirror replica. The
purpose of the establish operation is to Synchronize the contents from the Standard device to the
BCV. The first time a BCV is established with a standard device a full synchronization has to be
performed. Any future re-synchronization can be incremental in nature. The Symmetrix
microcode can keep track of changes made to either the Standard or the BCV.
• The Establish is a non-disruptive operation to the Standard device. I/O to Standard devices
can proceed during establish. Applications need not be quiesced during the establish
operation.
• The Establish operation will set a “Not Ready” status on the BCV device. Hence all I/O to
the BCV device must be stopped before the Establish operation is performed. Since BCVs
are dedicated replication devices a BCV cannot be established with another BCV.

Business Continuity - 135


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

EMC TimeFinder/Mirror – Operations …


y Split
– Time of Split is the Point-in-Time
– BCV is made accessible for BC Operations
– Consistency
¾ Consistent Split
– Changes tracked STD BCV

Split

© 2006 EMC Corporation. All rights reserved. Business Continuity - 136

The Point in Time of the replica is tied to the time when the Split operation is executed.
The Split operation separates the BCV from the Standard Symmetrix device and makes the BCV
device available for host access through its own device address. After the split operation
changes made to the Standard or BCV devices are tracked by the Symmetrix Microcode. EMC
TimeFinder/Mirror ensures Consistency of data on the BCV devices via the Consistent Split
option.

Business Continuity - 136


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

EMC TimeFinder/Mirror Consistent Split


EMC PowerPath Enginuity Consistency Assist

Host

STD BCV STD BCV

ƒ PowerPath is an EMC host based multi- ƒ Symmetrix Microcode holds I/O during
pathing software TimeFinder/Mirror Split
ƒ PowerPath holds I/O during - Write I/O (subsequent reads after first
TimeFinder/Mirror Split write)
-Read and write I/O

© 2006 EMC Corporation. All rights reserved. Business Continuity - 137

The TimeFinder/Mirror Consistent Split option ensures that the data on the BCVs is consistent
with the data on the Standard devices. Consistent Split holds I/O across a group of devices using
a single Consistent Split command, thus all the BCVs in the group are consistent point-in-time
copies. Used to create a consistent point-in-time copy of an entire system, an entire database, or
any associated set of volumes.
The holding of I/Os can be either done by the EMC PowerPath multi-pathing software or by the
Symmetrix Microcode (Enginuity Consistency Assist). PowerPath-based consistent split
executed by the host doing the I/O, I/O is held at the host before the split.
Enginuity Consistency Assist (ECA) based consistent split can be executed, by the host doing
the I/O or by a control host in an environment where there are distributed and/or related
databases. I/O held at the Symmetrix until the split operation is completed. Since I/O is held at
the Symmetrix, ECA can be used to perform consistent splits on BCV pairs across multiple,
heterogeneous hosts.

Business Continuity - 137


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

EMC TimeFinder/Mirror – Operations …


y Restore
– Synchronize contents of BCV volume to the
Standard volume
STD BCV
– Restore can be full or incremental
– BCV is set to a Not Ready state
– I/Os to the Standard and BCVs should be
stopped before the restore is initiated Incremental Restore

y Query
– Provide current status of BCV/Standard volume
pairs

© 2006 EMC Corporation. All rights reserved. Business Continuity - 138

The purpose of the restore operation is to synchronize the data on the BCVs from a prior Point
in Time to the Standard devices. Restore is a recovery operation, hence all I/O’s to the Standard
device should be stopped and the device must be taken offline prior to a restore operation. The
restore will set the BCV device to a Not-Ready state, thus all I/O’s to the BCV devices must be
stopped and the devices must be offline before issuing the restore command.
Operations on the Standard volumes can resume as soon as the restore operation is initiated,
while the synchronization of the Standards from the BCV is still in progress.
The query operation is used to provide current status of Standard/BCV volume pairs.

Business Continuity - 138


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

EMC TimeFinder/Mirror Multi-BCVs


y Standard device keeps track of changes to multiple BCVs
one after the other
y Incremental establish or restore
Incremental establish
BCV 2:00 a.m.

Establish Standard
Split volume
or
Standard BCV 4:00 a.m.
BCV 4:00 a.m.
volume

Establish
Split Incremental restore

BCV 6:00 a.m.


© 2006 EMC Corporation. All rights reserved. Business Continuity - 139

TimeFinder/Mirror allows a given Standard device to maintain incremental relationships with


multiple BCVs.
This means that different BCVs can be established and then split incrementally from a standard
volume at different times of the day. For example a BCV that was split at 4:00 a.m. can be re-
established incrementally even though another BCV was established and split at 5:00 a.m. In
this way, a user can split and incrementally re-establish volumes throughout the day or night and
still keep re-establish times to a minimum.
Incremental information can be retained between a STD device and multiple BCV devices,
provided the BCV devices have not been paired with different STD devices.
The incremental relationship is maintained between each STD/BCV pairing by the Symmetrix
Microcode.

Business Continuity - 139


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

TimeFinder/Mirror Concurrent BCVs


y Two BCVs can be established
concurrently with the same
Standard device
y Establish BCVs simultaneously or
one after the other BCV1

y BCVs can be split individually or Standard


simultaneously.
y Simultaneous. “Concurrent
BCV2
Restores”, are not allowed

© 2006 EMC Corporation. All rights reserved. Business Continuity - 140

Concurrent BCVs is a TimeFinder/Mirror feature that allows two BCVs to be simultaneously


attached to a standard volume. The BCV pair can be split, providing customers with two copies
of the customer’s data. Each BCV can be mounted online and made available for processing.

Business Continuity - 140


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

EMC CLARiiON SnapView - Snapshots


y SnapView allows full copies and pointer-based copies
– Full copies – Clones (sometimes called BCVs)
– Pointer-based copies – Snapshots

y Because they are pointer-based, Snapshots


– Use less space than a full copy
– Require a ‘save area’ to be provisioned
– May impact the performance of the LUN they are associated with

y The ‘save area’ is called the ‘Reserved LUN Pool’


y The Reserved LUN Pool
– Consists of private LUNs (LUNs not visible to a host)
– Must be provisioned before Snapshots can be made

© 2006 EMC Corporation. All rights reserved. Business Continuity - 141

SnapView is software that runs on the CLARiiON Storage Processors, and is part of the
CLARiiON Replication Software suite of products, which includes SnapView, MirrorView and
SAN Copy.
SnapView can be used to make point in time (PIT) copies in 2 different ways – Clones, also
called BCVs or Business Continuity Volumes, are full copies, whereas Snapshots use a pointer-
based mechanism. Full copies are covered later, when we look at Symmetrix TimeFinder;
SnapView Snapshots will be covered here.
The generic pointer-based mechanism has been discussed in a previous section, so we’ll
concentrate on SnapView here.
Snapshots require a save area, called the Reserved LUN Pool. The ‘Reserved’ part of the name
implies that the LUNs are reserved for use by CLARiiON software, and can therefore not be
assigned to a host. LUNs which cannot be assigned to a host are known as private LUNs in the
CLARiiON environment.
To keep the number of pointers, and therefore the pointer map, at a reasonable size, SnapView
divides the LUN to be snapped, called a Source LUN, into areas of 64 kB in size. Each of these
areas is known as a chunk. Any change to data inside a chunk will cause that chunk to be written
to the Reserved LUN Pool, if it is being modified for the first time. The 64 kB copied from the
Source LUN must fit into a 64 kB area in the Reserved LUN, so Reserved LUNs are also
divided into chunks for tracking purposes.
The next 2 slides show more detail on the Reserved LUN Pool, and allocation of Reserved
LUNs to a Source LUN.

Business Continuity - 141


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

The Reserved LUN Pool

Reserved LUN
Pool

FLARE LUN 5 Private LUN 5

FLARE LUN 6 Private LUN 6

FLARE LUN 7 Private LUN 7

FLARE LUN 8 Private LUN 8

© 2006 EMC Corporation. All rights reserved. Business Continuity - 142

The CLARiiON storage system must be configured with a Reserved LUN Pool in order to use
SnapView Snapshot features. The Reserved LUN Pool consists of 2 parts: LUNs for use by
SPA and LUNs for use by SPB. Each of those parts is made up of one or more Reserved LUNs.
The LUNs used are bound in the normal manner. However, they are not placed in storage groups
and allocated to hosts, they are used internally by the storage system software. These are known
as private LUNs because they cannot be used, or seen, by attached hosts.
Like any LUN, a Reserved LUN will be owned by only one SP at any time and they may be
trespassed if the need should arise (i.e., if an SP should fail).
Just as each storage system model has a maximum number of LUNs it will support, each also
has a maximum number of LUNs which may be added to the Reserved LUN Pool.
The first step in SnapView configuration will usually be the assignment of LUNs to the
Reserved LUN Pool. Only then will SnapView Sessions be allowed to start. Remember that as
snapable LUNs are added to the storage system, the LUN Pool size will have to be reviewed.
Changes may be made online.
LUNs used in the Reserved LUN Pool are not host-visible, though they do count towards the
maximum number of LUNs allowed on a storage system.

Business Continuity - 142


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Reserved LUN Allocation

Reserved LUN
Pool
Source LUNs
Snapshot 1a Session 1a
Private LUN 5
LUN 1 Snapshot 1b Session 1b

Private LUN 6

Private LUN 7

Private LUN 8

LUN 2 Snapshot 2a Session 2a

© 2006 EMC Corporation. All rights reserved. Business Continuity - 143

In this example, LUN 1 and LUN 2 have been changed to Source LUNs by the creation of one
or more Snapshots on each. Three Sessions will be started on those Source LUNs. Once a
Session starts, the SnapView mechanism tracks changes to the LUN and Reserved LUN Pool
space will be required. In this example, the following occurs:
y Session 1a is started on Snapshot 1a.
y Private LUN 5 in the Reserved LUN Pool is immediately allocated to Source LUN 1, and
changes made to that Source LUN are placed in Private LUN 5.
y A second Session, Session 1b, is started on Snapshot 1b, and changes to the Source LUN are
still saved in Private LUN 5.
y When PL 5 fills up, SnapView allocates the next available LUN, Private LUN 6, to Source
LUN 1, and the process continues.
y Sessions 1a and 1b are now storing information in PL 6.
y A Session is then started on Source LUN 2, and Private LUN 7 – a new LUN, since Source
LUNs cannot share a Private LUN - is allocated to it.
y Once that LUN fills, Private LUN 8 will be allocated.
y If all private LUNs have been allocated, and Session 1b causes Private LUN 6 to become
full, then Session 1b will be terminated by SnapView without warning. SnapView does
notify the user in the SP Event Log, and, if Event Monitor is active, in other ways, that the
Reserved LUN Pool is filling up. This notification allows ample time to correct the
condition. Notification takes place when the Reserved LUN Pool is 50% full, then again at
75%, and every 5% thereafter.
Business Continuity - 143
Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

SnapView Terms
y Snapshot
– The ‘virtual LUN’ seen by a secondary host
– Made up of data on the Source LUN and data in the RLP
– Visible to the host (online) if associated with a Session

y Session
– The mechanism that tracks the changes
– Maintains the pointers and the map
– Represents the point in time

y Activate and deactivate a Snapshot


– Associate and disassociate a Session with a Snapshot

y Roll back
– Copy data from a (typically earlier) Session to the Source LUN

© 2006 EMC Corporation. All rights reserved. Business Continuity - 144

Let’s use an analogy to make the distinction easier to understand. We’ll compare this technology
to CD technology.
You can own a CD player, but have no CDs. Similarly, You can own CDs, but not have a player.
CDs are only useful if you can listen to them; also, you can only listen to one at a time on a
player, no matter how many CDs I own.
In the same way, a Session (the CD) is a point in time copy of data on a LUN. The exact time is
determined by the time at which I start the Session.
The Snapshot (the CD player in our analogy) allows us to view the Session data (listen to the
CD)
The sequence of slides that follows will demonstrate the COFW process and the rollback
process.

Business Continuity - 144


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

COFW and Reads from Snapshot

Chunk Chunk Chunk Chunk Chunk


0’
0 1 2 3’’
3’
3 4

Primary Host Source LUN

Secondary Snapshot
Host Map

SnapView Chunk Chunk


Map 3 0

Reserved LUN SP memory


© 2006 EMC Corporation. All rights reserved. Business Continuity - 145

To use SnapView Snapshots, a Reserved LUN Pool must be created. It must have enough space
available to hold all the original chunks on the Source LUN that we are likely to change while
the Session is active.
This slide demonstrates the COFW process, invoked when a host changes a Source LUN chunk
for the first time. The original chunk is copied to the Reserved LUN Pool and pointers are
updated to indicate that the chunk is now present in the Reserved LUN Pool. The map in SP
memory and the map on disk in a persistent session, will also be updated.
Once a chunk has been copied to the Reserved LUN Pool, further changes made to that chunk
on the Source LUN (for the specific Session) do not initiate any COFW operations for that
Session.
If the secondary host requests a read, SnapView first determines whether the required data is on
the Source LUN (i.e. has not been modified since the Session started), or in the Reserved LUN
Pool, and fetches it from the relevant location. Examples of both types of read are shown here.

Business Continuity - 145


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Writes to Snapshot

Chunk Chunk Chunk Chunk Chunk


0’ 1 2 3’’ 4

Primary Host Source LUN

Secondary Snapshot
Host Map

SnapView Chunk Chunk Chunk Chunk Chunk


Map 3 0*
0 0 2*
2 2

Reserved LUN SP memory

© 2006 EMC Corporation. All rights reserved. Business Continuity - 146

SnapView Snapshots are writeable by the secondary host. This example shows 2 different write
operations: one where the chunk being updated has already been copied into the Reserved LUN
Pool, and the other where it has not yet been copied.
y In the first example, the secondary host writes to a chunk which has already been copied into
the Reserved LUN Pool. SnapView needs to keep an original copy in the Reserved LUN
Pool (so as to make recovery of the original point in time view possible) and duplicates the
chunk. The secondary host then modifies its copy of the chunk. The maps and pointers are
updated to reflect the changes.
y In the second example, the chunk has not yet been modified by the primary host, so is not
yet in the Reserved LUN Pool. SnapView copies the chunk from the Source LUN to the
Reserved LUN Pool and makes an additional copy. The copy visible to the secondary host is
then modified by the write. The maps and pointers are updated.

Business Continuity - 146


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Rollback - Snapshot Active (preserve changes)

Chunk Chunk Chunk Chunk Chunk


0’
0* 1 2*
2 3’’
3 4

Primary Host Source LUN

Secondary Snapshot
Host Map

SnapView Chunk Chunk Chunk Chunk Chunk


Map 3 0* 0 2* 2

Reserved LUN SP memory

© 2006 EMC Corporation. All rights reserved. Business Continuity - 147

SnapView rollback allows a Source LUN to be returned to its state at a previously defined point
in time. When performing the rollback, you can choose to preserve or discard any changes made
by the secondary host. In this first example, changes are preserved. Meaning that the state of the
Source LUN at the end of the rollback process will be identical to the Snapshot, as it appears
now.
All chunks that are in the Reserved LUN Pool are copied over the corresponding chunks on the
Source LUN. Before this process starts, it will be necessary to take the Source LUN offline (we
are changing the data structure without the knowledge of the host operating system, and it needs
to refresh its view of that structure). If this step is not performed, data corruption could occur on
the Source LUN.
Note: No changes are made to the Snapshot or to the Reserved LUN Pool when this process
takes place.

Business Continuity - 147


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Rollback - Snapshot Deactivated (discard changes)

Chunk Chunk Chunk Chunk Chunk


0’
0 1 2 3’’
3 4

Primary Host Source LUN

Secondary Snapshot
Host Map

SnapView Chunk Chunk Chunk Chunk Chunk


Map 3 0* 0 2* 2

Reserved LUN SP memory

© 2006 EMC Corporation. All rights reserved. Business Continuity - 148

In this example, all changes that have been made to the Snapshot by the secondary host are
discarded, and return the Source LUN to the state it was in when the session was started (the
original PIT view). To do this, the Snapshot needs to be deactivated. Deactivating the Snapshot
discards all changes made by the secondary host, and frees up areas of the Reserved LUN Pool
which were holding those changes. It also makes the Snapshot unavailable to the secondary host.
Once the deactivation has completed, the rollback process can be started. At this point, the
Source LUN needs to be taken offline. The Source LUN is then returned to its original state at
the time the session was started.

Business Continuity - 148


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Remote Replication
After completing this module, you will be able to:
y Explain Remote Replication Concepts
– Synchronous/Asynchronous
– Connectivity Options

y Discuss Host and Array based Remote Replication


Technologies
– Functionality
– Differences
– Considerations
– Selecting the appropriate technology

© 2006 EMC Corporation. All rights reserved. Business Continuity - 149

This module introduces the challenges and solutions for remote replication and describes two
possible implementations.

Business Continuity - 149


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Remote Replication Concepts


y Replica is available at a remote facility
– Could be a few miles away or half way around the world
– Backup and Vaulting are not considered remote replication

y Synchronous Replication
– Replica is identical to source at all times – Zero RPO

y Asynchronous Replication
– Replica is behind the source by a finite margin – Small RPO

y Connectivity
– Network infrastructure over which data is transported from source
site to remote site

© 2006 EMC Corporation. All rights reserved. Business Continuity - 150

The Replication concepts/considerations that were discussed for Local Replication apply to
Remote Replication as well. We will explore the concepts that are unique to Remote replication.
Synchronous and Asynchronous replication concepts and considerations will be explained in
more detail in the next few slides.
Data has to be transferred from the source site to a remote site over some network – This can be
done over IP networks, over the SAN, using DWDM (Dense Wave Division Multiplexing) or
SONET (Synchronous Optical Network) etc. We will discuss the various options later in the
module.

Business Continuity - 150


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Synchronous Replication
y A write has to be secured on the
remote replica and the source before
it is acknowledged to the host Disk

y Ensures that the source and remote 1


replica have identical data at all times
– Write ordering is maintained at all times 4
Server
¾ Replica receives writes in exactly the 2
Data Write 3
same order as the source
Data Acknowledgement

y Synchronous replication provides the


lowest RPO and RTO
– Goal is zero RPO Disk

– RTO is as small as the time it takes to


start application on the remote site

© 2006 EMC Corporation. All rights reserved. Business Continuity - 151

Synchronous – Data is committed at both the source site and the remote site before the write is
acknowledged to the host. Any write to the source must be transmitted to and acknowledged by
the remote before signaling a write complete to the host. Additional writes cannot occur until
each preceding write has been completed and acknowledged. Ensures that data at both sites are
identical at all times.

Business Continuity - 151


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Synchronous Replication
y Response Time Extension
– Application response time will be
extended due to synchronous
Max
replication
¾ Data must be transmitted to remote
site before write can be acknowledged Writes
MB/s
¾ Time to transmit will depend on
distance and bandwidth

y Bandwidth Average

– To minimize impact on response


Time
time, sufficient bandwidth must be
provided for at all times

y Rarely deployed beyond 200 km

© 2006 EMC Corporation. All rights reserved. Business Continuity - 152

• Applications response times will be extended with any kind of Synchronous Replication, this
is due to the fact that any write to source must be transmitted to and acknowledged by remote
before signaling write complete to the host. The response time depends on the distance
between sites, available bandwidth and the network connectivity infrastructure.
• The longer the distance the more the response time – Speed of light is finite – every 200 Km
(125 miles) will add 1ms to the response time.
• Insufficient bandwidth will also cause response time elongation. With Synchronous replication
one should have sufficient bandwidth all the time. The picture on the slide shows the amount
of data that has to replicated as a function of time. To minimize the response time elongation
one must ensure that the Max bandwidth is provided by the network at all times. If we assume
that only the average bandwidth is provided for then there will be times during the day (the
shaded section) when response times may be unduly elongated causing applications to time
out.
• The distances over which Synchronous replication can be deployed really depends on an
applications ability to tolerate the extension in response time. It is rarely deployed for
distances greater than 200 Km (125 miles).

Business Continuity - 152


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Asynchronous Replication
y Write is acknowledged to host
as soon as it is received by the
source
Disk

y Data is buffered and sent to 1


remote
– Some vendors maintain write ordering 2
Server
– Other vendors do not maintain write 3
Data Write 4
ordering, but ensure that the replica will
Data Acknowledgement
always be a consistent re-startable
image

y Finite RPO
Disk
– Replica will be behind the Source by
a finite amount
– Typically configurable

© 2006 EMC Corporation. All rights reserved. Business Continuity - 153

Asynchronous - Data is committed at the source site and the acknowledgement is sent to the
host. The Data is buffered and then forwarded to the remote site as the network capabilities
permit. The Data at the remote site will be behind the source by a finite RPO, typically the RPO
would be a configurable value.
The primary benefit of Asynchronous replication is that there is no response time elongation.
Asynchronous replications are typically deployed over extended distances. The response time
benefit is offset by the Finite RPO.

Business Continuity - 153


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Asynchronous Replication
y Response Time unaffected
y Bandwidth
– Need sufficient bandwidth on average Max

y Buffers Writes
MB/s
– Need sufficient buffers

y Can be deployed over long distances


Average

Time

© 2006 EMC Corporation. All rights reserved. Business Continuity - 154

Extended distances can be achieved with Asynchronous replication because there is no impact
on the application response time. Data is buffered and then sent to the remote site. The available
bandwidth should be at least equal to the average write workload. Data will be buffered during
times when the bandwidth is not enough, thus sufficient buffers should be designed into the
solution.

Business Continuity - 154


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Remote Replication Technologies


y Host based
– Logical Volume Manager (LVM)
¾ Synchronous/Asynchronous
– Log Shipping

y Storage Array based


– Synchronous
– Asynchronous
– Disk Buffered - Consistent PITs
¾ Combination of Local and Remote Replication

© 2006 EMC Corporation. All rights reserved. Business Continuity - 155

In the context of our discussion Remote Replication refers to replication that is done between
data centers if it is host based and between Storage arrays if it is array based.
Host based implies that all the replication is done by using the CPU resources of the host using
software that is running on the host. Array based implies that all replication is done between
Storage Arrays and is handled by the Array Operating Environment.
We will discuss each of the technologies listed in turn.

Business Continuity - 155


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

LVM Based Remote Replication

y Duplicate Volume Groups at local and remote sites


y All writes to the source Volume Group are replicated to
the remote Volume Group by the LVM
– Synchronous or Asynchronous

Log Log

Physical Physical Physical Physical Physical Physical


Volume 1 Volume 2 Volume 3 Volume 1 Volume 2 Volume 3

Network
Volume Group Volume Group

Local Site Remote Site

© 2006 EMC Corporation. All rights reserved. Business Continuity - 156

Some LVM vendors provide remote replication at the Volume Group level
Duplicate Volume Groups need to exist at both the local and remote sites before replication
starts
y This can be achieved in a number of ways – Over IP, Tape backup/restore etc.
All writes to the source Volume Group are replicated to the remote Volume Group by the LVM
y Typically the writes are queued in a log file and sent to the remote site in the order received
over a standard IP network
y Can be done synchronously or asynchronously
y Synchronous – Write must be received by remote before the write is acknowledged locally
to the host
y Asynchronous – Write is acknowledged immediately to the local host and queued and sent in
order

Business Continuity - 156


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

LVM Based Remote Replication


y In the event of a network failure
– Writes are queued in the log file
– When the issue is resolved the queued writes are sent over to the
remote
– The maximum size of the log file determines the length of outage
that can be withstood

y In the event of a failure at the source site, production


operations can be transferred to the remote site

© 2006 EMC Corporation. All rights reserved. Business Continuity - 157

Production work can continue at the source site if there is a network failure, the writes that need
to replicated will be queued in the log file and sent over to the remote site when the network
issue is resolved. If the log files fill up before the network outage is resolved a complete
resynchronization of the remote site would have to be performed. Thus the size of the log file
determine the length of network outage that can be tolerated.
In the event of a failure at the source site (e.g. server crash, site wide disaster), production
operations can be resumed at the remote site with the remote replica. The exact steps that need
to performed to achieve this depend on the LVM that is in use.

Business Continuity - 157


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

LVM Based Remote Replication


y Advantages
– Different storage arrays and RAID protection can be used at the
source and remote sites
– Standard IP network can be used for replication
– Response time issue can be eliminated with asynchronous mode,
with extended RPO

y Disadvantages
– Extended network outages require large log files
– CPU overhead on host
¾ For maintaining and shipping log files

© 2006 EMC Corporation. All rights reserved. Business Continuity - 158

A significant advantage of using LVM based remote replication is the fact that storage arrays
from different vendors can be used at the two sites. E.g. At the production site a high-end array
could be used while at the remote site a second tier array could be used. In a similar manner the
RAID protection at the two sites could be different as well.
Most of the LVM based remote replication technologies allow the use of standard IP networks
that are already in place, eliminating the need for a dedicated network. Asynchronous mode
supported by many LVMs eliminates the response time issue of synchronous mode while
extending the RPO.
Log files need to be configured appropriately to support extended network outages. Host based
replication technologies use host CPU cycles.

Business Continuity - 158


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Host Based Log Shipping


Logs
IP Network

Original

y Offered by most DB Vendors


y Advantages
– Minimal CPU overhead
– Low bandwidth Logs
– Standby Database consistent
to last applied log
Stand By

© 2006 EMC Corporation. All rights reserved. Business Continuity - 159

Log Shipping is a host based replication technology for databases offered by most DB Vendors
y Initial State - All the relevant storage components that make up the database are replicated to
a standby server (done over IP or other means) while the database is shutdown
y Database is started on the production server – As and when log switches occur the log file
that was closed is sent over IP to the standby server
y Database is started in standby mode on the standby server, as and when log files arrive they
are applied to the standby database
y Standby database is consistent up to the last log file that was applied
Advantages
y Minimal CPU overhead on production server
y Low bandwidth (IP) requirement
y Standby Database consistent to last applied log
− RPO can be reduced by controlling log switching
Disadvantages
y Need host based mechanism on production server to periodically ship logs
y Need host based mechanism on standby server to periodically apply logs and check for
consistency
y IP network outage could lead to standby database falling further behind

Business Continuity - 159


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Based – Remote Replication


y Replication performed by the Array Operating
Environment
– Host CPU resources can be devoted to production operations
instead of replication operations
– Arrays communicate with each other via dedicated channels
¾ ESCON, Fibre Channel or Gigabit Ethernet

y Replicas are on different arrays


– Primarily used for DR purposes
– Can also be used for other BC operations
Production Array Remote Array

Network
Production Source
Server
Distance Replica DR Server

© 2006 EMC Corporation. All rights reserved. Business Continuity - 160

Replication Process
y A Write is initiated by an application/server
y Received by the source array
y Source array transmits the write to the remote array via dedicated channels (ESCON, Fibre
Channel or Gigabit Ethernet) over a dedicated or shared network infrastructure
y Write received by the remote array
Only Writes are forwarded to the remote array
y Reads are from the source devices

Business Continuity - 160


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Based – Synchronous Replication

Network links

Source Target

Write is received by the source array from host/server


Write is transmitted by source array to the remote array
Remote array sends acknowledgement to the source
array
Source array signals write complete to host/server

© 2006 EMC Corporation. All rights reserved. Business Continuity - 161

Synchronous Replication ensures that the replica and source have identical data at all times. The
source array issues the write complete to the host/server only when the write has been received
both at the remote array and the source array. Thus when the write complete is sent the Replica
and Source are identical.
The sequence of operations is:
y Write is received by the source array from host/server.
y Write is transmitted by source array to the remote array.
y Remote array sends acknowledgement to the source array.
y Source array signals write complete to host/server.

Business Continuity - 161


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Based – Asynchronous Replication

Network links

Source Target
Write is received by the source array from host/server
Source array signals write complete to host/server
Write is transmitted by source array to the remote array
Remote array sends acknowledgement to the source
array
y No impact on response time
y Extended distances between arrays
y Lower bandwidth as compared to Synchronous
© 2006 EMC Corporation. All rights reserved. Business Continuity - 162

Applications do not suffer any response time elongation with Asynchronous replication, because
any write is acknowledged to the host as soon as the write is received by the source array. Thus
asynchronous replication can be used for extended distances. Bandwidth requirements for
Asynchronous will be lower than Synchronous for the same workload. Vendors ensure data
consistency in different ways

Business Continuity - 162


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Based – Asynchronous Replication


y Ensuring Consistency
– Maintain write ordering
¾ Some vendors attach a time stamp and sequence number with each of
the writes, then ship the writes to the remote array and apply the writes
to the remote devices in the exact order based on the time stamp and
sequence numbers
¾ Remote array applies the writes in the exact order they were received,
just like synchronous
– Dependent write consistency
¾ Some vendors buffer the writes in the cache of the source array for a
period of time (between 5 and 30 seconds)
¾ At the end of this time the current buffer is closed in a consistent manner
and the buffer is switched, new writes are received in the new buffer
¾ The closed buffer is then transmitted to the remote array
¾ Remote replica will contain a consistent, re-startable image on the
application

© 2006 EMC Corporation. All rights reserved. Business Continuity - 163

The data on the remote replicas will be behind the source by a finite amount in Asynchronous
replication, thus steps must be taken to ensure consistency. Some vendors achieve consistency
by maintaining write ordering, i.e. the remote array applies writes to the replica devices in the
exact order that they were received at the source. Other vendors leverage the dependent write
I/O logic that is built into most databases and applications.

Business Continuity - 163


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array based – Disk Buffered Consistent PITs


y Local and Remote replication technologies can be
combined to create consistent PIT copies of data on
remote arrays
y RPO usually in the order of hours
y Lower Bandwidth requirements
y Extended distance solution

© 2006 EMC Corporation. All rights reserved. Business Continuity - 164

Disk buffered consistent PITs is a combination of Local and Remote replications technologies.
The idea is to make a Local PIT replica and then create a Remote replica of the Local PIT. The
advantage of disk buffered PITs is lower bandwidth requirements and the ability to replicate
over extended distances. Disk buffered replication is typically used when the RPO requirements
are of the order of hours or so, thus a lower bandwidth network can be used to transfer data from
the Local PIT copy to the remote site. The data transfer may take a while, but the solution would
be designed to meet the RPO.
We will take a look at a two disk buffered PIT solutions.

Business Continuity - 164


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Extended Distance Consistent PIT


SOURCE REMOTE

Network Links
Source Local
Replica

Local Remote
Replica Replica

y Create a Consistent PIT Local Replica on Source Array


y Create a Remote Replica of this Local Replica
y Optionally create another replica of the Remote replica on the
remote array if needed
y Repeat…as automation, link bandwidth, change rate permit

© 2006 EMC Corporation. All rights reserved. Business Continuity - 165

Disk buffered replication allows for the incremental resynchronization between a Local Replica
which acts as a source for a Remote Replica.
Benefits include:
y Reduction in communication link cost and improved resynchronization time for long-
distance replication implementations
y The ability to use the various replicas to provide disaster recovery testing, point-in-time
backups, decision support operations, third-party software testing, and application upgrade
testing or the testing of new applications.

Business Continuity - 165


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Synchronous + Extended Distance Consistent PIT


SOURCE BUNKER REMOTE

Sync Local
Remote Network
Source Links Replica
Replica

Local Remote
Replica Replica
Network
Links

y Synchronous replication between the Source and Bunker


Site
y Create consistent PIT Local Replica at bunker
y Create Remote Replica of bunker Local Replica
y Optionally create additional Local Replica at Target site
from the Remote Replica if needed
y Repeat…as automation, link bandwidth, change rate
permit
© 2006 EMC Corporation. All rights reserved. Business Continuity - 166

Synchronous + Extended Distance Buffered Replication benefits include:


y Bunker site provides a zero RPO DR Replica
y The ability to resynchronize only changed data between the intermediate Bunker site and the
final target site, reducing required network bandwidth
y Reduction in communication link cost and improved resynchronization time for long-
distance replication implementations
y The ability to use the replicas to provide disaster recovery testing, point-in-time backups,
decision support operations, third-party software testing, and application upgrade testing or
the testing of new applications.

Business Continuity - 166


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Remote Replicas – Tracking Changes


y Remote replicas can be used for BC Operations
– Typically remote replication operations will be suspended when the
remote replicas are used for BC Operations

y During BC Operations changes will/could happen to both


the source and remote replicas
– Most remote replication technologies have the ability to track
changes made to the source and remote replicas to allow for
incremental re-synchronization
– Resuming remote replication operations will require re-
synchronization between the source and replica

© 2006 EMC Corporation. All rights reserved. Business Continuity - 167

Tracking changes to facilitate incremental re-synchronization between the source devices and
remote replicas is done via the use of bitmaps in a manner very similar to that discussed in the
Local Replication lecture. Two bitmaps one for the source and one for the replica would be
created, some vendors may keep the information of both bitmaps at both the source and remote
sites, while others may simply keep the source bitmap at the source site and the remote bitmap
at the remote site. When a re-synchronization (source to replica or replica to source) is required
the source and replica bitmaps will be compared and only data that was changed will be
synchronized.

Business Continuity - 167


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Primary Site Failure – Operations at Remote Site


y Remote replicas are typically not available for use while
the replication session is in progress
y In the event of a primary site failure the replicas have to
be made accessible for use
y Create a local replica of the remote devices at the remote
site
y Start operations at the Remote site
– No remote protection while primary site issues are resolved

y After issue resolution at Primary Site


– Stop activities at remote site
– Restore latest data from remote devices to source
– Resume operations at Primary (Source) Site
© 2006 EMC Corporation. All rights reserved. Business Continuity - 168

While remote replication is in progress the remote devices will typically not be available for use.
This is to ensure that the no changes are made to the remote replicas, the purpose of the remote
replica is to provide a good starting point for any recovery operations.
Prior to any recovery efforts with the remote replicas, it is always a good idea to create a local
replica of the remote devices. The local replica can be use as a fall back if the recovery process
somehow corrupts the remote replicas.

Business Continuity - 168


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Based – Which Technology?


y Synchronous
– Is a must if zero RPO is required
– Need sufficient bandwidth at all times
– Application response time elongation will prevent extended distance
solutions (rarely above 125 miles)

y Asynchronous
– Extended distance solutions with minimal RPO (order of minutes)
– No Response time elongation
– Generally requires lower Bandwidth than synchronous
– Must design with adequate cache/buffer or sidefile/logfile capacity

y Disk Buffered Consistent PITs


– Extended distance solution with RPO in the order of hours
– Generally lower bandwidth than synchronous or asynchronous

© 2006 EMC Corporation. All rights reserved. Business Continuity - 169

The choice of the appropriate array based remote replication depends on specific needs.
What are the RPO requirements? What is the distance between sites? What is the primary reason
for remote replication? etc.

Business Continuity - 169


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Storage Array Based – Remote Replication


y Network Options
– Most vendors support ESCON or Fibre Channel adapters for remote
replication
¾ Can connect to any optical or IP networks with appropriate protocol
converters for extended distances
™ DWDM
™ SONET
™ IP Networks
– Some Vendors have native Gigabit Ethernet adapters which allows
the array to be connected directly to IP Networks without the need
for protocol converters

© 2006 EMC Corporation. All rights reserved. Business Continuity - 170

A dedicated or a shared network must be in place for remote replication. Storage arrays will
have dedicated ESCON, Fibre Channel or Gigabit Ethernet adapters which are used for remote
replication. The network between the two arrays could be ESCON or Fibre Channel for the
entire distance. Such networks would be typically used for shorter distance. For extended
distances an optical or IP network must be used. Examples of optical networks are DWDM and
SONET (discussed later). To connect the ESCON or Fibre Channel adapters from the arrays to
these networks protocol converters may have to be used. Gigabit Ethernet adapters can be
connected directly to the IP network.

Business Continuity - 170


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Dense Wavelength Division Multiplexing (DWDM)


y DWDM is a technology that puts data from different
sources together on an optical fiber with each signal
carried on its own separate light wavelength (commonly
referred to as a lambda or λ).
y Up to 32 protected and 64 unprotected separate
wavelengths of data can be multiplexed into a light
stream transmitted on a single optical fiber.
Optical Channels

ESCON
Optical
Fibre Channel Optical Electrical
Lambda λ

Gigabit Ethernet

© 2006 EMC Corporation. All rights reserved. Business Continuity - 171

Dense Wavelength Division Multiplexing (DWDM) multiplexes wavelengths (often referred to


as lambdas or represented by the symbol λ) onto a single pair (transmit and receive paths) of
optical fibers.
A key benefit of DWDM is protocol transparency. Since DWDM is an optical transmission
technique, the same interface type can be used to transport any bit rate or protocol. It also allows
different bit rates and protocol data streams to be mixed on the same optical fiber. DWDM
alleviates the need for protocol conversion, associated complexity and the resulting transmission
latencies.

Business Continuity - 171


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Synchronous Optical Network (SONET)


y SONET is Time Division Multiplexing
(TDM) technology where traffic from OC3 OC48
multiple subscribers is multiplexed OC48
together and sent out onto the SONET
ring as an optical signal
y Synchronous Digital Hierarchy (SDH) SONET

similar to SONET but is the European


standard
STM-1 STM-16
y SONET/SDH, offers the ability to
service multiple locations, its STM-16
reliability/availability, automatic
protection switching, and restoration
SDH

© 2006 EMC Corporation. All rights reserved. Business Continuity - 172

Synchronous Optical Networks (SONET) is a standard for optical telecommunications transport


formulated by the Exchange Carriers Standards Association (ECSA) for the American National
Standards Institute (ANSI). The equivalent international standard is referred to as Synchronous
Digital Hierarchy and is defined by the European Telecommunications Standards Institute
(ETSI). Within Metropolitan Area Networks (MANs) today, SONET/SDH rings are used to
carry both voice and data traffic over fiber.

Business Continuity - 172


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Rated Bandwidth
Link Bandwidth Mb/s
Escon 200
Fibre Channel 1024 or 2048
Gigabit Ethernet 1024
T1 1.5
T3 45
E1 2
E3 34
OC1 51.8
OC3/STM1 155.5
OC12/STM4 622.08
OC48/STM16 2488.0
© 2006 EMC Corporation. All rights reserved. Business Continuity - 173

The slide lists the rated bandwidth in Mb/s for standard WAN (T1, T3, E1, E3), SONET (OC1,
OC3, OC12, OC48) and SDH (STM1, STM4, STM16) Links. The rated bandwidth of ESCON,
Fibre Channel and Gigabit Ethernet is also listed.

Business Continuity - 173


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Module Summary
Key points covered in this module:
y Remote Replication Concepts
– Synchronous/Asynchronous
– Connectivity Options

y Host and Array based Remote Replication Technologies


– Functionality
– Differences
– Considerations
– Selecting the appropriate technology

© 2006 EMC Corporation. All rights reserved. Business Continuity - 174

These are the key points covered in this module. Please take a moment to review them.

Business Continuity - 174


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Apply Your Knowledge…


Upon completion of this topic, you will be able to:

y Enumerate EMC’s Remote Replication Solutions for the


Symmetrix and CLARiiON arrays
y Describe EMC’s SRDF/Synchronous Replication
Solution
y Describe EMC’s MirrorView/A Replication Solution

© 2006 EMC Corporation. All rights reserved. Business Continuity - 175

Business Continuity - 175


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

EMC – Remote Replication Solutions


y EMC Symmetrix Arrays
– EMC SRDF/Synchronous
– EMC SRDF/Asynchronous
– EMC SRDF/Automated Replication

y EMC CLARiiON Arrays


– EMC MirrorView/Synchronous
– EMC MirrorView/Asynchronous

© 2006 EMC Corporation. All rights reserved. Business Continuity - 176

All remote replication solutions that were discussed in this module are available on EMC
Symmetrix and CLARiiON Arrays.
The SRDF (Symmetrix Remote Data Facility) family of products provides Synchronous,
Asynchronous and Disk Buffered remote replication solutions on the EMC Symmetrix Arrays.
The MirrorView family of products provides Synchronous and Asynchronous remote replication
solutions on the EMC CLARiiON Arrays.
SRDF/Synchronous (SRDF/S): High-performance, host-independent, real-time synchronous
remote replication from one Symmetrix to one or more Symmetrix systems.
MirrorView/Synchronous (MirrorView/S): Host-independent, real-time synchronous remote
replication from one CLARiiON to one or more CLARiiON systems.
SRDF/Asynchronous (SRDF/A): High-performance extended distance asynchronous replication
for Symmetrix arrays using a Delta Set architecture for reduced bandwidth requirements and no
host performance impact. Ideal for Recovery Point Objectives of the order of minutes.
MirrorView/Asynchronous (MirrorView/A): Asynchronous remote replication on CLARiiON
arrays. Designed with low-bandwidth requirements, delivers a cost-effective remote replication
solution ideal for Recovery Point Objectives (RPOs) of 30 minutes or greater.
SRDF/Automated Replication: Rapid business restart over any distance with no data exposure
through advanced single-hop and multi-hop configurations using combinations of
TimeFinder/Mirror and SRDF on Symmetrix Arrays.

Business Continuity - 176


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

EMC SRDF/Synchronous - Introduction


y Array based Synchronous Remote Replication
technology for EMC Symmetrix Storage Arrays
– Facility for maintaining real-time physically separate mirrors of
selected volumes

y SRDF/Synchronous uses special Symmetrix devices


– Source arrays have SRDF R1 devices
– Target arrays have SRDF R2 devices
– Data written to R1 devices are replicated to R2 devices

y SRDF uses dedicated channels to send data from source


to target array
– ESCON, Fibre Channel or Gigabit Ethernet are supported

y SRDF is available in both Open Systems and Mainframe


environments
© 2006 EMC Corporation. All rights reserved. Business Continuity - 177

EMC SRDF/Synchronous is an Array based Synchronous Remote Replication technology for


EMC Symmetrix Storage Arrays. SRDF R1 and R2 volumes are devices dedicated for Remote
Replication. R2 Volumes are on the Target Arrays while R1 Volumes are on the Source Arrays.
Data written to R1 volumes is replicated to R2 volumes.

Business Continuity - 177


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

SRDF Source and Target Volumes


y SRDF R1 and R2 Volumes can have any local RAID
Protection
– E.g. Volumes could have RAID-1 or RAID-5 protection

y SRDF R2 volumes are in a Read Only state when remote


replication is in effect
– Changes cannot be made to the R2 volumes

y SRDF R2 volumes are accessed under certain


circumstances
– Failover – Invoked when the primary volumes become unavailable
– Split – Invoked when the R2 volumes need to be concurrently
accessed for BC operations

© 2006 EMC Corporation. All rights reserved. Business Continuity - 178

Business Continuity - 178


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

SRDF/Synchronous
1. Write received by Symmetrix containing Source volume
Source Host 2. Source Symmetrix sends write data to Target Target Host
3. Target Symmetrix sends acknowledgement to Source
1 4 4. Write complete sent to host

Channel Channel 2 Channel Channel


Director (CD) Director (CD) Director (CD) Director (CD)
Remote Link Remote Link
Director (RLD) Director (RLD)
Global Cache Director Global Cache Director
3
Remote Link Remote Link
Director (RLD) Director (RLD)
Disk Disk Disk Disk
Director (DD) Director (DD) Director (DD) Director (DD)

Symmetrix Containing Source (R1) Volumes Symmetrix Containing Target (R2) Volumes

y Application does not receive I/O acknowledgement until data is received


and acknowledged by remote Symmetrix
y Write completion time is extended - No impact on Reads
y Most often used in campus solutions
© 2006 EMC Corporation. All rights reserved. Business Continuity - 179

SRDF/Synchronous is used primarily in SRDF campus environments. In this mode of operation,


Symmetrix maintains a real-time mirror image of the data between the SRDF pairs.
Data on the Source (R1) volumes and the Target (R2) volumes is always identical .
The sequence of operations is:
1. An I/O write is received from the host/server into the cache of the Source.
2. The I/O is transmitted to the cache of the Target.
3. A receipt acknowledgment is provided by the Target back to the cache of the Source.
4. An ending status is presented to the host/server.
The transmission of data to the target and the receipt of acknowledgement from the target is done
via specialized hardware on the array (depicted as Remote Link Director – RLD in the picture).
De-stage of data to disk in Source and Target Symmetrix is done on a “off-priority” basis.
If a link failure occurs before acknowledgement is received from the Target Symmetrix then the
operation is re-tried down the remaining links in the RA-group. If all links fail then IO is
acknowledged to the host and the track is flagged as invalid to the remote mirror.

Business Continuity - 179


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

SRDF Operations - Failover


y Purpose – Make Target Volumes Read Write
y Source Volume status is changed to Read Only
y SRDF Link is suspended

Before
RW RO

RO RW
Source Target
Volume Volume

After Target
Source
Volume Volume

© 2006 EMC Corporation. All rights reserved. Business Continuity - 180

Failover operations are performed if the SRDF R1 Volumes become unavailable and the
decision is made to start operations on the R2 Devices. Failover could also be performed when
DR processes are being tested or for any maintenance tasks that have to be performed at the
source site.

If failing over for a Maintenance operation: For a clean, consistent, coherent point in time
copy which can be used with minimal recovery on the target side some or all of the following
steps may have to be taken on the source side:
y Stop All Applications (DB or what ever)
y Unmount file system.
y Deactivate the Volume Group
y A failover leads to a RO state on the source side. If a device suddenly becomes RO from a
RW state the reaction of the host can be unpredictable if the device is in use. Hence the
suggestion to stop applications, un-mount and deactivation of Volume Groups.

Business Continuity - 180


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

SRDF Operations - Failback


y Makes target volume Read Only, resumes link,
synchronize R2 to R1, and write enables source volume

Before
RO RW

Source Target
RW RO
Volume Volume

sync
After Target
Source
Volume Volume

© 2006 EMC Corporation. All rights reserved. Business Continuity - 181

The main purpose of the Failback operation is to allow the resumption of operations at the
primary site on the source devices. Failback will be typically invoked after a failover has been
performed and production tasks are being performed on the Target site on the R2 devices.
Once operations can be resumed at the Primary site the Failback operation can be invoked.
One must ensure that applications are properly quiesced and volume groups deactivated before
failback is invoked.

When failback is invoked the Target Volumes become Read Only, the source volumes become
Read Write and any changes that were made at the Target site while in the failed over state are
propagated back to the source site.

Business Continuity - 181


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

SRDF Operations - Split


y Enables read and write operations on both source and
target volumes
y Suspends replication

Before
RW RO

Source Target
RW RW
Volume Volume

After Target
Source
Volume Volume

© 2006 EMC Corporation. All rights reserved. Business Continuity - 182

The SRDF Split operation is used to allow concurrent access to both the source and target
volumes. Target volumes are made Read Write and the SRDF replication between the source
and target is suspended.

Business Continuity - 182


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

SRDF Operations – Establish/Restore


y Establish - Resume SRDF operation retaining data from
source and overwriting any changed data on target
y Restore - SRDF operation retaining data on target and
overwriting any changed data on source

RW RO RW RO

Target Source Target


Source Volume
Volume Volume Volume

Establish Restore
© 2006 EMC Corporation. All rights reserved. Business Continuity - 183

During current operations while in a SRDF Split state, changes could occur on both the source
and target volumes. Normal SRDF replication can be resumed by performing an establish or a
restore operations.
With either establish or restore the status of the target volume goes to Read Only. Thus prior to
establish or restore all access to the target volumes must be stopped.

The Establish operation is used when changes to the Target volume should be discarded while
preserving changes that were made to the source volumes.
The Restore operation is used when changes to the Source volume should be discarded while
preserving changes that were made to the Target volumes. Prior to a restore operation all
access to the source and target volumes must be stopped. The Target volumes will go to read
only state, while the data on the source volumes will be overwritten with the data on the target
volumes.

Business Continuity - 183


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

EMC CLARiiON MirrorView/A Overview


y Optional storage system software for remote replication
on EMC CLARiiON arrays
– No host cycles used for data replication

y Provides a remote image for disaster recovery


– Remote image updated periodically - asynchronously
– Remote image cannot be accessed by hosts while replication is
active
– Snapshot of mirrored data can be host-accessible at remote site

y Mirror topology (connecting primary array to secondary


arrays)
– Direct connect and switched FC topology supported
– WAN connectivity supported using specialized hardware

© 2006 EMC Corporation. All rights reserved. Business Continuity - 184

MirrorView/A is optional software supported on CX-series EMC CLARiiON arrays.


The design goal of MirrorView/A is to allow speedy recovery from a disaster, but at lower cost
than synchronous solutions. It allows long distance connectivity in environments where some
data loss is acceptable. It accomplishes this goal by using an asynchronous interval-based
update mechanism. This means that changed data is accumulated at the local side of the link,
then sent to the remote side at regular, user-defined intervals. The data on the remote image is
always older than the data on the local image, by up to 2 interval times. Though this will lead to
data loss in the event of a disaster, it is an acceptable trade-off for many customers.
Supported connection topologies include direct connect, SAN connect, and WAN connect,
when appropriate Fibre Channel to IP conversion devices are used.

Business Continuity - 184


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

MirrorView/A Terms
y Primary storage system
– Holds the local image for a given mirror

y Secondary storage system


– Holds the local image for a given mirror

y Bidirectional mirroring
– A storage system can hold local and remote images

y Mirror Synchronization
– Process that copies data from local image to remote image

y MirrorView Fractured state


– Condition when a Secondary storage system is unreachable by the
Primary storage system

© 2006 EMC Corporation. All rights reserved. Business Continuity - 185

The terms ‘primary storage system’ and ‘secondary storage system’ are terms relative to each
mirror. Because MirrorView/A supports bidirectional mirroring, a storage system which hosts
local images for one or more mirrors may also host remote images for one or more other
mirrors.
The process of updating a remote image with data from the local image is called
synchronization. When mirrors are operating normally, they will either be in the synchronized
state, or be synchronizing. If a failure occurs, and the remote image cannot be updated –
perhaps because the link between the CLARiiONs has failed – then the mirror is in a fractured
state. Once the error condition is corrected, synchronization will restart automatically.

Business Continuity - 185


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

MirrorView/A Configuration
y MirrorView/A Setup
– MirrorView/A software must be loaded on both Primary and
Secondary storage system
– Remote LUN must be exactly the same size as local LUN
– Secondary LUN does not need to be the same RAID type as
Primary
– Reserved LUN Pool space must be configured

y Management via Navisphere Manager and CLI

© 2006 EMC Corporation. All rights reserved. Business Continuity - 186

MirrorView/A software must be loaded on both CLARiiONs, regardless of whether or not the
customer wants to implement bi-directional mirroring.
The remote LUN must be the same size as the local LUN, though not necessarily the same
RAID type. This allows flexibility in DR environments, where the backup site need not match
the performance of the primary site.
Because MirrorView/A uses SnapView Snapshots as part of its internal operation, space must
be configured in the Reserved LUN Pool for data chunks copied as part of a COFW operation.
SnapView Snapshots, the Reserved LUN Pool, and COFW activity were discussed in an earlier
module.
MirrorView/A, like other CLARiiON software, is managed by using either Navisphere Manager
if a graphical interface is desired, or Navisphere CLI for command-line management.
Hosts can not attach to a remote LUN while it is configured as a secondary (remote) mirror
image. If you promote the remote image to be the primary mirror image (in other words,
exchange roles of the local and remote images), as will be done in a disaster recovery scenario,
or if you remove the secondary LUN from the mirror, and thereby turn it into an ordinary
CLARiiON LUN, then it may be accessed by a host.

Business Continuity - 186


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

MirrorView/A – Initial Synchronization

Primary Secondary
Image Image
B C D E’
A B’ E F A B C D E F
Host

Tracking
DeltaMap 0 0
1 0 0 1
0 0
Snapshot
Transfer
DeltaMap
1 0
0 1 0
1 0
1 0
1 0
1

RLP MAP E MAP

© 2006 EMC Corporation. All rights reserved. Business Continuity - 187

MirrorView/A makes use of bitmaps, called DeltaMaps because they track changes, to log
where data has changed, and needs to be copied to the remote image. As with SnapView
Snapshots, the MirrorView image is seen as consisting of 64 kB areas of data, called chunks or
extents.
This animated sequence shows the initial synchronization of a MirrorView/A mirror. The
Transfer DeltaMap has all its bits set, to indicate that all extents need to be copied across to the
secondary. At the time the synchronization starts, a SnapView Session is started on the primary,
and it will track all changes in a similar manner to that used by Incremental SAN Copy. At the
end of the initial synchronization, the secondary image is a copy of what the primary looked
like when the synchronization started. Any changes made to the primary since then are flagged
by the Tracking DeltaMap

Business Continuity - 187


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

MirrorView/A – Update

Primary Secondary
Image Image
A B’ C D E”
A’ E’ F B C D E’
A B’ E F
Host

Tracking
Transfer
DeltaMap 0 0 0 0
0 1 1 0
Snapshot
Transfer
Tracking
DeltaMap
1
0 0 0 0 1
0 0

RLP MAP E’ MAP B E

© 2006 EMC Corporation. All rights reserved. Business Continuity - 188

An update cycle starts, either automatically at the prescribed time, or initiated by the user. Prior
to the start of data movement to the secondary, MirrorView/A starts a SnapView Session on the
secondary, to protect the original data if anything goes wrong during the update cycle.
After the update cycle completes successfully, the SnapView Session and Snapshot on the
secondary side are no longer needed, and are destroyed.

Business Continuity - 188


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

MirrorView/A –Promotion (Update Failure)

Primary Secondary
Primary
Image Image
A’ B’ C D E”
E’ F’ A B’
B C D E F
Host Promote
Secondary
Transfer
DeltaMap 0 0 0 0 1 0
Snapshot
Tracking
DeltaMap 1 0 0 0 1
0 1

RLP MAP E’ MAP B

© 2006 EMC Corporation. All rights reserved. Business Continuity - 189

Should the update cycle fail for any reason – here a primary storage system failure – and it
becomes necessary to promote the secondary, then the safety Session will be rolled back, and
the secondary image will be returned to the state it was in prior to the start of the update cycle.

Business Continuity - 189


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Consistency Groups
y Group of secondary images treated as a unit
y Local LUNs must all be on the same CLARiiON
y Remote LUNs must all be on the same CLARiiON
y Operations happen on all LUNs at the same time
– Ensures a restartable image group

© 2006 EMC Corporation. All rights reserved. Business Continuity - 190

Consistency Groups allow all LUNs belonging to a given application, usually a database, to be
treated as a single entity, and managed as a whole. This helps to ensure that the remote images
are consistent, i.e. all made at the same point in time. As a result, the remote images are always
restartable copies of the local images, though they may contain data which is not as new as that
on the primary images.
It is a requirement that all the local images of a Consistency Group be on the same CLARiiON,
and that all the remote images for a Consistency Group be on the same remote CLARiiON. All
information related to the Consistency Group will be sent to the remote CLARiiON from the
local CLARiiON.
The operations which can be performed on a Consistency Group match those which may be
performed on a single mirror, and will affect all mirrors in the Consistency Group. If, for some
reason, an operation cannot be performed on one or more mirrors in the Consistency Group,
then that operation will fail, and the images will be unchanged.

Business Continuity - 190


Copyright © 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Section Summary
Key points covered in this section:
y Business continuity overview
y Basic technologies that are enablers of data availability
y Basic disaster recovery techniques

© 2006 EMC Corporation. All rights reserved. Business Continuity - 191

This completes Section 4 – Business Continuity.


Please take a moment to review the key points covered in this section.

Business Continuity - 191

You might also like