Professional Documents
Culture Documents
June 2010
Summary
Hitachi Data Systems, Microsoft, Brocade and Ciena are partnering to architect a robust business continuity
and disaster recovery solution using best-in-class technologies for implementing Microsoft Hyper-V Live
Migration over Distance.
A comprehensive business continuity and disaster recovery plan mandates the deployment of multiple data
centers located far enough apart to protect against regional power failures and disasters. Synchronous remote
data replication is the appropriate solution for organizations seeking the fastest possible data recovery,
minimal data loss and protection against database integrity problems. However, application performance is
affected by the distance and latency between the data centers, which might restrict the location of the data
centers.
Deploying best-in-class storage replication technologies with the Hitachi Universal Storage Platform family,
Hitachi Storage Cluster, Hyper-V Live Migration over Distance and data center interconnect products from
Brocade and Ciena creates a highly available and scalable solution for business continuity where synchronous
replication is a requirement. This document defines a tested reference architecture that supports Live
Migration over Distance.
For best results use Acrobat Reader 8.0.
Feedback
Hitachi Data Systems welcomes your feedback. Please share your thoughts by sending an email message to
SolutionLab@hds.com. Be sure to include the title of this white paper in your email message.
Table of Contents
Proactive Data Center Management ................................................................................................................................... 1
Solution Overview ................................................................................................................................................................ 2
Hyper-V Live Migration over Distance Requirements ............................................................................................... 4
Solution Components ............................................................................................................................................... 4
Tested Deployment .............................................................................................................................................................. 9
Storage Configuration .............................................................................................................................................. 9
Storage Area Network .............................................................................................................................................. 9
Wide Area Network ................................................................................................................................................ 11
Private Fiber Network ............................................................................................................................................. 13
Operating System .................................................................................................................................................. 13
Management Software ........................................................................................................................................... 14
Deployment Considerations .............................................................................................................................................. 14
Storage Bandwidth ................................................................................................................................................. 14
Storage Replication Paths ...................................................................................................................................... 14
Storage Redundancy ............................................................................................................................................. 14
Storage System Processing Capacity .................................................................................................................... 15
Professional Services ............................................................................................................................................. 15
Lab Validated Results ........................................................................................................................................................ 16
Conclusion .......................................................................................................................................................................... 18
Appendix A: Bill of Materials ............................................................................................................................................. 19
Appendix B: References .................................................................................................................................................... 21
Hitachi .................................................................................................................................................................... 21
Brocade .................................................................................................................................................................. 21
Ciena ...................................................................................................................................................................... 21
Deploying best-in-class storage replication technologies with the Hitachi Universal Storage Platform family,
Hitachi Storage Cluster, Hyper-V Live Migration over Distance and data center interconnect products from
Brocade and Ciena creates a highly available and scalable solution for business continuity where synchronous
replication is a requirement. This document defines a tested reference architecture that supports Live Migration
over Distance.
Planning and implementation of this solution requires professional services from Hitachi Data Systems Global
Solutions Services.
This white paper is written for storage and data center administrators charged with disaster recovery and
business continuity planning. It assumes the reader has general knowledge of Microsoft Failover Clustering,
local and wide area networking and storage area networks.
Perceived zero data center downtime for maintenance Perform virtually any data center
maintenance task during normal working hours without affecting end users by simply moving the affected
applications either within the data center or to a remote site.
Workload balance Dynamically and non-disruptively move workloads between data centers.
Data center consolidation and migration Move workloads between data centers.
Ease of management Relieve storage and data center administrators of the need to learn many
different tools or products. Hitachi Storage Cluster and Hyper-V live migration use standard cluster
management interfaces to execute all operations such as application failover and failback between sites.
Live migration is performed using the Failover Cluster GUI, Virtual Machine Manager (VMM) or Powershell
scripting.
Solution Overview
A highly available, highly scalable Hyper-V failover cluster that supports Live Migration over Distance requires
highly available, highly scalable storage, storage fabric and networks. Because this solution uses Hitachi
Storage Cluster, Hitachi TrueCopy Synchronous software and Hyper-V Failover Clusters, virtual machines
can be migrated between storage systems across distance with minimal intervention.
This solution supports up to 16 Hyper-V host nodes in a multi-site failover cluster, with high availability
achieved at the local and remote sites with redundant physical paths enabled via multiple host bus adapters
(HBAs) from the servers. Proper zoning within the storage fabric and the use of multipathing software allows for
continued operation in the event of a hardware component failure. Redundant high-speed network and fabric
interconnects enable continued operation in the event of a hardware failure, ensuring high availability and
performance across geographically separated sites.
For this reference architecture, a distance of 200 kilometers was tested by using Fibre Channel spools between
the DWDMs at each site.
This reference architecture uses the Hitachi Universal Storage Platform VM as the storage platform, Hyper-V
Failover Clustering supporting live migration of VMs, and a Brocade network and fabric architecture to provide
connectivity across data centers. Note that although this reference architecture was tested on a Universal
Storage Platform VM, it can be deployed on a Universal Storage Platform V as well.
Figure 1 illustrates the reference architecture described by this white paper.
In this reference architecture, each site hosted eight Hyper-V servers connected to the Universal Storage
Platform VM via a Brocade DCX Backbone director at the local site and a DCX-4S director at the remote site.
To support the storage replication and rapid movement of VMs between the local and remote sites, Hitachi
Storage Cluster was implemented. A 1/10GbE Brocade TurboIron switch and a 10GbE NetIron XMR router at
each site, along with a Ciena 4200 DWDM, provided the network infrastructure to support both storage
replication traffic and Hyper-V Live Migration over Distance traffic.
To support the storage replication and Live Migration over Distance traffic bandwidth and latency requirements,
both Fibre Channel over IP ISL links along with native Fibre Channel inter-switch links (ISLs) were configured.
ISL links are inter-switch links that connect the switches into a switched fabric. This was done for the following
reasons:
To provide redundancy in case of an ISL link failure with either the Fibre Channel over IP ISL links or the
Fibre Channel ISL links, each ISL link was comprised of multiple physical links. In the case of failure of a
physical link within an ISL, the availability of a particular ISL link is not affected. The Fibre Channel FCIP
link contained three physical 1Gb links and the Fibre Channel ISL links contained two 4Gb physical links
for availability and performance.
To validate this reference architecture for storage replication traffic using Fibre Channel over IP.
To validate this reference architecture for Fibre Channel ISL links over distance.
Testing used SQL server transaction workloads along with Iometer workloads to validate this solution in terms
of bandwidth and latency capabilities. This ensured that live migrations occurred in a timely fashion and with no
perceivable outage to end users.
3
A total of 3Gb of bandwidth was configured to support the Fibre Channel FCIP links and validation of
throughput and latency was performed on these links. In the lab, Hitachi Data Systems was able to move, on
average, 400MB/s of write traffic across the replication links with average response times less than 20ms.
Brocades hardware compression on the Fibre Channel over IP ISLs within the DCX directors accounted for the
increased throughput.
A total of 8GB of bandwidth was configured to support the Fibre Channel ISL links and validation of throughput
and latency was performed on these links. Testing showed that we were able to move on average 480MB/s of
write traffic across the replication links with average response times less than 20ms. Additional bandwidth was
still available across the Fibre Channel ISL links, but increasing the link utilization requires additional resources
within the Universal Storage Platform VM storage systems, which were not available in the test environment.
This solution validated that multiple parallel live migrations completed successfully across distance between
nodes in the Hyper-V cluster. Because live migration is restricted to pairs of nodes in the cluster, this reference
architectures 16-node cluster design is limited to eight parallel live migrations. Hitachi Data Systems
successfully conducted eight simultaneous migrations in its testing of this reference architecture.
For more information about validation testing, see the Lab Validation section.
An IP network that can support the bandwidth requirements of the virtual machines that will be migrated.
This requirement can vary based on the number of modified pages that might need to be moved across the
IP network for a particular virtual machine.
A Fibre Channel IP or Fibre Channel network that can support the bandwidth and latency requirements for
storage replication across distance.
The source and destination Hyper-V host are required to have a private live migration IP network on the
same subnet and broadcast domain.
The IP subnet that is utilized by the virtual machines must be accessible from both the local and remote
servers. When live migrating a virtual machine between the local and remote site, the virtual machine must
retain its IP address so that TCP communication continues during and after the migration
.
The following variables affect live migration speed:
The number of modified pages on the VM to be migrated; the larger the number of modified pages, the
longer the VM remains in a migrating state
Available bandwidth (network or Fibre Channel) between Hyper-V physical hosts and shared storage
Solution Components
The following sections describe the components that make up this solution.
This reference architecture uses Hitachi Dynamic Provisioning software to provision virtual machines. The
Universal Storage Platform VM with Hitachi Dynamic Provisioning software supports both internal and external
virtualized storage, simplifies storage administration and improves performance to help reduce overall power
and cooling costs.
Although this solution was tested on a Universal Storage Platform VM, it is also appropriate for use with
Universal Storage Platform V.
Figure 2 illustrates how multiple virtual machines and their associated applications can be made highly
available using Live Migration over Distance with Hitachi Storage Cluster.
Figure 2. Hitachi Storage Cluster for Hyper-V
Live migration produces significantly less downtime than quick migration for the VM being migrated. This
makes live migration the preferred method when users require uninterrupted access to the VM that is being
migrated. Because a live migration will complete in less time that the TCP timeout for the migrating VM, users
will not experience any outage even during steps 3 and 4 of the migration.
Because live migration moves virtual machines over the Ethernet network, the following networking features
within Microsoft Windows Server 2008 R2 enhance Live Migration over Distance:
The ability to specify which physical network at the NIC level that live migration uses when moving a virtual
machines configuration and memory pages across the network. For this reference architecture, the 10GbE
Brocade network hosts the live migration network traffic for both performance and throughput reasons.
Support for Jumbo Frames allows for larger payloads per network packet which improves overall throughput
and reduces CPU utilization for large transfers. This reference architecture uses Broadcom NICs that support
Jumbo Frames.
Support for VM Chimney, which allows a virtual machine to offload its network processing load onto the NIC
of the physical Hyper-V host computer. This improves CPU and overall network throughput performance and
is fully supported by live migration. This reference architecture uses Broadcom NICs that support VM
Chimney.
For more information about Hyper-V Live Migration, see the Hyper-V Live Migration Overview and Architecture
white paper.
management of virtual connections, providing the ability to guarantee service levels, monitor I/O history and
isolate traffic per virtual machine.
Tested Deployment
The following sections describe the tested deployment of Hyper-V Live Migration over Distance in the Hitachi
Data Systems laboratory.
Storage Configuration
Hitachi TrueCopy Synchronous software requires the use of a Hitachi Universal Storage Platform V or
Universal Storage Platform VM at the local site that contains the primary volumes and a Universal Storage
Platform V or Universal Storage Platform VM at the remote site for the secondary volumes. Testing conducted
to develop this reference architecture used a Universal Storage Platform VM.
The Universal Storage Platform VM storage system at the local site is known as the Main Control Unit (MCU)
and the Universal Storage Platform VM at the remote site is known as the Remote Control Unit (RCU). Remote
paths connect the two Universal Storage Platform VM storage systems over distance. The tested deployment
used two 4Gb Fibre Channel remote copy connections.
Table 1 lists the configuration specifications for the two Universal Storage Platform VMs deployed in this
reference architecture.
Table 1. Deployed Storage System Configuration
Component
Details
Storage system
Microcode level
60-06-21
RAID-5 (3+1)
Cache memory
128GB
Front-end ports
TrueCopy ports
Drive capacity
300GB
24
88
50GB
10
In addition, to provide additional throughput, availability and performance for the storage replication traffic,
multiple Fibre Channel over IP links were configured between the directors across distance. Fibre Channel
over IP trunking was implemented to combine multiple FCIP links into a high bandwidth FCIP trunk spanning
multiple physical ports to provide load balancing and network failure resiliency. For more information, see the
Wide Area Network section of this paper.
This solution uses two redundant paths from each Hyper-V host to the Universal Storage Platform VM. Each
Hyper-V host had dual-path Brocade HBAs configured for high availability. Microsofts MPIO software provided
a round-robin load balancing algorithm that automatically selects a path by rotating through all available paths,
thus balancing the load across all available paths, optimizing IOPS and response time.
Figure 3 illustrates the storage area network configuration for the stretched 16-node Hyper-V failover cluster
used for this reference architecture.
Figure 3. Live Migration over Distance SAN Configuration
Table 2 lists the Brocade director configuration and firmware levels for the HBAs and the directors deployed in
this solution.
Table 2. Deployed Brocade Configuration and Firmware Levels
Device
Number
of Fibre
Channel
Ports
Number
of GbE
Ports
Notes
22
Firmware 6.3.0.b
Brocade DCX-4S
22
Firmware 6.3.0.b
N/A
10
11
Figure 5 shows the network configuration implemented for both the IP network that supports cluster
management traffic and live migration traffic and the Fibre Channel over IP and Fibre Channel links that
support storage replication.
Figure 5. Storage Replication Network
12
The F10-T Transponder module provides transponding and regeneration of various 10Gb signals. For this
solution, the 10GbE connections from the NetIron XMRs connected directly into the F10-T.
The FC4-T Muxponder aggregates up to three 4Gb Fibre Channel ports across a 10Gb wavelength. The FC4T is a Fibre Channel aggregation card that can be provisioned to handle FC400 clients. For this architecture,
the Fibre Channel links from the Brocade DCXs were plugged directly into the FC4-T.
Operating System
Microsoft Windows Server 2008 R2 was deployed on 16 Hyper-V servers across geographically dispersed
datacenters, with eight servers deployed on the local site and eight servers on the remote site. Multiple
network connections were deployed to support the cluster management network and the live migration
network. Table 3 lists the deployed server hardware and software.
13
Role
Quantity
Operating System
Dell 2950
16
HP DL585
HP DL585
Management Software
This section describes the software deployed to support the Hyper-V Live Migration over Distance architecture.
Table 4 lists the software used in this reference architecture.
Table 4. Deployed Management Software
Software
Version
Release 2
7.0
7.0
Microsoft MPIO
006.0001.7600.6385
10.4.0
Deployment Considerations
The following sections describe key considerations for planning a deployment of this solution.
Storage Bandwidth
To maintain a continuous replica copy, the bandwidth available across the TrueCopy links must be greater than
the average write workload that occurs during any Recovery Point Objective (RPO). This means that if an
organization wants to maintain an RPO of twenty minutes, then the twenty-minute interval with the greatest
write activity must be identified. With that the bandwidth required to keep up with this traffic can be calculated.
Storage Redundancy
It is important to identify redundancy requirements, and to ensure that multiple links are available over the
network to carry replication traffic in case of failure.
14
Professional Services
Planning and implementation of this solution requires professional services from Hitachi Data Systems Global
Solutions Services.
Bandwidth recommendation
Hardware and software audits of the host and storage environments identified for inclusion in the
replication environment
Workload characteristics
Strategic recommendations
Implementation planning
Testing
Knowledge transfer (including hardware and software configuration, command scripts, host operations, and
control mechanisms)
15
Write
(MB/sec)
Response
Time (ms)
VM
Memory
Size (GB)
Live
Migration
Time
18447.31
159.32
465.94
15.82
00:01:16
19772.26
156.43
469.28
15.02
00:02:05
20727.91
158.69
472.06
15.52
00:02:25
21647.11
164.69
484.11
16.78
10
00:02:47
Total IOPS
Table 6 lists the test results. The live migration times increase as the I/O workload and the size of the memory
allocated to the virtual machine increases. All response times were less than 20ms for writes.
Table 6. Fibre Channel Links for I/O Profile 75% Write, 25% Read (50% Random, 50% Sequential)
Total IOPS
VM
Memory
Size (GB)
Live
Migration
Time
Read
(MB/sec)
Write
(MB/sec)
Response
Time (ms)
20755.56
156.26
468.77
15.84
00:01:17
21059.35
159.52
478.56
15.14
00:01:44
20627.91
160.77
482.30
15.50
00:02:47
22167.84
165.12
495.71
15.49
10
00:03:14
16
Table 7 lists the test results for migrating two virtual machines in parallel.
Table 7. Multiple Live Migrations for I/O Profile 75% Write, 25% Read (50% Random, 50% Sequential)
Virtual
Machine
Total
IOPS
Read
(MB/sec)
Write
(MB/Sec)
Response
Time (ms)
VM
Memory
Size (GB)
Live
Migration
Time
VM1
8077.11
62.95
188.83
9.91
00:01:25
VM2
6613.14
51.69
155.06
12.08
00:01:34
For testing application workloads, the SQLStress tool from Microsoft was used against an AdventureWorks
database. The SQLStress tool is a load driver for SQL Server that can be used to execute large batches of
queries or updates against one or more SQL Server databases. It can be used for stress testing, performance
testing, and/or for research purposes. The tool is controlled by configuring a simple XML file that contains the
connection information as well as descriptions of the types of work load to run.
The SQLStress tool works by creating a workload consisting of one or more work items; each work item
contains a T-SQL query. The tool then spawns a number of worker threads. Each thread executes a number of
iterations; with each iteration, it chooses a work item, and associated query, and executes it against the
database.
To ensure that the server memory on the SQL Server virtual machine was being heavily used, Transact-SQL
Stored Procedures generated recursive read queries to objects within the SQL Server database. For this test,
200 worker threads were executed against the database. Table 8 lists the test results. All live migrations were
initiated using VMM and the SQL queries continued uninterrupted after the migration.
Table 8. SQL Server Live Migration TImes
Transactions
per Second
Read
Response
Time (ms)
VM
Memory
Size
(GB)
Live
Migration
Time
2368
4.53
00:00:38
2401
4.22
00:00:56
2590
4.02
00:01:22
2644
3.86
00:01:46
2740
2.67
10
00:02:10
17
SQLStress generated update writes to the database. Table 9 lists the test results. All live migrations were
initiated using VMM and the SQL updates continued uninterrupted after the migration.
Table 9. SQL Server Live Migration Times (Update Workload)
Transactions
per Second
Write
Response
Time (ms)
VM
Memory
Size
(GB)
Live
Migration
Time
2192
20.12
00:00:39
2280
19.23
00:01:02
2230
19.01
00:01:21
2251
18.60
00:01:46
2343
18.02
10
00:02:06
Conclusion
This white paper describes a reference architecture that delivers an end-to-end high availability and business
continuity solution using Hyper-V live migration, advanced storage replication technologies from Hitachi Data
Systems, and network and storage extensions from Brocade and Ciena This solution enables the creation of a
highly flexible and easy-to-manage virtualized environment for critical applications.
18
Description
7846540.P
DKC-F615I-16FS.P
DKC-F615I-450KS.P
DKC-F615I-B2.P
DKC-F615I-C16G.P
DKC-F615I-S4GQ.P
DKC-F615I-CX.P
DKC-F615I-DKA.P
Disk Adapter
DKC-F615I-LGAB.P
DKC-F615I-LGAB.P
DKC-F615I-UC0.P
DKC615I-5.P
DKC-F615I-PHUC.P
Description
HD-DCX-0001
BR-DCX-0102
PORT BLADE,32P,0SFP,DCX,BR
XBR-000148
FRU,SFP,SWL,8G,8-PK, BR
HD-FX824-0001
XBR-000190
HD-DCX4S-0002
BR-DCX-0102
PORT BLADE,32P,0SFP,DCX,BR
XBR-000148
FRU,SFP,SWL,8G,8-PK, BR
HD-FX824-0001
XBR-000190
HD-825-0010
BR-DCFM-ENT
NI-XMR-4-AC
19
Part Number
Description
NI-XMR-MR
NI-XMR-1GX20-SFP
NI-XMR-10GX4
NI-X-ACPWR-A
NI-X-SF1
TI-24X-AC
24P 10GBE/1GBE,SFP+,TI,BR
RPS11
10G-SFPP-SR
E1MG-SX-OM
E1MG-TX
10G-XFP-SR
Description
B-820-0007-001
B-820-0007-002
B-720-0020-004
B-720-0015-001
166-0011-900
500-4200-101
800-4200-KIT
S42-0001-71C
431-1133-001
B-720-0025-200
B-720-1086-300
B-720-0042-001
B-955-0003-007
B-720-0017-001
MODULE (4)
B-720-0016-001
B-730-0008-001
130-4901-900
20
Appendix B: References
The following sections provide links to more information about the products mentioned in this white paper.
Hitachi
Hitachi Storage Cluster for Microsoft Hyper-V Solution and Partnership Overview
Hitachi Storage Cluster for Microsoft Hyper-V: Optimizing Business Continuity and Disaster Recovery in
Microsoft Hyper-V Environments
Building a Scalable Microsoft Hyper-V Architecture on the Hitachi Universal Storage Platform Family
Optimizing High Availability and Disaster Recovery with Hitachi Storage Cluster in Microsoft Virtualized
Environments
Top 5 Business Reasons to use Hitachi Enterprise Storage in Virtualized Server Environments
Brocade
Brocade 415, 425, 815 and 825 Fibre Channel HBA Data Sheet
Ciena
21
Corporate Headquarters 750 Central Expressway, Santa Clara, California 95050-2627 USA
Contact Information: + 1 408 970 1000 www.hds.com / info@hds.com
Asia Pacific and Americas 750 Central Expressway, Santa Clara, California 95050-2627 USA
Contact Information: + 1 408 970 1000 www.hds.com / info@hds.com
Europe Headquarters Sefton Park, Stoke Poges, Buckinghamshire SL2 4HD United Kingdom
Contact Information: + 44 (0) 1753 618000 www.hds.com / info.uk@hds.com
Hitachi is a registered trademark of Hitachi, Ltd., in the United States and other countries. Hitachi Data Systems is a registered trademark and service mark of
Hitachi, Ltd., in the United States and other countries.
All other trademarks, service marks and company names mentioned in this document are properties of their respective owners.
Notice: This document is for informational purposes only, and does not set forth any warranty, expressed or implied, concerning any equipment or service offered
or to be offered by Hitachi Data Systems. This document describes some capabilities that are conditioned on a maintenance contract with Hitachi Data Systems
being in effect and that may be configuration dependent, and features that may not be currently available. Contact your local Hitachi Data Systems sales office for
information on feature and product availability.
Hitachi Data Systems Corporation 2010. All Rights Reserved.
AS-049-00 June 2010