You are on page 1of 51

Symon Perriman Program Manager Microsoft Corporation WSV316

Multi-Site Clustering
Benefits Deployment Replication Networking Faster Failover Quorum Best Practices

Benefits of a Multi-Site Cluster


Protects Against Loss of an Entire Datacenter
Power outage, fires, hurricanes, floods, earthquakes, terrorism

Automates Failover
Reduced downtime Lower complexity of disaster recovery plan

Reduces Administrative Overhead


Automatically synchronize application and cluster changes Easier to keep consistent than unclustered servers

What is the primary reason why disaster recovery solutions fail?

Dependence on People

Multi-Site Clustering Checklist


http://technet.microsoft.com/en-us/library/dd197546.aspx Organized multi-site cluster deployment guide

Multi-Site Clustering
Benefits Deployment Replication Networking Faster Failover Quorum Best Practices

Multi-Site Clustering Basics


2+ physically separate sites 1+ node at each site Storage at each site with data replication Application moves during a failover
Site A Site B

SAN

SAN

Redundancy Everywhere
2 or more computers (nodes) 2 NICs
3rd NIC for iSCSI

HBA
Fibre Channel (FC) Serial Attached-SCSI (SAS)

Multipath IO (MPIO) Redundant Storage Interconnects Replicated Storage OS, Service or Application

HA Roles

Mix and Match Hardware


You Can Use Any Hardware Configuration if
Each component has a Windows Server 2008 / R2 logo
Servers, Storage, HBAs, MPIO, etc

It passes Validate

It s That Simple!
Connect your Windows Server 2008 / R2 logo d hardware Pass every test in Validate
It is now supported!

If you make a change, just run Validate again

Details: http://go.microsoft.com/fwlink/?LinkID=119949

FCCP
Failover Cluster Configuration Program Windows Server 2008 / R2 Buy validated solutions
Validated by Microsoft Failover Cluster Configuration Program

Not required for Microsoft support, must be logo d More information: http://www.microsoft.com/windowsserver2008/en/us/clusterin g-program.aspx

Introduction to Multi-Site Clustering

Cluster Validation and Replication


Multi-Site clusters are not required to pass the Storage tests to be supported Validation guide and policy:
http://go.microsoft.com/ fwlink/?LinkID=119949

Multi-Site Clustering
Benefits Deployment Replication Networking Faster Failover Quorum Best Practices

Why is Replication Needed?


Loss of a site won t cause complete data loss Data must exist on other site after a failover Different storage needs than local clusters
Multiple storage arrays, independent on each site

Nodes usually access local site s storage first


Site A Site B

Changes are made on Site A and replicated to Site B

Replica

Replication Solutions
Replication Levels
Hardware (block level) storage-based replication Software (file system level) host-based replication Application-based replication
Exchange Server 2007 CCR

Replication Types
Synchronous Asynchronous

A data replication mechanism between sites is needed

Synchronous Replication
Host receives write complete response from the storage after the data is successfully written on both storage devices

Replication Write Request Secondary Storage Primary Storage Acknowledgement

Write Complete

Asynchronous Replication
Host receives write complete response from the storage after the data is successfully written to the primary storage device

Replication
Write Request

Write Complete Primary Storage

Secondary Storage

Synchronous vs. Asynchronous


Synchronous No data loss Requires high bandwidth/low latency connection Stretches over shorter distances Write latencies impact application performance Asynchronous Potential data loss on hard failures Enough bandwidth to keep up with data replication Stretches over longer distances No significant impact on application performance

What About DFS-Replication?


Using DFS-R to replicate the cluster disk s data in a multi-site Failover Cluster is not supported
DFS-R performs replication on file close Some file types stay open for a very long time
VHDs for Virtual Machines Databases for SQL Server

Data could be lost during a failover if it had not yet replicated

Resource Dependencies
Establishes start order timing

Resource Group
Workload Resource (example File Server)

Group determines smallest unit of failover

Network Name Resource

IP Address Resources*

Disk Resource

Custom Resource
(manages replication)
depends on

Multi-Site Clustering
Benefits Deployment Replication Networking Faster Failover Quorum Best Practices

Network Considerations
Cluster nodes can reside in different subnets (2008/R2) No need to connect nodes with VLANs
Public Network Site A 10.10.10.1 20.20.20.1 Site B

30.30.30.1

40.40.40.1

Separate Network

Stretching the Network


Longer distance means greater network latency Too many missed health checks can cause false failover Fully configurable in 2008/R2
Failover Clustering has NO DISTANCE & NO SUBNET LIMITATIONS Check if your vendor s hardware / replication has limitations

SameSubnetDelay (default = 1 second)


Frequency heartbeats are sent

SameSubnetThreshold (default = 5 heartbeats)


Missed heartbeats before an interface is considered down

CrossSubnetDelay (default = 1 second)


Frequency heartbeats are sent to nodes on dissimilar subnets

CrossSubnetThreshold (default = 5 heartbeats)


Missed heartbeats before an interface is considered down to nodes on dissimilar subnets

Command Line: Cluster.exe /prop PowerShell (R2): Get-Cluster | fl *

Security Over the WAN


Improved Security Prevent Clients from Connecting to Networks Encrypt Intra-cluster Traffic
0 = clear text 1 = signed (default) 2 = encrypted

Enhanced Dependencies OR
Network Name resource stays up if either IP Address Resource A OR IP Address Resource B is up
Network Name Resource
OR

IP Address Resource A

IP Address Resource B

Resource Dependencies
Workload Resource (example File Server)

Network Name Resource


OR

Disk Resource

Custom App
(replication)

IP Address Resources A

IP Address Resources B

Comes online on site A

Comes online on site B

Multi-Site Clustering
Benefits Deployment Replication Networking Faster Failover Quorum Best Practices

DNS Updates
Nodes in dissimilar subnets Failover changes resource s IP Address Clients need that new IP Address from DNS to reconnect
DNS Server 1 DNS Replication DNS Server 2

Record Created

Record Updated Record Obtained Record Updated

10.10.10.111

20.20.20.222

20.20.20.222 FS = 10.10.10.111
Site A Site B

Network Name Properties


RegisterAllProvidersIP (default = 0 for FALSE)
Determines if all IP Addresses for a Network Name will be registered by DNS TRUE (1): IP Addresses can be online or offline and will still be registered Ensure application is set to try all IP Addresses, so clients can come online quicker

HostRecordTTL (default = 1200 seconds)


Controls time the DNS record lives on client for a cluster network name Shorter TTL: DNS records for clients updated sooner Exchange Server 2007 recommends a value of five minutes (300 seconds)

Local Failover First


Local failover first
No change in IP Address

Cross-site failover for disaster recovery


DNS Server 1 DNS Server 2

10.10.10.111

20.20.20.222

10.10.10.111 FS = 20.20.20.222
Site A Site B

Failover Order
Preferred Owners
Local failover first

Possible Owners Always Enforced


Resource will not start on non-possible owner

AntiAffinityClassNames
Groups with same AACN try to avoid moving to same node http://msdn.microsoft.com/en-us/ library/aa369651(VS.85).aspx

Virtual LAN (VLAN)


Deploying a VLAN minimizes client reconnection times
Can be harder to configure

Required for SQL & live migration


DNS Server 1 DNS Server 2

10.10.10.111

10.10.10.111 VLAN

FS = 10.10.10.111
Site A Site B

Multi-Site Clustering Groups and Settings

Multi-Site Clustering
Benefits Deployment Replication Networking Faster Failover Quorum Best Practices

Quorum Overview
Majority is greater than 50% Possible Voters:
Nodes (1 each), Disk Witness (1 max), File Share Witness (1 max)

4 Quorum Types
Disk only (not recommended) Node and Disk majority
Vote Vote Vote Vote

Node majority Node and File Share majority


Vote

Node and Disk Majority


Nodes get 1 vote each and Disk gets vote
Loss of disk or node OK if majority is maintained

Do not use in multi-site clusters unless directed by vendor


Vote Vote Vote

?
Replicated Storage from vendor

Node Majority
Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up

5 Node Cluster: Majority = 3

Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership

Site A

Site B

SAN

Cross site network connectivity broken!

SAN

Majority in Primary Site

Node Majority
We are down!

5 Node Cluster: Majority = 3

Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership

Site A

Site B

SAN

Disaster at Site 1

SAN

Majority in Primary Site

Forcing Quorum
Always understand why quorum was lost Used to bring cluster online without quorum Cluster starts in a special forced state Once majority achieved, no more forced state Command line:
net start clussvc /forcequorum (or /fq)

PowerShell (R2):
Start-ClusterNode FixQuorum (or fq)

Multi-Site With File Share Witness


Site C
File Share Witness

Complete resiliency and automatic recovery from the loss of any 1 site

\\Foo\Cluster1

Site A

WAN

Site B

SAN
Replicated Storage from vendor

SAN

Multi-Site With File Share Witness


Site C
File Share Witness

Complete resiliency and automatic recovery from the loss of any 1 site
WAN

\\Foo\Cluster1

Site A

Site B

SAN
Replicated Storage from vendor

SAN

Multi-Site With File Share Witness


Site C
File Share Witness

Complete resiliency and automatic recovery from the loss of the File Share Witness
Site A WAN

\\Foo\Cluster1

Site B

SAN
Replicated Storage from vendor

SAN

FSW Considerations
Simple Windows File Server
Needs to be in the same forest Running Windows Server 2003, 2008 or 2008 R2

Recommended to be at 3rd separate site Single file server can serve as a witness for multiple clusters
Each cluster requires its own share Can be clustered in a second cluster

FSW cannot be on a node in the same cluster It is an additional voter for free (almost)

Quorum on a Multi-Site Cluster

Quorum Model Summary


No Majority: Disk Only
Note Recommended Only use as directed by vendor

Node and Disk Majority


Only use as directed by vendor

Node Majority
Odd number of nodes

Node and File Share Majority


Best availability solution Recommended for
Exchange Server 2007 CCR

Multi-Site Clustering
Benefits Deployment Replication Networking Faster Failover Quorum Best Practices

Cluster your Branch Offices


Cluster several standalone File Servers from branch offices Keep network traffic low High-Availability for the files Redundancy for the data

Site A

Site B

Clients primarily accessing applications in Site A

Clients primarily accessing applications in Site B

Multi-Site Across the Enterprise


More distributed cluster nodes & clusters gives higher availability Complete resiliency and automatic failover Remember your quorum model
Loss of any single site should not bring down the cluster

File Share Witness


1 File Server hosts all File Share Witnesses for multiple clusters
Make it highly-available

Separate site Not a node in that same cluster

Cluster 2, Branch 1

Cluster 2, Branch 2

Cluster 1, Site 1

Cluster 1, Site 2

Cluster 3, Many FSWs

Cluster 2, Main Office

Multi-Site Clustering Review


Site C File Share Witness

4, 6, 8 nodes + FSW = odd # votes Local failover first (preferred owner) Site failover second (possible owner) AntiAffinityClassNames

Faster DNS Updates Register all IPs for a Network Name Shorten client s DNS record TTL Ensure application tries all IPs
Site B

Site A

WAN Encrypt WAN traffic for security Adjust health checks for latency Configure OR dependencies

SAN
Replicated Storage from vendor

SAN

Session Summary
Multi-Site Failover Clustering has many benefits Variety of hardware options & configurations Redundancy is needed everywhere Understand your replication needs Compare VLANs with multiple subnets Plan your quorum model & nodes before deployment Follow the checklist and best practices
http://technet.microsoft.com/en-us/library/dd197546.aspx

Are You Up For a Challenge?


Become a Cluster MVP!

You might also like