You are on page 1of 92

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Auckland New Zealand | August 14 - 17 2013

AHY24

#include <std_disclaimer.h> These notes have been prepared by an Australian, so beware of unusual spelling and pronunciation.

PowerHA SystemMirror for AIX: New Features and Best Practice Antony Red Steel - ATS
Advanced Technical Skills

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Contents

Introduction to PowerHA Standard and Enterprise PowerHA maintenance and features PowerHA directions

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013 Standard Edition
Centralised Management C-SPOC Cluster resource management Shared Storage management Cluster verification framework Integrated disk heartbeat SMIT management interfaces AIX event/error management Integrated heartbeat PowerHA DLPAR HA management Smart Assists Multi Site HA Management PowerHA GLVM async mode IBM Metro Mirror support IBM Global Mirror support DS8700 EMC SRDF sync/async Hitachi Truecopy Stretched or linked clusters

Agenda

Enterprise Edition pending


2013 IBM Corporation

PowerHA Standard and Enterprise Editions Cluster Aware AIX General changes What's new in PowerHA 7.1.1 and 7.1.2 Walk through PowerHA configuration and demo of application

PowerHA SystemMirror

DS8000 Hyper Swap

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Introduction to PowerHA SystemMirror Standard

Introduction to PowerHA What is high availability Planning designing high availability Features of PowerHA to keep your applications available
PowerHA SystemMirror 7.1.2 AIX 7.1 TL2 SP1 AIX 6.1 TL8 SP1 PowerHA SystemMirror 7.1.1 AIX 7.1 TL1 SP2 AIX 6.1 TL7 SP2 PowerHA SystemMirror 7.1 AIX 7.1 with RSCT 3.1.0.1 AIX 6.1 TL6 with RSCT 3.1.0.1 PowerHA SystemMirror 6.1 AIX 7.1 with RSCT 3.1.0.0 AIX 6.1 TL2 with RSCT 2.5.4.0 AIX 5.3 TL9 with RSCT 2.4.12.0 Standard Edition 5765 H39 Standard Edition 5765 H23 Standard Edition 5765 H23 Standard Edition 5765 H23 Enterprise Edition 5765 H40 Enterprise Edition N/A Enterprise Edition N/A Enterprise Edition 5765 H24 Fixpack 1 GA EOS Fixpack 1 GA EOS Fixpack 4 GA EOS Fixpack 6 GA EOS Feb 2013 Nov 2012 N/A Feb 2012 Dec 2011 N/A Sept 2011 Sept 2010 Sept 2014 Aug 2011 Oct 2009 Sept 2014

HACMP 5.5 went EOS 30/4/2012 GeoRM went EOS 30/9/2009 PowerHA SystemMirror
2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Introduction to High Availability

PowerHA SystemMirror for AIX Standard Edition Cluster management for the data centre Monitors, detects and reacts to events Establishes a heartbeat between the systems Enables automatic switch-over

Causes of downtime
Application errors Operating system errors Hardware failure Operator error

IBM shared storage clustering Can enable near-continuous application service Minimize impact of planned & unplanned outages Standish Group Research 2008-2010 Ease of use for HA operations Smart Assists application agents Out of the box deployment for SAP and other popular applications Mature Product 22 Major releases (averaging one a year) Over 12,000 customers worldwide PowerHA SystemMirror for AIX Enterprise Edition Cluster management for the Enterprise Multi-site cluster management Includes the Standard Edition function

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Introduction to High Availability

High availability is: The reduction to close to zero for downtime (not fault tolerance) Solution may address planned or unplanned down time Solution need not be fault tolerant but should be fault resistant Solution should eliminate single points of failure (SPOF) PowerHA is not the answer if Cannot afford any downtime life critical systems - Need a fault tolerant solution Environment is not secure Many users with root access Then environment is not stable Change management is not respected You do not have trained administrators Procedures are not well documented Environment is prone to user fiddle factor Applications cannot be controlled Scripts cannot be used to start/stop and recover applications

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Eliminate single points of failure by:

Node Using multiple nodes Power source Using multiple circuits or un-interruptible power supplies Network adapter Using redundant network adapters and bonding (etherchannel etc) Network Using multiple networks to connect nodes / clients TCP/IP subsystem Using non-IP networks to connect nodes Disk adapter Using redundant disk adapter or multipath hardware Disk Using multiple disks with mirroring or raid Application Adding node for takeover; configuring application monitor VIO server Implementing dual VIO servers Site

Adding an additional site

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Setting realistic expectations


What is considered an outage in your environment? Unexpected downtime Maintenance Tasks

Availability 90% (1-nine) 99% (2-nines) 99.9% (3-nines) 99.99% (4-nines) 99.999% (5-nines) 99.9999% (6-nines)

Downtime 36.5 days/year 3.65 days/year 8.76 hours/year 52 minutes/year 5 minutes/year 31 seconds/year

What are the desired: RTO Recovery Time Objective RPO Recovery Point Objective

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Building for availability


Infrastructure planning Power Redundancy; I/O Drawers; SCSI Backplane; SAN HBAs / Multipathing; Virtualized or Dedicated Deployments; Backup Strategies; Application Fallover Protection LPM Live move of OS/Application between frames; Workload management; Energy management; Hardware management Partition Suspend/Resume Resume where stopped; suspend low priority workloads; Firmware updates without stopping / restarting the application Charm Available on high end models (>= 770) Perform CHARM during low-use periods LPM critical partitions to other servers if possible Depending on the repair, IBM may recommend quiescing critical applications on running partitions Have current backups before beginning, and make sure all configuration redundancy requirements have been met Use PowerVM Suspend / Resume to reduce CPU and active memory PowerHA SystemMirror
2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Introduction to High Availability

Planned Maintenance Upgrades Testing Development

Unplanned User Error Application Failure Component Failure Operating System Failure Environmental Disasters

Becoming a more important area PowerHA as an administration tool

LPM is an alternative for

But not for


(or software upgrades etc)

PowerHA will help to mask or eliminate


PowerHA SystemMirror
2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

You cannot let sleeping clusters lie

Why touch the system ?? has been working now for 2 years... Hardware may need to be upgraded (6 monthly f/w update 1/year may not be concurrent). Replacement hardware may be at unrecognisable firmware levels.. Application may need to be upgraded, which may require new software levels or fixes OS and/or application out of support Business expands PowerHA designed to manage/support upgrade process Rolling upgrades Snapshot conversions

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

High Availability options

One site HA PowerHA SystemMirror Dual servers, shared storage Site only single point of failure Disaster Recovery Replication GLVM Storage / Database PowerHA SystemMirror Enterprise Ed. PowerHA managing application and storage replication GLVM SVC; Storewise; MetroMirror; GlobalMirror EMC SRDF / Hitachi TrueCopy/HUR

>> Planning and preparation

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Cluster Aware AIX

IBM Cluster products (RSCT, PowerHA, VIOS...) use CAA CAA is a toolset doesnt form a cluster (now concept of quorum or fencing nodes but provides tools to manage these) All interfaces are monitored lscluster -i All nodes monitored lscluster -m Changes from 2010 No consistent view of devices SolidDB no longer used No zones / sub-clusters Secure communication between nodes Deadman switch (DMS) A node is detected if isolated can generate an AHAFS event or crash the node clctrl -tune -o deadman_mode (clctrl -tune -L to list)

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Cluster topology
nA_n1_boot1 nA_n1_boot2 nB_n1_boot1 network1

network2 nA_n2_boot1 nA_n2_boot2

NodeA

NodeB

NodeC

Repository disk

hdiskn

hdisko

hdiskp

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Cluster topology
nA_al nA_n1_boot1 rg1_n2_svc1 nA_n1_boot2 rg1_n1_svc1 network2 nA_n2_boot1 nA_n2_boot2 nB_n1_boot1 network1 rg2_n2_svc1

nC_al rg2_n1_svc1

RG1

RG2
NodeB NodeC

NodeA app_mon1

NodeA

rmt0

app_mon2

Repository disk

vg1
hdisko hdiskp

vg2

hdiskn

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Cluster topology
nA_al nA_n1_boot1 rg1_n2_svc1 nA_n1_boot2 rg1_n1_svc1 network2 nA_n2_boot1 nA_n2_boot2 nB_n1_boot1

nC_al network1 rg2_n2_svc1 rg2_n1_svc1

Policies

RG1

Policies
NodeA app_mon1 NodeA NodeB

RG2
NodeC

rmt0

app_mon2

Repository disk

vg1
hdisko hdiskp

vg2

hdiskn

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Cluster behaviour

Resource Group Policies Startup Online on home node only Online on first available Online on all available Start up distribution Failover Failover to next node in the list Failover using Dynamic node priority (CPU, Paging space, Disk IO, Adaptive (user defined)) Bring offline Fallback Fallback to higher priority node Never fallback

Resource group dependencies IP distribution preferences Inter site management policies Online on Both Sites Online on Either Site Prefer Primary Site Ignore

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Resource Group dependencies

Online on same node dependency Resource groups come online on the same node Parent child dependency Child will come online after the parent is stable, will go offline if the parent goes offline. Can have up to 3 levels Online on different node dependency High, intermediate and low High will force intermediate and low to move, intermediate will force low to move Same priority cannot come online on same node Same priority will not cause a movement

On same node dependency DB: n1,n3,n2 High App: n2,n3,n1 Intermediate Test n3,n2,n1 Low Parent / Child DB parent; App - Child

n1
DB

n2
App

n3
Test

n2
App

n3
Test

n2
App

n3
DB

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

IP distribution preferences

Collocation All Service labels will be on the same adapter Collocation with persistent all service labels will be on the same adapter as the persistent IP. Collocation with Source all service labels will be on the same adapter and the customer can choose the source IP of the outgoing packets Anti-collocation all resources of this type will be allocated on the first adapter which is not already serving (or serving the least number of) addresses Anti-collocation with 1st Source Same as above with the service IP being the source address of all outgoing packets. Anti-collocation with Persistent Labels service labels will almost never be on the same adapter as the persistent IP, that is, service will occupy a different interface as long as one is available, but if no other is available then they will occupy the same interface. Anti-collocation with Persistent Labels and Source Same as above with all outgoing packets having the service IP as the source address.

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Are you using PowerHA features

Are you aware of / using Fast failure detection File collections Application monitoring Startup, long running or both Process or custom CSPOC Cluster Test tool

missing heartbeat

check

Remember that in the new versions of PowerHA, the developers used feedback from the field/PMRs to fix common problems

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Cluster Aware AIX (cont)

Debugging snap caa Logging via syslog lscluster -s for stats lsattr -El cluster0 Obtains node and repository disk UUID /usr/lib/cluster/clras lsrepos Lists valid cluster repository disks /usr/lib/cluster/clras sfwinfo -d hdisk2 Displays storage framework UUID for disks /usr/lib/cluster/clras dumprepos Displays contents of cluster repository disk

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Cluster Aware AIX

Kernel based A set of services/tools embedded in AIX to help manage a cluster of AIX nodes and/or help run cluster software on AIX IBM cluster products (including RSCT, PowerHA, and the VIOS) will use and/or call CAA services/tools. CAA services can assist in the management and monitoring of an arbitrary set of nodes and/or running a third-party cluster. CAA does not form a cluster by itself. It is a tool set. There is no notion of quorum. (If 20 nodes of a 21 node cluster are down, CAA still runs on the remaining node). CAA does not eject nodes from a cluster. CAA provides tools to fence a node but never fences a node and will continue to run on a fenced node Requires a repository disk (protected at the storage level) By default all interfaces monitored snap caa to collect PD data

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Cluster Aware AIX (cont)

All nodes are monitored. Cluster Aware AIX tells you what nodes are in the cluster plus information on those nodes ,including state. A special gossip protocol is used over the multicast address to determine node information and implement scalable reliable multicast. No traditional heartbeat mechanism is employed. Gossip packets travel over all interfaces, including storage. CAA monitors both communication interface states and points-of-contact between nodes on a node-by-node basis A point-of-contact indicates that a node has received a packet from the other node over the interface. A point-of-contact up state indicates that the packet flow continues between the nodes. A point-of-contact down state indicates that the packet flow does not continue between the nodes, even though the interface may be in an up state. Note: The ability to monitor this particular condition is very important. An interface in the up state and a point-of-contact in a down state can occur because of hardware or other network issues between these particular nodes.

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Cluster Aware AIX (cont)

Cluster disks. CAA has information on all disks in the cluster including their state. (3 rd party disks do not participate in the monitoring). SolidDB and cluster disk naming dropped in 2010 In 2011 added: Deadman switch for isolated nodes tuneable and response options. 3rd party disk support added Synchronous changes allowed across the cluster Improved logging and RAS tools In 2012 added: 2 sites Linked or stretched clusters Stretched Cluster (Single CAA cluster; Single Repository Disk; Require multicast across 2 sites; Cluster communication:- Networks, SAN, or Disk) Linked Cluster (Linked CAA cluster; 2 Separate Repository Disks; One local repository on each site; Synchronized between sites; Cluster communication:- Networks)

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 6.1 RSCT Resource Monitoring and Control

Resource Manager

Group Services

Topology Services

AIX
PowerHA SystemMirror
2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 7.1 RSCT Resource Monitoring and Control

Resource Manager

Group Services

AIX
PowerHA SystemMirror

CAA
2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Cluster aware AIX Topology management


Host 2 Host 2

Host 1

Host 3

Host 3 Host 1
MULTICAST

Host 4

Host 4

PowerHA 6.1 Heartbeat Rings: detailed protocol Leader, Successor, Mayor etc Difficult to add/delete nodes Requires IP aliases management in the subnet

PowerHA 7.1 Multicast based protocol Discover and use as many adapters as possible Use network and SAN as needed Adapt to the environment: delay, subnet etc Kernel based cluster message handling

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Default Multi Channel Health Management


Minimal Setup Multiple channels of communication
Network SAN Central Repository

Host 1
Reliable Messaging

Host 2
Reliable Messaging

Heartbeats

Heartbeats

First line of Defence

Network SAN
Heartbeats

Second line of Defence

Third line of Defence

Cluster Repository

3 lines of (redundant) independant communications


PowerHA SystemMirror
2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Configure SAN heart beating in virtual environment

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

General Changes

Disk Handling Changes ECMVG required Existing volume groups automatically converted No user action required, no override allowed Done by call to cl_makecm out of node_up C-SPOC creates all volume groups as ECM Either Fast Disk Takeover or Concurrent Access Active/Passive mode used for non-concurrent resource groups No SCSI-2 disk reserves set or broken Most disk differences now irrelevant Disk reserve handling code cl_disk_available retained for migration Fast path through code if ECM and no reserves

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 7.1.1

Key dates: Announce: October 12 General Availability: December 16 Lifecycle information: http://www-01.ibm.com/software/support/lifecycle/index_h.html Offerings: Standard Edition has base function plus Smart Assists New features added to Enterprise Edition 6.1 (only) no 7.1 EE RSCT and AIX requisites AIX 6.1 TL 7 with bos.cluster.rte 6.1.7.2 (SP2) APAR IV09929 OR AIX 7.1 TL 1 with bos.cluster.rte 7.1.1.2 (SP2) APAR IV09868 RSCT 3.1.1.0 works with either versions of AIX

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 7.1.1 (cont)

New features Standard edition Security features Encrypted Filesystem, Role Based Access Control, LDAP Smart Assists Expanded middleware support including SAP MaxDB HotStandby and Websphere MQ Series IBM Systems Director plug-in Extends features available through Director Cluster Aware AIX New features Miscellaneous updates CSPOC enhancements, migration, synchronous application startup

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 7.1.1 (cont)

New features Enterprise edition (6.1) XIV replication support Global Mirror support enhancements Enterprise Edition 6.1 requires SystemMirror 6.1 with Service Pack 7 SP7 and new install images available from FixCentral http://www.ibm.com/support/fixcentral/aix/selectFixes follow the links to select IV11782 (packaging APAR) New support included in existing genxd fileset (updates only)

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 7.1.1 Smart Assist support


SystemMirror 7.1.0 SystemMirror 7.1.1 DB2 Enterprise Edition 9.5 9.7 WAS 6.1 6.1 WAS N/D 6.1 6.1 HTTP Server 6.1 6.1 TSM 6.1 6.2 TDS 5.2 6.3 Filenet 4.5.1 4.5.1 Lotus Domino Server 8.5.1 Oracle Databse 11g r1 Oracle Application Server 10g r2 SAP SAP ERP netweaver 2004s SAP SCM 7.0 with Netweaver 7.0 EHP1 for FVT SAP SCM 7.0 with Netweaver 7.0 EHP1 for SVT - MaxDB V7.6 - Oracle 10g r2 10g r2 - DB2 9.7 MQ Series 7.0.1.5 AIX print server AIX 6.1 AIX DHCP AIX 6.1 AIX DNS AIX 6.1

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 7.1.1 Federated Security

All user, RBAC, encrypted FS credentials in a central store Can use existing LDAP or Windows server Role based access (RBAC) Roles: ha_admin: ha_op: ha_mon: ha_view:

Administrator Operator Monitor Viewer

Support for Encrypted filesystems Shared filesystem or LDAP for keystore

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 7.1.2

Version 7.1.2 offers both a Standard and an Enterprise Edition. The Enterprise Edition provides for Disaster Recovery solutions with both host based mirroring and storage based mirroring IPv6 support is enabled with this version for v7 product Simpler to deploy and easier to manage multi-site configurations with IBM Systems Director, intuitive interfaces, multi-site install wizard Stretched Cluster; Cluster wide AIX commands, kernel based event management single repository multicast communications Linked Clustering; cluster wide AIX commands, kernel based event management, linked clusters with unicast communications & dual repositories HyperSwap capability is introduced. HyperSwap with DS8800 storage subsystems provides for continuous availability against storage failures. Cluster Split/Merge technology for managing split-site policy scenarios

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 7.1.2

Cross Site Mirroring using LVM mirror pools Enhancements to the Director plugin to facilitate the use of these new features Software Levels Required: OS AIX 6.1 TL8 SP1 OS AIX 7.1 TL2 SP1 PowerHA SystemMirror 7.1.2 SP1 Additonal software requirements for Enterprise Editionand HyperSwap PowerHA SystemMirror 7.1.2 APAR IV27586

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 7.1.2

High Availability and disaster recovery across multiple sites PowerHA SystemMirror for AIX Enterprise Edition Adds long distance failover for Disaster Recovery Low cost host based mirroring support Extensive support for storage array replication Short distance (Campus to 80-100km) deployment: Synchronous Long distance ( >100km) deployment: Asynchronous
Replication Technology Sync Async
New York
IBM DS8K Series Storage - PPRC SVC, Storevize, XIV EMC SRDF * Hitachi Universal Replicator,Truecopy * HP Continuous Access * Network
Host Mirroring

Host Replication

Geo LVM

London

Storage Array Replication

Site 1

Site 2

Fiber
Storage Mirroring

Enterprise Edition
2013 IBM Corporation

PowerHA SystemMirror

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 7.1.2
Site 1 Site 2

Multi Sites Inter site communication Repository disk Cluster Communication

Stretched Cluster Multicast Shared Networks SAN Disk

Linked Cluster Unicast Separate Networks SAN in future

Repository Disk

Cross site LVM mirroring HyperSwap Multi site Conncurrent RG with HyperSwap
Standard Enterprise

Fig 1: Multi Sites with Stretched Cluster

Site 1

Site 2

Multi Site Definition Site Service IP Site Policies Stretched Cluster

Links
Repository Disk 1 Repository Disk 2

Linked Clusters HADR with Storage Replication Management HyperSwap


2013 IBM Corporation

Fig 2: Multi Sites with Linked Clusters

PowerHA SystemMirror

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Tie breaker support

PowerHA 7.1.2 Tie Breaker

Support Separate Site Split and Merge policies Split/Merge: Tie Breaker policy FC/iSCSI Tie Breaker SCSI 3 reservation disk Losing side is quiesced

Site 1 Cluster

Site 2

SCSI or iSCSI

Shared Disk Tie Breaker

More suited for Linked Clusters Policy Setting Tie Breaker Majority Rule Split Merge Comments Tie break Holder side wins >N/2 side wins

Site 3

In case of N/2, side that includes node with the smallest node id Manual Manual steps needed for recovery to continue

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

HyperSwap Technology
Continuous Availability against Storage failures Substitutes storage secondary to take the place of failed primary device
Non-disruptive - applications keep running Key value add to HA/DR deployments
Customer Benefits Unplanned HyperSwap:
HA/D R

Application

Cluster Hyperswap

Continuous Availability against storage failures Storage Maintenance without downtime Storage migration without downtime
Primary DS8K Site 1

Sync Mirror

Planned HyperSwap:

Secondary DS8K Site 2


Legend: Active Path Passive Path

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

HyperSwap Support by AIX-PowerHA


HyperSwap device configuration transparent to application
Application can continue to use the device as before

Application/LVM/Middleware

Application/LVM/Middleware

/dev/hdiskX HyperSwap Pair /dev/hdiskX /dev/hdiskY


Configure HyperSwap

/dev/hdiskX

/dev/hdiskY

SYNC

SYNC

Primary DS8K Secondary DS8K

Primary DS8K Secondary DS8K

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

HyperSwap Multi Site Deployments: Oracle RAC Example


PowerHA Cluster

Compute Node outages:


Active-Active workload provides continuous availability

Site 1

Oracle RAC
(Active) N1-1 (Active) N1-2 N2-1

Site 2 (Passive) (Passive) N2-2

Storage outages:
HyperSwap provides continuous availability

Active-Passive Sites
Active-Active workload within a site Active-Passive across sites Continuous availability for site storage outages

S1

SYNC < 100 KM

S2

Fig 1: Active-Passive HyperSwap

Active-Active Sites (Future)


Active-Active workload across sites Continuous availability of site compute infrastructure and storage outages Oracle RAC long distance deployment
N1-1

Site 1 (Active) (Active) N1-2

Oracle RAC
(Active) N2-1

Site 2 (Active) N2-2

S1

SYNC < 100 KM

S2

Fig 2: Active-Active HyperSwap

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Pre-requisites

Additional AIX Fileset Requirements: bos.cluster.rte CAA Fileset bos.cluster.solid Solid DB (not required in 7.1.1) bos.ahafs Autonomic Health Advisor Filesystem ECM VGs are Required in 7.1 not pre-reqd but required bos.clvm.enh

Configuration Files /etc/cluster/rhosts on the node where cluster will be created /etc/hosts the nostname is the first alias for that IP address Topology services daemon is no longer used CAA uses Scalable Reliable Multicast (SRM) for monitoring all network and storage interfaces using a single cluster-wide multicast IP address Can automatically define Multicast Address for you Range 224.0.0.0 239.255.255.255

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Implementation differences

New LAN Switch Settings IP Multicasting Enabled Address Automatically selected during cluster configuration Set on Network Switches IGMP_snooping Enabled Will reduce the amount of Multicast Traffic on LAN switches TME must be enabled on HBAs to leverage SAN heartbeating List of supported Adapters in the slide notes Additional steps for virtual HBAs (later slide) Repository Disk requirement CAA Requirement (documented size has changed) This value can now be altered to 512MB or higher (max is 460GB) Larger disks will only result in wasted space VSCSI volumes are supported

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Implementation differences (cont).

All network adapters will be discovered and used To exclude adaters, use: /etc/cluster/ifrestrict: en4 en5 IPAT via Aliasing Only No IPAT via Replacement No Heartbeating over Aliases Network types supported mping to test Ether broadcast Infiniband (soon) Notice that FDDI, TMSSA, TMSCSI and others are gone Removed Serial Network Types RS232 Serial network Disk heartbeat networks No Multi-node disk heartbeat

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Zoning requirements for HBA heartbeating

W W P N

W W P N

W W P N

W W P N

W W P N

W W P N

optional heartbeat zone

W W P N

W W P N

shared storage zone

Storage subsystem
PowerHA SystemMirror
2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Zoning requirements for HBA heartbeating

W W P N

W W P N

W W P N

W W P N

W W P N

W W P N

optional heartbeat zone

W W P N

W W P N

individual shared storage zone

Storage subsystem
PowerHA SystemMirror
2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Tools

Cluster test tool Application availability analysis tool File collections Automatic cluster verification Automatic Error Notification (can also be customized) Auto-corrective/Self healing clusters Custom Pager notification methods (including SMS) OEM Volume and Filesystem Support (Veritas) and Custom disk methods Non-disruptive startup (create cluster around existing environment) Cluster snapshots to save/restore clusters (XML format allows easy editing)

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Cluster test tool

Automated test plan Important part of install process Still important as regular procedure once in production Many cluster administrators believe testing too time consuming and costly Lack of testing leads to failures Conducts a series of tests and then analyzes them Will start all nodes, then preform node down with and without takeover on random nodes; network and application down. There are some limitations. Custom test procedure - user defined plan Designed to test the configuration, not the operation of the cluster manager

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Tests

NODE_UP: start one or more nodes NODE_DOWN_FORCED: stop a node forced NODE_DOWN_GRACEFUL: stop one or more nodes NODE_DOWN_TAKEOVER: stop a node with takeover CLSTRMGR_KILL: catastrophic software failure NETWORK_DOWN_LOCAL: stop a network on a node NETWORK_UP_LOCAL: restart a network on a node SERVER_DOWN: stop an application server WAIT: pause testing RG_ONLINE, RG_OFFLINE, RG_MOVE, RG_MOVE_SITE: Resource Group online,offline, move and site move JOIN_LABEL, FAIL_LABEL: Interface fail and join VG_DOWN: loss of VG

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Tests (cont)

NETWORK_UP/DOWN_LOCAL: Local network up and down SITE_UP, SITE_DOWN_GRACEFUL, SITE_DOWN_TAKEOVER: site up and down graceful or takeover SITE_ISOLATION, SITE_MERGE: Site isolation and re-integration Non-IP networks now tested
1/10/2005_07:20:24: 11/10/2005_07:20:24:|ValidateNODE_UP 11/10/2005_07:20:24: 11/10/2005_07:20:24:Eventnode:ALL 11/10/2005_07:20:24:Configurednodes:ha1ha2 11/10/2005_07:20:24:Event2:NODE_DOWN_GRACEFUL: NODE_DOWN_GRACEFUL,node1,Stopclusterservicesgracefullyonanode 11/10/2005_07:20:24: 11/10/2005_07:20:24:|ValidateNODE_DOWN_GRACEFUL 11/10/2005_07:20:24: 11/10/2005_07:20:24:Eventnode:ha1 11/10/2005_07:20:24:Configurednodes:ha1ha2 11/10/2005_07:20:24:Event3:NODE_UP:NODE_UP,node1, Restartclusterservicesonthenodethatwasstopped 11/10/2005_07:20:24: 11/10/2005_07:20:24:|ValidateNODE_UP 11/10/2005_07:20:24: 11/10/2005_07:20:24:Eventnode:ha1 11/10/2005_07:20:24:Configurednodes:ha1ha2 11/10/2005_07:20:24:Event4:NODE_DOWN_TAKEOVER: NODE_DOWN_TAKEOVER,node2,Stopclusterserviceswithtakeoveronanode

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Application availability analysis tool


ApplicationAvailabilityAnalysis
Typeorselectvaluesinentryfields. PressEnterAFTERmakingalldesiredchanges.[EntryFields] *SelectanApplication[test_appl01]+ *BeginanalysisonYEAR(19702038)[2005]# *MONTH(0112)[01]# *DAY(131)[01]# *BeginanalysisatHOUR(0023)[00]# *MINUTES(0059)[00]# *SECONDS(0059)[00]# *EndanalysisonYEAR(19702038)[2005]# *MONTH(0112)[06]# *DAY(131)[30]# *EndanalysisatHOUR(0023)[23]# *MINUTES(0059)[59]# *SECONDS(0059)[59]#

Analysisbegins:Saturday,01Jan2005,00:00 Analysisends:Thursday,30June2005,23:59 Applicationanalyzed:test_appl01 Totaltime:180days,23hours,59minutes,59seconds Uptime: Amount:180days,22hours,58minutes,29seconds Percentage:99.97% Longestperiod:98days,16hours,48minutes,3seconds Downtime: Amount:0days,1hours,1minutes,30seconds

PowerHA SystemMirror

Good log for initial PD

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Upgrade considerations

Non-Disruptive Upgrade functionality is NOT available to get to 7.X Can use non-disruptive upgrade to load patches: ie. Source 7.1.0.1 to Target 7.1.0.4 Migration to 7.X releases is different than prior releases Migration is disruptive Requires the use of clmigcheck utility Requires some reconfiguration of cluster topology If running older versions of HA you have a decision to make: Migrate or Start at PowerHA version 7.1.0 or 7.1.1 Migrating from 7.1.0 to 7.1.1 is disruptive (7.1.1 requires newer AIX levels which provide CAA enhancements)

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Designing High Availability

Designing High Availability A spare should be available for every single hardware and software component that is required to keep application running No Single Point of Failure Whilst a generally accepted principle, not always adhered to Cut to reduce cost effects of the failure of a single component not always thought through eg single adapter networks, no serial/failed serial network Nodes Power feed Storage Networks Adapters Administrators (good documentation 'clear' design) Applications

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA usability changes

Mount Guard A new JFS2 facility to help prevent accidental double mounts LVM and CAA can help, but not ensure A second mount, without an intervening unmount will be rejected. Mount state is maintained on the disks Set by chfs option, can be changed by chfs and logredo Available in bos.filesystems 7.1.1 or 6.1.7 Available in HA 6.1 and 5.5 Private Networks Reserve a network for Oracle Oracle needs a network with no heartbeat etc. PowerHA < 6.1 supported, 7.1 didnt PowerHA 7.1.1 restores ability to make network as private.

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA usability changes (cont)

Application start in debug mode DARE Progress Indicators Whats going on, and when is it done Terminal is locked Back ported to HA 6.1 and 5.5 Mirror Pools NB: PowerHA 7.1 didnt support Xsite mirroring, but PowerHA 7.1.1 has concept of sites and uses Mirror Pools to handle cross site mirroring Renaming Physical Volumes optional since 7.1 Shared physical volumes can be given consistent names across the cluster Cannot be part of a VG when renamed Foreground Application Start Application server can now be started in foreground Simplifies design of scripts, but poor scripts can lead to config_too_long Startup in Debug mode warning exit code currently not checked. User can respond immediately to start failure

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA usability changes (cont)

Network changes New ways to specify the source IP address for outgoing network traffic The following are the new policies for Service IP Distribution Preference: Anti-Collocation with 1st Service Each Service label will be placed on a different adapter and the service address is the source address of all outgoing traffic Collocation with 1st Service All the Service labels are placed on one adapter and the customer can choose an address as a source for all outgoing traffic Anti-collocation with Persistent with 1st Service Each service label will be the source address The swap adapter will use the new transfer option of ifconfig This should help with problems associated with default and user specified routes CLCOMD now uses all unrestricted interfaces

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA usability changes (cont)

Two-Node disk heartbeat Easy set up, change and test (only 5.5 and 6.1) New Heartbeat Tuning Parameters Grace Period: The amount of time (seconds) the node will wait before marking a node as DOWN. Accepted values are between 5 and 30 Seconds. Failure Cycle: The frequency of the heartbeat. Accepted values are between 1 and 20 seconds Settings apply to all networks across the cluster. Notes on Migration Check carefully as not many configurations can be migrated

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA usability changes (cont)

Repository Resiliency In PowerHA 7.1.0, the node shuts down on when the repository disk fails Disk failure or lost connection CAA will provide Repository Resiliency Requires AIX 6.1.7 SP4 or AIX 7.1.0 SP3, PowerHA 7.1.1 SP1 Node continues running even on repository disk failure, using locally cached information Kept in the kernel User can provide a new disk on which to rebuild the repository No changes allowed while repository is out of service On repository failure Message posted to hacmp.out Repeated on config_too_long pattern DARE and sync continue to function, but any CAA topology changes are rejected User must recognise repository failure, and allocate a new disk SMIT path under Manage the Cluster -> Select a new Repository Disk

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

7.1 - clmgr cluster command line

Director plug-in neededs a consistent interface for SystemMirror. Simplify management of clusters from Director Reduce maintenance overhead Replacement to CLVT Current Smart Assists utilize CLVT Overcomes previous CLVT limitations Limited trace output and logging; Difficult to use clmgr is a hard link to clvt, clvt is a binary the *only* binary in the clmgr code base, all other code is ksh93 /usr/es/sbin/cluster/utilities/clmgr Added 100% tracing coverage, with multiple levels all STDERR output is written to /var/hacmp/log/clutils.log Fully globalized; uses the command.cat message catalog. Added dozens of consistent error messages, and large amounts of automatic help Consolidated the set of supported actions and attributes

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

7.1 - clmgr cluster command line (cont)

Supported actions add delete manage modify move offline online query recover sync view

Supported object classes cluster site node interface network resource_group service_ip persistent_ip application_controller application_monitor dependency

file_collection fallback_timer volume_group * logical_volume * file_system * physical_volume * method* report snapshot tape

* incomplete coverage of features

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

clmgr examples

7801p20:/usr/local/scripts# clmgr online cluster WHEN=now MANAGE=auto \ BROADCAST=false CLINFO=true FORCE=false FIX=interactively <snip> s7801p22: s7801p22: s7801p22: s7801p22: s7801p22: s7801p22: s7801p22: s7801p22: s7801p22: Aug 12 2012 21:04:35 Checking for srcmstr active... Aug 12 2012 21:04:35 complete. Aug 12 2012 21:04:35 /usr/es/sbin/cluster/utilities/clstart: called with flags -m -G -i -B -A Aug 12 2012 21:05:10 Completed execution of /usr/es/sbin/cluster/etc/rc.cluster with parameters: -boot -N -A -i interactively -P cl_rc_cluster. Exit status = 0

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

lscluster command

lscluster flags -i Lists the cluster configuration interfaces on the local node. -n Allows the cluster name to be queried for all interfaces -s Lists the cluster network statistics on the local node. -m Lists the cluster node configuration information. -d Lists the cluster storage interfaces. -c Lists the cluster configuration.

s7801p20:/usr/local/scripts# lscluster -c Cluster query for cluster pleiades returns: Cluster uuid: 527e26c4-99b8-11e1-a0e3-1293071a2808 Number of nodes in cluster = 3 Cluster id for node s7801p20 is 1 Primary IP address for node s7801p20 is 10.2.55.120 Cluster id for node s7801p21 is 2 Primary IP address for node s7801p21 is 10.2.55.121 Cluster id for node s7801p22 is 3 Primary IP address for node s7801p22 is 10.2.55.122 Number of disks in cluster = 0 Multicast address for cluster is 228.2.55.120

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

clRGinfo command

s7801p20:/usr/local/scripts# clRGinfo -p Cluster Name: pleiades Resource Group Name: test_rg Node ---------------------------s7801p20 s7801p21 s7801p22 Group State --------------ONLINE OFFLINE OFFLINE

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

lssrc command output changed

s7801p20:/usr/local/scripts# lssrc -ls clstrmgrES Current state: ST_STABLE sccsid = "$Header: @(#) 61haes_r710_integration/14 43haes/usr/sbin/cluster/hacmprd/ main.C, hacmp, 61haes_r710, 1038A_61haes_r710 2010-08-27T05:11:44-05:00$" i_local_nodeid 0, i_local_siteid -1, my_handle 1 ml_idx[1]=0 ml_idx[2]=1 ml_idx[3]=2 There are 0 events on the Ibcast queue There are 0 events on the RM Ibcast queue CLversion: 12 local node vrmf is 7103 cluster fix level is "3" The following timer(s) are currently active: Current DNP values DNP Values for NodeId - 1 NodeName - s7801p20 PgSpFree = 126661 PvPctBusy = 0 PctTotalTimeIdle = 98.523127 DNP Values for NodeId - 2 NodeName - s7801p21 PgSpFree = 127610 PvPctBusy = 0 PctTotalTimeIdle = 98.945318 DNP Values for NodeId - 3 NodeName - s7801p22 PgSpFree = 126483 PvPctBusy = 0 PctTotalTimeIdle = 98.801866

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

cldump command

s7801p20:/usr/local/scripts# cldump cldump: Waiting for the Cluster SMUX peer (clstrmgrES) to stabilize............. Failed retrieving cluster information. There are a number of possible causes: clinfoES or snmpd subsystems are not active. snmp is unresponsive. snmp is not configured correctly. Cluster services are not active on any nodes. Refer to the HACMP Administration Guide for more information.

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

cltopinfo command
s7801p20:/usr/local/scripts# cltopinfo Cluster Name: pleiades Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: caa_private0 Cluster IP Address: 228.2.55.120 There are 3 node(s) and 1 network(s) defined NODE s7801p20: Network net_ether_02 srvc1 10.2.50.120 s7801p20b 172.3.1.20 NODE s7801p21: Network net_ether_02 srvc1 10.2.50.120 s7801p21 10.2.55.121 NODE s7801p22: Network net_ether_02 srvc1 10.2.50.120 S7801p22 10.2.55.122 <snip> Resource Group test_rg Startup Policy Online On Home Node Only Fallover Policy Fallover To Next Priority Node In The List Fallback Policy Never Fallback Participating Nodes s7801p20 s7801p21 s7801p22 Service IP Label srvc1

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Cluster wide execution

Command is /usr/sbin/clcmd Provided by CAA, distributes command to all nodes (or a subset of the nodes) in cluster (or clusters) Similar to dsh
clcmd lssrc -g caa ------------------------------NODE s7801p22 ------------------------------Subsystem Group cld caa clcomd caa clconfd caa solidhac caa solid caa ------------------------------NODE s7801p20 ------------------------------Subsystem Group cld caa clcomd caa solid caa solidhac caa clconfd caa

PID 6750432 7012576 7798794 6815926 8847410

Status active active active active active

PID 5832952 6553816 7929910 8454150 8388622

Status active active active active active


2013 IBM Corporation

PowerHA SystemMirror

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

IBM Systems Director: PowerHA management interface

No charge plug-in Masks complexity Central management Real-time status Smart Assist integration Deployment wizards

73

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 7.1.2 Director Plugin Enhancements

Wizards

Cluster Create Wizard Single Site and Multi Site deployment Resource Group Creation Wizard Custom and Smart Assist based RG deployment SAP liveCache HotStandby solution Wizard Federated Security Setup Wizard Volume Group Create Wizard Support for LVM Mirror Pools Replication (Mirror) Group Wizard HyperSwap Setup

Management Enhancements

Repository Disk/s Management Resource Groups management Snapshots, networks, log files etc Reports Management Notifications management Event driven callouts Capacity upgrade based fallovers HyperSwap Management File collections

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA 7.1.2 Director Plugin: Multi Site Management

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

System Director Plug-in: Basic Architecture


Three-tier architecture provides scalability: User Interface Management Server Director Agent
Director Agent
Automatically installed on AIX 7.1 & AIX V6.1 TL06
User Interface Web-based interface Command-line interface

AIX PowerHA Director Agent

Secure communication
P D

Director Server
P D P D Central point of control Supported on AIX, Linux, and Windows Agent manager

Discovery of clusters and resources


2013 IBM Corporation

76

PowerHA SystemMirror

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

System Director Plug-in Getting Started

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Monitoring Services
All communication interfaces are monitored
Cluster Aware AIX tells you what interfaces have been discovered on a node and information on those interfaces including state

All cluster disks are monitored

Cluster Aware AIX tells you what disks are in the cluster and information on those disks including state All monitors implemented at a low-level of the AIX kernel, therefore they are largely insensitive to system load

All nodes are monitored

Cluster Aware AIX tells you what nodes are in the cluster and information on those nodes including state. A special gossip protocol is used over the multicast address to determine node information and implement scalable reliable multicast. No traditional heartbeat mechanism is employed. Gossip packets travel over all interfaces including storage.

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

LVM Split Site (Cross Site) Equivalent

Assumes SAN connected disks and nodes at two locations

Define shared volume group with super strict mirror pools Mirror pool for each location Disks must be manually assigned to each mirror pool Knowing which disks are where is a user responsibility LVM mirrors logical volume between two locations Resource group definition should allow forced varyon In the event of node and disk loss at one location Volume group forced on line at other location by PowerHA Mirror pool set up guarantees a local copy of the data Manual recovery of repository using Repository Resiliency

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Problem determination

# clctrl -tune -L
NAMEDEFMINMAXUNITSCOPE ENTITY_NAME(UUID) CUR pleiades(361d4ace5eb011e291f01293071a2807) 240 config_timeout24002G1secondscn deadman_modeacn hb_src_disk113c hb_src_lan113c hb_src_san213c link_timeout3000001171Kmillisecondscn node_down_delay100005000600000millisecondscn node_timeout2000010000600000millisecondscn packet_ttl32164cn remote_hb_factor101100c repos_modeecn site_merge_policypc

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Problem determination

# snap caa Creates /tmp/ibmsupt/caa Contains data from each node in Data/data_time

tartvfs7801p20.tar drwxrxrx000Jan3009:55:302013s7801p20/ rwrr001123Jan3009:55:312013s7801p20/LOG rwrr002554Jan3009:55:302013s7801p20/bootstrap_repository rwrr00978Jan3009:55:302013s7801p20/caa_tunables rwrr00194671Jan3009:55:292013s7801p20/clcomd_log.Z rwrr005618196Jan3009:55:302013s7801p20/clcomddiag_log.Z rwrr001362Jan3009:55:302013s7801p20/detail_repository rwrr00548Jan3009:55:302013s7801p20/lscluster_clusters rwrr006144Jan3009:55:302013s7801p20/lscluster_network_interfaces rwrr001968Jan3009:55:302013s7801p20/lscluster_network_statistics rwrr002484Jan3009:55:302013s7801p20/lscluster_nodes rwrr001067Jan3009:55:302013s7801p20/lscluster_storage_interfaces rwrr0076Jan3009:55:302013s7801p20/lsrepos_all rwrr00396Jan3009:55:302013s7801p20/swfinfo_uuids rwrr0010017023Jan3009:55:282013s7801p20/syslog_caa rwrr0093Jan3009:55:302013s7801p20/system_proc_version rwrr0030Jan3009:55:302013s7801p20/system_uname

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Moving to Disaster Recovery

Requirements for HADR Solution Recovery Time Objective Time application is unavailable Recovery Point Objective Last data point at which production is recovered in event of a failure Planned downtime Maintenance / Testing Geographic dispersion To meet compliance regulations Ease of management Degree of skill required compared with practicality of swaps Ease of deployment Desire from customers for a simple solution Integration and support Degree of integration with the OS and application will affect the success of failover

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Summary of changes
PowerHA 6.1 DSCLI Metro Mirror VIOS Packaging & Pricing Changes p6/p7 CoD DLPAR Support EMC SRDF Integration GLVM Config Wizard Full IPV6 Support
PowerHA 7.1.1 CAA Repository Resilience JFS2 Mount Guard support SAP Hot Standby Solution Federated Security SAP & MQ Smart Assists XIV Replication Integration Director Plug-in Updates

PowerHA 7.1.2 PowerHA 7.1 Cluster Aware AIX IPv6, Rolling upgrade, Linked Cluster Aware AIX clusters IBM Director Integration IBM Systems Director plug-in Hitachi TrueCopy & HUR async Integration New wizards, 2 site clusters, DS8700 Global Mirror Integration Enterprise Edition, Drop topology services for MultiCast protocol Linked and stretched clusters Split / merge site options with Storage Monitoring tie-breaker HADR Storage Framework Hyperswap Support for DS8k for 2 sites

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

PowerHA roadmap

PowerHA release Life cycle strategy Current model: Major release every year Requires ISV certification for every major release New model: Implement technology level release strategy Major releases as necessary Minor release updates (Technology Leve 0 to Major release) At lease two technology levels per major release Proposed Additional 2 year service offering for last TL (under review) New command halevel -s

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Support planning

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Roadmap
PowerHA SystemMirror 7.1.1 TL01

PowerHA SystemMirror 7.1.2

PowerHA SystemMirror 7.1.3

SAP liveCache Hot Standby Solution PowerHA federated security

Smart Assists Weblogic; Sybase; Peoplesoft PowerHA Enterprise Edition 7.1+ Hyperswap HA/DR

VM HA management VM restart VM DR restart

MQSeries smart assist

PowerHA Enterprise Edition 7.1+

Technology level release

PowerHA failover reversal 3 or more sites support Operator override support

PowerHA Enterprise Edition 6.1

HA/DR support for XIV

2011
PowerHA 7.1 director plugin

2012
PowerHA 7.1 director plugin

2013

Federated security management Replicated storage management Wizards update SAP liveCache HotSwap GLVM express wizard Multi-site cluster wizard

PowerHA 7.1 director plugin

Hyperswap HA/DR support Cluster modeling

Three site support Cluster modeling Failover reversal LPAR HA management

LPAR HA management

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

References

Thanks to Shawn, Mike and the US team for notes and detailed information.

IBM PowerHA SystemMirror for AIX v7.1 http://www.redbooks.ibm.com/abstracts/sg247845.html PowerHA Web site: www.ibm.com/systems/power/software/availability/ PowerHA portal http://www-03.ibm.com/systems/power/software/availability/aix/index.html Online Documentation http://www-03.ibm.com/systems/p/library/hacmp_docs.html PowerHA SystemMirror Marketing Page http://www-03.ibm.com/systems/power/software/availability/aix/index.html PowerHA landing page on IBM.com http://www-03.ibm.com/systems/power/software/availability/aix/index.html PowerHA technical forum https://www.ibm.com/developerworks/community/forums/html/forum?id=1111111 1-0000-0000-0000-000000001611 PowerHA Comments & Questions: hafeedbk@us.ibm.com PowerHA SystemMirror
2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Other useful info

IBM Technology Service Offering for PowerHA SystemMirror XD deployment http://www-935.ibm.com/services/us/index.wss/offering/its/a1000032 Redbooks SG24-7739 : PowerHA for AIX Cookbook SG24-7841 : Exploiting IBM PowerHA SystemMirror Enterprise Edition SG24-7845 : IBM PowerHA SystemMirror 7.1 for AIX RedGuide High Availability and Disaster Recovery Planning: Next-Generation Solutions for Multi server IBM Power Systems Environments http://www.redbooks.ibm.com/abstracts/redp4669.html?Open Education: PowerHA for AIX Implementation, Configuration and Administration AN610 Go to www.ibm.com/services/learning, search for AN610 or PowerHA coming soon GLVM white paper www.ibm.com/systems/resources/systems_p_os_aix_whitepapers_pdf_aix_glvm.pdf clmgr white paper www.ibm.com/systems/resources/systems_power_software_availability_clmgr_tech_guide.pdf IBM storage virtualization offerings www.ibm.com/systems/storage/virtualization Wiki

http://www.ibm.com/developerworks/wikis/display/WikiPtype/High%20Availability

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Other useful info


PowerHA SystemMirror for AIX v7.1 Two-Node Quick Configuration Guide

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP102216

Current Redbook

http:// www.redbooks.ibm.com/redbooks.nsf/searchsite?SearchView&query=powerha

Redbook if using PowerHA Enterprise Edition with Hitachi TrueCopy

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/PRS5098

Disaster recovery using IBM Storwize family storage with IBM PowerHA SystemMirror Enterprise Edition 7.1

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP102245 Implementing PowerHA with Storwize V7000

Tips for Configuring PowerHA on Flex System POWER7 Compute Nodes

http://w3-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP102181 PowerHA SystemMirror


2013 IBM Corporation

89

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Resources matrices and cross references


PowerHA Hardware Support Matrix

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105638

PowerHA for AIX Version Compatibility Matrix

http://w3-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101347

PowerHA Enterprise Edition Support Cross Reference

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105440

90

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Resources demo videos


Configuring PowerHA v7.1.2 using IBM Systems Director Demo http://www.youtube.com/watch?v=zxHURigatQc Apply Updates (Service Packs) to an active PowerHA 7.1.2 cluster http://www.youtube.com/watch?v=fZpYiu8zAZo PowerHA cluster test tool demonstration http://www.youtube.com/watch?v=zZHhCXhg1L8 Dynamically add a node into an active PowerHA cluster http://www.youtube.com/watch?v=bV9JdzPWTVQ PowerHA Enterprise Edition with XIV replication failover http://www.youtube.com/watch?v=RJ5O0030agM

91

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Resources - DeveloperWorks
PowerHA cluster migration to POWER7 (Chris Gibson IBM)

http://www.ibm.com/developerworks/aix/library/au-cluster-migration/index.html

PowerHA 7.1 heartbeat over SAN (Talor Holloway Advent One)

http://www.ibm.com/developerworks/aix/library/au-aix-powerha-heartbeat/index.htm l

92

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Edison Group whitepaper The Value of Deep Integration


http://www-03.ibm.com/systems/power/advantages/whypower/powerha.html
Such deep integration enables innovative features unavailable in other products In addition, because the clustering solution and operating system evolve together, any flaws in the synthesis between the two discovered in the field are addressed, and the fixes are baked into the next release of the product. This ensures a product that continually improves over time into an extremely robust HA clustering solution.

93

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

Case study - Robert Wood Johnson University Hospital


Download case study
Overview Consistently ranked as one of Americas Best Hospitals by U.S. News & World Report, Robert Wood Johnson University Hospital provides state-of-the-art care through a wide range of health care services. The 610-bed hospital based in New Brunswick, New Jersey functions as one of the nations leading academic medical centers and is the only Level 1 Trauma Center for Central New Jersey. Business need To remain competitive and ensure business continuity, Robert Wood Johnson University Hospital needed to improve IT performance and implement a failover system to ensure reliable data access. Solution The hospital deployed IBM Power 740 Express servers running IBM AIX, IBM PowerHA SystemMirror for AIX, IBM System Storage DS4300 and NTT DATA Optimum Revenue Cycle Management software. Benefits Hospital staff and patients noticed vast performance improvements in accounts and records systems, and IT staff ensured data access by reducing failover time from several hours to five minutes.

94

PowerHA SystemMirror

2013 IBM Corporation

Advanced Technical Skills

IBM Systems and Technology Group Technical Symposium Auckland New Zealand | August 14 17, 2013

AHY24 PowerHA SystemMirror for AIX: New Features and Best Practice
Questions ?
Antony (Red) Steel - ATS Senior IT Specialist IBM Aust/NZ red_steel@au.ibm.com +61 41980 3049

IBMTECHU.COM
IBM STG Technical Universities & Symposia web portal

download password: nz2013


KEY FEATURES... Create a personal agenda using the agenda planner View the agenda and agenda changes Use the agenda search to find the sessions and/or Download presentations Submit Session and Conference Evaluations PowerHA SystemMirror
2013 IBM Corporation

ibmtechu.com/nz

You might also like