PowerHA Workshop Part1

GTS Delivery Center Argentina
IBM Global Account
Power HA Workshop
Part 1
ing. Luciano Bez
1Understand the High Availability Technology.

2Understand the hacmp & powerha technology.
Power HA - Workshop
Instructor
ing. Luciano Martn BAEZ MOYANO
lucianobaez@ar.ibm.com
luciano.baez
http://www.luchonet.com.ar
http://www.linkedin.com/in/lucianobaez
https://www.facebook.com/lucianobaez
Resources
http://ibmurl.hursley.ibm.com/NUMX
http://ibmurl.hursley.ibm.com/NUOH
ing. Luciano Bez lucianobaez@ar.ibm.com
May 2016
Power HA - Workshop
What is a Cluster ?
How many kinds of Clusters there are ?
A Cluster is a group of servers and other

resources with a common objective.
-Server Farms
-Load Balancing
-High Availability
-High Performance Computing
May 2016
Power HA - Workshop
What is High Availability ?
High Availability is a system design approach and

associated service implementation that ensures a
prearranged level of operational performance will be met
during a contractual measurement period.
Availability refers to the ability of the user community to access the system, whether to
submit new work, update or alter existing work, or collect the results of previous work. If
a user cannot access the system, it is said to be unavailable.
Generally, the term downtime is used to refer to periods when a system is unavailable.
May 2016
Power HA - Workshop
Causes Of Downtime
Solution
Required
Disaster
Recovery
High
Availability
Downtime refers to a period of time or a
percentage of a time span that a machine
or system (usually a computer server) is
offline or not functioning, usually as a
result of either system failure (such as a
crash or routine maintenance.
(Continuous
Operations)
Reliability is not the same as Availability!

5ing. Luciano Bez lucianobaez@ar.ibm.com
May 2016
Power HA - Workshop
Availability percentage calculation

Availability is usually expressed as a percentage of uptime in a given year.
Huh?
Did
something
happen?
Checkpoint restart.
Not too bad .
Start over.
Where's all my work?
Uptime and Availability are not synonymous. A system can be up, but not available, as
in the case of a network outage.
May 2016
Power HA - Workshop
What is a Fault Tolerant system ?

Fault-tolerance or graceful degradation is the property that
enables a system, to continue operating properly in the event
of the failure of (or one or more faults within) some of its
components. If its operating quality decreases at all, the decrease
is proportional to the severity of the failure, as compared to a
navely-designed system in which even a small failure can cause
total breakdown.
Fault-tolerance is particularly sought-after in high-availability or life-critical
systems.
No single point of failure or redundancy.
Fault isolation to the failing component.
Fault containment to prevent propagation of the failure.
Availability of reversion modes.
May 2016
Power HA - Workshop
High Availability cluster design

High-availability clusters (also known as HA Clusters or Failover Clusters) are
computer clusters that are implemented primarily for the purpose of providing
high availability of services which the cluster provides. They operate by having
redundant computers or nodes which are then used to provide service when
system components fail.
Network
Cluster
Heartbeat
SAN
May 2016
Power HA - Workshop
Some concepts
Not every application can run in a high-availability cluster environment, and the necessary design
decisions need to be made early in the software design phase. In order to run in a high-availability
cluster environment, an application must satisfy at least the following technical requirements:
There must be a relatively easy way to start, stop, force-stop, and check the status of the
application. In practical terms, this means the application must have a command line interface or
scripts to control the application, including support for multiple instances of the application.
The application must be able to use shared storage (NAS/SAN).
Most importantly, the application must store as much of its state on non-volatile shared storage
as possible. Equally important is the ability to restart on another node at the last state before
failure using the saved state from the shared storage.
The application must not corrupt data if it crashes, or restarts from the saved state.
Fail over: If a Node with a clustered resource crashes, the HA clustering remedies this situation by
immediately restarting the application on another node without requiring administrative intervention.
Fail back: Is the process to back the resource to the original node, after a failover.
Heartbeat: Is a connection between nodes which is used to monitor the health and status of each
node in the cluster.
Split-Brain: Occurs when all private links go down simultaneously, but the cluster nodes still
running. If that happens, each node in the cluster may mistakenly decide that every other node has
gone down and attempt to start services that other nodes are still running. Having duplicate
instances of services may cause data corruption on the shared storage.
May 2016
Power HA - Workshop
High Availability Node configuration.

Active/Passive: Provides a fully redundant instance of each node, which is
only brought online when its associated primary node fails. This configuration
typically requires the most extra hardware.
Active/Active: Traffic intended for the failed node is either passed onto an
existing node or load balanced across the remaining nodes. This is usually only
possible when the nodes utilize a homogeneous software configuration.
N+1: Provides a single extra node that is brought online to take over the role of
the node that has failed. In the case of heterogeneous software configuration
on each primary node, the extra node must be universally capable of assuming
any of the roles of the primary nodes it is responsible for. This normally refers to
clusters which have multiple services running simultaneously; in the single
service case, this degenerates to Active/Passive.
N+M: In cases where a single cluster is managing many services, having only
one dedicated failover node may not offer sufficient redundancy. In such cases,
more than one (M) standby servers are included and available. The number of
standby servers is a tradeoff between cost and reliability requirements.
May 2016
Power HA - Workshop
High Availability Node configuration.

N-to-1: Allows the failover standby node to become the active one temporarily,
until the original node can be restored or brought back online, at which point
the services or instances must be failed-back to it in order to restore High
Availability.
N-to-N: A combination of Active/Active and N+M clusters, N to N clusters
redistribute the services, instances or connections from the failed node among
the remaining active nodes, thus eliminating (as with Active/Active) the need for
a 'standby' node, but introducing a need for extra capacity on all active nodes.
May 2016
Power HA - Workshop
Active/Passive Node Configuration

Network
Cluster
Request for Application

A
Service Access
FAIL
Application A
Heartbeat
Passive
Active Node
FAILOVER process Node
Note: The failover process could be

triggered by a unexpected fail (unplanned)
or by a planned system administratorMay
action
2016
Power HA - Workshop
Active/Active Node Configuration

Network

A
B
Service Access
Cluster
FAIL
Application A
Application B
Heartbeat
Active Node
Active Node
FAILOVER process
B
A

action
2016
Power HA - Workshop
Poor Man Cluster Node Configuration

Network

A
No
available
B
Service Access
Cluster
FAIL
UNLOAD
the
Development Database
Low importance
Application B application
(Development)
Application A
Heartbeat
(production)
Active Node
Active Node
FAILOVER process
B
A

action
2016
Power HA - Workshop
2 HACMP
High Availability Cluster Multiprocessing
(Now called IBM PowerHA SystemMirror)
May 2016
Power HA - Workshop
HACMP History
PowerHA SystemMirror 7.1
HACMP 5.2
HACMP 4.4.1
Integration with
Tivoli
Application Monitoring
Cascading w/out
fallback option
Integration of HANFS
Functionality
Selective Fallover
HACMP 4.3.1
32 Node Support
Node by Node
migration
Fast Connect
Support
HACMP Task Guides
HACMP 4.2.2
Introduced HAES
based on RSCT
monitoring topology &
event management
services from PSSP
2 Node Config Assist

File Collections
Cluster Test Tool
RG Dependencies
Self Healing Clusters
User Password Mgmt
WebSMIT
HACMP 5.1
HAS (Classic)
Dropped
Fast Disk Takeover
Custom Resource
Groups
Heartbeating over IP
Aliases
Disk Heartbeating
HACMP 4.5
Introduction of IP
Aliasing
Persistent IP Address
64-bit capable APIs
Monitoring and
recovery from loss of
VG quorum
HACMP 5.4.1
First Failure Data Capture

WPAR Integration
Consistency Group
Support DS Metro Mirror
GLVM Monitoring
Enhancements
NFSV4 Support
Multi-Node Disk Heartbeat
WebSMIT Enhancements
HACMP 5.4.0
Non Disruptive Upgrades
Fast Failure Detection
IPAT on XD Networks
Linux on Power Support
Oracle Smart Assistant
GPFS 2.3 Integration
DSCLI Support
Intermix of DS
Enclosures
HACMP 5.3
OEM Volume & FS
Support
Location Dependencies
Startup Verification
Geographic LV Mirroring
IP Distribution Policies
Cluster Aware AIX

IBM Director Integration
Hitachi TrueCopy & Global
Replicator Integration
DS8700 Global Mirror
Integration
Drop RSCT for Multi Cast
protocol
Storage Monitoring
HADR Storage Framework
PowerHA SystemMirror 6.1
DSCLI Metro Mirror VIOS

Packaging & Pricing
Changes
p6/p7 CoD DLPAR Support
EMC SRDF Integration
GLVM Config Wizard
Full IPV6 Support
PowerHA 5.5
WebSMIT Gateway Server

WebSMIT Enterprise View
Partial IPV6 Support
Asynchronous GLVM
DR Manual Recovery
Option
SVC PPRC VIO Support
May 2016
Power HA - Workshop
HACMP History
Single
server
Clusters
Site 1
Site 1
Split Site
Clusters
Multi-site
Disaster Recovery
Split Site Mirror

Site 1
Site 2
Site 1
1
112
01
1989
1992
HACMP Cluster
Active-Passive Failover
Resource Group
Management
Planned and unplanned
outage handling
2000
Disaster Recovery with
Storage: DS8K, SVC
Framework for OEM disk
and FS support
Location dependencies
Low cost Host mirroring
File Collections
Fast Disk Takeover

Redundant communication
support (Disk, Network, SCSI
target, Token ring, Serial etc)
Integration with Tivoli
Monitoring
VG failure handling
NFS HA management
Site 2
11
122 1
94
31
8
765 0
219
3
4578
6
Third Party
Storage DR
Site 1
1
112
01
Site 2
11
122 1
94
3 10
8
765
SAP HotStandby HyperSwap

SAP
liveCache HotStandby
HyperSwap
Site 1 Site 2
Active-Active
3 Site
Sites
Deployments
Active Active Sites
Site 1
Site 2
3 Site Deployments
Site 1 Site 2 Site 3
219
3
4578
6
2004
Fast Failure Detection
Framework for OEM disk
and FS support
Capacity Optimized
failovers
Low cost Host mirroring
GPFS Integration
Two Node Rapid
deployment assistant
WPAR HA Management
Browser based UI
RG Dependencies
Health Monitoring and
Verification framework
Flexible and uniform
E2E
integration
failover
policies for 1 or 2
Single
sites point of control
Application
level
granularity
DR with EMC,
Hitachi,
XIV
Distributed
server h/w mgt
NDU upgrades
E2E
0-3 sec / RTO < 1H
SelfRPO
healing
2010
2013
2012
PowerHA v7: Kernel based

clustering
PowerHA federated Security
Administration
Enhanced Split/Merge
handling
Enhanced Middleware HA
management (Smart Asists)
SAP HA management
SAP liveCache HotStandby
solution
IBM Director: Graphical
Management
Full IPv6 Support
Stretched and Linked
Clusters
HyperSwap with DS8K

Active-Active Sites support
Manual Failover DR
Tie breaker Support
3 Site support through LVM
Mirror+HyperSwap
3 Site Support through LVM
Mirror + GLVM Mirroring
Unicast clustering
Dynamic Host Name Change
support
May 2016
Power HA - Workshop
PowerHA / HACMP support matrix

Packaging Changes Introduced in version 6.1:
Standard Edition - Local Availability
Enterprise Edition - Local & Disaster Recovery
May 2016
Power HA - Workshop
PowerHA / HACMP support matrix
May 2016
Power HA - Workshop
Clustering for HA and DR

Application
PowerHA standard edition cluster

Application
PowerHA enterprise edition cluster
PowerHA enables 24x365 operational availability

Automation for planned and unplanned outages
Solutions covering simple data center to multiple-site configurations
May 2016
Power HA - Workshop
Power HA Standard and Enterprise

Standard
Edition
Enterprise
Edition
Centralized Management CSPOC
Cluster resource management
Shared Storage management
Cluster verification framework
Integrated disk heartbeat
SMIT management interfaces
AIX event/error management
Integrated heartbeat
PowerHA DLPAR HA management
Smart Assists
Multi Site HA Management
High Level Features
PowerHA GLVM async mode
GLVM deployment wizard
IBM Metro Mirror support
IBM Global Mirror support
OEM Copy Services
Highlights:
Editions to optimize software value
capture
Standard Edition targeted
at datacenter HA
Enterprise Edition targeted
at multi-site HA/DR
- Stretched Clusters
- Linked Clusters
Per processor core used + tiered
pricing structure
- Small/Med/Large
May 2016
Power HA - Workshop
PowerHA SystemMirror 7.1 Enterprise Edition

Simpler to deploy and easier to manage multi-site configurations
with IBM Systems Director, intuitive interfaces, multi-site install
wizard.
Stretched Cluster; cluster wide AIX commands, kernel based
event management single repository multicast communications.
Linked Clustering; cluster wide AIX commands, kernel based
event management, linked clusters with unicast communications &
dual Repositories.
HyperSwap for continuously available storage in two-site
topologies Cluster Split/Merge technology for managing split-site
policy scenarios.
Announce Date: Oct 3 2012

GA Date: Nov 9 2012
May 2016
Power HA - Workshop
Stretched clusters and linked cluster differences

You can use PowerHA SystemMirror management interfaces to create the following multiple-site
solutions:
Stretched cluster: Contains nodes from sites that are located at the same geographical locations.
Stretched clusters do not support HADR (High Availability Disaster Recovery) with storage
replication management.
Linked cluster: Contains nodes from sites that are located at different geographical locations.
Linked clusters support cross-site LVM mirroring and HyperSwap.
May 2016
Power HA - Workshop
HyperSwap Support by AIX-PowerHA

HyperSwap device configuration transparent to application
Applications continue to use the devices as usual - storage switching is fast seconds
Application/LVM/Middleware
Application/LVM/Middleware
/dev/hdiskX
/dev/hdiskX
/dev/hdiskY
Metro Mirror
Primary DS8K
Secondary DS8K
Traditional Metro Mirror Cluster

/dev/hdiskX
/dev/hdiskY
Metro Mirror
Primary DS8K
Secondary DS8K
HyperSwap Cluster
May 2016
Power HA - Workshop
HyperSwap for PowerHA SystemMirror

HyperSwap function in PowerHA enhances application availability for storage errors by using IBM
DS8000 metro mirroring. If you use the HyperSwap function in your environment, your applications
stay online even if errors occur on the primary storage because PowerHA SystemMirror,
transparently routes the application I/O to an auxiliary storage system.
The HyperSwap function uses a model of communication, which is called in-band, that sends the
control commands to a storage system through the same communication channel as the I/O for the
disk.
The HyperSwap function supports the following types of configurations:
Traditional Metro Mirror Peer-to-Peer Remote Copy (PPRC): The primary volume group is
only visible in the primary site and the auxiliary volume group is only visible in the auxiliary site.
HyperSwap:The primary and auxiliary volume group are visible from the same node in the
cluster.
You typically configure the HyperSwap function to be used in the following environments:.
Single node environment: A single compute node is connected to two storage systems that are
in two sites. This HyperSwap configuration is ideal to protect your environment against simple
storage failures in your environment.
Multiple site environment: A cluster has multiple nodes that are spread across two sites. This
HyperSwap configuration provides high availability and disaster recovery for your environment.
May 2016
Power HA - Workshop
HyperSwap for PowerHA SystemMirror

Mirror groups in HyperSwap for PowerHA SystemMirror represent a container of disks and have the
following characteristics: used by Cluster Aware AIX (CAA).
Mirror group contain information about the disk pairs across the site. This information is used
to configure mirroring between the sites.
Mirror groups can contact a set of logical volume manager (LVM) volume groups and a set of
raw disks that are not managed by the AIX operating system.
All the disks devices that are associated with the LVM volume groups and raw disks that are
part of a mirror group are configured for consistency. For example, the IBM DS8800 views a
mirror group as one entity regarding consistency management during replication.
The following types of mirror groups are supported:
User mirror group: Represents the middleware-related disk devices. The HyperSwap function is
prioritized internally by PowerHA SystemMirror and is considered low priority.
System mirror group: Represents critical set of disks for system operation, such as, rootvg disks
and paging space disks. These types of mirror groups are used for mirroring a copy of data that is not
used by any other node or site other than the node that host these disks.
Repository mirror group: Represents the cluster repository disks of that are used by Cluster Aware
AIX (CAA).
May 2016
Power HA - Workshop
PowerHA SystemMirror: DLPAR Value
Pros:
Automated action on acquisition of
resources (bound to the PowerHA application
server)
HMC Verification Checking for connectivity to the
HMC
Ability to Grow LPAR on Failover
Save $ on PowerHA SM Licensing
Thin Standby node
Cons:
Requires Connectivity to HMC
Potentially Slower Failover System
Specs (Takes a lot of time)
Lacks ability to grow LPAR on-fly
Ssh comunication
LPAR A
HMC
LPAR B
HMC
Backup
May 2016
Power HA - Workshop
PowerHA/HACMP topology
Networking components
Nodes: In the PowerHA context, the term node, means any

IBM pSeries system (physical or virtual) which is member of a
high availability cluster running PowerHA .
Networks: Network consist of IP and Non-IP networks. The
Non-Ip networks ensure cluster monitoring can be done if
there is total loss of IP communications. Non-IP networks are
strongly recommended to be configured to provide high
availability. (Ethernet, Ether channel, Non-Ip disk, Non-Ip
serial, etc.)
Communication interfaces: Adapter for IP networks.
Communication devices: Used for Non-IP networks (SAN).
May 2016
Power HA - Workshop
PowerHA/HACMP topology
Resource components
r
Se
P Ad
dres
s
r
ve
Serv
i ce I
p
rou
eG
um
Vol
n
io
at
ic
pl
Ap
Resource: Is a logical component

that can be put into a resource
group. Because they are logical
components, they can be moved
without human intervention.
NF
Se
xp
ort
s
Resource Group: Is a collection of resources treated as a unit along

with the nodes, they can potentially be activated on and what policies the
cluster manager should use to decide which node to choose during
startup, fallover and fallback.
le
Fi
m
te
s
sy
May 2016
Power HA - Workshop
Custom Resource Groups

Start up preferences:
* Online On Home Node Only (cascading) - (OHNO)
* Online on First Available Node (rotating or cascading
w/inactive takeover) - (OFAN)
* Online On All Available Nodes (concurrent) - (OAAN)
* Startup Distribution
Fallover Preferences:
* Fallover To Next Priority Node In The List - (FOHP)
* Fallover Using Dynamic Node Priority - (FDNP)
* Bring Offline (On Error Node Only) - (BOEN)
Fallback Preferences:
* Fallback To Higher Priority Node - (FBHP)
* Never Fallback - (NFB)
May 2016
Power HA - Workshop
Resource Groups Dependencies

* The maximum depth of the dependency tree is three levels, but any
resource group can be in a dependency relationship with any number of
other resource groups.
* Circular dependencies are not supported, and are prevented during
configuration time.
RG A
RG B
RG A
RG C
RG B
RG C
RG D
May 2016
Power HA - Workshop
Resource Groups Locations Dependencies
Online on same node: all resource groups must be

online on the same node.
Online on Different Nodes: All resource groups must
be online on different nodes.
Online on Same Site: All resource groups must be
online on the same site.
May 2016
Power HA - Workshop
Online on different Node priorities

You can assign High, Intermediate, and Low priority to each resource group.
- Higher priority resource groups take precedence over lower priority groups
at startup, fallover, and fallback.
- High priority groups can force Intermediate and Low priority groups to
move or go offline.
- Intermediate priority groups can force Low priority groups to move or go
Offline.
- Low priority groups cannot force any other groups to move or go offline.
- Groups of the same priority cannot force each other to move or go offline.
- RGs with the same priority cannot come ONLINE (startup) on the same
node.
- RGs with the same priority do not cause one another to be moved from the
node after a failover or fallback
May 2016
Power HA - Workshop
Availability components
Not just PowerHA: The final high availability solution goes beyond the
PowerHA. A high availability solution comprises a reliable OS (AIX),
applications that are tested to work in a HA cluster, storage devices,
appropriate selection of hardware, trained administrators and thorough
design and planning.
May 2016
Power HA - Workshop
So what is PowerHA/HACMP?
It's an application wich
Controls where resource groups runs.

Monitors and reacts to events. (Does fall over & does reintegration)
Provides tools for cluster wide configuration and synchronization
Cluster Manager Subsystem

clcomdES
Topology
manager
Resource
manager
Event
manager
RSCT (Topology Svcs / RMC)
SNMP
manager
snmpd
clinfoES
clstat
May 2016
Power HA - Workshop
PowerHA V6 cluster manager flow

Cluster manager is a subsystem wich:
- Controls where resource groups run, reconfigures service addresses
- Reacts to events and has tools for making cluster-wide changes and displaying status
- Relies on other AIX subsystem (ODM, LVM, CAA, RSCT, AHAFS, TCPIP, )
May 2016
Power HA - Workshop
PowerHA V7 cluster manager flow

Cluster manager is a subsystem wich:
- Controls where resource groups run, reconfigures service addresses
- Reacts to events and has tools for making cluster-wide changes and displaying status
- Relies on other AIX subsystem (ODM, LVM, CAA, RSCT, AHAFS, TCPIP, )
May 2016
Power HA - Workshop
PowerHA topology
The cluster topology represents the physical view of the cluster and how hardware cluster
components are connected using networks (IP and non-IP). To understand the operation of
PowerHA, you need to understand the underlying topology of the cluster, the role each component
plays and how PowerHA interacts. In this section we describe:
PowerHA cluster
Nodes
Sites
Policies (Split and Merge)
Networks ( physical, logical, labels, alias, multicasting,
unicasting, etc.)
Communication interfaces / devices
Persistent and Service node IP labels / addresses
Network modules (NIMs)
Topology and group services
Clients
etc.
May 2016
Power HA - Workshop
PowerHA topology
Networks
In PowerHA, the term network is used to define a logical entity that groups the communication interfaces and
devices used for communication between the nodes in the cluster, and for client access. The networks in
PowerHA can be defined as IP networks and non-IP networks. The following terms are used to describe
PowerHA networking:
IP address: The dotted decimal IP address.

IP label: The label that is associated with a particular IP address as defined by the name
resolution method (DNS or static using /etc/hosts).
Base IP label / address: The default IP label / address that is set on the
interface by AIX on
startup. The base address of the interface.
Service IP label / address: An IP label / address over which a service is provided. It can be
bound to a single node or shared by multiple nodes. Although not part of the topology, these are
the addresses that PowerHA keeps highly available.
Boot interface: Earlier versions of PowerHA have used the terms boot adapter and standby
adapter depending on the function. These have been collapsed into one term to describe any IP
network interface that can be used by PowerHA to host a service IP label / address.
IP aliases: An IP alias is an IP address that is added to an interface, rather than replacing its
base IP address. This is an AIX function that is supported by PowerHA. However, PowerHA
assigns to the IP alias the same subnet mask of the base IP address over which it is configured.
Logical network interface: The name to which AIX resolves a port (for example, en0) of a
physical network adapter.
May 2016
Power HA - Workshop
PowerHA topology
IP Address takeover mechanism
One of the key roles of PowerHA is to maintain the service IP labels / addresses highly available.
PowerHA does this by starting and stopping each service IP address as required on the
appropriate interface. When a resource group is active on a node, PowerHA supports two
methods of activating the service IP addresses:
By replacing the base (boot-time) IP address of an interface with the

service IP address. This method is known as IP address takeover (IPAT)
via IP replacement. This method also allows the takeover of a locally
administered hardware address (LAA)hardware address takeover.
By adding the service IP address as an alias on the interface, for
example, in addition to the base IP address. This method is known as IP
address takeover via IP aliasing. This is the default for PowerHA.
May 2016
Power HA - Workshop
PowerHA topology
Persistent IP Label or Address
A persistent node IP label is an IP alias that can be assigned to a network for a specified node. A
persistent node IP label is a label that:
Always stays on the same node (is node-bound)
Co-exists with other IP labels present on the same interface
Does not require installation of an additional physical interface on that node
Is not part of any resource group
Assigning a persistent node IP label for a network on a node allows you to have a highly available
node-bound address on a cluster network. This address can be used for administrative purposes
because it always points to a specific node regardless of whether PowerHA is running.
Note: It is only possible to configure one persistent node IP label per network per node. For
example, if you have a node connected to two networks defined in PowerHA, that node can be
identified via two persistent IP labels (addresses), one for each network.
May 2016
Power HA - Workshop
PowerHA topology
Device based or serial networks
Serial networks are designed to provide an alternative method for exchanging information using
heartbeat packets between cluster nodes. In case of IP subsystem or physical network failure,
PowerHA can still differentiate between a network failure and a node failure when an independent
path is available and functional.
Serial networks are point-to-point networks, and therefore, if there are more than two nodes in
the cluster, the serial links should be configured as a ring, connecting each node in the cluster.
Even though each node is only aware of the state of its immediate neighbors, the RSCT daemons
ensure that the group leader is aware of any changes in state of any of the nodes.
Even though it is possible to configure a PowerHA cluster without non-IP networks, we strongly
recommend that you use at least one non-IP connection between each node in the cluster.
The following devices are supported for non-IP (device-based) networks in PowerHA:
Serial RS232 (rs232)
Target mode SCSI (tmscsi)
Target mode SSA (tmssa)
Disk heartbeat (diskhb)
Multi-node disk heartbeat (mndhb)
May 2016
Power HA - Workshop
PowerHA topology
Split policy
A cluster split event can occur between sites when a group of nodes cannot communicate with the
remaining nodes in a cluster. For example, in a linked cluster, a split occurs if all communication
links between the two sites fail. A cluster split event splits the cluster into two or more partitions.
The following options are available for configuring a split policy:
None: A choice of None indicates that no action will be taken when a cluster split event is
detected. Each partition that is created by the cluster split event becomes an independent
cluster. Each partition can start a workload independent of the other partition. If shared volume
groups are in use, this can potentially lead to data corruption. This option is the default
setting, since manual configuratiDo not use this option if your environment is configured to use
HyperSwap for PowerHA SystemMirroron is required to establish an alternative policy..
Tie breaker: A choice of Tie Breaker indicates that a disk will be used to determine which
partitioned site is allowed to continue to operate when a cluster split event occurs. Each partition
attempts to acquire the tie breaker by placing a lock on the tie breaker disk. The tie breaker is a
SCSI disk that is accessible to all nodes in the cluster. The partition that cannot lock the disk is
rebooted. If you use this option, the merge policy configuration must also use the tie breaker
option.
May 2016
Power HA - Workshop
PowerHA topology
Merge policy
Depending on the cluster split policy, the cluster might have two partitions that run independently of
each other. You can use PowerHA SystemMirror Version 7.1.2, or later, to configure a merge
policy that allows the partitions to operate together again after communications are restored
between the partitions.
The following options are available for configuring a merge policy:
Majority: The partition with the highest number of nodes remains online. If each partition has
the same number of nodes, then the partition that has the lowest node ID is chosen. The
partition that does not remain online is rebooted, as specified by the chosen action plan. This
option is available for linked clusters. For stretched clusters to use the majority option, your
environment must be running one of the following version of the AIX operating system:
*IBM AIX 7 with Technology Level 4, or later
*AIX Version 7.2, or later.
Tie breaker: Each partition attempts to acquire the tie breaker by placing a lock on the tie
breaker disk. The tie breaker is a SCSI disk that is accessible to all nodes in the cluster. The
partition that cannot lock the disk is rebooted, or has cluster services restarted, as specified by
the chosen action plan. If you use this option, your split policy configuration must also use the
tie breaker option.
May 2016
Power HA - Workshop
PowerHA topology
Highly available NFS server
The highly available NFS server functionality is included in the PowerHA SystemMirror product
subsystem.
A highly available NFS server allows a backup processor to recover current NFS activity should the
primary NFS server fail. The NFS server special functionality includes highly available modifications
and locks on network file systems (NFS).
You can do the following:
Use the reliable NFS server capability that preserves locks and dupcache (2-node
clusters only if using NFS version 2 and version 3)
Specify a network for NFS cross-mounting
Define NFS exports and cross-mounts at the directory level v Specify export options for
NFS-exported directories and file systems
Configure two nodes to use NFS.
PowerHA SystemMirror clusters can contain up to 16 nodes. Clusters that use NFS version 2 and
version 3 can have a maximum of two nodes, and clusters that use NFS version 4 can have a
maximum of 16 nodes.
May 2016
Power HA - Workshop
PowerHA topology
PowerHA SystemMirror common cluster configurations
Standby configurations: Standby configurations are the traditional redundant hardware configurations where
one or more standby nodes stand idle, waiting for a server node to leave the cluster. (Standby configurations with
online on home node only startup policy, Standby configurations with online using distribution policy startup)
Takeover configurations: In the takeover configurations, all cluster nodes do useful work, processing part of the
cluster's workload. There are no standby nodes. Takeover configurations use hardware resources more efficiently
than standby configurations since there is no idle processor. Performance can degrade after node detachment,
however, since the load on remaining nodes increases. (One-sided takeover, Mutual takeover, Two-node mutual
takeover configuration, Eight-node mutual takeover configuration)
Cluster configurations with multitiered applications: A typical cluster configuration that could utilize parent
and child dependent resource groups is the environment in which an application such as WebSphere depends on
another application such as DB2.
Cluster configurations with resource group location dependencies: You can configure the cluster so that
certain applications stay on the same node, or on different nodes not only at startup, but during fallover and
fallback events. To do this, you configure the selected resource groups as part of a location dependency set.
Cross-site LVM mirror configurations for disaster recovery: You can set up disks that are located at two
different sites for remote LVM mirroring, using a storage area network (SAN).
Cluster configurations with dynamic LPARs: The advanced partitioning features of AIX provide the ability to
dynamically allocate system CPU, memory, and I/O slot resources (dynamic LPAR).
May 2016
Power HA - Workshop
What will PowerHA do for me ?
Heartbeats & Failure Detection
Communication across all available

interfaces (NICs, HBAs, Repository
Lun)
IP Address Takeover
Break Disk Reservations
Start / Stop Applications
Ability to define Dependencies in
multitiered Environments
Application Monitoring
Failure Notification
Application Recovery
Initiate Takeover
Custom Events (Alter processing)
Automatic Corrective Actions

Nightly Verification of configuration
Built-in Lazy Update
Documenting the environment
Cluster Snapshots
IBM Director Integration
File Collections
Propagate Files amongst the
cluster
Ease of Deployment
Smart Assistants (SAP, DB2,
OracleL)
Federated Security Integration
System Security Administration
Encrypted Filesystems
RBAC
May 2016
Power HA - Workshop
PowerHA topology
Tasks to Configure Cluster Infrastructure and ownership
Plan-out IP Addresses
Hard set interface Ips
Document DNS names
update /etc/hosts
Share Storage
Drivers / filesets
Assign Luns
Alter SAN infrastructure
Zoning
Install Applications
Start/stop scripts
Space requirements
Optimize configuration for
performance
HA Cluster Installation / Deployment
Topology & Resource Setup
Fallover Testing
Monitoring the environment
Owner
Network Admin.
Storage Admin.
Application Admin.
HA Admin.
May 2016
Power HA - Workshop
Questions
May 2016

PowerHA Workshop Part1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PowerHA Workshop Part1

Uploaded by

Copyright:

Available Formats

GTS Delivery Center Argentina

IBM Global Account

1Understand the High Availability Technology.

A Cluster is a group of servers and other

ing. Luciano Bez lucianobaez@ar.ibm.com

What is High Availability ?

High Availability is a system design approach and

ing. Luciano Bez lucianobaez@ar.ibm.com

Reliability is not the same as Availability!

Availability percentage calculation

What is a Fault Tolerant system ?

ing. Luciano Bez lucianobaez@ar.ibm.com

High Availability cluster design

ing. Luciano Bez lucianobaez@ar.ibm.com

High Availability Node configuration.

High Availability Node configuration.

ing. Luciano Bez lucianobaez@ar.ibm.com

Active/Passive Node Configuration

Request for Application

ing. Luciano Bez lucianobaez@ar.ibm.com

Note: The failover process could be

Active/Active Node Configuration

Request for Application

ing. Luciano Bez lucianobaez@ar.ibm.com

Note: The failover process could be

Poor Man Cluster Node Configuration

Request for Application

ing. Luciano Bez lucianobaez@ar.ibm.com

Note: The failover process could be

ing. Luciano Bez lucianobaez@ar.ibm.com

HACMP Task Guides

2 Node Config Assist

Fast Disk Takeover

64-bit capable APIs

ing. Luciano Bez lucianobaez@ar.ibm.com

First Failure Data Capture

Cluster Aware AIX

PowerHA SystemMirror 6.1

DSCLI Metro Mirror VIOS

WebSMIT Gateway Server

Split Site Mirror

Fast Disk Takeover

ing. Luciano Bez lucianobaez@ar.ibm.com

SAP HotStandby HyperSwap

PowerHA v7: Kernel based

HyperSwap with DS8K

PowerHA / HACMP support matrix

ing. Luciano Bez lucianobaez@ar.ibm.com

PowerHA / HACMP support matrix

ing. Luciano Bez lucianobaez@ar.ibm.com

Clustering for HA and DR

PowerHA standard edition cluster

PowerHA enterprise edition cluster

PowerHA enables 24x365 operational availability

Power HA Standard and Enterprise

Centralized Management CSPOC

Cluster resource management

Shared Storage management

Cluster verification framework

Integrated disk heartbeat

SMIT management interfaces

AIX event/error management

PowerHA DLPAR HA management

Multi Site HA Management