You are on page 1of 16

Veritas cluster concepts

July 23, 2012 VCS 5,808 Views

VCS Basics
A single VCS cluster consists of multiple systems connected in various combinations to shared
storage devices. VCS monitors and controls applications running in the cluster, and restarts
applications in response to a variety of hardware or software faults. Client applications
continue operation with little or no downtime. Client workstations receive service over the
public network from applications running on the VCS systems. VCS monitors the systems and
their services. VCS systems in the cluster communicate over a private network.
Switchover and Failover
A switchover is an orderly shutdown of an application and its supporting resources on one
server and a controlled startup on another server.
A failover is similar to a switchover, except the ordered shutdown of applications on the
original node may not be possible, so the services are started on another node. The process of
starting the application on the node is identical in a failover or switchover.
CLUSTER COMPONENTS:
Resources
Resources are hardware or software entities, such as disk groups and file systems, network
interface cards (NIC), IP addresses, and applications. Controlling a resource means bringing it
online (starting), taking it offline (stopping), and monitoring the resource.
Resource Dependencies:
Resource dependencies determine the order in which resources are brought online or taken
offline when their associated service group is brought online or taken offline. In VCS
terminology, resources are categorized as parents or children. Child resources must be online
before parent resources can be brought online, and parent resources must be taken offline
before child resources can be taken offline.
Resource Categories:
On-Off: VCS starts and stops On-Off resources as required. For example, VCS imports a disk
group when required, and deports it when it is no longer needed.
On-Only: VCS starts On-Only resources, but does not stop them. For example, VCS requires
NFS daemons to be running to export a file system. VCS starts the daemons if required, but
does not stop them if the associated service group is taken offline.
Persistent: These resources cannot be brought online or taken offline. For example, a network
interface card cannot be started or stopped, but it is required to configure an IP address. VCS
monitors Persistent resources to ensure their status and operation. Failure of a Persistent
resource triggers a service group failover.

Service Groups
A service group is a logical grouping of resources and resource dependencies. It is a
management unit that controls resource sets. A single node may host any number of service
groups, each providing a discrete service to networked clients. Each service group is
monitored and managed independently. Independent management enables a group to be
failed over automatically or manually idled for administration or maintenance without
necessarily affecting other service groups. VCS monitors each resource in a service group and,
when a failure is detected, restarts that service group. This could mean restarting it locally or
moving it to another node and then restarting it.
Types of Service Groups:
Fail-over Service Groups
A failover service group runs on one system in the cluster at a time.
Parallel Service Groups
A parallel service group runs simultaneously on more than one system in the cluster.
Hybrid Service Groups
A hybrid service group is for replicated data clusters and is a combination of the two groups
cited above. It behaves as a failover group within a system zone and a parallel group across
system zones. It cannot fail over across system zones, and a switch operation on a hybrid
group is allowed only if both systems are within the same system zone.
The ClusterService Group
The Cluster Service group is a special purpose service group, which contains resources
required by VCS components. The group contains resources for Cluster Manager (Web
Console), Notification, and the wide-area connector (WAC) process used in global clusters.
The ClusterService group can fail over to any node despite restrictions such as frozen. It is
the first service group to come online and cannot be auto disabled. The group comes online
on the first node that goes in the running state.
Agents
Agents are VCS processes that manage resources of predefined resource types according to
commands received from the VCS engine, HAD. A system has one agent per resource type that
monitors all resources of that type; for example, a single IP agent manages all IP resources.
When the agent is started, it obtains the necessary configuration information from VCS. It
then periodically monitors the resources, and updates VCS with the resource status. VCS
agents are multithreaded, meaning a single VCS agent monitors multiple resources of the
same resource type on one host. VCS monitors resources when they are online and offline to
ensure they are not started on systems on which they are not supposed to run. For this
reason, VCS starts the agent for any resource configured to run on a system when the cluster
is started. If no resources of a particular type are configured, the agent is not started.
Agent Operation:

OnlineBrings a specific resource ONLINE from an OFFLINE state.


OfflineTakes a resource from an ONLINE state to an OFFLINE state.
MonitorTests the status of a resource to determine if the resource is online or offline.
CleanCleans up after a resource fails to come online, fails to go offline, or fails while in an
ONLINE state.
ActionPerforms actions that can be completed in a short time (typically, a few seconds), and
which are outside the scope of traditional activities such as online and offline.
InfoRetrieves specific information for an online resource.
Multiple Systems
VCS runs in a replicated state on each system in the cluster. A private network enables the
systems to share identical state information about all resources and to recognize which
systems are active, which are joining or leaving the cluster, and which have failed. The
private network requires two communication channels to guard against network partitions.
For the VCS private network, two types of channels are available for heartbeating: network
connections and heartbeat regions on shared disks. The shared disk region heartbeat channel
is used for heartbeating only, not for transmitting information as are network channels. Each
cluster configuration requires at least two channels between systems, one of which must be a
network connection. The remaining channels may be a combination of network connections
and heartbeat regions on shared disks. This requirement for two channels protects your
cluster against network partitioning. Also its recommended to have at least one heart beat
disk region on each I/O shared between systems. E.g. two-system VCS cluster in which sysA
and sysB have two private network connections and another connection via the heartbeat disk
region on one of the shared disks. If one of the network connections fails, two channels
remain. If both network connections fail, the condition is in jeopardy, but connectivity
remains via the heartbeat disk.
Shared Storage
A VCS hardware configuration typically consists of multiple systems connected to share
storage via I/O channels. Shared storage provides multiple systems an access path to the
same data, and enables VCS to restart applications on alternate systems when a system fails.
Cluster Control, Communications, and Membership

Cluster communications ensure VCS is continuously aware of the status of each systems
service groups and resources.
High-Availability Daemon (HAD)
The high-availability daemon, or HAD, is the main VCS daemon running on each system. It is
responsible for building the running cluster configuration from the configuration files,
distributing the information when new nodes join the cluster, responding to operator input,
and taking corrective action when something fails. It is typically known as the VCS engine.
The engine uses agents to monitor and manage resources. Information about resource states
is collected from the agents on the local system and forwarded to all cluster members. HAD

operates as a replicated state machine (RSM). This means HAD running on each node has a
completely synchronized view of the resource status on each node. The RSM is maintained
through the use of a purpose-built communications package consisting of the protocols Low
Latency Transport (LLT) and Group Membership Services/Atomic Broadcast (GAB).
Low Latency Transport (LLT)
VCS uses private network communications between cluster nodes for cluster maintenance.
The Low Latency Transport functions as a high-performance, low-latency replacement for the
IP stack, and is used for all cluster communications.
Traffic Distribution
LLT distributes (load balances) internode communication across all available private network
links. This distribution means that all cluster communications are evenly distributed across all
private network links (maximum eight) for performance and fault resilience. If a link fails,
traffic is redirected to the remaining links.
Heartbeat
LLT is responsible for sending and receiving heartbeat traffic over network links. This
heartbeat is used by the Group Membership Services function of GAB to determine cluster
membership.
The system administrator configures LLT by creating the configuration files /etc/llthosts,
which lists all the systems in the cluster, and /etc/llttab, which describes the local systems
private network links to the other systems in the cluster.
Group Membership Services/Atomic Broadcast (GAB)
The Group Membership Services/Atomic Broadcast protocol (GAB) is responsible for cluster
membership and cluster communications.
Cluster Membership
GAB maintains cluster membership by receiving input on the status of the heartbeat from
each node via LLT. When a system no longer receives heartbeats from a peer, it marks the
peer as DOWN and excludes the peer from the cluster.
Cluster Communications
GABs second function is reliable cluster communications. GAB provides guaranteed delivery
of point-to-point and broadcast messages to all nodes.
The system administrator configures GAB driver by creating a configuration file
(/etc/gabtab).
Cluster Topologies

Asymmetric or Active/Passive Configuration


An application runs on a primary, or master, server. A dedicated redundant server is present to
take over on any failure. The redundant server is not configured to perform any other
functions. This configuration is the simplest and most reliable. The redundant server is on
stand-by with full performance capability.

Symmetric or Active/Active Configuration


In a symmetric configuration, each server is configured to run a specific application or service
and provide redundancy for its peer. When a failure occurs, the surviving server hosts both
application groups.
In the asymmetric example, the redundant server requires only as much processor power as
its peer. On failover, performance remains the same. In the symmetric example, the
redundant server requires not only enough processor power to run the existing application,
but also enough to run the new application it takes over.

N-to-1 Configuration
An N-to-1 configuration is based on the concept that multiple, simultaneous server failures
are unlikely; therefore, a single redundant server can protect multiple active servers. When a
server fails, its applications move to the redundant server. In this configuration, a dedicated,
redundant server is cabled to all storage and acts as a spare when a failure occurs.
The problem with this design is the issue of failback. When the original, failed server is
repaired, all services normally hosted on the server must be failed back to free the spare
server and restore redundancy to the cluster.

N + 1 Configuration
With the capabilities introduced by storage area networks (SANs), you can not only create
larger clusters, but more importantly, can connect multiple servers to the same storage. A
dedicated, redundant server is no longer required in the configuration. Instead of N-to-1
configurations, there is N+1. In advanced N+ 1 configuration, an extra server in the cluster is
spare capacity only. When a server fails, the application service group restarts on the spare.
After the server is repaired, it becomes the spare. This configuration eliminates the need for
a second application failure to fail back the service group to the primary system. Any server
can provide redundancy to any other server.

N-to-N Configuration
An N-to-N configuration refers to multiple service groups running on multiple servers, with
each service group capable of being failed over to different servers in the cluster.
If any node fails, each instance is started on a different node, ensuring no single node
becomes overloaded. This configuration is a logical evolution of N + 1: it provides cluster
standby capacity instead of a standby server.

Storage Configurations

Basic Shared Storage Cluster


In this configuration, a single cluster shares access to a storage device, typically over a SAN.
An application can only be started on a node with access to the required storage. For
example, in a multi-node cluster, any node designated to run a specific database instance
must have access to the storage. When a node or application fails, all data required to start
on another node is stored on the shared disk.

Share Nothing Cluster:


There is no storage disk. All nodes maintain separate copies of data. They maintain separate
copies of data. VCS shared nothing clusters typically have read-only data stored locally on
both systems.

Replicated Data Cluster:


In a replicated data cluster there is no shared disk. Instead, a data replication product
synchronizes copies of data between nodes. Replication can take place at the application,
host, and storage levels. Regardless of which replication technology is used, the solution must
provide data access that is identical to the shared disks. If the failover management software
requires failover due to a node or storage failure, the takeover node must possess an identical
copy of data. This typically implies synchronous replication. At the same time, when the
original server or storage is repaired, it must return to standby capability.

Global Cluster
A global cluster links clusters at separate locations and enables wide-area failover and
disaster recovery. In a global cluster, if an application or a system fails, the application is

migrated to another system within the same cluster. If the entire cluster fails, the application
is migrated to a system in another cluster.

Configuration:

Configuring VCS means conveying to the VCS engine the definitions of the cluster, service
groups, resources, and resource dependencies. VCS uses two configuration files in a default
configuration:
The main.cf file defines the entire cluster.
The types.cf file defines the resource types.
By default, both files reside in the directory /etc/VRTSvcs/conf/config. Additional files
similar to types.cf may be present if agents have been added, such as Oracletypes.cf.
In a VCS cluster, the first system to be brought online reads the configuration file and creates
an internal (in-memory) representation of the configuration. Systems brought online after the
first system derive their information from systems running in the cluster. You must stop the
cluster while you are modifying the files from the command line. Changes made by editing
the configuration files take effect when the cluster is restarted. The node on which the
changes were made should be the first node to be brought back online.
main.cf File
Include Clauses
Include clauses incorporate additional configuration files into main.cf. These additional files
typically contain type definitions, including the types.cf file. Other type definitions must be
included as required.
Cluster Definition
This section of main.cf defines the attributes of the cluster, including the cluster name and
the names of the cluster users.
System Definition
Each system designated as part of the cluster is listed in this section of main.cf. The names
listed as system names must match the name returned by the command uname-a. System
names are preceded with the keyword system. For any system to be used in a service group
definition, it must be defined in this section.

Service Group Definition


Service group definitions in main.cf comprise the attributes of a particular service group.
Resource Definition
This section in main.cf defines each resource used in a particular service group
Service Group Dependency Clause
To configure a service group dependency, place the keyword requires in the service group
declaration of the main.cf file. Position the dependency clause before the resource
dependency specifications and after the resource declarations.
Resource Dependency Clause
A dependency between resources is indicated by the keyword requires between two
resource names. This indicates the second resource (the child) must be online before the first
resource (the parent) can be brought online. Conversely, the parent must be offline before
the child can be taken offline. Also, faults of the children are propagated to the parent.
The following example is a basic two-node cluster exporting an NFS file system. The systems
are configured as:
1. servers: Server1 and Server2
2. storage: One disk group managed using VERITAS Volume Manager, shared1
3. file system: /home
4. IP address: 192.168.1.3 IP_nfs1 public interface: hme0 Server1 is primary location to
start the NFS_group1
In an NFS configuration, the resource dependencies must be configured to bring up the IP
address last. This prevents the client from accessing the server until everything is ready, and
preventing unnecessary Stale File Handle errors on the clients.
include types.cf
cluster demo (
UserNames = { admin = cDRpdxPmHpzS }
)
system Server1
system Server2
group NFS_group1 (
SystemList = { Server1, Server2 }
AutoStartList = { Server1 }
)
DiskGroup DG_shared1 (
DiskGroup = shared1
)
IP IP_nfs1 (
Device = hme0
Address = 192.168.1.3
)
Mount Mount_home (
MountPoint = /export/home
BlockDevice = /dev/vx/dsk/shared1/home_vol

FSType = vxfs
FsckOpt = -y
MountOpt = rw
)
NFS NFS_group1_16 (
Nservers = 16
)
NIC NIC_group1_hme0 (
Device = hme0
NetworkType = ether
)
Share Share_home (
PathName = /export/home
)
IP_nfs1 requires Share_home
IP_nfs1 requires NIC_group1_hme0
Mount_home requires DG_shared1
Share_home requires NFS_group1_16
Share_home requires Mount_home
The types.cf File
The types.cf file describes standard resource types to the VCS engine; specifically, the data
required to control a specific resource.
The following example illustrates a DiskGroup resource type definition.
type DiskGroup (
static int OnlineRetryLimit = 1
static str ArgList[] = { DiskGroup, StartVolumes, StopVolumes,
MonitorOnly, MonitorReservation, tempUseFence }
str DiskGroup
str StartVolumes = 1
str StopVolumes = 1
static int NumThreads = 1
boolean MonitorReservation = 0
temp str tempUseFence = INVALID
)
The types definition performs two important functions. First, it defines the type of values
that may be set for each attribute.
The second critical piece of information provided by the type definition is the ArgList
attribute. The line static str ArgList[] = { xxx, yyy, zzz } defines the order in which parameters
are passed to the agents for starting, stopping, and monitoring resources.
Attributes:
VCS components are configured using attributes. Attributes contain data about the cluster,
systems, service groups, resources, resource types, agent, and heartbeats if using global
clusters. For example, the value of a service groups SystemList attribute specifies on which
systems the group is configured and the priority of each system within the group.

Types:
String
Integer
Boolean

Introducing the VCS User Privilege Model


Cluster operations are enabled or restricted depending on the permissions with which you log
on to VCS. Each category is assigned specific privileges, and some categories overlap; for
example, Cluster Administrator includes privileges for Group Administrator, which includes
privileges for Group Operator.

Cluster Administrator
Users in this category are assigned full privileges, including making configuration read-write,
creating and deleting groups, setting group dependencies, adding and deleting systems, and
adding, modifying, and deleting users. All group and resource operations are allowed. Users
with Cluster Administrator privileges can also change other users privileges and passwords .
Users in this category can create and delete resource types and execute remote commands
from the Manager (Java Console) via Cluster Shell. Cluster Administrators are allowed access
to Cluster Shell if designated in the value of the attribute HaCliUserLevel.
Note Cluster Administrators can change their own and other users passwords only after
changing the configuration to read/write mode.
Cluster Operator
In this category, all cluster-, group-, and resource-level operations are allowed, including
modifying the users own password and bringing service groups online.
Note Users in this category can change their own passwords only if configuration is in
read/write mode. Cluster Administrators can change the configuration to the read/write
mode.
Users in this category cannot create service groups or execute remote commands via Cluster
Shell. Additionally, users in this category can be assigned Group Administrator privileges for
specific service groups.
Group Administrator
Users in this category can perform all service group operations on specific groups, such as
bringing groups and resources online, taking them offline, and creating or deleting resources.
Additionally, users can establish resource dependencies and freeze or unfreeze service
groups. Note that users in this category cannot create or delete service groups.
Group Operator
Users in this category can bring service groups and resources online and take them offline.
Users can also temporarily freeze or unfreeze service groups.

Cluster Guest
Users in this category have read-only access, meaning they can view the configuration, but
cannot change it. They can modify their own passwords only if the configuration is in
read/write mode.
Note By default, newly created users are assigned Cluster Guest permissions.
Review the following sample main.cf:
Cluster vcs
UserNames = { sally = Y2hJtFnqctD76, tom = pJad09NWtXHlk,betty = kjheewoiueo, lou =
T6jhjFYkie, don = gt3tgfdgttU,
intern = EG67egdsak }
Administrators = { tom }
Operators = { sally }

)
Group finance_server (
Administrators = { betty }
Operators = { lou, don }

)
Group hr_application (
Administrators = { sally }
Operators = { lou, betty }

)
Group test_server (
Administrators = { betty }
Operators = { intern, don }

The following concepts apply to users executing commands from the command line:
Users logged on as root (or administrator) are granted privileges that exceed those of Cluster
Administrator, such as the ability to start and stop a cluster.
When non-root users execute haxxx commands, they are prompted for their VCS user name
and password to authenticate themselves. Use the halogin command to save the

authentication information so that you do not have to enter your credentials every time you
run a VCS command.
User Privileges in Global Clusters
VCS enforces user privileges across clusters. A cross-cluster online or offline operation is
permitted only if the user initiating the operation has one of the following privileges:
1. Group Administrator or Group Operator privileges for the group on the remote cluster
2. Cluster Administrator or Cluster Operator privileges on the remote cluster
A cross-cluster switch operation is permitted only if the user initiating the operation has the
following privileges:
1. Group Administrator or Group Operator privileges for the group on both clusters
2. Cluster Administrator or Cluster Operator privileges on both clusters

System States
The following table provides a list of VCS system states and their descriptions.
STATE
ADMIN_WAIT

DEFINITION
The running configuration was lost. A system transitions
into this state for the following reasons:
The last system in the RUNNING configuration leaves the
cluster before another system takes a snapshot of its
configuration and transitions to the RUNNING state.
A system in LOCAL_BUILD state tries to build the
configuration from disk and receives an unexpected error
from hacf indicating the configuration is invalid.

CURRENT_DISCOVER_WAIT

The system has joined the cluster and its configuration file
is valid. The system is waiting for information from other
systems before it determines how to transition to another
state.

CURRENT_PEER_WAIT

The system has a valid configuration file and another


system is doing a build from disk (LOCAL_BUILD). When its
peer finishes the build, this system transitions to the state
REMOTE_BUILD.

EXITING

The system is leaving the cluster.

EXITED

The system has left the cluster.

EXITING_FORCIBLY

A hastop -force command has forced the system to leave

STATE

DEFINITION
the cluster.

FAULTED

The system has left the cluster unexpectedly.

INITING

The system has joined the cluster. This is the initial state
for all systems.

LEAVING

The system is leaving the cluster gracefully. When the


agents have been stopped, and when the current
configuration is written to disk, the system transitions to
EXITING.

LOCAL_BUILD

The system is building the running configuration from the


disk configuration.

REMOTE_BUILD

The system is building a running configuration that it


obtained from a peer in a RUNNING state.

STALE_ADMIN_WAIT

The system has a stale configuration and there is no other


system in the state of RUNNING from which to retrieve a
configuration. If a system with a valid configuration is
started, that system enters the LOCAL_BUILD state.
Systems in STALE_ADMIN_WAIT transition to
STALE_PEER_WAIT.

STALE_DISCOVER_WAIT

The system has joined the cluster with a stale configuration


file. It is waiting for information from any of its peers
before determining how to transition to another state.

STALE_PEER_WAIT

The system has a stale configuration file and another


system is doing a build from disk (LOCAL_BUILD). When its
peer finishes the build, this system transitions to the state
REMOTE_BUILD.

UNKNOWN

The system has not joined the cluster because it does not
have a system entry in the configuration.

Thank you for reading this article.Please leave a comment if you have any doubt ,i will get back to you
as soon as possible.

You might also like