You are on page 1of 7

High Availability Support for Failover Clustering (HA)

3 High Availability Support for Failover Clustering


(HA)
Switchover clusters guarantee high availability of the SAP system by switching critical
software units across multiple hosts in the cluster. When a primary node fails, proprietary
switchover software automatically switches the failed software unit to another hardware node
in the cluster. Applications accessing the failed software unit experience a short delay but
resume normal processing after the switchover.

The term “cluster” is used in information technology in the following different


ways: switchover cluster, software cluster, and database cluster.
The first one, switchover cluster, is described here; the second means the
functionality of redundant application servers, all running the same software.
This has the benefit of high availability but at the same time is also a feature for
scalability. The third term means high availability for databases, which may
come in different flavors.
If you come across the term “cluster”, make sure you understand which
meaning is used.
Switchover clusters also have the advantage that you can release a particular node for
system maintenance by deliberately initiating a switchover. Switchover solutions can protect
against hardware failure and operating system failure, but not operator errors or faulty
application software.
SAP NetWeaver introduces the concept of the Server Central Services (SCS) – an instance
that consists of the essential enqueue and message system services only. This has been
standard for AS Java installations and now is possible for AS ABAP also.
The benefit of having a separate SCS instance is mainly in the area of high availability. This
approach concentrates the possible single points of failure of a system into a single instance
and, therefore, ensures isolation just on them. Before the SCS entities were located on a
separate functional instance, it was necessary to extend protection to a complete system.
With the introduction of the SCS, the term "central instance" becomes almost obsolete. Up to
SAP NetWeaver 7.0, the Software Deployment Manager (SDM) is installed on the central
instance, which is a single instance for the cluster. Nevertheless, the SDM is not considered
a single point of failure as deployment on productive systems usually accompanies
downtime.
To reflect this development the, term “dialog instance” is not used any longer in this HA
documentation. From now on, instances running functional services are named "application
server" (AS) instance.
This means that high available SAP NetWeaver systems have only two kinds of instances,
either an SCS or an AS instance. However, as there currently are still two central services
instances in a SAP NetWeaver Add-In installation – one for ABAP and one for Java – they
are called SCS and ASCS.
The critical components in a SAP system are:
• Central services instance (SCS and ASCS) with message server and enqueue
server
• Database instance
• SAP central file system
Other instances can be protected by running them redundantly. For example, you can add
additional application servers.

March 2008 7
High Availability Support for Failover Clustering (HA)

A switchover cluster consists of:


• A hardware cluster of two or more different hosts to hold multiple copies of the critical
software units.
• Switchover software to detect failure in a node and switch the affected software unit
to the standby node, where it can continue operating.
• A mechanism to enable application software to seamlessly continue working with the
switched software unit by using virtual network identity of protected instances.

Design the switchover cluster together with your hardware partner. The switchover product
has to match your operating system.

We strongly recommend that all the hosts used for switchover have the same
operating system.

Assuming that central services instance is running under the operating system
Microsoft Windows and the DB on a UNIX platform, the central services
instance can only switch to another Windows host and the DB to another UNIX
host.

The following figure shows the essential features of a switchover setup:

March 2008 8
High Availability Support for Failover Clustering (HA)

The figure above shows several switchover units spanning several hardware
clusters. This is not a required setup. There are many possible switchover
scenarios and it is possible to run a switchover environment for SAP systems
with the minimal configuration of two different hosts.

3.1 Protecting System-Critical Components


This section describes switchover strategies for protecting critical components.

3.1.1 Switchover Units


Switchover components are entities that are combined in a hardware cluster for the
switchover process. In the switchover process, every entity in that component is switched. To
keep the downtime of an entity during switchover as short as possible, switchover groups
must be as small as possible.
The SAP recommendation is to group the critical components of your SAP system into the
following switchover components:
• Central services instance
• Database instance
Distributing the database instance from the ASCS onto two different hosts is
recommended when SAP NetWeaver is running in a multi-host environment with a heavy
database workload.

The saposcol process must run on the DB host to enable CCMS functions.
For more information, see SAP Note 20624
• File system
Several directories in a SAP system are shared among all instances. Protecting files
systems is handled by the cluster software solution provider.

3.1.2 Failure and Switchover of SAP NetWeaver Application Server


Instances
The purpose of distributing the SAP NetWeaver Application Server over several instances is
to avoid switchover procedures. If there is more than one distributed instance of the
application server, then high availability is achieved. In case one application server instance
fails, the user can always reconnect to another application server instance. However, the
current session is lost in this case. If session failover is needed in addition, this is possible for
usage type AS Java. For more information, see help.sap.com/nw2004sÆ SAP
NetWeaver 7.0 (2004s) Æ English Æ SAP Library Æ SAP NetWeaver Library Æ SAP
NetWeaver by Key CapabilitiesÆ Application Platform by Key Capability Æ Java Technology
Æ Administration Manual Æ J2EE Engine Æ Application Management Æ Failover System
Failover on application server instances is not recommended for performance reasons.
These instances are much bigger then the SCS instances and need more resources for the
switchover process. Nevertheless, it is a possible setup.

Please note that the installation procedure for Windows™ does not support this.

March 2008 9
High Availability Support for Failover Clustering (HA)

3.1.3 Internet Communication Manager (ICM)


The ICM lets the SAP system communicate with the outside world using the HTTP, HTTPS
and SMTP protocols. In its role as a server, the ICM can process requests from the Internet
that arrive as URLs with the server/port combination that the ICM can listen to. The ICM then
calls the relevant local handler for the URL in question.
The Internet Communication Manager (ICM) is implemented as an independent process
started and monitored by the ABAP dispatcher.

ICM Server Cache


The ICM Server Cache saves HTTP objects before they are sent to the client.
The HTTP request handler uses the ICM Server Cache when, for example, response pages
need to be re-used, such as the entry page of an online shop application. The first time, the
request is processed in the backend. The response is stored by the ICM server cache before
it is sent to the client. When the page is requested again, the application gets the page
directly from the ICM, when the expiration date has not expired, sends it to the client and the
work process does not have to be opened. The result is much better performance.

Failure of Internet Communication Manager


When the Internet Communication Manager (ICM) fails, the affected instance cannot
communicate using Internet protocols. Communication using the dispatcher is not affected.
Therefore, the ABAP dispatcher restarts the ICM when it detects a failure. As the ICM does
not hold state information, only active requests are affected.
As there is an ICM for each SAP Web NetWeaver AS instance, it is not a critical component
and does not need further protection.
Sessions that have used the ICM get an error for a recurring request. Using the SAP Web
dispatcher, the sessions are directed to another server. Using message-server based
redirection, the user has to initiate a new redirection to access the message server.

For more information on the ICM, see help.sap.com/NW2004s:


SAP NetWeaver 7.0 (2004s) Æ English Æ SAP Library Æ SAP NetWeaver Æ
Application Platform (SAP Web Application Server) Æ Architecture of the SAP
Web AS Æ Internet Communication Manager (ICM)

3.1.4 System Landscape Directory (SLD)


The System Landscape Directory (SLD) is the central directory of system landscape
information relevant for the management of your software lifecycle. It contains a description
of your system landscape (that is, the software units that are currently installed) and a
repository of software units that can be installed in your landscape.
The SLD can be installed as one single central SLD, as one central SLD with sub-SLDs or
multiple standalone SLDs.

3.1.5 Central Services Instance Failure


The most critical part of a central services instance failure is the loss of the enqueue server.
The locks held by the SAP system are lost and the enqueue server has to be restarted
(unless you are using a replicated enqueue server). The message server is also disabled.

March 2008 10
High Availability Support for Failover Clustering (HA)

Communication between the different application servers in the distributed system also fails
or is impeded.
SAP NetWeaver 7.0 ensures database consistency by disabling enqueue transactions when
the enqueue server is not available.
After the central services instance has been switched over to another host, it has to be
restarted. However, external SAP NetWeaver application servers on different hosts
might still be holding open, uncommitted transactions. These can hold enqueue locks
that have been lost but are not visible anywhere in the entire SAP system.
If no precautions are taken, any user in the SAP system can then lock the same object
and change it in the database, which can cause an inconsistent database. Therefore, all
open transactions in the entire SAP system must be aborted and rolled back before the
enqueue server is restarted.

System Impact
The switchover of the central services instance has the following impact on your system:
• Transaction locks that have not yet been committed are lost at a system-wide level.
• All user input for all transactions that have not been finished with the ABAP command
COMMIT WORK needs to be re-entered.
• RFC connections are maintained.
In the case of a planned central services switchover, you need to notify the users and
give them a deadline to commit all their transactions.

To eliminate the enqueue server as a critical component you have to set up the
enqueue server as standalone replicated enqueue server.

Enqueue Replication Server


The enqueue replication server runs on another host and contains a replica of the lock table.
All clients and the replication server are connected to the SCS instance. If the SCS instance
fails, it is restarted by the cluster software on the replication server, and the lock table stored
on the replication server is transferred to the SCS instance. The cluster software also
ensures that access attempts from clients go through the replication enqueue server while
the SCS instance is out of action.
If the replication server fails, it can also be restarted. It retrieves the lock table from the SCS
instance when it restarts. In normal circumstances, the replication server only gets delta
information for the lock table.

3.1.6 Failure and Switchover of the Database Server


You can use DB reconnect in all situations where the database connection fails, such as host
failure, planned shutdown, or temporary interruption of the connection to the database host.
This feature enables automatic reconnection to the database if the last connection was
closed unexpectedly. There are two types of DB reconnect:
• Reconnect to the same database instance
The reconnect to the same database instance is only successful if the error condition
has been resolved.
• Reconnect to a standby database instance
This setup uses parallel database technology, where application hosts are connected
to one database instance with an additional database instance on another host
available as a standby instance.

March 2008 11
High Availability Support for Failover Clustering (HA)

The reconnect to a standby database instance is normally successful immediately


after the DB failure, if an error does not occur on this instance as well.
However, if an instance is (re-)started without being able to access the database, the
instance stops. There is no reconnect at startup time. The same applies to the restarted work
process in usage type AS ABAP: if the initial connect fails, the work process is stopped and is
not restarted.

DB Reconnect – AS ABAP
In the event of a database host failure, the network connection of SAP work processes to the
DBMS is lost. If a work process encounters an error in the database connection, the built-in
“DB Reconnect” mechanism starts, and tries to re-establish the database connection.
The DB reconnect feature makes sure that all work processes of all SAP instances are
automatically reconnected to the DB as soon as the DB service is restarted and becomes
available again. The work processes can transparently recover after temporary DB failure.
To the end user, the temporary DBMS failure is almost fully transparent, apart from the time
taken for the DB service to be switched over and become operational again. The functionality
depends on the type of access service involved – that is, dialog, batch, or update.
For more information about the database reconnect feature for ABAP, see
help.sap.com/nw2004sÆ SAP NetWeaver 7.0 (2004s) Æ English Æ SAP Library Æ SAP
NetWeaver Library Æ SAP NetWeaver by Key CapabilitiesÆ Solution Lifecycle Management
by Key Capabilities Æ SAP High Availability Æ SAP NetWeaver AS ABAP: High AvailabilityÆ
DB Reconnect (AS-ABAP)

Technical Details
The DB reconnect features avoid all work processes being shut down and therefore the SAP
instances do not have to be restarted.
If pre-defined errors are returned from the database call to a work process, this process is set
to “reconnect” status. The transaction run by the process is terminated while the process
keeps running and informs all other work processes (regardless of the type) on the host
about the database restart. If the database is not available, all work processes switch to this
status within a short period of time.
Whenever a user request is received, a work process in this status tries to reconnect to the
database system before it starts the requested transaction. If the database is accessible
again, a work process in the reconnect state lets the transaction start without terminating.
This is transparent to the user. If the database connection cannot be re-established, the
transaction does not start and the user is informed by a popup message about the lost
database connection.
For more information, see SAP Note 98051.

DB Reconnect – AS Java
The DB reconnect has to be handled by the application. Depending on the programming
model used for database access in your applications, SAP Web AS Java provides the
following reconnect mechanisms:
• Connections using OpenSQL and NativeSQL with Java Database Connectivity
(JDBC)
• Connections using VendorSQL with direct JDBC connection
For more details on the database reconnect feature for Java, see
help.sap.com/nw2004sÆ SAP NetWeaver 7.0 (2004s) Æ English Æ SAP Library Æ SAP
NetWeaver Library Æ SAP NetWeaver by Key CapabilitiesÆ Solution Lifecycle Management
by Key Capabilities Æ SAP High Availability Æ SAP Web AS Java: High Availability Æ
System Failure (AS Java) Æ Persistence Layer and Databases.

March 2008 12
High Availability Support for Failover Clustering (HA)

3.2 High Availability Installation Scenario Rules


The following set of rules is created to serve as a base when setting up custom installation
scenarios. These rules are meant for different operating systems and some of them may be
obsolete for a particular operating system.
1. Database instance and central services instance run in different switchover groups.
Databases have significant impact on switchover due to processing of transaction
logs, thus it is not recommended to include them together with other components in a
switchover unit.
The central services do not use or need a lot of resources; therefore they can be
switched very fast. Thus, it is better to have independent switchover groups for both
services.
2. Java SCS/ABAP SCS instances should be in a single switchover group
As SCS Instances contain quite small processes which are very important to the
whole environment it is recommended to keep them alone in their switchover group.
Thus eliminating the case that SCS instances have to wait for larger processes get
running
3. AS may run in a switchover group.
It is possible to run an application server in a switchover group, but this is not
necessary. In case of a switchover, all user sessions will be lost if the application
does not support session persistence. This configuration is not supported with the
installation provided for Microsoft Windows™
4. Several switchover groups may run on the same hardware cluster.
It is possible to run several switchover groups on one hardware cluster if the HA
software allows this. It is also an option, to distribute them to several hardware
clusters.
5. The use of an enqueue replication server is strongly recommended.
The enqueue service handles locks in the application server. In case the service fails
and is not replicated, there may still be running processes in the system that continue
to use old locks. To keep integrity, the server restarts as soon as it detects such
situation. Although this detection only can appear when contacting the enqueue
service, there is a possible short timeframe of risk.
By using a replicated enqueue server, this risk is completely avoided and integrity
can be assured under all circumstances.
6. If ABAP central instance exists, it has to be in a switchover group.
This is only relevant for installations that will be made highly available later and have
been installed in standard mode. In this way, the critical components are part of the
main process that defines the old central instance. You either have to follow this rule
or reinstall your system in HA mode.
7. Additional AS may serve the same SAP System.
It is possible to have additional non-protected application servers in the same
system, either on additional hardware or on the hardware cluster. This does not
influence high availability issues and is only relevant for performance.

March 2008 13

You might also like