You are on page 1of 19

IBM Tivoli System Automation for Multiplatforms (TSA MP) Rolling Upgrade Procedure for a TSAMP Automated HADR

environment.
This whitepaper is written for a DB2 v9.5/9.7 Automated Failover HADR environment. The goal is to show how to keep the database available on one server while maintenance of any sort is carried out on the other server.

October 2011

Authors: Gareth Holl, IBM Tivoli Support (gholl@us.ibm.com)

Table of Contents
Table of Contents.............................................................................................................................................2 Introduction.....................................................................................................................................................3 High Level Overview........................................................................................................................................4 A. Display Initial System State..........................................................................................................................5 B. Save the automation policy to an XML file..................................................................................................6 C. Disable Critical Resource Protection............................................................................................................6 D. Deactivate the Standby Database (on node02)...........................................................................................6 E. Shutdown the DB2 Instance on the Standby Node (node02).......................................................................7 F. Offline the Standby Node (node 02)............................................................................................................7 G. Maintenance time for the Standby node (node02).....................................................................................7 H. Bring the Standby Node (node02) back Online...........................................................................................8 I. Start the DB2 instance on the Standby node (node02).................................................................................8 J. Perform a controlled failover (to move online resources off current primary node)..................................9 K. Repeat steps D through F where the new standby node is now the original primary node (node01).........9 L. Maintenance Time for the New Standby node (node01)...........................................................................10 M. Online the node and restart the current Standby DB2 instance...............................................................10 N. [OPTIONAL] Perform a Controlled Failover (Failback)............................................................................11 O. Post Maintenance Task Activating new RSCT code level (if RSCT was upgraded by any means)............12 P. Post Maintenance Task - Activate new TSA MP code level (if TSAMP was upgraded, including Fixpack upgrade)........................................................................................................................................................13 Q. Post Maintenance Task - Check each component has been migrated completely (if RSCT or TSAMP upgrades performed).....................................................................................................................................14 R. Post Maintenance Task - Re-enable Critical Resource Protection.............................................................15 S. Check the state of your HADR environment..............................................................................................15 T. [OPTIONAL] Restore/Activate the automation policy previously saved.....................................................16 Appendix - References and more information..............................................................................................17

Introduction
This whitepaper provides a step-by-step approach to preparing each server for planned maintenance activities in a rolling upgrade fashion. It is tailored for a DB2 v9.5/9.7 High Availability Disaster Recovery (HADR) environment. This procedure keeps the database accessible throughout the upgrade process, however automated failover and other automated recovery actions would unavailable until all steps were completed. This document assumes that an Automated Failover HADR environment has been created via the db2haicu tool (for details on constructing such an Automated Failover HADR environment, please consult the white paper Automated cluster controlled HADR configuration setup using the IBM DB2 high availability instance configuration utility available at www.ibm.com/developerworks/data/library/long/dm0907hadrdb2haicu/index.html). The rolling upgrade procedure is documented in a step by step fashion, beginning with step A. and concluding with step T.; ensure that each step is carefully considered before potentially ignoring. Finally, an Appendix is included for addition information. The environment used to build this step-by-step guide was a two node cluster using TSAMP v3.2.1.2 and RSCT v3.1.0.4. TSAMP was being used to manage/automate a DB2 v9.7.0.4 HADR environment. However, the same procedure applies to other combinations of the same software set.

In the following text, these conventions apply: Bold indicates commands that you type. The prompt indicates the userid that will issue the command. For example, root@node01:# indicates that the command is issued by the 'root' user on node 'node02' db2inst1@node02% indicates that the command is issued by the db2inst1 user on 'node02'

High Level Overview


The rolling upgrade process consists of the following steps: 1) Stop DB2 and TSAMP services on the current standby node 2) Perform required maintenance on the current standby node 3) Bring the TSAMP and DB2 services back online on the current standby node 4) Perform a takeover so that maintenance can be repeated on the other node 5) Repeat steps 1 to 3 6) [Optional] Perform a takeover to move primary database back to original node The purpose of doing it this way is to maintain access to the DB2 database with only a minimal interruption to service during a necessary takeover step. It is recommended that a planned maintenance period be schedule in case of unforeseen problems. Please note if your maintenance activities are to include upgrading the TSAMP software from v2.2 directly to v3.2 (or some later fixpack of v3.1), you will not be able to use the rolling upgrade procedure documenting in this guide. You will need to use Entire domain migration technique (whitepaper available via TSAMP Support team).

A. Display Initial System State


Use the lsrpdomain, lsrpnode, and lssam commands to list and verify the current cluster state and resource states. root@node01:# export CT_MANAGEMENT_SCOPE=2 root@node01:# lsrpdomain Name hadrdom OpState RSCTActiveVersion MixedVersions TSPort GSPort Online 3.1.0.4 No 12347 12348

root@node01:# lsrpnode Name node02 node01 OpState RSCTVersion Online 3.1.0.4 Online 3.1.0.4

root@node01:# lssam Online IBM.ResourceGroup:db2_db2inst1_node02_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_node02_0-rs '- Online IBM.Application:db2_db2inst1_node02_0-rs:node02 Online IBM.ResourceGroup:db2_db2inst1_node01_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_node01_0-rs '- Online IBM.Application:db2_db2inst1_node01_0-rs:node01 Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg Nominal=Online |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs |- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node02 '- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node01 '- Online IBM.ServiceIP:db2ip_10_20_30_40-rs |- Offline IBM.ServiceIP:db2ip_10_20_30_40-rs:node02 '- Online IBM.ServiceIP:db2ip_10_20_30_40-rs:node01 Verify the state of the HADR pair using native DB2 commands, for example: db2inst1@node01% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Primary Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0

ConnectStatus ConnectTime Timeout Connected Sun Aug 8 11:00:58 2010 (1241544058) 120 .

B. Save the automation policy to an XML file


Save the current DB2/HADR resources, groups, equivalencies, and relationships (the automation policy) to an XML file in case of problems that might require restoration/activation of the policy. The TSAMP command sampolicy is used for this task : root@node01:# sampolicy -s hadr_policy_YYYYMMDD.xml

C. Disable Critical Resource Protection


Disable Critical Resource Protection so as to prevent unexpected server rebooting : root@node01:# chrsrc -c IBM.PeerNode CritRsrcProtMethod=5

Verify change in setting as shown : root@node01:# lsrsrc -c IBM.PeerNode CritRsrcProtMethod Resource Class Persistent Attributes for IBM.PeerNode resource 1: CritRsrcProtMethod = 5

D. Deactivate the Standby Database (on node02)


As the instance owner, deactivate the database on the standby node only: db2inst1@node02:% db2 deactivate database hadrdb This should cause the HADR Resource Group to be automatically locked as shown via the 'lssam' output. This is expected the HADR Resource Group is locked by the DB2 engine whenever the HADR pair are not in a peer state, by design, to prevent TSAMP from attempting any failover activity.

E. Shutdown the DB2 Instance on the Standby Node (node02)


As the instance owner, stop the DB2 instance on the standby node only : db2inst1@node02:% db2stop force This should cause the DB2 instance Resource Group on the standby node to be locked so that TSAMP does not attempt to restart the DB2 instance.

F. Offline the Standby Node (node 02)


From the standby node, run the following as the root user: root@node02:# stoprpnode f node02 root@node02:# lsrpdomain Name hadrdom OpState Offline RSCTActiveVersion 3.1.0.4 MixedVersions No TSPort GSPort 12347 12348

Wait until the OpState of the domain shows Offline (from the standby node), as shown above, before proceeding to the next step. Note that the domain will still show online if you check from the other node the 'lsrpnode' command (from the online node) will show the opposite node as Offline.

G. Maintenance time for the Standby node (node02)


At this point, the standby node is in a state where maintenance activities can be performed on the OS, DB2, or TSAMP/RSCT, including any necessary reboots.

H. Bring the Standby Node (node02) back Online


The standby node needs to be brought back Online after you've completed all maintenance activities on that node. This can be done with the startrpdomain command from the standby node : root@node02:# startrpdomain hadrdom root@node02:# lsrpdomain Name hadrdom OpState Online RSCTActiveVersion 3.1.0.4 MixedVersions No TSPort GSPort 12347 12348

Re-issue the lsrpdomain command until the domain changes from Pending online to Online must each Online before you proceed to the next step.

I. Start the DB2 instance on the Standby node (node02)


As the instance owner, start the DB2 instance on the standby node : db2inst1@node02:% db2start

Activate the database if it is not automatically activated: db2inst1@node02:% db2 activate database hadrdb

Check that the database comes up in Standby mode and that it reaches a Peer state : db2inst1@node02% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Standby Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0

ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:21:30 2011 (1241544058) 120

J. Perform a controlled failover (to move online resources off current primary node)
For a DB2 v9.5/9.7 HADR environment, a controlled failover is actually performed by issuing the DB2 takeover command on the current standby node. db2inst1@node02:% db2 takeover hadr on database hadrdb Confirm the HADR roles on each node swaps and Peer state is maintained: db2inst1@node02% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Primary Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0

ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:20:58 2011 (1241544058) 120 .

K. Repeat steps D through F where the new standby node is now the original primary node (node01).
As the instance owner, deactivate the database and stop the instance on the new standby node only: db2inst1@node01:% db2 deactivate database hadrdb db2inst1@node01:% db2stop force From the new standby node, run the following as the root user: root@node01:# stoprpnode f node01 root@node01:# lsrpdomain Name hadrdom OpState Offline RSCTActiveVersion 3.1.0.4 MixedVersions No TSPort GSPort 12347 12348

Wait until the OpState of the domain shows Offline, as shown above, before proceeding to the next step. Note that the domain will still show online if you check from the other node ... the 'lsrpnode' command (from the online node) will show the opposite node as Offline.

L. Maintenance Time for the New Standby node (node01)


At this point, the node is in a state where maintenance activities can be performed on the OS, DB2, or TSAMP/RSCT, including any necessary reboots.

M. Online the node and restart the current Standby DB2 instance
Once maintenance activities have been completed, bring the current standby node (node01) back online within the domain, restart the DB2 instance, and where necessary activate the database.

root@node01:# startrpdomain hadrdom root@node01:# lsrpdomain Name hadrdom OpState Online RSCTActiveVersion 3.1.0.4 MixedVersions No TSPort GSPort 12347 12348

Issue the lsrpdomain command until the domain changes from Pending online to Online. As the instance owner, start the DB2 instance on the standby node : db2inst1@node01:% db2start

Activate the database if it is not automatically activated: db2inst1@node01:% db2 activate database hadrdb

Check that the database comes up in Standby mode and that it reaches a Peer state : db2inst1@node01% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Standby Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0

ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:21:30 2011 (1241544058) 120

10

N. [OPTIONAL] Perform a Controlled Failover (Failback)


If you prefer that your primary HADR database resides on the node it was originally on (node01) prior to this whole rolling upgrade procedure, issue a DB2 takeover command from the current standby node (node01) : db2inst1@node01:% db2 takeover hadr on database hadrdb

Check that the roles swap and Peer state is maintained : db2inst1@node01% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Primary Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0

ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:20:58 2011 (1241544058) 120 .

db2inst1@node02% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Standby Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0

ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:21:30 2011 (1241544058) 120

11

O. Post Maintenance Task Activating new RSCT code level (if RSCT was upgraded by any means)
If you've patched the AIX operating system, its possible the RSCT software has also been upgraded. If you've applied a Fixpack to the TSAMP software, its also very likely the RSCT software has been upgraded. If this is the case, the online domain will show MixedVersions (lsrpdomain) as Yes, and RSCTVersion (lsrpnode) will show the new RSCT level. Assuming youve installed the new code on each server, RSCTVersion should be the same on each server. RSCTVersion would be different to the RSCTActiveVersion (lsrpdomain) since the newly installed RSCT level has not been activated yet. If youre expecting a change to the RSCT level but MixedVersions is still set to No, wait a couple minutes and re-check using lsrpdomain and lsrpnode. To activate the new RSCT level, issue the following commands : root@node01:# export CT_MANAGEMENT_SCOPE=2 root@node01:# runact -c IBM.PeerDomain CompleteMigration Options=0 Resource Class Action Response for CompleteMigration

12

P. Post Maintenance Task - Activate new TSA MP code level (if TSAMP was upgraded, including Fixpack upgrade)
Before activating the new TSA MP code level, ensure TSAMP (IBM.RecoveryRM) has finished initializing by checking In Config State is set to TRUE : root@node01:# lssrc -ls IBM.RecoveryRM | grep In Config State In Config State : TRUE

Now activate the new TSA MP level : root@node01:# samctrl -m Ready to Migrate! Are you Sure? [Y|N]:. Y

13

Q. Post Maintenance Task - Check each component has been migrated completely (if RSCT or TSAMP upgrades performed)
Ensure that MixedVersions is no longer Yes for the Cluster component (RSCT), and that the RSCTActiveVersion shows the same level as the RSCTVersion on each server : root@node01:# lsrpdomain Name hadrdom OpState Online RSCTActiveVersion 3.1.0.4 MixedVersions TSPort GSPort No 12347 12348

root@node01:# lsrpnode Name node02 node01 OpState Online Online RSCTVersion 3.1.0.4 3.1.0.4

Ensure that the Active Version Number (AVN) matches the Installed Version Number (IVN) for TSA MP : root@node01:# lssrc ls IBM.RecoveryRM |grep VN Our IVN Our AVN : 3.2.1.2 : 3.2.1.2

14

R. Post Maintenance Task - Re-enable Critical Resource Protection


Turn critical resource protection (CritRsrcProtMethod) back on : root@node01:# chrsrc -c IBM.PeerNode CritRsrcProtMethod=1

Verify the change in setting as shown: root@node01:# lsrsrc -c IBM.PeerNode Resource Class Persistent Attributes for IBM.PeerNode resource 1: CritRsrcProtMethod = 1

S. Check the state of your HADR environment


At this point your DB2 resources should be showing Online ... the lssam output should look similar to the following : root@node01:# lssam Online IBM.ResourceGroup:db2_db2inst1_node02_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_node02_0-rs '- Online IBM.Application:db2_db2inst1_node02_0-rs:node02 Online IBM.ResourceGroup:db2_db2inst1_node01_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_node01_0-rs '- Online IBM.Application:db2_db2inst1_node01_0-rs:node01 Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg Nominal=Online |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs |- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node02 '- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node01 '- Online IBM.ServiceIP:db2ip_9_26_124_22-rs |- Offline IBM.ServiceIP:db2ip_9_26_124_22-rs:node02 '- Online IBM.ServiceIP:db2ip_9_26_124_22-rs:node01

15

Again confirm HADR role and state on each server to be sure all is in good order: db2inst1@node01% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Primary Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0

ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:20:58 2011 (1241544058) 120 . db2inst1@node02% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Standby Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0

ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:21:30 2011 (1241544058) 120

T. [OPTIONAL] Restore/Activate the automation policy previously saved


If the automation policy is incomplete for some reason, you can resolve by simply reactivating the automation policy from the XML file saved back in section B, using the TSAMP command sampolicy as follows : root@node01:# sampolicy -a hadr_policy_YYYYMMDD.xml

16

Appendix - References and more information


Automated Cluster Controlled HADR (High Availability Disaster Recovery) Configuration Setup using the IBM DB2 High Availability Instance Configuration Utility (db2haicu), by Steve Raspudic, Malaravan Ponnuthurai (IBM Canada Ltd./IBM Toronto Software Lab), June 2009 http://www.ibm.com/developerworks/data/library/long/dm-0907hadrdb2haicu/ IBM Red Book: High Availability and Disaster Recovery Options for DB2 on Linux, UNIX, and Windows, by Whei-Jen Chen, Masafumi Otsuki, Paul Descovich, Selvaprabhu Arumuggharaj, Toshihiko Kubo and Yong Jun Bi (IBM Corporation), February 2009. http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247363.html IBM Tivoli System Automation for Multiplatforms (Version 3 Release 1) product/technical documentation: http://publib.boulder.ibm.com/tividd/td/IBMTivoliSystemAutomationforMultiplatforms 3.1.html IBM Tivoli System Automation for Multiplatforms (Version 3 Release 2) product/technical documentation: http://www.ibm.com/developerworks/wikis/display/tivolidoccentral/Tivoli+System+A utomation+for+Multiplatforms Reliable Scalable Cluster Technology (RSCT) Administration Guide http://publib.boulder.ibm.com/infocenter/clresctr IBM DB2 9.5 and DB2 9.7 for Linux, UNIX, and Windows Information Centers on the Web http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp

17

Copyright IBM Corporation, 2011 IBM Corporation Software Group Route 100 Somers, NY 10589 U.S.A. Produced in the United States of America February 2010 All Rights Reserved Neither this document nor any part of it may be copied or reproduced in any form or by any means or translated into another language, without the prior consent of the above-mentioned copyright owner. IBM makes no warranties or representations with respect to the content hereof and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. IBM assumes no responsibility for any errors that may appear in this document. The information contained in this document is subject to change without any notice. IBM reserves the right to make any such changes without obligation to notify any person of such revision or changes. IBM makes no commitment to keep the information contained herein up to date. IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates.

18

Additional Notices Language This information was developed for products and services offered in the U.S.A. Information about non-IBM products is based on information available at the time of first publication of this document and is subject to change. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the users responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan, Ltd. 3-2-12, Roppongi, Minato-ku, Tokyo 106-8711 Japan Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems, and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. All statements regarding IBMs future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only.

19

You might also like