Professional Documents
Culture Documents
environment.
This whitepaper is written for a DB2 v9.5/9.7 Automated Failover HADR environment. The goal is to show how to keep the database available on one server while maintenance of any sort is carried out on the other server.
October 2011
Table of Contents
Table of Contents.............................................................................................................................................2 Introduction.....................................................................................................................................................3 High Level Overview........................................................................................................................................4 A. Display Initial System State..........................................................................................................................5 B. Save the automation policy to an XML file..................................................................................................6 C. Disable Critical Resource Protection............................................................................................................6 D. Deactivate the Standby Database (on node02)...........................................................................................6 E. Shutdown the DB2 Instance on the Standby Node (node02).......................................................................7 F. Offline the Standby Node (node 02)............................................................................................................7 G. Maintenance time for the Standby node (node02).....................................................................................7 H. Bring the Standby Node (node02) back Online...........................................................................................8 I. Start the DB2 instance on the Standby node (node02).................................................................................8 J. Perform a controlled failover (to move online resources off current primary node)..................................9 K. Repeat steps D through F where the new standby node is now the original primary node (node01).........9 L. Maintenance Time for the New Standby node (node01)...........................................................................10 M. Online the node and restart the current Standby DB2 instance...............................................................10 N. [OPTIONAL] Perform a Controlled Failover (Failback)............................................................................11 O. Post Maintenance Task Activating new RSCT code level (if RSCT was upgraded by any means)............12 P. Post Maintenance Task - Activate new TSA MP code level (if TSAMP was upgraded, including Fixpack upgrade)........................................................................................................................................................13 Q. Post Maintenance Task - Check each component has been migrated completely (if RSCT or TSAMP upgrades performed).....................................................................................................................................14 R. Post Maintenance Task - Re-enable Critical Resource Protection.............................................................15 S. Check the state of your HADR environment..............................................................................................15 T. [OPTIONAL] Restore/Activate the automation policy previously saved.....................................................16 Appendix - References and more information..............................................................................................17
Introduction
This whitepaper provides a step-by-step approach to preparing each server for planned maintenance activities in a rolling upgrade fashion. It is tailored for a DB2 v9.5/9.7 High Availability Disaster Recovery (HADR) environment. This procedure keeps the database accessible throughout the upgrade process, however automated failover and other automated recovery actions would unavailable until all steps were completed. This document assumes that an Automated Failover HADR environment has been created via the db2haicu tool (for details on constructing such an Automated Failover HADR environment, please consult the white paper Automated cluster controlled HADR configuration setup using the IBM DB2 high availability instance configuration utility available at www.ibm.com/developerworks/data/library/long/dm0907hadrdb2haicu/index.html). The rolling upgrade procedure is documented in a step by step fashion, beginning with step A. and concluding with step T.; ensure that each step is carefully considered before potentially ignoring. Finally, an Appendix is included for addition information. The environment used to build this step-by-step guide was a two node cluster using TSAMP v3.2.1.2 and RSCT v3.1.0.4. TSAMP was being used to manage/automate a DB2 v9.7.0.4 HADR environment. However, the same procedure applies to other combinations of the same software set.
In the following text, these conventions apply: Bold indicates commands that you type. The prompt indicates the userid that will issue the command. For example, root@node01:# indicates that the command is issued by the 'root' user on node 'node02' db2inst1@node02% indicates that the command is issued by the db2inst1 user on 'node02'
root@node01:# lsrpnode Name node02 node01 OpState RSCTVersion Online 3.1.0.4 Online 3.1.0.4
root@node01:# lssam Online IBM.ResourceGroup:db2_db2inst1_node02_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_node02_0-rs '- Online IBM.Application:db2_db2inst1_node02_0-rs:node02 Online IBM.ResourceGroup:db2_db2inst1_node01_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_node01_0-rs '- Online IBM.Application:db2_db2inst1_node01_0-rs:node01 Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg Nominal=Online |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs |- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node02 '- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node01 '- Online IBM.ServiceIP:db2ip_10_20_30_40-rs |- Offline IBM.ServiceIP:db2ip_10_20_30_40-rs:node02 '- Online IBM.ServiceIP:db2ip_10_20_30_40-rs:node01 Verify the state of the HADR pair using native DB2 commands, for example: db2inst1@node01% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Primary Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0
ConnectStatus ConnectTime Timeout Connected Sun Aug 8 11:00:58 2010 (1241544058) 120 .
Verify change in setting as shown : root@node01:# lsrsrc -c IBM.PeerNode CritRsrcProtMethod Resource Class Persistent Attributes for IBM.PeerNode resource 1: CritRsrcProtMethod = 5
Wait until the OpState of the domain shows Offline (from the standby node), as shown above, before proceeding to the next step. Note that the domain will still show online if you check from the other node the 'lsrpnode' command (from the online node) will show the opposite node as Offline.
Re-issue the lsrpdomain command until the domain changes from Pending online to Online must each Online before you proceed to the next step.
Activate the database if it is not automatically activated: db2inst1@node02:% db2 activate database hadrdb
Check that the database comes up in Standby mode and that it reaches a Peer state : db2inst1@node02% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Standby Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0
ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:21:30 2011 (1241544058) 120
J. Perform a controlled failover (to move online resources off current primary node)
For a DB2 v9.5/9.7 HADR environment, a controlled failover is actually performed by issuing the DB2 takeover command on the current standby node. db2inst1@node02:% db2 takeover hadr on database hadrdb Confirm the HADR roles on each node swaps and Peer state is maintained: db2inst1@node02% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Primary Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0
ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:20:58 2011 (1241544058) 120 .
K. Repeat steps D through F where the new standby node is now the original primary node (node01).
As the instance owner, deactivate the database and stop the instance on the new standby node only: db2inst1@node01:% db2 deactivate database hadrdb db2inst1@node01:% db2stop force From the new standby node, run the following as the root user: root@node01:# stoprpnode f node01 root@node01:# lsrpdomain Name hadrdom OpState Offline RSCTActiveVersion 3.1.0.4 MixedVersions No TSPort GSPort 12347 12348
Wait until the OpState of the domain shows Offline, as shown above, before proceeding to the next step. Note that the domain will still show online if you check from the other node ... the 'lsrpnode' command (from the online node) will show the opposite node as Offline.
M. Online the node and restart the current Standby DB2 instance
Once maintenance activities have been completed, bring the current standby node (node01) back online within the domain, restart the DB2 instance, and where necessary activate the database.
root@node01:# startrpdomain hadrdom root@node01:# lsrpdomain Name hadrdom OpState Online RSCTActiveVersion 3.1.0.4 MixedVersions No TSPort GSPort 12347 12348
Issue the lsrpdomain command until the domain changes from Pending online to Online. As the instance owner, start the DB2 instance on the standby node : db2inst1@node01:% db2start
Activate the database if it is not automatically activated: db2inst1@node01:% db2 activate database hadrdb
Check that the database comes up in Standby mode and that it reaches a Peer state : db2inst1@node01% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Standby Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0
ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:21:30 2011 (1241544058) 120
10
Check that the roles swap and Peer state is maintained : db2inst1@node01% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Primary Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0
ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:20:58 2011 (1241544058) 120 .
db2inst1@node02% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Standby Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0
ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:21:30 2011 (1241544058) 120
11
O. Post Maintenance Task Activating new RSCT code level (if RSCT was upgraded by any means)
If you've patched the AIX operating system, its possible the RSCT software has also been upgraded. If you've applied a Fixpack to the TSAMP software, its also very likely the RSCT software has been upgraded. If this is the case, the online domain will show MixedVersions (lsrpdomain) as Yes, and RSCTVersion (lsrpnode) will show the new RSCT level. Assuming youve installed the new code on each server, RSCTVersion should be the same on each server. RSCTVersion would be different to the RSCTActiveVersion (lsrpdomain) since the newly installed RSCT level has not been activated yet. If youre expecting a change to the RSCT level but MixedVersions is still set to No, wait a couple minutes and re-check using lsrpdomain and lsrpnode. To activate the new RSCT level, issue the following commands : root@node01:# export CT_MANAGEMENT_SCOPE=2 root@node01:# runact -c IBM.PeerDomain CompleteMigration Options=0 Resource Class Action Response for CompleteMigration
12
P. Post Maintenance Task - Activate new TSA MP code level (if TSAMP was upgraded, including Fixpack upgrade)
Before activating the new TSA MP code level, ensure TSAMP (IBM.RecoveryRM) has finished initializing by checking In Config State is set to TRUE : root@node01:# lssrc -ls IBM.RecoveryRM | grep In Config State In Config State : TRUE
Now activate the new TSA MP level : root@node01:# samctrl -m Ready to Migrate! Are you Sure? [Y|N]:. Y
13
Q. Post Maintenance Task - Check each component has been migrated completely (if RSCT or TSAMP upgrades performed)
Ensure that MixedVersions is no longer Yes for the Cluster component (RSCT), and that the RSCTActiveVersion shows the same level as the RSCTVersion on each server : root@node01:# lsrpdomain Name hadrdom OpState Online RSCTActiveVersion 3.1.0.4 MixedVersions TSPort GSPort No 12347 12348
root@node01:# lsrpnode Name node02 node01 OpState Online Online RSCTVersion 3.1.0.4 3.1.0.4
Ensure that the Active Version Number (AVN) matches the Installed Version Number (IVN) for TSA MP : root@node01:# lssrc ls IBM.RecoveryRM |grep VN Our IVN Our AVN : 3.2.1.2 : 3.2.1.2
14
Verify the change in setting as shown: root@node01:# lsrsrc -c IBM.PeerNode Resource Class Persistent Attributes for IBM.PeerNode resource 1: CritRsrcProtMethod = 1
15
Again confirm HADR role and state on each server to be sure all is in good order: db2inst1@node01% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Primary Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0
ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:20:58 2011 (1241544058) 120 . db2inst1@node02% db2pd -hadr -db hadrdb Database Partition 0 -- Database HADRDB -- Active -- Up 0 days 00:00:05 HADR Information: Role State Standby Peer SyncMode HeartBeatsMissed Sync 0 LogGapRunAvg (bytes) 0
ConnectStatus ConnectTime Timeout Connected Fri Feb 25 11:21:30 2011 (1241544058) 120
16
17
Copyright IBM Corporation, 2011 IBM Corporation Software Group Route 100 Somers, NY 10589 U.S.A. Produced in the United States of America February 2010 All Rights Reserved Neither this document nor any part of it may be copied or reproduced in any form or by any means or translated into another language, without the prior consent of the above-mentioned copyright owner. IBM makes no warranties or representations with respect to the content hereof and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. IBM assumes no responsibility for any errors that may appear in this document. The information contained in this document is subject to change without any notice. IBM reserves the right to make any such changes without obligation to notify any person of such revision or changes. IBM makes no commitment to keep the information contained herein up to date. IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates.
18
Additional Notices Language This information was developed for products and services offered in the U.S.A. Information about non-IBM products is based on information available at the time of first publication of this document and is subject to change. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the users responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan, Ltd. 3-2-12, Roppongi, Minato-ku, Tokyo 106-8711 Japan Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems, and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. All statements regarding IBMs future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only.
19