You are on page 1of 172

Enterprise Manager Cluster Database Pages

Learning Objective

After completing this topic, you should be able to

identify how to use Enterprise Manager cluster database pages

1. Cluster Database Home page


The Cluster Database Home page serves as a crossroad for managing and monitoring all
aspects of your RAC database. From this page, you can access the other main cluster
database tabs Performance, Availability, Server, Schema, Data Movement,
Software and Support, and Topology.

Supplement
Selecting the link title opens the resource in a new browser window.

Style considerations
View more information on the style considerations for Oracle 11g Database used
in this course.
Launch window
On this page, you find General, High Availability, Space Summary, and Diagnostic
Summary sections for information that pertains to your cluster database as a whole. The
number of instances is displayed for the RAC database, in addition to the status.
A RAC database is considered to be up if at least one instance has the database open.
You can access the Cluster Home page by clicking the Cluster link in the General section
of the page.

Graphic
The General section contains options with linked values. These options are Status
as Up, Instances as 2, Availability (%) as 100 (Last 24 hours), and Cluster as
vx_cluster02. Other options in this section are Time Zone as EST, Database
Name as RDBB, and Version as 11.1.0.6.0. This section also contains a View All
Properties link and the Shutdown and Black Out buttons. The Host CPU section
contains a bar with legends Other and RDBB. The levels in this bar are of 0, 25,
50, 75, and 100% and it has a Load of 0.72. The load is also available as a

separate link. The Active Sessions section contains a bar with ranges as 0, 2, 4, 6,
and 8. The legends for the bar are Wait, User I/O, and CPU. The bar indicates that
the value of Maximum CPU is 8. The Diagnostic Summary section contains the
Interconnect Alerts, ADDM Findings, and Active Incidents options. The values of
all options in this section are zero. The Interconnect Alerts and Active Incidents
options have tick marks. The Space Summary section has Database Size (GB) as
1.645, Problem Tablespaces as 0, and Segment Advisor Recommendations as 0.
These values are hyperlinked. The Policy Violations is displayed as 0.
Other items of interest include the date of the last RMAN backup, archiving information,
space utilization, and an alert summary. By clicking the link next to the Flashback
Database Logging label, you can go to the Recovery Settings page from where you can
change various recovery parameters.
The Alerts table shows all open recent alerts. Click the alert message in the Message
column for more information about the alert. When an alert is triggered, the name of the
metric for which the alert was triggered is displayed in the Name column.

Graphic
The High Availability section has Last Backup as n/a. This section also has
options with hyperlinked information. These options are User Flash Recovery Area
(%) as 94.45 and Flashback Database Logging as Disabled. The section also
contains the Problem Services option as 1 with an exclamation mark. The Alerts
section has a table with columns such as Severity, Target Name, Target Type,
Category, Name, Impact, and Message.
The Related Alerts table provides information about alerts for related targets such as
Listeners and Hosts, and contains details about the message, the time the alert was
triggered, the value, and the time the alert was last checked.
The Policy Trend Overview page (accessed by clicking the Compliance Score link)
provides a comprehensive view about a group or targets containing other targets with
regard to compliance over a period of time. Using the tables and graphs, you can easily
watch for trends in progress and changes.

Graphic
The Policy Violations section contains information such as All, Critical Rules
Violated, Critical Security Patches, and Compliance Score (%).
The Security At a Glance page shows an overview of the security health of the enterprise
for all the targets or specific groups. This helps you to quickly focus on security issues by
showing statistics about security policy violations and noting the critical security patches
that have not been applied.

Graphic
In this example, the Target Group selected in the Security At a Glance page is
RDBB. The page displays security information using the Violation Flux and
Compliance Score (%) graphs.
The Job Activity table displays a report of the job executions that shows the scheduled,
running, suspended, and problem (stopped/failed) executions for all Enterprise Manager
jobs on the cluster database.

Graphic
The Job Activity section has a Create Job drop-down list box with a Go button.
This section contains a table for jobs scheduled to start. It contains three columns
Status, Submitted to the Cluster Database, and Submitted to any member.
There are four rows for each status. The four types of status are Scheduled,
Suspended, Running, and Problem. The section on Critical Patch Advisories for
Oracle Homes contains Patch Advisories, Affected Oracle Homes, and Oracle
MetaLink Credentials.
The Instances table lists the instances for the cluster database, their availability, alerts,
policy violations, performance findings, and related ASM Instance. Click an instance
name to go to the Home page for that instance. Click the links in the table to get more
information about a particular alert, advice, or metric.

Graphic
The Home Tab contains an Instances table. It has seven columns named Name,
Status, Alerts, Policy Violations, Compliance Score (%), ASM Instance, and ADDM
Findings. There are 15 Related Links such as Access and All Metrics.

Question
Which area of Enterprise Manager shows the date of the last RMAN backup?
Options:
1.

Security at a Glance page

2.

Policy Trend Overview page

3.

Job Activity table

4.

Cluster Database Home page

Answer

Option 1: Incorrect. The Security At a Glance page shows an overview of the


security health of the enterprise for all the targets or specific groups. This helps
you to quickly focus on security issues by showing statistics about security policy
violations and noting the critical security patches that have not been applied.
Option 2: Incorrect. The Policy Trend Overview page, accessed by clicking the
Compliance Score link, provides a comprehensive view about a group or targets
containing other targets with regard to compliance over a period of time. Using the
tables and graphs, you can easily watch for trends in progress and changes.
Option 3: Incorrect. The Job Activity table displays a report of the job executions
that shows the scheduled, running, suspended, and problem executions for all
Enterprise Manager jobs on the cluster database.
Option 4: Correct. The Cluster Database Home page serves as a crossroads for
managing and monitoring all aspects of your RAC database. On this page, you
find General, High Availability, Space Summary, and Diagnostic Summary
sections for information that pertains to your cluster database as a whole. Items of
interest include the date of the last RMAN backup, archiving information, space
utilization, and an alert summary.
Correct answer(s):
4. Cluster Database Home page
The Database Instance Home page enables you to view the current state of the instance
by displaying a series of metrics that portray its overall health. This page provides a
launch point for the performance, administration, and maintenance of the instance
environment.
You can access the Database Instance Home page by clicking one of the instance names
from the Instances section of the Cluster Database Home page. This page has basically
the same sections as the Cluster Database Home page.
The difference is that tasks and monitored activities from these pages apply primarily to a
specific instance. For example, clicking the Shutdown button from this page shuts down
only this one instance. However, clicking the Shutdown button from the Cluster Database
Home page gives you the option of shutting down all or specific instances.

Graphic
This example has information about the Database Instance: RDBB_RDBB1.
There are seven tabs: Home, Performance, Availability, Server, Schema, Data
Movement, and Software and Support. The Home tab shows three sections
General, Host CPU, and Active Sessions. The General section contains the
Shutdown and Black Out buttons. It also shows the Status of the Database

Instance as Up, the Instance Name as RDBB1, and Version as 11.1.0.6.0. In the
Host CPU section, Load is 0.14 and Paging is 0.00. The Active Sessions section
indicates 4 for Maximum CPU.
By scrolling down on this page, you see the Alerts, Related Alerts, Policy Violations, Jobs
Activity, and Related Links sections. These provide information similar to that provided in
the same sections in the Cluster Database Home page.
This is the classical Administration page with an important difference with its
corresponding single-instance version. Each database-level related task is prefixed with a
small icon representing a cluster database.

Graphic
Some of the sections of the Server tab are Storage, Database Configuration,
Oracle Scheduler, Statistics Management, Resource Manager, and Security.
Storage has links such as Control Files and Tablespaces. Database Configuration
has links such as Memory Advisors. Oracle Scheduler has links such as Jobs and
Chains. Statistics Management has links such as AWR Baselines. Resource
Manager has links such as Getting Started and Consumer Groups. The Security
has links such as Users, Roles, Profiles, and Audit Settings.
The Cluster Home page is accessed by clicking the Cluster link located in the General
section of the Cluster Database Home page. The cluster is represented as a composite
target composed of nodes and cluster databases. An overall summary of the cluster is
provided here.
The Cluster Home page displays several sections including General, Configuration, and
Diagnostic Summary.
The Cluster Home page also has Cluster Databases, Alerts, and Hosts.
The General section provides a quick view of the status of the cluster, providing basic
information such as current Status, Availability (%), Up nodes, and Clusterware Home
and Version.
The Configuration section allows you to view the operating systems (including Hosts and
OS Patches) and hardware (including Hardware configuration and Hosts) for the cluster.
The Cluster Databases table displays the cluster databases (optionally associated with
corresponding services) associated with this cluster, their availability, and any alerts on
those databases. The Alerts table provides information about any alerts that have been
issued along with the severity rating of each.
It also includes a Hosts table that displays the hosts for the cluster, their availability,
corresponding alerts, CPU and memory utilization percentage, and total I/O per second.

Graphic
The Cluster Databases table has six columns. These are named, Name, Status,
Alerts, Policy Violations, Compliance Score (%), and Version. The Alerts section
has a table with eight columns. They are Severity, Target Name, Target Type,
Category, Name, Impact, Message, and Alert Triggered.

2. Configuration section
The Cluster Home page is invaluable for locating configuration-specific data. Locate the
Configuration section on the Cluster Home page. The View drop-down list allows you to
inspect hardware and operating system overview information.
Click the Hosts link, and then click the Hardware Details link of the host that you want.

Graphic
The Configuration section has a View drop-down list box and a Collection
Problems table. It has two columns named Hardware and Hosts. The values in
this table are i686 GenuineIntel i686 and 2. Clicking the 2 in the Hosts column
from the Configuration page opens the Hardware: i686 GenuineIntel i686 in
composite target vx_cluster02 page. This page contains a table with the Host,
Operating System, and Hardware Details columns. This table contains two rows
with the Host column contains vx0306.us.oracle.com and vx0313.us.oracle.com.
There are infinity icons in the Hardware Details column.
On the Hardware Details page, you find detailed information regarding your CPU, disk
controllers, network adapters, and so on. This information can be very useful when
determining the Linux patches for your platform.

Graphic
This page contains options such as Hostname and System Configuration. This
page also contains three tables CPUs, IO Devices, and Network Interfaces. The
CPUs table has six columns, in which five are displayed such as CPU speed
(MHZ), Vendor, PROM Revision, ECACHE (MB), and CPU Implementation. The
IO Devices table has five columns in which four are displayed such as Name,
Vendor, Bus Type, and Frequency (MHZ). The Network Interfaces table has eight
columns in which seven are displayed. They are Name, INET Address, Maximum
Transfer Unit, Broadcast Address, Mask, Flags, and MAC Address.
Click History to access the hardware history information for the host. Some hardware
information is not available, depending on the hardware platform.

Note
The Local Disk Capacity (GB) field shows the disk space that is physically
attached (local) to the host. This value does not include disk space that may be
available to the host through networked file systems.
The Operating System Details General page displays the following operating system
details for a host:

Graphic
The Configuration window contains the View drop-down list box and the Collection
Problems table. The table contains the Operating Systems, Hosts, and OS
Patches columns. The table contains a row containing listing the Red Hat
Enterprise Linux AS release 4 (Nahant Update 5) 2.6.16.29 xenU with 2 Hosts.
The Operating System page contains a three-column table with Host, Hardware,
and Operating System Details columns. The table contains two rows containing
information about two hosts, the first of which is vx0306.us.oracle.com. The table
also contains an infinity icon in the Operating System Details column. Clicking this
icon opens the Operating System Details page.

general information, such as the distributor version and the maximum swap space of the
operating system and
information about operating system details
The Source column displays where Enterprise Manager obtained the value for each
operating system properties.

Graphic
The Operating System Details page contains information such as Host and
Vendor. The page also contains three tabs General, File Systems, and
Packages. The General tab includes Operating System Properties.
To see a list of changes to the operating system properties, click History.
The Operating System Details File Systems page displays the following information about
one or more file systems for the selected hosts:

Graphic
The File Systems tab includes a table containing four columns named Resource
Name, Type, Mount Location, and Mount Options.

name of the file system on the host

type of mounted file system, for example, ufs or nfs

directory where the file system is mounted, and

the mount options for the file system, for example ro, nosuid, or nobrowse
The Operating System Details Packages page displays information about the operating
system packages that have been installed on a host.

Graphic
The Packages tab includes package details such as 4Suite 1.0 - 3.
The Oracle Enterprise Manager Topology Viewer enables you to visually see the
relationships between target types for each host of your cluster database. You can zoom
in or out, pan, and see selection details. These views can also be used to launch various
administration functions.

Graphic
The Topology tab has the Overview section that contains pictorial representations
of the cluster. The Selection Details section contains the Name, Type, Host, and
Critical Alerts.
The Topology Viewer populates icons on the basis of your system configuration. If a
listener is serving an instance, a line connects the listener icon and the instance icon.
Possible target types are

interface

listener

ASM instance, and

database instance
If the Show Configuration Details option is not selected, the topology shows the
monitoring view of the environment, which includes general information such as alerts
and overall status. If you select the Show Configuration Details option, additional
details are shown in the Selection Details window, which are valid for any topology view.
For instance, the Listener component would also show the machine name and port
number. You can click an icon and then right-click to display a menu of available actions.

You can use Enterprise Manager to administer alerts for RAC environments. Enterprise
Manager distinguishes between database- and instance-level alerts in RAC
environments. Enterprise Manager also responds to metrics from across the entire RAC
database and publishes alerts when thresholds are exceeded.
Enterprise Manager interprets both predefined and customized metrics. You can also
copy customized metrics from one cluster database instance to another or from one RAC
database to another. A recent alert summary can be found on the Database Control
Home page. Notice that alerts are sorted by relative time and target name.

Graphic
There are five sections Alerts, Related Alerts, Policy Violations, Security, and
Job Activity. The Alerts section has a table with seven columns. They are Severity,
Target Name, Target Type, Category, Name, Impact, and Message. The table
contains five rows with RDBB_RDBB1 and RDBB_RDBB2 under the Target Name
column.
Alert thresholds for instance-level alerts, such as archive log alerts, can be set at the
instance target level. This enables you to receive alerts for the specific instance if
performance exceeds your threshold.

Graphic
The Metric and Policy Settings pages for Cluster Database RDBB and Database
Instance RDBB_RDBB1 are open. It contains the Metric Thresholds and Policies
tabs. The Metric Thresholds tabbed page is open. This page contains a table that
displays the Metrics with thresholds for the cluster database and the database
instance, respectively.
You can also configure alerts at the database level, such as setting alerts for tablespaces.
This enables you to avoid receiving duplicate alerts at each instance.
It is also possible to view the metric across the cluster in a comparative or overlay
fashion. To view this information, click the Compare Targets link at the bottom of the
corresponding metric page. When the Compare Targets page appears, choose the
instance targets that you want to compare by selecting them and then clicking the Move
button. If you want to compare the metric data from all targets, then click the Move All
button. After making your selections, click the OK button to continue.
The Metric summary page appears next. Depending on your needs, you can accept the
default timeline of 24 hours or select a more suitable value from the View Data drop-down
list. If you want to add a comment regarding the event for future reference, then enter a
comment in the Comment for Most Recent Alert field, and then click the Add Comment
button.

Graphic
The Compare Targets page includes the Available Targets and Selected Targets
sections.
In a RAC environment, you can see a summary of the alert history for each participating
instance directly from the Cluster Database Home page.
The following steps are performed in the drill-down process:

click the Alert History link in the Related Links section of the Cluster Database Home page

check the Alert History page where the summary for both instances are displayed
The Alert History page includes two database instances with a pictorial representation of the
history. These instances are RDBB_RDBB1 and RDBB_RDBB2.

click one of the instance's links to go to the corresponding Alert History page for that instance,
and
The Alert History page of the instance RDBB_RDBB1 database instance contains a table with two
columns, Metric and History. The table contains six rows that include links like Audited User,
Database Time Spent Waiting (%), Global Cache Average Current Block Request Time (centiseconds), Global Cache Blocks Lost, Instance Status, and Unmounted.

access a corresponding alert page by choosing the alert of your choice


You can use Enterprise Manager to define blackouts for all managed targets of your RAC
database to prevent alerts from being recorded. Blackouts are useful when performing
scheduled or unscheduled maintenance or other tasks that might trigger extraneous or
unwanted events. You can define blackouts for an entire cluster database or for specific
cluster database instances.
You can perform the following steps to create a blackout event:

click the Setup link on top of any Enterprise Manager page and then click the Blackouts link on
the left which shows the Blackouts page

click the Create button which shows the Create Blackout: Properties page
The Create Blackout: Properties page contains a pictorial representation of the steps for creating a
Blackout.

enter a name or tag in the Name field, you can also enter a descriptive comment in the
Comments field
The page also contains the Name and Comments text boxes with the Reason drop-down list box
and a Run jobs during the blackout checkbox.

enter a reason for the blackout in the Reason field

choose a target Type from the drop-down list in the Targets area of the Properties page and click
the Cluster Database in the Available Targets list (in this example the entire Cluster Database
RDBB is chosen)

click the Move button to move your choice to the Selected Targets list, and

click the Next button to continue


The Create Blackout: Member Targets page appears next. Expand the Selected
Composite Targets tree and ensure that all targets that must be included appear in the
list. Continue and define your schedule as you normally would.

Graphic
This page contains a table with the Name, Type, and Blackout columns and four
rows. The first row contains Selected Composite Targets under Name and nothing
under the Type and Blackout columns. The second row contains RDBB under the
Name column, Cluster Database under the Type column, and a drop-down list box
under the Blackout column. The drop-down list box has three options All current
member targets, Full blackout (all members at blackout start time), and Selected
member targets. The Full blackout (all members at the blackout start time) option
is selected. The third row contains RDBB_RDBB1 under Name, Database
Instance under Type and a selected checkbox under the Blackout column. The
fourth row contains RDBB_RDBB2 under Name, Database Instance under Type
and a selected checkbox under the Blackout column.

Question
You are performing scheduled maintenance on a cluster database instance and
want to avoid recording unwanted events. What can be done in Enterprise
Manager to prevent these events from being recorded?
Options:
1.

Define a blackout

2.

Create an instance-level alert

3.

Create a database-level alert

4.

Modify the alert history for the instance

Answer
Option 1: Correct. You can use Enterprise Manager to define blackouts for all
managed targets of your RAC database to prevent alerts from being recorded.
Blackouts are useful when performing scheduled or unscheduled maintenance or

other tasks that might trigger extraneous or unwanted events. You can define
blackouts for an entire cluster database or for specific cluster database instances.
Option 2: Incorrect. Alert thresholds for instance-level alerts, such as archive log
alerts, can be set at the instance target level. This enables you to receive alerts for
the specific instance if performance exceeds your threshold.
Option 3: Incorrect. You can configure alerts at the database level. For example,
you can set alerts for tablespaces. This enables you to avoid receiving duplicate
alerts at each instance.
Option 4: Incorrect. In a RAC environment, you can see a summary of the alert
history for each participating instance directly from the Cluster Database Home
page. However, you cannot modify the alert history.
Correct answer(s):
1. Define a blackout

Summary
The Cluster Database Home page acts as a crossroad for managing and monitoring the
RAC database. From this page, you can access different tabbed pages such as
Performance, Availability, Server, Schema, Data Movement, Software and Support, and
Topology. You also find General, High Availability, Space Summary, and Diagnostic
Summary sections.
On the Cluster Database Home page, the configuration section lets you inspect hardware
and operating system overview information. You can also access hardware history
information for the host. However, the amount of information available depends on the
hardware platform. The Oracle Enterprise Manager Topology Viewer enables you to
visually see the relationships between target types for each host of your cluster database.
You can use Enterprise Manager to administer alerts for RAC environments, where it
distinguishes between database and instance-level alerts. Blackouts for extraneous or
unwanted alters can also be configured.

Style considerations
Although certain aspects of the Oracle 11g Database are case and spacing insensitive, a common coding
convention has been used throughout all aspects of this course.
This convention uses lowercase characters for schema, role, user, and constraint names, and for
permissions, synonyms, and table names (with the exception of the DUAL table.) Lowercase characters
are also used for column names and user-defined procedure, function, and variable names shown in
code.

Uppercase characters are used for Oracle keywords and functions, for view, table, schema, and column
names shown in text, for column aliases that are not shown in quotes, for packages, and for data
dictionary views.
The spacing convention requires one space after a comma and one space before and after operators that
are not Oracle-specific, such as +, -, /, and <. There should be no space between an Oracle-specific
keyword or operator and an opening bracket, between a closing bracket and a comma, between the last
part of a statement and the closing semicolon, or before a statement.
String literals in single quotes are an exception to all of the convention rules provided here. Please use
this convention for all interactive parts of this course.

Start and Stop RAC Databases and Instances


Learning Objective

After completing this topic, you should be able to

recognize how to start and stop RAC databases and instances

1. RAC Databases and Instances


With Real Application Clusters or RAC, each instance writes to its own set of online redo
log files, and the redo written by an instance is called a thread of redo, or thread.
Thus, each redo log file group used by an instance is associated with the same thread
number determined by the value of the THREAD initialization parameter. If you set the
THREAD parameter to a nonzero value for a particular instance, the next time the instance
is started, it will try to use that thread.

Graphic
In this example, there are two nodes Node1 and Node2 which contain the RAC01
and RAC02 databases. These nodes are connected using a two-way dotted line.
Node1 is also connected to Thread 1 containing groups 1 to 3. Node2 is
connected to Thread 2 containing groups 4 and 5. A dotted line from RAC02 also
connects to THREAD 1. The space between the threads contains a box labeled
Shared storage. This box contains an SPFILE with RAC01.THREAD=1 and
RAC02.THREAD=2.
Because an instance can use a thread as long as that thread is enabled and not in use by
another instance, it is recommended to set the THREAD parameter to a nonzero value
with each instance having different values.

Graphic
The Create Redo Log Group page contains the Group # text box with the value 5.
This page also contains the File size text box with the value 51200 entered and
KB selected in the corresponding drop-down list box. The Thread # text box
contains the value 1.
You associate a thread number with a redo log file group by using the ALTER DATABASE
ADD LOGFILE THREAD statement. You enable a thread number by using the ALTER
DATABASE ENABLE THREAD statement. Before you can enable a thread, it must have at
least two redo log file groups.

Code
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 4;
ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 5;
ALTER DATABASE ENABLE THREAD 2;.
By default, a database is created with one enabled public thread. An enabled public
thread is a thread that has been enabled by using the ALTER DATABASE ENABLE
PUBLIC THREAD statement.
Such a thread can be acquired by an instance with its THREAD parameter set to zero.
Therefore, you need to create and enable additional threads when you add instances to
your database.
The maximum possible value for the THREAD parameter is the value assigned to the
MAXINSTANCES parameter specified in the CREATE DATABASE statement.

Note
You can use Enterprise Manager to administer redo log groups in a RAC
environment.
The Oracle database automatically manages undo segments within a specific undo
tablespace that is assigned to an instance.
Under normal circumstances, only the instance assigned to the undo tablespace can
modify the contents of that tablespace. However, all instances can always read all undo
blocks for consistent-read purposes.
Also, any instance can update any undo tablespace during transaction recovery, as long

as that undo tablespace is not currently used by another instance for undo generation or
transaction recovery.

Graphic
This example includes two interconnnected nodes Node1 containing the
instance RAC01 and Node2 containing the instance RAC02. RAC01 is also
connected to three undo tablespaces undotbs1, undotbs2, and undotbs3. RAC02
is connected to the undo tablespaces undotbs3 and undotbs2. RAC01 and RAC02
can perform consistent reads on all the connected undo tablespaces. The two
instances can also update all the connected undo tablespaces during transaction
recovery.
You assign undo tablespaces in your RAC database by specifying a different value for
this parameter for each instance in your SPFILE or individual PFILEs.
If you do not set the UNDO_TABLESPACE parameter, then each instance uses the first
available undo tablespace. If undo tablespaces are not available, the SYSTEM rollback
segment is used.

Graphic
The code to assign undo tablespaces in your RAC database is the following:
...
RAC01.UNDO_TABLESPACE=undotbs3
RAC02.UNDO_TABLESPACE=undotbs2
...
You can dynamically switch undo tablespace assignments by executing the ALTER
SYSTEM SET UNDO_TABLESPACE statement with the SID clause. You can run this
command from any instance. In this example, the following steps are performed:

Graphic
The command that executes the ALTER SYSTEM SET UNDO_TABLESPACE
statement is the following:
ALTER SYSTEM SET UNDO_TABLESPACE=undotbs3 SID='RAC01';

the previously used undo tablespace assigned to instance RAC01 remains assigned to it until the
RAC01 instance's last active transaction commits and

the pending offline tablespace may be unavailable for other instances until all transactions
against that tablespace are committed

Note
You cannot simultaneously use Automatic Undo Management or AUM and manual
undo management in a RAC database. It is highly recommended that you use the
AUM mode.
In a RAC environment, multiple instances can have the same RAC database open at the
same time. Also, shutting down one instance does not interfere with the operation of other
running instances.
The procedures for starting up and shutting down RAC instances are identical to the
procedures used in single-instance Oracle. There is only one exception.
The SHUTDOWN TRANSACTIONAL command with the LOCAL option is useful to shut down
an instance after all active transactions on the instance have either committed or rolled
back.
Transactions on other instances do not block this operation. If you omit the LOCAL option,
then this operation waits until transactions on all other instances that started before the
shutdown are issued either a COMMIT or a ROLLBACK.
You can start up and shut down instances by using Enterprise Manager, SQL*Plus, or
Server Control, which is referred to as SRVCTL. Both Enterprise Manager and SRVCTL
provide options to start up and shut down all the instances of a RAC database with a
single step.
Shutting down a RAC database mounted or opened by multiple instances means that you
need to shut down every instance accessing that RAC database. However, having only
one instance opening the RAC database is enough to declare the RAC database open.
If you want to start up or shut down only one instance, and if you are connected to your
local node, you must first ensure that your current environment includes the SID for the
local instance. To start up or shut down your local instance, initiate a SQL*Plus session
connected as SYSDBA or SYSOPER, and then issue the required command (for example,
STARTUP).
You can start multiple instances from a single SQL*Plus session on one node by way of
Oracle Net Services. To achieve this, you must connect to each instance by using a Net
Services connection string (typically an instance-specific alias from your tnsnames.ora
file.) For example, you can use a SQL*Plus session on a local node to shut down two
instances on remote nodes by connecting to each using the instance's individual alias
name.

Code

[stc-raclin01] $ echo $ORACLE_SID


RACDB1
sqlplus / as sysdba
SQL> startup
SQL> shutdown
[stc-raclin02] $ echo $ORACLE_SID
RACDB2
sqlplus / as sysdba
SQL> startup
SQL> shutdown
/*OR*/
[stc-raclin01] $sqlplus / as sysdba
SQL> startup
SQL> shutdown
SQL> connect sys/oracle@RACDB2 as sysdba
SQL> startup
SQL> shutdown
Consider the example that assumes that the alias name for the second instance is
RACDB2. In this example, there is no need to connect to the first instance using its
connect descriptor because the command is issued from the first node with the correct
ORACLE_SID.
It is not possible to start up or shut down more than one instance at a time in SQL*Plus,
so you cannot start or stop all the instances for a cluster database with a single SQL*Plus
command. To verify that instances are running on any node, look at
V$ACTIVE_INSTANCES.

Code
[stc-raclin01] $ echo $ORACLE_SID
RACDB1
sqlplus / as sysdba
SQL> startup
SQL> shutdown
[stc-raclin02] $ echo $ORACLE_SID
RACDB2
sqlplus / as sysdba
SQL> startup
SQL> shutdown
/*OR*/

[stc-raclin01] $sqlplus / as sysdba


SQL> startup
SQL> shutdown
SQL> connect sys/oracle@RACDB2 as sysdba
SQL> startup
SQL> shutdown

Note
SQL*Plus is integrated with Oracle Clusterware to make sure that corresponding
resources are correctly handled during start up and shut down of instances by
using SQL*Plus.
To start and stop RAC instances with SRVCTL, you can use certain commands.

Code
srvctl start|stop database -d <db_name>
[-o open|mount|nomount|normal|transactional|immediate|
abort>]
[-c <connect_str> | -q]
srvctl start|stop instance -d <db_name> -i <inst_name_list>
[-o open|mount|nomount|normal|transactional|immediate|
abort>]
[-c <connect_str> | -q]
srvctl start|stop database
The srvctl start database command starts a cluster database, its enabled instances,
and its services. The srvctl stop database command stops a database, its instances,
and its services.
srvctl start|stop instance
The srvctl start instance command starts instances of a cluster database. This
command also starts all enabled and nonrunning services that have the listed instances
either as preferred or as available instances. The srvctl stop instance command
stops instances as well as all enabled and running services that have these instances as
either preferred or available instances. You must disable an object that you intend to keep
stopped after you issue an srvctl stop command; otherwise, Oracle Clusterware or OC
can restart it as a result of another planned operation.
For commands that use a connect string, if you do not provide a connect string, then
SRVCTL uses / as sysdba to perform the operation. The q option asks for a connect
string from standard input. SRVCTL does not support concurrent executions of commands

on the same object.


Therefore, run only one SRVCTL command at a time for each database, service, or other
object. To use the START and STOP options of the SRVCTL command, your service must
be an OC-enabled, nonrunning service.

Code
srvctl start instance -d RACDB -i RACDB1,RACDB2
srvctl stop instance -d RACDB -i RACDB1,RACDB2
srvctl start database -d RACDB -o open

Question
Which two utilities provide options to start up and shut down all the instances of a
RAC database with a single step?
Options:
1.

AUM

2.

SQL*Plus

3.

Server Control

4.

Enterprise Manager

Answer
Option 1: Incorrect. Automatic Undo Management or AUM is not a utility that
provides options to start up and shut down instances in a RAC database. AUM
enables the Oracle database to automatically manage undo segments within a
specific undo tablespace that is assigned to an instance.
Option 2: Incorrect. You can only start up or shut down one instance at a time
using SQL*Plus. You cannot start or stop all the instances for a cluster database
with a single SQL*Plus command.
Option 3: Correct. You can start up and shut down instances by using Enterprise
Manager, SQL*Plus, or Server Control or SRVCTL. Both Enterprise Manager and
SRVCTL provide options to start up and shut down all the instances of a RAC
database with a single step. The srvctl start database command starts a
cluster database, its enabled instances, and its services. The srvctl stop
database command stops a database, its instances, and its services.
Option 4: Correct. You can start up and shut down instances by using Enterprise
Manager, SQL*Plus, or Server Control or SRVCTL. Both Enterprise Manager and

SRVCTL provide options to start up and shut down all the instances of a RAC
database with a single step.
Correct answer(s):
3. Server Control
4. Enterprise Manager
By default, Oracle Clusterware is configured to start the VIP, listener, instance, ASM,
database services, and other resources during system boot. It is possible to modify some
resources to have their AUTO_START profile parameter set to the value of 2.
This means that after node reboot, or when Oracle Clusterware is started, resources with
AUTO_START=2 need to be started manually via srvctl. This is designed to assist in
problem troubleshooting and system maintenance.
Starting with Oracle Database 10g Release 2, when changing resource profiles through
srvctl, the command tool automatically modifies the profile attributes of other
dependent resources given the current prebuilt dependencies. To accomplish this, use
the command shown to modify databases.

Code
srvctl modify database -d <dbname> -y AUTOMATIC|MANUAL
To implement Oracle Clusterware and Real Application Clusters, it is best to have Oracle
Clusterware start the defined Oracle resources during system boot, which is the default.
Consider these two examples. The first example uses the srvctl config database
command to display the current policy for the RACB database. As you can see, it is
currently set to its default AUTOMATIC.
The second statement uses the srvctl modify database command to change the
current policy to MANUAL for the RACB database.

Code
$ srvctl config database -d RACB -a
ex0044 RACB1 /u01/app/oracle/product/10.2.0/db_1
ex0045 RACB2 /u01/app/oracle/product/10.2.0/db_1
DB_NAME: RACB
ORACLE_HOME: /u01/app/oracle/product/10.2.0/db_1
SPFILE: +DGDB/RACB/spfileRACB.ora
DOMAIN: null
DB_ROLE: null

START_OPTIONS: null
POLICY: AUTOMATIC
ENABLE FLAG: DB ENABLED
$
When you add a new database by using the srvctl add database command, that
database is by default placed under the control of Oracle Clusterware using the
AUTOMATIC policy.
However, to directly set the policy to MANUAL, you can use the add database statement.

Code
srvctl add database -d RACZ -y MANUAL.

Note
You can also use this procedure to configure your system to prevent Oracle
Clusterware from autorestarting failed database instances more than once.
The following steps depict how you can add and remove redo log groups in a RAC
database environment.
You want to use Database Control to create two new redo log groups in your database.
The two groups must pertain to the thread number three, and each group must have only
one 51200 KB member called redo05.log and redo06.log, respectively. You perform the
following steps:

from Database Control Home page, click the Server tab

on the Cluster Database Server page, click Redo Log Groups in the Storage section

on the Redo Log Groups page, click Create and

on the Create Redo Log Group page, leave the current value of the Group # field as it is (5).
Make sure that the File size field is set to 51200 KB. Set the Thread # field to 3. When you are
finished, click OK.
After clicking OK on the Create Redo Log Group page you are taken back to the Redo
Log Groups page. Here you see a confirmation message that a new object was
successfully created.
You then perform the following steps:

on the Redo Log Groups page, click Create

on the Create Redo Log Group page, leave the current value of the Group # field as it is (6).
Make sure that the File size field is set to 51200 KB. Set the Thread # field to 3. When you are
finished, click OK and

this takes you back to the Redo Log Groups page where you should again see a confirmation
message indicating the successful creation of a new object
You determine that you need to destroy redo thread number three. Make sure that in the
end both instances are up and running and managed by Oracle Clusterware.

Supplement
Selecting the link title opens the resource in a new browser window.

Oracle Clusterware
View the code required to alter the database.
Launch window

Summary
With Real Application Clusters or RAC, each instance writes to its own set of online redo
log files, and the redo written by an instance is called a thread of redo, or thread. Before
you can enable a thread, it must have at least two redo log file groups. By default, a
database is created with one enabled public thread.
In a RAC environment, multiple instances can have the same RAC database open at the
same time, and shutting down one instance does not interfere with the operation of other
running instances. You can start up and shut down instances by using Enterprise
Manager, SQL*Plus, or Server Control or SRVCTL. Both Enterprise Manager and SRVCTL
provide options to start up and shut down all the instances of a RAC database with a
single step.

Oracle Clusterware
y=`cat /home/oracle/nodeinfo | sed -n '1,1p'`
z=`cat /home/oracle/nodeinfo | sed -n '2,2p'`
DBNAME=`ps -ef | grep dbw0_RDB | grep -v grep | grep -v
callout1 | awk '{ print $8 }' | sed 's/1/''/' | sed
's/ora_dbw0_/''/'`
I1NAME=$DBNAME"1"

I2NAME=$DBNAME"2"
export ORACLE_HOME=/u01/app/oracle/product/11.1.0/db_1
export ORACLE_SID=$I1NAME
echo "Reset thread to 2 for second instance ..."
$ORACLE_HOME/bin/sqlplus -s /NOLOG <<EOF
connect / as sysdba
ALTER SYSTEM SET thread = 2 SCOPE=SPFILE SID='$I2NAME';
EOF
echo "Stop second instance ..."
/u01/crs11g/bin/srvctl stop instance -d $DBNAME -i $I2NAME
echo "Restart second instance ..."
/u01/crs11g/bin/srvctl start instance -d $DBNAME -i $I2NAME
echo "Removing thread 3 from database ..."
$ORACLE_HOME/bin/sqlplus -s /NOLOG <<EOF
connect / as sysdba
alter database disable thread 3;
alter database drop logfile group 5;
alter database drop logfile group 6;
EOF
/u01/crs11g/bin/crs_stat -t
Reset thread to 2 for second instance ...
System altered.
Stop second instance ...
Restart second instance ...
Removing thread 3 from database ...
Database altered.

Database altered.

Database altered.
Name
Type
Target State Host
-----------------------------------------------------------ora....B1.inst application ONLINE ONLINE vx0306
ora....B2.inst application ONLINE ONLINE vx0313
ora.RDB.db
application ONLINE ONLINE vx0313
ora....SM1.asm application ONLINE ONLINE vx0306
ora....06.lsnr application ONLINE ONLINE vx0306
ora.vx0306.gsd application ONLINE ONLINE vx0306
ora.vx0306.ons application ONLINE ONLINE vx0306
ora.vx0306.vip application ONLINE ONLINE vx0306
ora....SM2.asm application ONLINE ONLINE vx0313
ora....13.lsnr application ONLINE ONLINE vx0313
ora.vx0313.gsd application ONLINE ONLINE vx0313
ora.vx0313.ons application ONLINE ONLINE vx0313
ora.vx0313.vip application ONLINE ONLINE vx0313
[oracle@vx0306 less04]$

Modify Initialization Parameters in a RAC Environment


Learning Objective

After completing this topic, you should be able to

recognize how to modify initialization parameters in a RAC environment

1. Modifying initialization parameters


When you create the database, the DBCA creates an SPFILE in the file location that you
specify. This location can be an Automatic Storage Management or ASM disk group,
cluster file system file, or a shared raw device. If you manually create your database, then
it is recommended to create an SPFILE from a PFILE.
All instances in the cluster database use the same SPFILE at startup. Because the
SPFILE is a binary file, do not edit it. Instead, change the SPFILE parameter settings by
using Enterprise Manager or ALTER SYSTEM SQL statements.
RAC uses a traditional PFILE only if an SPFILE does not exist or if you specify PFILE in
your STARTUP command. Using SPFILE simplifies administration, maintaining parameter
settings consistent, and guarantees parameter settings persistence across database

shutdown and startup. In addition, you can configure RMAN to back up your SPFILE.
In order for each instance to use the same SPFILE at startup, each instance uses its own
PFILE file that contains only one parameter called SPFILE. The SPFILE parameter
points to the shared SPFILE on your shared storage.
By naming each PFILE using the init<SID>.ora format, and by putting them in the
$ORACLE_HOME/dbs directory of each node, a STARTUP command uses the shared
SPFILE.
You can modify the value of your initialization parameters by using the ALTER SYSTEM
SET command. This is the same as with a single-instance database except that you have
the possibility to specify the SID clause in addition to the SCOPE clause.
By using the SID clause, you can specify the SID of the instance where the value takes
effect. Specify SID='*' if you want to change the value of the parameter for all
instances. Specify SID='sid' if you want to change the value of the parameter only for
the instance sid.
This setting takes precedence over previous and subsequent ALTER SYSTEM SET
statements that specify SID='*'. If the instances are started up with an SPFILE, then
SID='*' is the default if you do not specify the SID clause.

Code
ALTER SYSTEM SET <dpname> SCOPE=MEMORY sid='<sid|*>';
If you specify an instance other than the current instance, then a message is sent to that
instance to change the parameter value in its memory if you are not using the SPFILE
scope.
The combination of SCOPE=MEMORY and SID='sid' of the ALTER SYSTEM RESET
command allows you to override the precedence of a currently used <sid>.<dparam>
entry. This allows for the current *.<dparam> entry to be used, or for the next created
*.<dparam> entry to be taken into account on that particular sid.

Code
ALTER SYSTEM RESET <dpname> SCOPE=MEMORY sid='<sid>';
You can remove a line from your SPFILE, using this ALTER SYSTEM RESET command.

Code

ALTER SYSTEM RESET <dpname> SCOPE=SPFILE sid='<sid|*>';


You can access the Initialization Parameters page by clicking the Initialization
Parameters link on the Cluster Database: RDBB Server page.
The Current tabbed page displays the values currently used by the initialization
parameters of all the instances accessing the RAC database. You can filter the
Initialization Parameters page to show only those parameters that meet the criteria of the
filter that you entered in the Name field.
The Instance column shows the instances for which the parameter has the value listed in
the table. An asterisk (*) indicates that the parameter has the same value for all remaining
instances of the cluster database.

Graphic
In this example, the open_cursors instance is selected in the Current tabbed page
and its value is entered as 300.
Choose a parameter from the Select column and perform one of these steps:

click Add to add the selected parameter to a different instance. Enter a new instance name and
value in the newly created row in the table or

click Reset to reset the value of the selected parameter. Note that you may reset only those
parameters that do not have an asterisk in the Instance column. The value of the selected column
is reset to the value of the remaining instances.

Note
For both Add and Reset buttons, the ALTER SYSTEM command uses
SCOPE=MEMORY.
The SPFile tabbed page displays the current values stored in your SPFILE.
As on the Current tabbed page, you can add or reset parameters. However, if you select
the Apply changes in SPFile mode to the current running instance(s). For static
parameters, you must restart the database checkbox, then the ALTER SYSTEM
command uses SCOPE=BOTH. If this checkbox is not selected, SCOPE=SPFILE is used.

Graphic
The Initialization Parameters page has two tabs Current and SPFile. The SPFile
tabbed page is displayed here. The Apply changes in SPFile mode to the current

running instance(s). For static parameters, you must restart the database
checkbox is selected. This tabbed page also has a table with Add and Reset
buttons. It also has columns such as Select, Instance, Name, Help, Value,
Comments, Type, Constraint, Basic, and Dynamic.
Click Apply to accept and generate your changes.
There are several RAC initialization parameters.

Graphic
The Initialization Parameters page has a table with Select, Instance, Name, Help,
Revisions, Value, Comments, Type, Basic, Modified, Dynamic, and Category as its
columns.
cluster_database
cluster_database parameter enables a database to be started in cluster mode. Set
this to TRUE.
cluster_database_instances
cluster_database_instances sets the number of instances in your RAC
environment. A proper setting for this parameter can improve memory use.
cluster_interconnects
cluster_interconnects specifies the cluster interconnect when there is more than
one interconnect. Refer to your Oracle platform-specific documentation for the use of this
parameter, its syntax, and its behavior. You typically do not need to set the
cluster_interconnects parameter.
Do not set the cluster_interconnects parameter for common configurations for
example if you have only one cluster interconnect. Similarly, if the default cluster
interconnect meets the bandwidth requirements of your RAC database, which is typically
the case, do not set this parameter. Other configurations for which you need not set the
cluster_interconnects parameter include NIC bonding being used for the
interconnect and When OIFCFG's global configuration can specify the right cluster
interconnects. It only needs to be specified as an override for OIFCFG.
db_name
If you set a value for the db_name parameter in instance-specific parameter files, then the
setting must be identical for all instances.
There are a few more RAC initialization parameters like DISPATCHERS and SPFILE. Set
the DISPATCHERS parameter to enable a shared-server configuration, which is a server
that is configured to allow many user processes to share very few server processes. With

shared-server configurations, many user processes connect to a dispatcher.


The DISPATCHERS parameter may contain many attributes. Oracle recommends that you
configure at least the PROTOCOL and LISTENER attributes. PROTOCOL specifies the
network protocol for which the dispatcher process generates a listening end point.
LISTENER specifies an alias name for the Oracle Net Services listeners. Set the alias to
a name that is resolved through a naming method, such as a tnsnames.ora file.
When you use an SPFILE parameter, all RAC database instances must use the spfile
and the file must be on shared storage.
Another parameter is the MAX_COMMIT_PROPAGATION_DELAY which is RAC-specific.
Starting with Oracle Database 10g Release 2, the MAX_COMMIT_PROPAGATION_DELAY
parameter is deprecated. By default, commits on one instance are immediately visible on
all the other instances broadcast on commit propagation.
This parameter is retained for backward compatibility only. This parameter specifies the
maximum amount of time allowed before the System Change Number or SCN held in the
System Global Area or SGA of an instance is refreshed by the log writer process, also
known as LGWR.
It determines whether the local SCN should be refreshed from the SGA when getting the
snapshot SCN for a query. With previous releases, you should not alter the default setting
for this parameter except under a limited set of circumstances. For example, under
unusual circumstances involving rapid updates and queries of the same data from
different instances, the SCN might not be refreshed in a timely manner.
Another RAC Initialization Parameter is the THREAD parameter. If specified, it must have
unique values on all instances. This parameter specifies the number of the redo thread to
be used by an instance. You can specify any available redo thread number as long as
that thread number is enabled and is not used.
Certain initialization parameters that are critical at database creation or that affect certain
database operations must have the same value for every instance in RAC. Specify these
parameter values in the SPFILE, or in each init_dbname.ora file on each instance.
These parameters must have the same value on all instances.
Parameters that require identical settings include

ACTIVE_INSTANCE_COUNT

ARCHIVE_LAG_TARGET

COMPATIBLE

CLUSTER_DATABASE/ CLUSTER_DATABASE_INSTANCE

CONTROL_FILES

DB_BLOCK_SIZE

DB_DOMAIN

DB_FILES, and

DB_NAME
Some more parameters that require identical settings include

DB_RECOVERY_FILE_DEST/DB_RECOVERY_FILE_DEST_SIZE

DB_UNIQUE_NAME

INSTANCE_TYPE

PARALLEL_MAX_SERVERS

REMOTE_LOGIN_PASSWORD_FILE

MAX_COMMIT_PROPAGATION_DELAY

TRACE_ENABLED, and

UNDO_MANAGEMENT

Note
The setting for DML_LOCKS and RESULT_CACHE_MAX_SIZE must be identical on
every instance only if set to zero. Disabling the result cache on some instances
may lead to incorrect results.
Some parameters such as INSTANCE_NAME require unique settings. These include
THREAD or ROLLBACK_SEGMENTS
If you use the THREAD or ROLLBACK_SEGMENTS parameter, it is recommended that you
set unique values for them by using the SID identifier in the SPFILE.
INSTANCE_NUMBER
You must set a unique value for INSTANCE_NUMBER for each instance and you cannot use
a default value. The Oracle server uses the instance_number parameter to distinguish
among instances at startup. The Oracle server uses the thread number to assign redo

log groups to specific instances. To simplify administration, use the same number for both
the thread and instance_number parameters.
UNDO_TABLESPACE, and
If you specify UNDO_TABLESPACE with Automatic Undo Management enabled, set this
parameter to a unique undo tablespace name for each instance.
ASM_PREFERRED_READ_FAILURE_GROUPS
Using the ASM_PREFERRED_READ_FAILURE_GROUPS initialization parameter, you can
specify a list of preferred read failure group names. The disks in those failure groups
become the preferred read disks.
Thus, every node can read from its local disks. The setting for this parameter is instance
specific, and the values do not need to be the same on all instances.

Question
Which RAC initialization parameters require an identical value for every instance
in RAC?
Options:
1.

THREAD

2.

DB_NAME

3.

CLUSTER_DATABASE

4.

INSTANCE_NUMBER

Answer
Option 1: Incorrect. The Oracle server uses the THREAD number to assign redo
log groups to specific instances. It is recommended that you set a unique value for
this initialization parameter by using the SID identifier in the SPFILE.
Option 2: Correct. If you set a value for DB_NAME in instance-specific parameter
files, then the setting must be identical for all instances.
Option 3: Correct. The CLUSTER_DATABASE initialization parameter enables a
database to be started in cluster mode. Set this to TRUE. This parameter should
be identical for every instance in RAC.
Option 4: Incorrect. You must set a unique value for the INSTANCE_NUMBER
initialization parameter for each instance and you cannot use a default value.
Correct answer(s):

2. DB_NAME
3. CLUSTER_DATABASE

Summary
When you create a database, the DBCA creates an SPFILE in the file location that you
specify. All instances in the cluster database use the same SPFILE at startup. The
SPFILE parameter points to the shared SPFILE on your shared storage. When the
database is created manually, you create an SPFILE from PFILE.
Another RAC Initialization Parameter is the THREAD parameter. If specified, it must have
unique values on all instances. This parameter specifies the number of the redo thread to
be used by an instance. You can specify any available redo thread number as long as
that thread number is enabled and is not used.

Managing Instances in RAC


Learning Objective

After completing this topic, you should be able to

identify ways to manage instances in a RAC environment

1. Quiescing RAC databases


To quiesce a RAC database, use the ALTER SYSTEM QUIESCE RESTRICTED statement
from one instance. It is not possible to open the database from any instance while the
database is in the process of being quiesced from another instance.
After all non-DBA sessions become inactive, the ALTER SYSTEM QUIESCE RESTRICTED
statement executes and the database is considered to be quiesced. In a RAC
environment, this statement affects all instances.
The following conditions apply to RAC:

if you had issued the ALTER SYSTEM QUIESCE RESTRICTED statement but the Oracle server
has not finished processing it, then you cannot open the database
you cannot open the database if it is already in a quiesced state, and
the ALTER SYSTEM QUIESCE RESTRICTED and ALTER SYSTEM UNQUIESCE statements affect
all instances in a RAC environment, not just the instance that issues the command

Cold backups cannot be taken when the database is in a quiesced state because the
Oracle background processes may still perform updates for internal purposes even when
the database is in a quiesced state.
Also, the file headers of online data files continue to appear as if they are being
accessed. They do not look the same as if a clean shutdown were done.

Question
Which two statements about quiescing a RAC database are true?
Options:
1.

You cannot open the database if it is already in a quiesced state

2.

Cold backups cannot be taken when the database is in a quiesced state

3.

It is possible to open the database from any instance while the database is in the
process of being quiesced from another instance

4.

The ALTER SYSTEM QUIESCE RESTRICTED statement only affects the instance that
issued the command

Answer
Option 1: Correct. You cannot open the database if it is already in a quiesced
state. To quiesce a RAC database, use the ALTER SYSTEM QUIESCE
RESTRICTED statement from one instance.
Option 2: Correct. Cold backups cannot be taken when the database is in a
quiesced state because the Oracle background processes may still perform
updates for internal purposes even when the database is in a quiesced state.
Option 3: Incorrect. It is not possible to open the database from any instance
while the database is in the process of being quiesced from another instance.
After all non-DBA sessions become inactive, the ALTER SYSTEM QUIESCE
RESTRICTED statement executes and the database is considered to be quiesced.
Option 4: Incorrect. The ALTER SYSTEM QUIESCE RESTRICTED and ALTER
SYSTEM UNQUIESCE statements affect all instances in a RAC environment, not
just the instance that issues the command.
Correct answer(s):
1. You cannot open the database if it is already in a quiesced state
2. Cold backups cannot be taken when the database is in a quiesced state

Starting with Oracle RAC 11g R1, you can use the ALTER SYSTEM KILL SESSION
statement to terminate a session on a specific instance. This code illustrates by
terminating a session started on a different instance than the one used to terminate the
problematic session.

Code
SQL> SELECT SID, SERIAL#, INST_ID
2 FROM GV$SESSION WHERE USERNAME='JFV';
SID
SERIAL#
INST_ID
---------- ---------- ---------140
3340
2
SQL> ALTER SYSTEM KILL SESSION '140,3340,@2';
System altered.
SQL>
If the session is performing some activity that must be completed (such as waiting for a
reply from a remote database or rolling back a transaction), Oracle database waits for this
activity to complete, marks the session as terminated, and then returns control to you.
If the waiting lasts a minute, then Oracle database marks the session to be terminated
and returns control to you with a message that the session is marked to be terminated.
The PMON background process then marks the session as terminated when the activity
is complete.

Code
ALTER SYSTEM KILL SESSION '140,3340,@2'
*
ERROR at line 1:
ORA-00031: session marked for kill

Note
You can also use the IMMEDIATE clause at the end of the ALTER SYSTEM
command to immediately terminate the session without waiting for outstanding
activity to complete.
Most SQL statements affect the current instance. You can use SQL*Plus to start and stop
instances in the RAC database. You do not need to run SQL*Plus commands as root on
UNIX-based systems or as Administrator on Windows-based systems.
You need only the proper database account with the privileges that you normally use for
single-instance Oracle database administration.

Graphic
The table contains the SQL*Plus command and Associated instance columns.
The table contains seven rows; for the ARCHIVE LOG command, the associated
instance is Generally affects the current instance, for CONNECT, the associated
instance is Affects the default instance if no instance is specified in the CONNECT
command. For HOST, the associated instance is Affects the node running the
SQL*Plus session, and for RECOVER, it is Does not affect any particular
instance, but rather the database. For SHOW PARAMETER and SHOW SGA, it is
Show the current instance parameter and SGA information and for STARTUP and
SHUTDOWN, the associated instance is Affect the current instance. Finally, for
the SHOW INSTANCE command, the associated instance is Displays information
about the current instance.
The following are some examples of how SQL*Plus commands affect instances:

the ALTER SYSTEM SET CHECKPOINT LOCAL statement affects only the instance you are
currently connected to, rather than the default instance or all instances
ALTER SYSTEM CHECKPOINT LOCAL affects the current instance
ALTER SYSTEM CHECKPOINT or ALTER SYSTEM CHECKPOINT GLOBAL, affects all instances in
the cluster database

ALTER SYSTEM SWITCH LOGFILE affects only the current instance

the ALTER SYSTEM ARCHIVE LOG CURRENT statement helps you to force a global log switch, and

the INSTANCE option of ALTER SYSTEM ARCHIVE LOG enables you to archive each online redo
log file for a specific instance
Wallets used by RAC instances for Transparent Database Encryption may be a local copy
of a common wallet shared by multiple nodes, or a shared copy residing on shared
storage that all of the nodes can access.
A deployment with a single wallet on a shared disk requires no additional configuration to
use Transparent Data Encryption.

Graphic
In this example, there are three nodes - Node1, Node2, and Noden. Each node
contains a Wallet and two Master keys. The statement ALTER SYSTEM SET
ENCRYPTION KEY is connected to Node1. And Wallet and Master keys of Node1
are connected to those of Node2 and Noden respectively, and this connection is
named as Manual copy.

If you want to use local copies, you must copy the wallet and make it available to all of the
other nodes after initial configuration. For systems using Transparent Data Encryption
with encrypted wallets, you can use any standard file transport protocol. For systems
using Transparent Data Encryption with obfuscated wallets, file transport through a
secured channel is recommended.
The wallet must reside in the directory specified by the setting for the WALLET_LOCATION
or ENCRYPTION_WALLET_LOCATION parameter in sqlnet.ora.
The local copies of the wallet need not be synchronized for the duration of Transparent
Data Encryption usage until the server key is rekeyed through the ALTER SYSTEM SET
KEY SQL statement. Each time you run the ALTER SYSTEM SET KEY statement at a
database instance, you must again copy the wallet residing on that node and make it
available to all of the other nodes.
To avoid unnecessary administrative overhead, reserve rekeying for exceptional cases
where you are certain that the server master key is compromised and that not rekeying it
would cause a serious security problem.

2. ASM general architecture


Automatic Storage Management or ASM is part of the database kernel. One portion of the
ASM code allows for the startup of a special instance called an ASM instance. ASM
instances do not mount databases but instead manage the metadata needed to make
ASM files available to ordinary database instances.
Both ASM instances and database instances have access to a common set of disks
called disk groups. Database instances access the contents of ASM files directly,
communicating with an ASM instance only to obtain information about the layout of these
files.

Graphic
In this example, there are two nodes Node1 and Node2. It also has three ASM
disk groups named ASM disk group Tom, ASM disk group Bob, and ASM disk
group Harry. Node1 has elements such as DB instance SID=sales1, Group
Services tom=+ASM1 bob=+ASM1 harry=+ASM1, DBW0, RBAL, ASMB, FG,
ASM instance SID=+ASM1, GMON, ARB0, ARBA, and DB instance SID=test1.
Node2 has elements such as Group Services tom=+ASM2 bob=+ASM2
harry=+ASM2, ASM instance SID=+ASM2, DB instance SID=test2, ASMB, FG,
DBW0, ARBA, ARB0, RBAL, and GMON. The ASM disk groups Tom, Harry and
Bob have two ASM disks in each. DBW0 and RBAL of Node1 and Node 2 are
connected and it also connects to ASM disk group Tom, ASM disk group Bob, and
ASM disk group Harry. ASMB and FG are connected to each other in both the
nodes. ASM instance SID=+ASM1 and ASM instance SID=+ASM2 are connected.

The ASM instance is connected to the Group Services internally in both the
nodes.
An ASM instance contains three new types of background processes. The first type is
responsible for coordinating rebalance activity for disk groups, and is called RBAL. The
second type actually performs the data extent movements.
There can be many of these at a time, and they are called ARB0, ARB1, and so on. The
third type is responsible for certain disk group-monitoring operations that maintain ASM
metadata inside disk groups. The disk group monitor process is called GMON.
Each database instance that uses ASM has two new background processes called ASMB
and RBAL. In a database instance, RBAL performs global opens of the disks in the disk
groups. ASMB runs in database instances and connects to foreground processes in ASM
instances. Over those connections, periodic messages are exchanged to update statistics
and to verify that both instances are healthy.
During operations that require ASM intervention, such as a file creation by a database
foreground, the database foreground connects directly to the ASM instance to perform the
operation.
An ASMB process is started dynamically when an ASM file is first accessed. When
started, the ASM background connects to the desired ASM instance and maintains that
connection until the database instance no longer has any files open in the disk groups
served by that ASM instance. Database instances are allowed to connect to only one
ASM instance at a time, so they have at most one ASMB background process.
Like RAC, the ASM instances themselves may be clustered, using the existing Global
Cache Services or GCS infrastructure. There is usually one ASM instance per node on a
cluster. As with existing RAC configurations, ASM requires that the operating system
make the disks globally visible to all of the ASM instances, irrespective of node.
Database instances communicate only with ASM instances on the same node. If there
are several database instances for different databases on the same node, they must
share the same single ASM instance on that node.
A disk group can contain files for many different Oracle databases. Thus, multiple
database instances serving different databases can access the same disk group even on
a single system without RAC. Alternatively, one Oracle database may also store its files in
multiple disk groups managed by the same ASM instance.
Group Services is used to register the connection information needed by the database
instances to find ASM instances. When an ASM instance mounts a disk group, it registers
the disk group and connect string with Group Services.
The database instance knows the name of the disk group, and can therefore use it to look

up connection information for the correct ASM instance. Group Services is a functionality
provided by Oracle Clusterware, which is automatically installed on every node that runs
Oracle Database 10g.
If an ASM instance fails, all Oracle database instances dependent on that ASM instance
also fail. Note that a file system failure usually crashes a node. In a single ASM instance
configuration, if the ASM instance fails while disk groups are open for update, then after
the ASM instance reinitializes, it reads the disk group's log and recovers all transient
changes.
With multiple ASM instances sharing disk groups, if one ASM instance fails, then another
ASM instance automatically recovers transient ASM metadata changes caused by the
failed instance.
The failure of a database instance does not affect ASM instances.
Each disk group is self-describing, containing its own file directory, disk directory, and
other data such as metadata logging information. ASM automatically protects its
metadata by using mirroring techniques even with external redundancy disk groups.
With multiple ASM instances mounting the same disk groups, if one ASM instance fails,
another ASM instance automatically recovers transient ASM metadata changes caused
by the failed instance. This situation is called ASM instance recovery, and is automatically
and immediately detected by the global cache services.

Graphic
The ASM instance recovery situation has three states of Disk group A. In the first
state Both instances mount disk group Node1 and Node2 of Disk group A
have +ASM1 and +ASM2 respectively. +ASM1 and +ASM2 are connected to each
other and to Disk group A. In the second state ASM instance failure +ASM1
and Disk group A have a cross mark on them. In the third state Disk group
repaired by surviving instance +ASM1 is absent from Node1. +ASM2 is
connected to Disk group A.
With multiple ASM instances mounting different disk groups, or in the case of a single
ASM instance configuration, if an ASM instance fails when ASM metadata is open for
update, then the disk groups that are not currently mounted by any other ASM instance
are not recovered until they are mounted again.
When an ASM instance mounts a failed disk group, it reads the disk group log and
recovers all transient changes. This situation is called ASM crash recovery.
Therefore, when using ASM clustered instances, it is recommended to have all ASM
instances always mounting the same set of disk groups. However, it is possible to have a

disk group on locally attached disks that are visible to only one node in a cluster, and
have that disk group mounted on only that node where the disks are attached.

Note
The failure of an Oracle database instance is not significant here because only
ASM instances update ASM metadata.
In order to enable ASM instances to be clustered together in a RAC environment, each
ASM instance initialization parameter file must set its CLUSTER_DATABASE parameter to
TRUE.
This enables the global cache services to be started on each ASM instance. Although it is
possible for multiple ASM instances to have different values for their ASM_DISKGROUPS
parameter, it is recommended for each ASM instance to mount the same set of disk
groups. This enables disk groups to be shared among ASM instances for recovery
purposes.
In addition, all disk groups used to store one RAC database must be shared by all ASM
instances in the cluster.
Consequently, if you are sharing disk groups among ASM instances, their
ASM_DISKSTRING initialization parameter must point to the same set of physical media.
However, this parameter does not need to have the same setting on each node.
For example, assume that the physical disks of a disk group are mapped by the OS on
node A as /dev/rdsk/c1t1d0s2, and on node B as /dev/rdsk/c2t1d0s2. Although
both nodes have different disk string settings, they locate the same devices via the OS
mappings.
This situation can occur when the hardware configurations of node A and node B are
different for example, when nodes are using different controllers. ASM handles this
situation because it inspects the contents of the disk header block to determine the disk
group to which it belongs, rather than attempting to maintain a fixed list of path names.
You can use SRVCTL to perform the ASM administration tasks such as
ADD
The ADD option adds Oracle Cluster Registry or OCR information about an ASM instance
to run under Oracle Clusterware or OC. This option also enables the recourse.
ENABLE
The ENABLE option enables an ASM instance to run under OC for automatic startup, or
restart.
DISABLE

The DISABLE option disables an ASM instance to prevent OC inappropriate automatic


restarts. DISABLE also prevents any startup of that ASM instance using SRVCTL.
START, and
The START option starts an OC-enabled ASM instance. SRVCTL uses the SYSDBA
connection to perform the operation.
STOP
The STOP option stops an ASM instance by using the shutdown normal, transactional,
immediate, or abort option.
Other tasks that you can perform using SRVCTL are
CONFIG
The CONFIG option displays the configuration information stored in the OCR for a
particular ASM instance.
STATUS
The STATUS option obtains the current status of an ASM instance.
REMOVE, and
The REMOVE option removes the configuration of an ASM instance.
MODIFY INSTANCE
The MODIFY INSTANCE command can be used to establish a dependency between an
ASM instance and a database instance.
Adding and enabling an ASM instance is automatically performed by the DBCA when
creating the ASM instance.
These are some examples where ASM and SRVCTL is used with RAC.

Code
$ srvctl start asm n clusnode1
$ srvctl stop asm n clusnode1 o immediate
$ srvctl add asm -n clusnode1 -i +ASM1 -o /ora/ora10
$ srvctl modify instance -d crm -i crm1 -s +asm1
$ srvctl disable asm n clusnode1 i +ASM1
$ srvctl start asm n clusnode1

This example starts up the only existing ASM instance on the CLUSNODE1 node. The o
option allows you to specify in which mode you want to open the instance: open is the
default, but you can also specify mount or nomount.
$ srvctl stop asm n clusnode1 o immediate
This example is of an immediate shutdown of the only existing ASM instance on
CLUSNODE1.
$ srvctl add asm -n clusnode1 -i +ASM1 -o /ora/ora10
$ srvctl modify instance -d crm -i crm1 -s +asm1
This example adds to the OCR the OC information for +ASM1 on CLUSNODE1. You need to
specify the ORACLE_HOME of the instance. Although the following should not be needed in
case you use the DBCA, if you manually create ASM instances, you should also create
OC dependency between database instances and ASM instances to ensure that the ASM
instance starts up before starting database instance and to allow database instances to be
cleanly shut down before ASM instances. To establish the dependency, you have to use a
command similar to the command srvctl modify instance -d crm -i crm1 -s
+asm1 for each corresponding instance.
$ srvctl disable asm n clusnode1 i +ASM1
This example prevents OC to automatically restart +ASM1.
When you add a new disk group from an ASM instance, this disk group is not
automatically mounted by other ASM instances. If you want to mount the newly added
disk group on all ASM instances, for example, by using SQL*Plus, then you need to
manually mount the disk group on each ASM instance.
However, if you are using Enterprise Manager or EM to add a disk group, then the disk
group definition includes a checkbox to indicate whether the disk group is automatically
mounted to all the ASM clustered database instances. This is also true when you mount
and dismount ASM disk groups by using Database Control where you can use a
checkbox to indicate which instances mount or dismount the ASM disk group.

Graphic
The Automatic Storage Management: +ASM1_vx0306.us.oracle.com page
contains the Create, Mount All, and Dismount All buttons. This page contains the
Mount this disk group on all Automatic Storage Management instances in this
cluster checkbox, which is selected.
On the Automatic Storage Management Performance page, click the Disk Group I/O
Cumulative Statistics link in the Additional Monitoring Links section. On the Disk Group
I/O Cumulative Statistics page, click the corresponding disk group name.
A performance page is displayed showing clusterwide performance information for the

corresponding disk group. By clicking one of the proposed links, you can see an instancelevel performance details graph.

Graphic
The Disk Group: FRA page contains the General, Performance, Templates, and
Files tabs. The Performance tabbed page includes the MS per Operation and MB
per Second graphs in the Response Time section. The MB per Second graph
contains the I/O Throughput link.
This example describes the instance-level performance details graph for I/O Throughput
in disk group.

Graphic
This page contains a graph with MB per second on the Y axis. The legends of the
graph are +ASM2_vx0313.us.oracle.com and +ASM1_vx0306.us.oracle.com.

Summary
The ALTER SYSTEM QUIESCE RESTRICTED statement is used to quiesce one instance of
a RAC database. During this process, the database cannot be opened from another
instance. After all non-DBA instances are inactive, the ALTER SYSTEM QUIESCE
RESTRICTED statement executes and the database is said to be quiesced.
After the process of quiescing the RAC database, one part of the ASM code initiates a
special instance called ASM instance. ASM instances do not mount databases but
manage the metadata so that the ASM file is available for an ordinary database. Both
ASM instances and database instances have access to a common disk called Disks
group.

Managing Oracle Clusterware and Resources


Learning Objectives

After completing this topic, you should be able to

identify ways to manage Oracle Clusterware and resources

identify the functions of the voting disk and important CSS parameters

1. Oracle Clusterware and resources

Oracle Clusterware is a portable cluster infrastructure that provides High Availability or


HA to RAC databases and other applications.
Oracle Clusterware makes applications highly available by monitoring the health of the
applications, by restarting applications on failure, by relocating applications to another
cluster node when the currently used node fails or when the application can no longer run
in the current node.
In the case of node failure, certain types of protected applications, such as a database
instance, are not failed over surviving nodes.
Here, a cluster is a collection of two or more nodes where the nodes share a common
pool of storage used by the Oracle Clusterware system files (OCR and voting disk), a
common network interconnect, and a common operating system.
The Oracle Clusterware example depicts a possible three-node configuration where
Node1 runs a RAC database instance, a listener, and application A, all protected by
Oracle Clusterware.
On Node2, only one RAC database instance and a listener are protected by Oracle
Clusterware. On Node3, one application B is protected by Oracle Clusterware.
Oracle Clusterware monitors all protected applications periodically, and based on the
defined failover policy, it can restart them either on the same node or relocate them to
another node, or it can decide not to restart them at all.

Graphic
This example includes three nodes connected with Oracle Clusterware system
files. Node1 includes Protected App A, RAC DB Inst, Listener, ORACLE_HOME,
and CRS HOME. Node2 includes ORACLE_HOME, and CRS HOME to its left
and RAC DB Inst and Listener to its right. Node3 includes Protected App B and
CRS HOME.

Note
Although Oracle Clusterware is a required component for using RAC, it does not
require a RAC license when used only to protect applications other than RAC
databases.
On UNIX, the Oracle Clusterware stack is run from entries in /etc/inittab with
respawn. On Windows, it is run using the services controller.
The Cluster Synchronization Services Daemon or OCSSD process runs in both vendor
clusterware and nonvendor clusterware environments. It integrates with existing vendor

clusterware, when present.


OCSSDs primary job is internode health monitoring, primarily using the network
interconnect as well as voting disks, and database/ASM instance endpoint discovery via
group services. OCSSD runs as user oracle, and failure exit causes machine reboot to
prevent data corruption in the event of a split brain.

Graphic
In the flowchart, init includes four processes namely oprocd, evmd, ocssd, and
crsd. The evmd process forwards to evmlogger and then to racgevtf and invokes
callout. ocssd integrates with Voting disk. The crsd process maintains OCR along
with racgwrap, racgmain, and racgimon to execute action.
The Process Monitor Daemon process or OPROCD is spawned in any nonvendor
clusterware environment. If OPROCD detects problems, it kills a node. It runs as root.
This daemon is used to detect hardware and driver freezes on the machine.
If a machine was frozen long enough for the other nodes to evict it from the cluster, it
needs to kill itself to prevent any I/O from being reissued to the disk after the rest of the
cluster has remastered locks.
The Event Management Daemon or EVMD process forwards cluster events when things
happen. It spawns a permanent child evmlogger that, on demand, spawns children such
as racgevtf to invoke callouts. It runs as oracle, and is restarted automatically on
failure.
The Cluster Ready Services Daemon or CRSD process is the engine for High Availability
operations. It manages Oracle Clusterware registered applications and starts, stops,
checks, and fails them over via special action scripts.
CRSD spawns dedicated processes called RACGIMON that monitor the health of the
database and ASM instances and host various feature threads such as Fast Application
Notification or FAN. One RACGIMON process is spawned for each instance.
CRSD maintains configuration profiles as well as resource statuses in OCR or Oracle
Cluster Registry. It runs as root and is restarted automatically on failure.
In addition, CRSD can spawn temporary children to execute particular actions:

racgeut (Execute Under Timer), to kill actions that do not complete after a certain amount of
time
racgmdb (Manage Database), to start/stop/check instances

racgchsn (Change Service Name), to add/delete/check service names for instances

racgons, to add/remove ONS configuration to OCR, and

racgvip, to start/stop/check instance virtual IP


The RACG infrastructure is used to deploy the Oracle Database in a highly available
clustered environment. This infrastructure is mainly implemented using the racgwrap
script that invokes the racgmain program.
It is used by CRS to execute actions for all node-centric resources as well as to proxy
actions for all instance-centric resources to RACGIMON. Basically, this infrastructure is
responsible for managing all ora.* resources.

Question
Which process of the Oracle Clusterware stack is the engine for High Availability
operations?
Options:
1.

CRSD

2.

EVMD

3.

OCSSD

4.

OPROCD

Answer
Option 1: Correct. The Cluster Ready Services Daemon or CRSD process is the
engine for High Availability operations. It manages Oracle Clusterware registered
applications and starts, stops, checks, and fails them over via special action
scripts. CRSD spawns dedicated processes called RACGIMON that monitor the
health of the database and ASM instances and host various feature threads such
as Fast Application Notification or FAN. One RACGIMON process is spawned for
each instance. CRSD maintains configuration profiles as well as resource statuses
in OCR or Oracle Cluster Registry. It runs as root and is restarted automatically on
failure.
Option 2: Incorrect. The Event Management Daemon or EVMD process forwards
cluster events when the events occur. It spawns a permanent child evmlogger
that, on demand, spawns children such as racgevtf to invoke callouts. It runs as
oracle, and is restarted automatically on failure.
Option 3: Incorrect. The Cluster Synchronization Services Daemon or OCSSD
process runs in both vendor clusterware and nonvendor clusterware

environments. It integrates with existing vendor clusterware, when present.


OCSSDs primary job is internode health monitoring, primarily using the network
interconnect, as well as voting disks, and database/ASM instance endpoint
discovery via group services. OCSSD runs as user oracle, and failure exit causes
machine reboot to prevent data corruption in the event of a split brain.
Option 4: Incorrect. The Process Monitor Daemon or OPROCD process is
spawned in any nonvendor clusterware environment. If OPROCD detects
problems, it kills a node. It runs as root and is used to detect hardware and driver
freezes on the machine. If a machine was frozen long enough for the other nodes
to evict it from the cluster, it needs to kill itself to prevent any I/O from being
reissued to the disk after the rest of the cluster has remastered locks.
Correct answer(s):
1. CRSD
When a node of Oracle Clusterware comes up, the Oracle Clusterware processes start
up automatically. You can control this by using crsctl commands.
You may have to manually control the Oracle Clusterware stack while applying patches or
during any planned outages.

Code
#
#
#
#

crsctl
crsctl
crsctl
crsctl

stop crs -wait


start crs -wait
disable crs
enable crs

In addition, these commands can be used by third-party clusterware when used in


combination with Oracle Clusterware. You can stop the Oracle Clusterware stack by using
the crsctl stop crs command.
You can also start the Oracle Clusterware stack by using the crsctl start crs
command. The wait option displays progress and status for each daemon. Without this
option, the command returns immediately.
Use the crsctl disable crs command to disable Oracle Clusterware from being
started in a subsequent reboot. This command does not stop the currently running Oracle
Clusterware stack. Use the crsctl enable crs command to enable Oracle Clusterware
to be started in a subsequent reboot.

Code

#
#
#
#

crsctl
crsctl
crsctl
crsctl

stop crs -wait


start crs -wait
disable crs
enable crs

Note
You must run these commands as root.
CRS is the primary program for managing High Availability operations of applications
within the cluster. Applications that CRS manages are called resources. By default, CRS
can manage RAC resources such as database instance, ASM instances, listeners,
instance VIPs, services, ONS, and GSD.
However, CRS is also able to manage other types of application processes and
application VIPs. CRS resources are managed according to their configuration
parameters (resource profile) stored in OCR and an action script stored anywhere you
want.
The resource profile contains information such as the check interval, failure policies, the
name of the action script, privileges that CRS should use to manage the application, and
resource dependencies. The action script must be able to start, stop, and check the
application.
CRS provides some important facilities to support the life cycle of a resource:

Graphic
Life cycle of a resource diagram begins with crs_profile and proceeds to
crs_register, crs_start, crs_stat, crs_relocate, crs_stop, and finally crs_unregister.

crs_profile creates and edits a resource profile.

crs_register adds the resource to the list of applications managed by CRS.

crs_start starts the resource according to its profile. After a resource is started, its application
process is continuously monitored by CRS using a check action at regular intervals. Also, when the
application goes offline unexpectedly, it is restarted and/or failed over to another node according to
its resource profile.

crs_stat informs you about the current status of a list of resources.

crs_relocate moves the resource to another node of the cluster.

crs_unregister removes the resource from the monitoring scope of CRS.

The crs_stat t command shows you all the resources that are currently under Oracle
Clusterware control. In the example, only resources starting with the prefix ora. exist.
These are the resources that implement RAC high availability in a clustered environment.
By default, Oracle Clusterware can control databases, database and ASM instances,
VIP/ONS/GSD/Listener (also called nodeapps), services, and service members.

Code
$ <CRS HOME>/bin/crs_stat -t
Name
Type
Target
State
Host
-----------------------------------------------------------------------------------------------ora.atlhp8.ASM1.asm
application
ONLINE
ONLINE
atlhp8
ora.atlhp8.LISTENER_ATLHP8.lsnr application
ONLINE
ONLINE
atlhp8
ora.atlhp8.gsd
application
ONLINE
ONLINE
atlhp8
ora.atlhp8.ons
application
ONLINE
ONLINE
atlhp8
ora.atlhp8.vip
application
ONLINE
ONLINE
atlhp8
ora.atlhp9.ASM2.asm
application
ONLINE
ONLINE
atlhp9
ora.atlhp9.LISTENER_ATLHP9.lsnr application
ONLINE
ONLINE
atlhp9
ora.atlhp9.gsd
application
ONLINE
ONLINE
atlhp9
ora.atlhp9.ons
application
ONLINE
ONLINE
atlhp9
ora.atlhp9.vip
application
ONLINE
ONLINE
atlhp9
ora.xwkE.JF1.cs
application
ONLINE
ONLINE
atlhp8
ora.xwkE.JF1.xwkE1.srv
application
ONLINE
ONLINE
atlhp8
ora.xwkE.JF1.xwkE2.srv
application
ONLINE
ONLINE
atlhp9
ora.xwkE.db
application
ONLINE
ONLINE
atlhp9
ora.xwkE.xwkE1.inst
application
ONLINE
ONLINE
atlhp8

ora.xwkE.xwkE2.inst
ONLINE
atlhp9

application

ONLINE

If the Target status for the resources is ONLINE, it means that at next node restart,
Oracle Clusterware will try to start them up automatically.

Graphic
The Target status for the resource in the code is shown as the following:
Target
-----------ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLIINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE

Code
$ <CRS HOME>/bin/crs_stat -t
Name

Type
Target
State
Host
-----------------------------------------------------------------------------------------------ora.atlhp8.ASM1.asm
application
ONLINE
ONLINE
atlhp8
ora.atlhp8.LISTENER_ATLHP8.lsnr application
ONLINE
ONLINE
atlhp8
ora.atlhp8.gsd
application
ONLINE
ONLINE
atlhp8
ora.atlhp8.ons
application
ONLINE
ONLINE
atlhp8
ora.atlhp8.vip
application

ONLINE
ONLINE
atlhp8
ora.atlhp9.ASM2.asm
ONLINE
ONLINE
atlhp9
ora.atlhp9.LISTENER_ATLHP9.lsnr
ONLINE
ONLINE
atlhp9
ora.atlhp9.gsd
ONLINE
ONLINE
atlhp9
ora.atlhp9.ons
ONLINE
ONLINE
atlhp9
ora.atlhp9.vip
ONLINE
ONLINE
atlhp9
ora.xwkE.JF1.cs
ONLINE
ONLINE
atlhp8
ora.xwkE.JF1.xwkE1.srv
ONLINE
ONLINE
atlhp8
ora.xwkE.JF1.xwkE2.srv
ONLINE
ONLINE
atlhp9
ora.xwkE.db
ONLINE
ONLINE
atlhp9
ora.xwkE.xwkE1.inst
ONLINE
ONLINE
atlhp8
ora.xwkE.xwkE2.inst
ONLINE
atlhp9

application
application
application
application
application
application
application
application
application
application
application

ONLINE

State shows you the current status of the resource. Target can be ONLINE or
OFFLINE. State can be ONLINE, OFFLINE, or UNKNOWN. UNKNOWN results from a failed
start/stop action, and can be reset only by a crs_stop -f resourceName command.
The combination of Target and State can be used to derive whether a resource is
starting or stopping. Host shows you the name of the host on which the resource is
managed.

Code
$ <CRS HOME>/bin/crs_stat -t
Name

Type
Target
State
Host
-----------------------------------------------------------------------------------------------ora.atlhp8.ASM1.asm
application
ONLINE
ONLINE
atlhp8
ora.atlhp8.LISTENER_ATLHP8.lsnr application
ONLINE
ONLINE
atlhp8
ora.atlhp8.gsd
application
ONLINE
ONLINE
atlhp8

ora.atlhp8.ons
ONLINE
atlhp8
ora.atlhp8.vip
ONLINE
atlhp8
ora.atlhp9.ASM2.asm
ONLINE
atlhp9
ora.atlhp9.LISTENER_ATLHP9.lsnr
ONLINE
atlhp9
ora.atlhp9.gsd
ONLINE
atlhp9
ora.atlhp9.ons
ONLINE
atlhp9
ora.atlhp9.vip
ONLINE
atlhp9
ora.xwkE.JF1.cs
ONLINE
atlhp8
ora.xwkE.JF1.xwkE1.srv
ONLINE
atlhp8
ora.xwkE.JF1.xwkE2.srv
ONLINE
atlhp9
ora.xwkE.db
ONLINE
atlhp9
ora.xwkE.xwkE1.inst
ONLINE
atlhp8
ora.xwkE.xwkE2.inst
ONLINE
atlhp9

application

ONLINE

application

ONLINE

application

ONLINE

application

ONLINE

application

ONLINE

application

ONLINE

application

ONLINE

application

ONLINE

application

ONLINE

application

ONLINE

application

ONLINE

application

ONLINE

application

ONLINE

Note
Using the crs_stat t command truncates the resource names for formatting
reasons. The output example reestablishes entire names for clarity purposes.
You can use the crs_stat p resource_name command to print the OCR contents for
the named resource. The example indicates what you get for a RAC database instance.
Not all attributes are mandatory for each resource.
The following describes the most important attributes:

Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1

CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

NAME is the name of the application resource


The NAME attribute in the code is displayed as the following:
NAME=ora.JFDB.JFDB1.inst
Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

TYPE must be APPLICATION for all CRS resources


The TYPE attribute in the code is displayed as the following:
TYPE=application
Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance

FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

ACTION_SCRIPT is the name and location of the action script used by CRS to start, check, and
stop the application. The default path is <CRS HOME>/crs/script, and
The ACTION_SCRIPT attribute in the code is displayed as the following:
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

ACTIVE_PLACEMENT defaults to 0. When set to 1, Oracle Clusterware reevaluates the placement


of a resource during addition or restart of a cluster node
The ACTIVE_PLACEMENT attribute in the code is displayed as the following:
ACTIVE_PLACEMENT=0
Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1

CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

There are other attributes:

Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

AUTO_START is a flag indicating whether Oracle Clusterware should automatically start a


resource after a cluster restart, regardless of whether the resource was running before the cluster
restart. When set to 0, Oracle Clusterware starts the resource only if it was running before the
restart. When set to 1, Oracle Clusterware always starts the resource after a restart. When set to 2,
Oracle Clusterware never restarts the resource (regardless of the resources state when the node
stopped.)
The AUTO_START attribute in the code is displayed as the following:
AUTO_START=1
Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap

ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

CHECK_INTERVAL is the time interval, in seconds, between repeated executions of the check
command for the application.
The CHECK_INTERVAL attribute in the code is displayed as the following:
CHECK_INTERVAL=600
Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

DESCRIPTION is a description of the resource.


The DESCRIPTION attribute in the code is displayed as the following:
DESCRIPTION=CRS application for Instance
Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application

ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

FAILOVER_DELAY is the amount of time, in seconds, that Oracle Clusterware waits before
attempting to restart or fail over a resource.
The FAILOVER_DELAY attribute in the code is displayed as the following:
FAILOVER_DELAY=0
Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

More attributes are described:

Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap

ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

FAILURE_INTERVAL is the interval, in seconds, during which Oracle Clusterware applies the
failure threshold. If the value is zero (0), then tracking of failures is disabled.
The FAILURE_INTERVAL attribute in the code is shown as the following:
FAILURE_INTERVAL=0
Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

FAILURE_THRESHOLD is the number of failures detected within a specified FAILURE_INTERVAL


before Oracle Clusterware marks the resource as unavailable and no longer monitors it. If a
resources check script fails this several times, then the resource is stopped and set offline. If the
value is zero (0), then tracking of failures is disabled. The maximum value is 20.
The FAILURE_THRESHOLD attribute in the code is shown as the following:
FAILURE_THRESHOLD=0
Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst

TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

HOSTING_MEMBERS is an ordered list of cluster nodes separated by blank spaces that can host
the resource. Run the olsnodes commands to see your node names.
The HOSTING_MEMBERS attribute in the code is shown as the following:
HOSTING_MEMBERS=atlhp8
Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

PLACEMENT defines these placement policies to specify how Oracle Clusterware chooses
the cluster node on which to start the resource:

Graphic

The PLACEMENT attribute in the code is shown as the following:


PLACEMENT=restricted

Code
$ <CRS HOME>/bin/crs_stat -p ora.JFDB.JFDB1.inst
NAME=ora.JFDB.JFDB1.inst
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10g/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for Instance
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=atlhp8
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5

$ <CRS HOME>/bin/crs_stat t ora.xwkE.xwkE1.inst


balanced
Oracle Clusterware favors starting or restarting the application on the node that is currently
running the fewest resources. The host with the fewest resources running is chosen. If no
node is favored by these criteria, then any available node is chosen.
favored, and
Oracle Clusterware refers to the list of nodes in the HOSTING_MEMBERS attribute of the
application profile. Only cluster nodes that are in this list and that satisfy the resource
requirements are eligible for placement consideration.
The order of the hosting nodes determines which node runs the application. If none of the
nodes in the hosting node list are available, then Oracle Clusterware places the application
on any available node. This node may or may not be included in the HOSTING_MEMBERS
list.
restricted
Similar to the favored policy, except that if none of the nodes on the hosting list are
available, then Oracle Clusterware does not start or restart the application. A restricted
placement policy ensures that the application never runs on a node that is not on the list,
even if you manually relocate it to that node.

The remaining important attributes are REQUIRED_RESOURCES, which is an ordered list


of resource names separated by blank spaces that this resource depends on. Oracle
Clusterware relocates or stops an application if a required resource becomes unavailable.
Therefore, in the example on the previous page, it is clear that to start the JFDB1
instance, the ASM instance ASM1 must be started first.
And RESTART_ATTEMPTS is the number of times that Oracle Clusterware only attempts
to restart a resource on a single cluster node before attempting to relocate the resource.
After the time period that you have indicated by the setting for UPTIME_THRESHOLD has
elapsed, Oracle Clusterware resets the value for the restart counter (RESTART_COUNTS)
to 0. Basically, RESTART_COUNTS cannot exceed RESTART_ATTEMPTS for the
UPTIME_THRESHOLD period.

Code
REQUIRED_RESOURCES=ora.atlhp8.ASM1.asm
RESTART_ATTEMPTS=5
The crs_stat t resource_name command shows you the named resources states.
The Target status for the resource is ONLINE meaning that at the next node restart,
Oracle Clusterware will try to start up the instance. State shows you the current status of
the instance.

Code
Name
Type
Target
State
----------------------------------------------------ora....E1.inst
application
ONLINE
ONLINE
atlhp8

Supplement
Selecting the link title opens the resource in a new browser window.

Managing Oracle Clusterware


View more information to learn about managing Oracle Clusterware and
resources.
Launch window

2. Voting Disk functions and CSS parameters

Host

CSS is the service that determines which nodes in the cluster are available, and provides
cluster group membership and simple locking services to the other processes.
CSS typically determines node availability via communication through a dedicated private
network with a voting disk used as a secondary communication mechanism.
Basically, this is done by sending heartbeat messages through the network and the voting
disk in the Main Voting Disk function.

Graphic
The Main Voting Disk function nodes contain three CSS nodes and a Voting disk.
The nodes are interconnected and connected to the Voting disk and, therefore,
can see each other.
The voting disk is a shared raw disk partition or file on a clustered file system that is
accessible to all nodes in the cluster. Its primary purpose is to help in situations where the
private network communication fails.
When that happens, the cluster is unable to have all nodes remain available because
they are no longer able to synchronize I/O to the shared disks. Therefore, some of the
nodes must go offline.
The voting disk is then used to communicate the node state information used to
determine which nodes go offline.

Graphic
The Main Voting Disk function contains a second set which is a split-brain set. It
contains three CSS nodes and one voting disk. In this set, the third node can no
longer communicate through private interconnect. Other nodes cannot see its
heartbeats and evict that node using the voting disk. After eviction, the third node
stops.
Without the voting disk, it can become impossible for isolated node(s), to determine
whether it is experiencing a network failure or whether the other nodes are no longer
available.
It would then be possible for the cluster to get into a state where multiple subclusters of
nodes would have unsynchronized access to the same database files.
This situation is commonly referred to as the cluster split-brain problem.
When others can no longer see Node3's heartbeats, they decide to evict that node by
using the voting disk.

When Node3 reads the removal message, it generally reboots itself to make sure all
outstanding write I/Os are lost.
In addition to the voting disk mechanism, a similar mechanism also exists for RAC
database instances. At the instance level, the control file is used by all participating
instances for voting. This is necessary because there can be cases where instances
should be evicted, even if network connectivity between nodes is still in good shape.
For example, if LMON or LMD is stuck on one instance, it could then be possible to end
up with a frozen cluster database. Therefore, instead of allowing a clusterwide hang to
occur, RAC evicts the problematic instance(s) from the cluster. When the problem is
detected, the instances race to get a lock on the control file. The instance that obtains the
lock tallies the votes of the instances to decide membership. This is called Instance
Membership Reconfiguration or IMR.
The CSS misscount parameter represents the maximum time, in seconds, that a
network heartbeat across the interconnect can be missed before entering into a cluster
reconfiguration for node eviction purposes.
The default value for the misscount parameter value is 30 seconds. The misscount
parameters value drives cluster membership reconfigurations and directly affects the
availability of the cluster. Its default settings should be acceptable.
Modifying this value not only can influence the timeout interval for the I/O to the voting
disk, but also influences the tolerance for missed network heartbeats across the
interconnect. This directly affects database and cluster availability.
The CSS misscount default value, when using vendor (non-Oracle) clusterware, is also
30 seconds, and you should not change the default misscount value if you are using
vendor clusterware.
The CSS disktimeout parameter represents the maximum time, in seconds, that a disk
heartbeat can be missed (outside cluster reconfiguration events) before entering into a
cluster reconfiguration for node eviction purposes.
Its default value is 200 seconds.
However, if I/O latencies to the voting disk are greater than the default internal I/O
timeout, the cluster may experience CSS node evictions.
The most common cause in these latencies relates to multipath I/O software drivers and
the reconfiguration times resulting from a failure in the I/O path.
Therefore, until the underlying storage I/O latency is resolved, disktimeout could be

temporarily modified based only on maximum I/O latency to the voting disk including
latencies resulting from I/O path reconfiguration plus one second (M+1).

Question
What are characteristics of the CSS misscount parameter?
Options:
1.

Defaults to 200 seconds

2.

Represents network heartbeat timeouts

3.

Determines disk I/O timeouts during reconfiguration

4.

Can be temporarily changed when experiencing very long I/O latencies to voting
disks

Answer
Option 1: Incorrect. The default value for the disktimeout parameter is 200
seconds. The default value for the misscount parameter value is 30 seconds.
Option 2: Correct. The CSS misscount parameter represents the maximum time,
in seconds, that a network heartbeat across the interconnect can be missed
before entering into a cluster reconfiguration for node eviction purposes.
Option 3: Correct. The misscount parameters value drives cluster membership
reconfigurations and directly affects the availability of the cluster. Its default
settings should be acceptable. Modifying this value not only can influence the
timeout interval for the I/O to the voting disk, but also influences the tolerance for
missed network heartbeats across the interconnect. This directly affects database
and cluster availability.
Option 4: Incorrect. You should not change the default misscount value if you are
using vendor clusterware. The CSS disktimeout parameter can be changed
temporarily when experiencing very long I/O latencies to voting disks.
Correct answer(s):
2. Represents network heartbeat timeouts
3. Determines disk I/O timeouts during reconfiguration

Summary
RAC databases and other applications get High Availability or HA function from Oracle
Clusterware. You can manage the Oracle Clusterware stack using the crsctl command.
RAC resources such as database instances, ASM instances, listeners, instance VIPs,

services, ONS, and GSD are managed by CRS, which is a program to manage HA
operations of application within the cluster.
All nodes in the cluster are accessible by a shared raw disk partition called the voting
disk, which helps if the private network communication fails. A state where multiple
subclusters of nodes have unsynchronized access to the same database files is
commonly referred to as the cluster split-brain problem.

Managing Voting Disks and OCR


Learning Objectives

After completing this topic, you should be able to

recognize steps you take to configure voting disks

recognize how to manage OCR files and locations

1. Configuring voting disks


CSS availability can be improved by configuring it with multiple voting disks. Using only
one voting disk is adequate for clusters that are configured to use a single, highly
available shared disk, where both the database files and the CSS voting disk reside.
However, it is desirable to use multiple copies of the voting disk when using less reliable
storage.
Also, you can use multiple voting disks so that you do not have to rely on a multipathing
solution. The way voting disk multiplexing is implemented forces you to have at least
three voting disks.
To avoid a single point of failure, your multiplexed voting disk should be located on
physically independent storage devices with a predictable load well below saturation.
When using multiplexed copies of the voting disk, CSS multiplexes voting data to all the
voting disks.When CSS needs to read the voting disk, it reads all the information from all
the voting disks. If strictly more than half of the voting disks are up and contain consistent
information, CSS can use that consistent data in the same way as a single voting disk
configuration.
If less than half of the voting disks have readable consistent data, CSS will need to selfterminate like in the situation where a single voting disk cannot be read by CSS. This selftermination is to prevent disjoint subclusters from forming.
You can have up to 32 voting disks, but use the v = f *2+1 formula to determine the

number of voting disks where v is the number of voting disks, and f is the number of
disk failures you want to survive.

Note
A typical voting disk configuration comprises between three and five disks.
During Oracle Clusterware installation, you can multiplex your voting disk by using the
Specify Voting Disk Location screen of the Oracle Universal Installer. This allows you to
specify three voting disk locations.
However, you can dynamically add and remove voting disks after installing Oracle
Clusterware by using the following commands as the root user:

Code
# crsctl add css votedisk <new voting disk path>
# crsctl delete css votedisk <old voting disk path>

to add a voting disk, you can select this path, and


The command to add a voting disk is:
# crsctl add css votedisk <new voting disk path>
Code
# crsctl add css votedisk <new voting disk path>
# crsctl delete css votedisk <old voting disk path>

to remove a voting disk, you can select this path


The command to remove a voting disk is:
# crsctl delete css votedisk <old voting disk path>
Code
# crsctl add css votedisk <new voting disk path>
# crsctl delete css votedisk <old voting disk path>

Note
Here both paths are fully qualified paths.

If your cluster is down, you can use the -force option (at the very end of the crsctl
command) to modify the voting disk configuration with either of these commands without
interacting with active Oracle Clusterware daemons. However, using the -force option
while any cluster node is active may corrupt your configuration.

Code
# crsctl add css votedisk <new voting disk path> -force
# crsctl delete css votedisk <old voting disk path> -force

Supplement
Selecting the link title opens the resource in a new browser window.

Voting disk configuration


View information about adding mirrors to your voting disk configuration.
Launch window

Question
Voting disks are a vital resource for your cluster availability. What considerations
must be made when using voting disk multiplexing?
Options:
1.

You must have a minimum of 3 voting disks

2.

You can have up to a maximum of 16 voting disks

3.

You should use multiple voting disks when using less than reliable storage

4.

You must specify the voting disk locations during the installation of Oracle
Clusterware

Answer
Option 1: Correct. The way voting disk multiplexing is implemented forces you to
have at least three voting disks.To avoid a single point of failure, your multiplexed
voting disk should be located on physically independent storage devices with a
predictable load well below saturation. A typical voting disk configuration
comprises between three and five disks.
Option 2: Incorrect. You can have up to 32 voting disks, but use the following
formula to determine the number of voting disks you should use: v = f *2+1, where

v is the number of voting disks, and f is the number of disk failures you want to
survive.
Option 3: Correct. Using only one voting disk is adequate for clusters that are
configured to use a single, highly available shared disk, where both the database
files and the CSS voting disk reside. However, it is desirable to use multiple copies
of the voting disk when using less reliable storage. Also, you can use multiple
voting disks so that you do not have to rely on a multipathing solution.
Option 4: Incorrect. Voting disk configuration can be changed dynamically. You
can add and remove voting disks after installing Oracle Clusterware by using the
crsctl add css votedisk path and crsctl delete css votedisk path
commands, respectively.
Correct answer(s):
1. You must have a minimum of 3 voting disks
3. You should use multiple voting disks when using less than reliable storage
There should be no need to back up a voting disk. Simply add a new one and drop a bad
one. It is recommended to use symbolic links to specify your voting disk paths. This is
because the voting disk paths are directly stored in OCR, and editing the OCR file directly
is not supported.
By using symbolic links to your voting disks, it becomes easier to restore your voting
disks if their original locations can no longer be used as a restore location.

Code
$ crsctl query css votedisk
$ dd if=<voting disk path> of=<backup path> bs=4k
A new backup of one of your available voting disks should be taken any time a new node
is added, or an existing node is removed. The recommended way to do that is to use the
dd command (ocopy in Windows environments.)
As a general rule on most platforms, including Linux and Sun, the block size for the dd
command should be at least 4 KB to ensure that the backup of the voting disk gets
complete blocks.
Before backing up your voting disk with the dd command, make sure that you stopped
Oracle Clusterware on all nodes. The crsctl query css votedisk command lists the
voting disks currently used by CSS. This can help you to determine which voting disk to
backup and you can follow the procedure to back up and restore your voting disk.

Code
$ crsctl query css votedisk
$ dd if=<voting disk path> of=<backup path> bs=4k

Note
If you lose all your voting disks and you do not have any backup, you must
reinstall Oracle Clusterware.

Question
Which are true statements concerning the backup and recovery of your voting
disks in a Linux environment?
Options:
1.

Backing up a voting disk using the dd command must be done offline

2.

It is recommended that you edit the OCR directly when specifying the voting disk
paths

3.

The crsctl query css votedisk command is used to determine the number of
voting disks that are needed

4.

If no voting disk backup is available when all of your voting disks are lost, you must
reinstall Oracle Clusterware

Answer
Option 1: Correct. A new backup of one of your available voting disks should be
taken any time a new node is added, or an existing node is removed. The
recommended way to do that in a Linux environment is to use the dd command.
Before backing up your voting disk with the dd command, make sure that you
have stopped Oracle Clusterware on all nodes.
Option 2: Incorrect. It is recommended that you use symbolic links to specify your
voting disk paths. This is because the voting disk paths are directly stored in OCR,
and editing the OCR file directly is not supported. By using symbolic links to your
voting disks, it becomes easier to restore your voting disks if their original
locations can no longer be used as a restore location.
Option 3: Incorrect. The crsctl query css votedisk command lists the voting
disks currently used by CSS. This can help you to determine which voting disk to
back up.

Option 4: Correct. If you lose all your voting disks and you do not have any
backup, you must reinstall Oracle Clusterware. The recommended way to create a
backup of your voting disks in a Linux environment is to use the dd command.
Correct answer(s):
1. Backing up a voting disk using the dd command must be done offline
4. If no voting disk backup is available when all of your voting disks are lost, you
must reinstall Oracle Clusterware

2. Managing OCR files


Cluster configuration information is maintained in Oracle Cluster Registry or OCR. OCR
relies on a distributed shared-cache architecture for optimizing queries, and clusterwide
atomic updates against the cluster repository. Each node in the cluster maintains an inmemory copy of OCR, along with the Cluster Ready Services Daemon or CRSD that
accesses its OCR cache.
Only one of the CRS processes actually reads from and writes to the OCR file on shared
storage. This process is responsible for refreshing its own local cache, as well as the
OCR cache on other nodes in the cluster.
For queries against the cluster repository, the OCR clients communicate directly with the
local OCR process on the node from which they originate. When clients need to update
OCR, they communicate through their local CRS process to the CRS process that is
performing input/output or I/O for writing to the repository on disk.

Graphic
In this flowchart, there are three nodes named Node1, Node2, Node3 and a
component labeled Shared storage. The components in Node1 and Node3 are
OCR cache, CRS process, and Client process. These components are
interconnected. The components in Node2 are OCR cache, and CRS process.
These components are also interconnected. The components in Shared storage
are OCR primary file, and OCR mirror file and they're interconnected. The OCR
mirror file is highlighted. The Shared storage is connected to CRS process in
Node2. Similarly, CRS process in Node1 and Node3 are connected to CRS
process in Node2. The OCR cache in Node1 and Node3 are also connected to
CRS process in Node 2.
The main OCR client applications are the Oracle Universal Installer or OUI, SRVCTL,
Enterprise Manager or EM, the Database Configuration Assistant or DBCA, the Database
Upgrade Assistant or DBUA, NetCA, and the Virtual Internet Protocol Configuration
Assistant or VIPCA.

Furthermore, OCR maintains dependency and status information for application


resources defined within Oracle Clusterware, specifically databases, instances, services,
and node applications.
The installation process for Oracle Clusterware gives you the option of automatically
mirroring OCR. This creates a second OCR file (the OCR mirror file) to duplicate the
original OCR file (the primary OCR file.)
You can put the OCR mirror file on a cluster file system or on a shared raw device.
Although it is recommended to mirror your OCR, you are not forced to do it during
installation. The name of the OCR configuration file on a UNIX-based system is
ocr.loc, and the OCR file location variables are ocrconfig_loc and
ocrmirrorconfig_loc.
It is strongly recommended that you use mirrored OCR files if the underlying storage is
not RAID. This prevents OCR from becoming a single point of failure.

Note
OCR also serves as a configuration file in a single instance with the ASM, where
there is one OCR per node.
Every clustering technology requires a repository through which the clustering software
and other cluster-aware application processes can share information. Oracle Clusterware
uses Oracle Cluster Registry to store information about resources it manages. This
information is stored in a tree-like structure using key-value pairs.

Graphic
This flowchart is a tree structure with root as its main component. The main
branches are SYSTEM, DATABASE, and CRS. The subbranches of SYSTEM are
css, CRS HOME, evm, crs, and OCR. The subbranches of DATABASE are
NODEAPPS, LOG, ASM, DATABASES, and ONS. DATABASES is further
subdivided into SERVICE and INSTANCE. The CRS branch isn't subdivided.
Following are the main branches composing the OCR structure:

The SYSTEM keys contain data related to the main Oracle Clusterware processes such as CSSD,
CRSD, and EVMD. For example, CSSD keys contain information about the misscount parameter
and voting disk paths.

The DATABASE keys contain data related to the RAC databases that you registered with Oracle
Clusterware and you have information about instances, nodeapps, services, and so on.

The last category of keys that you can find in OCR relate to the resource profiles used by Oracle
Clusterware to maintain availability of the additional application you registered. These resources
include the additional application VIPs, the monitoring scripts, and the check interval values.
Using the ocrdump xml command, this XML data was obtained.

Code
- <KEY>
<NAME>SYSTEM.css.diskfile</NAME>
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
- <VALUE>
<![CDATA[ /dev/raw/raw2 ]]>
</VALUE>
- <KEY>
<NAME>DATABASE.DATABASES.xwke</NAME>
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
- <VALUE>
<![CDATA[ xwkE ]]>
</VALUE>
You use the ocrconfig tool (the main configuration tool for Oracle Cluster Registry) to
generate logical backups of OCR
Generate logical backups of OCR using the export option, and use them later to restore
your OCR information using the import option.
upgrade or downgrade OCR
Upgrade or downgrade OCR is one of the functions for which you can use the ocrconfig
tool.
use the showbackup option
Use the showbackup option to view the generated backups (by default, OCR is backed
up on a regular basis). These backups are generated in a default location that you can
change using the backuploc option. If need be, you can then restore physical copies of
your OCR using the restore option. You can also manually create OCR backups using
the -manualbackup option.
use the replace ocr option
Use the replace ocr or replace ocrmirror options to add, remove, or replace the
primary OCR files or the OCR mirror file.
use the overwrite option, and

Use the overwrite option under the guidance of Support Services because it allows
you to overwrite some OCR protection mechanisms when one or more nodes in your
cluster cannot start because of an OCR corruption.
use the repair option
Use the repair option to change the parameters listing the OCR and OCR mirror
locations.
The ocrcheck tool enables you to verify the OCR integrity of both OCR and its mirror.
Use the ocrdump utility to write the OCR contents (or part of it) to a text or an XML file.
OCR contains important cluster and database configuration information for RAC and
Oracle Clusterware. One of the Oracle Clusterware instances (CRSD master) in the
cluster automatically creates OCR backups every four hours, and CRS retains the last
three copies.
The CRSD process also creates an OCR backup at the beginning of each day and of
each week, and retains the last two copies and you can see the content of the default
backup directory of the CRSD master.

Code
$ cd $ORACLE_BASE/Crs/cdata/jfv_clus
$ ls -lt
-rw-r--r-- 1 root root 4784128 Jan
9
-rw-r--r-- 1 root root 4784128 Jan
9
-rw-r--r-- 1 root root 4784128 Jan
8
-rw-r--r-- 1 root root 4784128 Jan
8
-rw-r--r-- 1 root root 4784128 Jan
8
-rw-r--r-- 1 root root 4784128 Jan
6
-rw-r--r-- 1 root root 4005888 Dec
30

02:54
02:54
22:54
18:54
02:54
02:54
14:54

backup00.ocr
day_.ocr
backup01.ocr
backup02.ocr
day.ocr
week_.ocr
week.ocr

Although you cannot customize the backup frequencies or the number of retained copies,
you have the possibility to identify the name and location of the automatically retained
copies by using the ocrconfig -showbackup command.
The default target location of each automatically generated OCR backup file is this
directory.

Code
<CRS Home>/cdata/<cluster name>

It is recommended to change this location to one that is shared by all nodes in the cluster
by using the ocrconfig -backuploc <new location> command. This command
takes one argument that is the full path directory name of the new location.

Code
$ cd $ORACLE_BASE/Crs/cdata/jfv_clus
$ ls -lt
-rw-r--r-- 1 root root 4784128 Jan
9
-rw-r--r-- 1 root root 4784128 Jan
9
-rw-r--r-- 1 root root 4784128 Jan
8
-rw-r--r-- 1 root root 4784128 Jan
8
-rw-r--r-- 1 root root 4784128 Jan
8
-rw-r--r-- 1 root root 4784128 Jan
6
-rw-r--r-- 1 root root 4005888 Dec
30

02:54
02:54
22:54
18:54
02:54
02:54
14:54

backup00.ocr
day_.ocr
backup01.ocr
backup02.ocr
day.ocr
week_.ocr
week.ocr

# ocrconfig backuploc /shared/bak


Because of the importance of OCR information, it is also recommended to manually
create copies of the automatically generated physical backups.
You can use any backup software to copy the automatically generated backup files, and it
is recommended to do that at least once daily to a different device from where the primary
OCR resides.
You can perform an OCR backup on demand using the manualbackup option. The
backup is generated in the location that you specify with the -backuploc option.

Graphic
The code to perform an OCR backup is the following:
# ocrconfig manualbackup

Code
# ocrconfig manualbackup
# ocrconfig export file name
In addition, you should also export the OCR contents before and after making significant
configuration changes such as adding or deleting nodes from your environment,
modifying Oracle Clusterware resources, or creating a database.
Use the ocrconfig -export command as the root user to generate OCR logical
backups. You need to specify a file name as the argument of the command, and it

generates a binary file that you should not try to edit. Most configuration changes that you
make not only change the OCR contents but also cause file and database object creation.
Some of these changes are often not restored when you restore OCR. Do not perform an
OCR restore as a correction to revert to previous configurations if some of these
configuration changes fail. This may result in an OCR with contents that do not match the
state of the rest of your system.

Graphic
The code to generate OCR logical backup is the following:
# ocrconfig export file name

Code
# ocrconfig manualbackup
# ocrconfig export file name

Note
If you try to export OCR while an OCR client is running, you get an error.
Use the following steps in a procedure to restore OCR on UNIX-based systems:

Code
$
#
#
#
#
$

ocrconfig showbackup
ocrdump backupfile file_name
crsctl stop crs
ocrconfig restore <CRS HOME>/cdata/jfv_clus/day.ocr
crsctl start crs
cluvfy comp ocr -n all

identify the OCR backups


Identify the OCR backups by using the ocrconfig -showbackup command. You can
execute this command from any node as user oracle. The output tells you on which node
and which path to retrieve both automatically and manually generated backups. Use the
auto or manual argument to display only one category.
review the contents of the backup
Review the contents of the backup by using ocrdump -backupfile file_name, where
file_name is the name of the backup file.

stop Oracle Clusterware


Stop Oracle Clusterware on all the nodes of your cluster by executing the crsctl stop
crs command on all the nodes as the root user.
perform the restore by applying an OCR backup
Perform the restore by applying an OCR backup file that you identified in step one using
the following command as the root user, where file_name is the name of the OCR file that
you want to restore. Make sure that the OCR devices that you specify in the OCR
configuration file (/etc/oracle/ocr.loc) exist and that these OCR devices are valid
before running the ocrconfig -restore file_name command.
restart Oracle Clusterware, and
Restart Oracle Clusterware on all the nodes in your cluster by restarting each node or by
running the crsctl start crs command as the root user.
run the command to verify OCR integrity
Run the cluvfy comp ocr -n all command to verify OCR integrity, where the -n all
argument retrieves a listing of all the cluster nodes that are configured as part of your
cluster.
Use the following steps of the procedure to import OCR on UNIX-based systems:

Code
ocrconfig -export file_name
# crsctl stop crs
# ocrconfig import /shared/export/ocrback.dmp
# crsctl start crs
$ cluvfy comp ocr -n all
identify the OCR export file
Identify the OCR export file that you want to import by identifying the OCR export file that
you previously created.
stop Oracle Clusterware
Stop Oracle Clusterware on all the nodes in your RAC database by executing the crsctl
stop crs command on all the nodes as the root user.
perform the import by applying an OCR export file
Perform the import by applying an OCR export file that you identified in step one using the
following command, where file_name is the name of the OCR file from which you want to
import OCR information ocrconfig -import file_name.
restart Oracle Clusterware, and
Restart Oracle Clusterware on all the nodes in your cluster by restarting each node using
the crsctl start crs command as the root user.

run the following Cluster Verification Utility or CVU


Run the Cluster Verification Utility or CVU command to verify OCR integrity, where the -n
all argument retrieves a listing of all the cluster nodes that are configured as part of your
cluster cluvfy comp ocr -n all.
Consider this example that explains how to replace the existing OCR mirror file. It is
assumed that you already have an OCR mirror, and that this mirror is no longer working
as expected. Such reorganization can be triggered because you received an OCR failure
alert in Enterprise Manager, or because you saw an alert directly in the Oracle
Clusterware alert log file.
Using the ocrcheck command, you clearly see that the OCR mirror is no longer in sync
with the primary OCR. You then issue the ocrconfig replace ocrmirror filename
command to replace the existing mirror with a copy of your primary OCR.
In the example, filename can be a new file name if you decide to also relocate your OCR
mirror file. If it is the primary OCR file that is failing, and if your OCR mirror is still in good
health, you can use the ocrconfig replace ocr filename command instead.

Code
# ocrcheck
Status of Oracle Cluster Registry is as follows:
Version
:
2
Total space (kbytes)
:
200692
Used space (kbytes)
:
3752
Available space (kbytes) :
196940
ID
: 495185602
Device/File Name
: /oradata/OCR1
Device/File integrity check succeeded
Device/File Name
: /oradata/OCR2
Device/File needs to be synchronized with the other
device
# ocrconfig replace ocrmirror /oradata/OCR2

Supplement
Selecting the link title opens the resource in a new browser window.

Replacing corrupted OCR file


View more information about mirroring your OCR file and replacing a corrupted
OCR file.

Launch window
This example explains a replace scenario.
However, you can also use the ocrconfig or a similar command to add or remove
either the primary or the mirror OCR file. You can perform either of the following steps:

Code
# ocrcheck
Status of Oracle Cluster Registry is as follows:
Version
:
2
Total space (kbytes)
:
200692
Used space (kbytes)
:
3752
Available space (kbytes) :
196940
ID
: 495185602
Device/File Name
: /oradata/OCR1
Device/File integrity check succeeded
Device/File Name
: /oradata/OCR2
Device/File needs to be synchronized with the other
device
# ocrconfig replace ocrmirror /oradata/OCR2

execute ocrconfig replace ocr|ocrmirror filename adds the primary or mirror OCR file
to your environment if it does not already exist or

execute ocrconfig replace ocr|ocrmirror removes the primary or the mirror OCR file
Use the ocrconfig repair command to repair inconsistent OCR configuration
information.
The OCR configuration information is stored in these locations.

Graphic
The paths that store the OCR configuration information are the following:
/etc/oracle/ocr.loc on Linux and AIX
/var/opt/oracle/ocr.loc on Solaris and HP-UX, and
registry key HKEY_LOCAL_MACHINE\SOFTWARE\Oracle\ocr on Windows.
You may need to repair an OCR configuration on a particular node if your OCR
configuration changes while that node is stopped. For example, you may need to repair

the OCR on a node that was not up while you were adding, replacing, or removing an
OCR.
In this example, the OCR mirror file is added on the first node of your cluster while the
second node is not running Oracle Clusterware. You cannot perform this operation on a
node on which Oracle Clusterware is running.

Code
#
#
#
#

crsctl stop crs


ocrconfig replace ocrmirror /OCRMirror
ocrconfig repair ocrmirror /OCRMirror
crsctl start crs

Note
This repairs the OCR configuration information only, it does not repair OCR itself.
The following is a list of important considerations, when you use the ocrconfig
replace command:

if you are using raw devices, make sure that the file name exists before issuing an add or replace
operation using ocrconfig

to be able to execute an add, replace, or remove operation using ocrconfig, you must be
logged in as the root user

the OCR file that you are replacing can be either online or offline

if you remove a primary OCR file, then the mirrored OCR file becomes the primary OCR file, and

do not perform an OCR removal operation unless there is at least one other active OCR file
online

Supplement
Selecting the link title opens the resource in a new browser window.

Restore an OCR file


View information about restoring an OCR file.
Launch window

Question
What is an important consideration that should be taken into account when you
use the ocrconfig replace command?
Options:
1.

The OCR file that you are replacing must be offline

2.

Any user can execute an add operation using ocrconfig

3.

An OCR removal operation should not be performed unless at least one other OCR
file is online

4.

Removing the primary OCR file will result in Oracle Clusterware needing to be
reinstalled

Answer
Option 1: Incorrect. The OCR file that you are replacing can be either online or
offline.
Option 2: Incorrect. You must be the root user to be able to add, replace, or
remove an OCR file while using ocrconfig.
Option 3: Correct. Do not perform an OCR removal operation unless there is at
least one other active OCR file online.
Option 4: Incorrect. If you remove a primary OCR file, then the mirrored OCR file
becomes the primary OCR file.
Correct answer(s):
3. An OCR removal operation should not be performed unless at least one other
OCR file is online

Summary
When using less reliable storage, CSS availability can be improved by configuring it with
multiple voting disks located on physically independent storage devices. Use the v = f
*2+1 formula to determine the number of voting disks, however 32 is the maximum.
Cluster configuration information is maintained in the Oracle Cluster Registry or OCR.
OCR also maintains dependency and status information for application resources defined
within Oracle Clusterware, specifically databases, instances, services, and node
applications. The ocrconfig tool is the main configuration tool for OSR.

VIP Addresses and CRS Framework


Learning Objectives

After completing this topic, you should be able to

recognize the procedure for changing VIP addresses

recognize how to use the CRS framework to register an application

1. Changing the VIP address


The VIP address is a static IP address with a virtual host name defined and resolved
through either the DNS or your hosts file.
During Oracle Clusterware installation, you are prompted to enter a virtual IP and virtual
host name for each of the nodes in the cluster. These are stored in OCR, and different
components within the Oracle Clusterware HA framework depend on these VIPs.
If, for some reasons, you want to change the VIP address, use the following procedure on
each node, one at a time:

Code
$ ifconfig -a
$ srvctl stop instance -d DB -i DB1
$ srvctl stop asm -n node1
# srvctl stop nodeapps -n node1
$ ifconfig -a [ + $ crs_stat ]
/etc/hosts
# srvctl modify nodeapps -n node1 -A
192.168.2.125/255.255.255.0/eth0
# srvctl start nodeapps -n node1
confirm the current IP address
Confirm the current IP address for the VIP by running the ifconfig a command. On
Windows, run the ipconfig /all command. This should explain the current VIP bound
to one of the network interfaces.
stop all resources that are dependent on the VIP
Stop all resources that are dependent on the VIP on that node first, stop the database
instance, and then the ASM instance. When done, stop nodeapps.
verify that the VIP is no longer running

Verify that the VIP is no longer running by executing the ifconfig -a command again,
and confirm that its interface is no longer listed in the output. If the interface still shows as
online, this is an indication that a resource which is dependent on the VIP is still running.
The crs_stat -t command can help to show resources that are still online.
make any changes necessary to all nodes
Make any changes necessary to all nodes /etc/hosts files (on UNIX), or
\WINNT\System32\drivers\etc\hosts files on Windows, and make the necessary
DNS changes, to associate the new IP address with the old host name.
modify nodeapps
Modify nodeapps and provide the new virtual IP address. Use the srvctl modify
nodeapps command with the A option. This command should be run as root and you
specify the new IP address (192.168.2.125), then the corresponding netmask
(255.255.255.0), and the interface that you want the VIP to use (eth0).
start nodeapps again, and
Start nodeapps again after modifying nodeapps.
repeat the same steps
Repeat the same steps for all the nodes in the cluster. You can stay connected from the
first node because srvctl is a clusterwide management tool.
If only the IP address is changed, it is not necessary to make changes to the
listener.ora, tnsnames.ora and initialization parameter files, provided they are
using the virtual host names.
If changing both the virtual host name and the VIP address for a node, it will be
necessary to modify those files with the new virtual host name.
For the listener.ora file, you can use netca to remove the old listener and create a
new listener. In addition, changes will need to be made to the tnsnames.ora file of any
clients connecting to the old virtual host name.
When installing Oracle Clusterware and RAC, it is possible for you to specify wrong
information during the OUI interview regarding the public and interconnect interfaces that
Oracle Clusterware should use.
If that happens, Oracle Clusterware will be able to start at the end of the installation
process, but you might end up having trouble later communicating with other nodes in
your cluster.
If either the interface, IP subnet, or IP address for both your public network and
interconnect are incorrect or need to be changed, you should make the changes using
the Oracle Interface Configuration Tool or oifcfg because this will update the
corresponding OCR information.

In this example, both IP subnet for the public and private network are incorrect:

Code
$ <CRS HOME>/bin/oifcfg getif
eth0 139.2.156.0 global public
eth1 192.168.0.0 global cluster_interconnect
$ oifcfg delif -global eth0
$ oifcfg setif global eth0/139.2.166.0:public
$ oifcfg delif global eth1
$ oifcfg setif global eth1/192.168.1.0:cluster_interconnect
$ oifcfg getif
eth0 139.2.166.0 global public
eth1 192.168.1.0 global cluster_interconnect

you get the current interfaces information by using the getif option
This code is used to get the current interface information:
$ <CRS HOME>/bin/oifcfg getif
eth0 139.2.156.0 global public
eth1 192.168.0.0 global cluster_interconnect
Code
$ <CRS HOME>/bin/oifcfg getif
eth0 139.2.156.0 global public
eth1 192.168.0.0 global cluster_interconnect
$ oifcfg delif -global eth0
$ oifcfg setif global eth0/139.2.166.0:public
$ oifcfg delif global eth1
$ oifcfg setif global eth1/192.168.1.0:cluster_interconnect
$ oifcfg getif
eth0 139.2.166.0 global public
eth1 192.168.1.0 global cluster_interconnect

you delete the entry corresponding to public interface first by using the delif option, and then
enter the correct information by using the setif option
This code is used to delete the entry corresponding to public interface and enter the correct
information:

$ oifcfg delif -global eth0


$ oifcfg setif global eth0/139.2.166.0:public
Code
$ <CRS HOME>/bin/oifcfg getif
eth0 139.2.156.0 global public
eth1 192.168.0.0 global cluster_interconnect
$ oifcfg delif -global eth0
$ oifcfg setif global eth0/139.2.166.0:public
$ oifcfg delif global eth1
$ oifcfg setif global eth1/192.168.1.0:cluster_interconnect
$ oifcfg getif
eth0 139.2.166.0 global public
eth1 192.168.1.0 global cluster_interconnect

you do the same for your private interconnect, and


This code is used for private interconnect:
$ oifcfg delif global eth1
$ oifcfg setif global eth1/192.168.1.0:cluster_interconnect
Code
$ <CRS HOME>/bin/oifcfg getif
eth0 139.2.156.0 global public
eth1 192.168.0.0 global cluster_interconnect
$ oifcfg delif -global eth0
$ oifcfg setif global eth0/139.2.166.0:public
$ oifcfg delif global eth1
$ oifcfg setif global eth1/192.168.1.0:cluster_interconnect
$ oifcfg getif
eth0 139.2.166.0 global public
eth1 192.168.1.0 global cluster_interconnect

you check that the new information is correct


This code is used to check that the new information is correct:

$ oifcfg getif
eth0 139.2.166.0 global public
eth1 192.168.1.0 global cluster_interconnect
Code
$ <CRS HOME>/bin/oifcfg getif
eth0 139.2.156.0 global public
eth1 192.168.0.0 global cluster_interconnect
$ oifcfg delif -global eth0
$ oifcfg setif global eth0/139.2.166.0:public
$ oifcfg delif global eth1
$ oifcfg setif global eth1/192.168.1.0:cluster_interconnect
$ oifcfg getif
eth0 139.2.166.0 global public
eth1 192.168.1.0 global cluster_interconnect

Note
A network interface can be stored as a global interface or as a node-specific
interface. An interface is stored as a global interface when all the nodes of a RAC
cluster have the same interface connected to the same subnet (recommended). It
is stored as a node-specific interface only when there are some nodes in the
cluster that have a different set of interfaces and subnets.
Oracle Clusterware provides two publicly available components that can be used to help
protect any application on a cluster:
the High Availability framework and
The High Availability framework provides facilities to manage your applications under CRS
protection via command-line tools such as crs_register, crs_start, and crs_stop.
This framework is also used to automatically invoke control scripts that you created so that
CRS can start, stop, and monitor your applications. OCR is used as a repository to define
failover policies and other important parameters for CRS to control your applications.
the C API
The C API can be used to directly manipulate OCR to define how CRS should protect an
application. This API can be used to modify, at runtime, how the application should be
managed by CRS.

If the application you want CRS to protect is accessed by way of a network, you have the
possibility to create a Virtual Internet Protocol address for your application. This is
referred to as an application VIP.
Application VIPs created by Oracle Clusterware are able to fail over from one Network
Interface Card or NIC to another on the same node as well as from one NIC to another
one located on another node in case all public networks are down on a given node.
In addition, your application might need to store configuration files on a disk. To share
these files among nodes, Oracle Corporation also provides you with the Oracle Cluster
File System or OCFS.
Most of the differences between resources attached to application VIPs and RAC VIPs
reside in the fact that they are configured differently within Oracle Clusterware.
For example, it makes no sense from a RAC perspective to fail over either a database
instance or listener because there is already a listener and an instance waiting on another
node. Therefore, the listener does not listen on any other VIPs than the one node-specific
VIP.
Looking at the CRS profile of those resources, you will see the differences. Also, most of
the time, there are many applications attached to a RAC VIP such as listeners, database
instances, and ASM instances.
Although it is possible to associate an application VIP to multiple applications, this is not
recommended because if one of the applications cannot be started or restarted on a
node, it will be failed over to another node with the VIP, which in turn will force the other
applications to be also relocated.
This is especially true if the applications are independent.
However, one noticeable difference between a RAC VIP and an application VIP is that
after a RAC VIP is failed over to a surviving node, it no longer accepts connections
(NAK), thus forcing clients that are trying to access that address, to reconnect using
another address.
If it accepts new connections, then if a failback occurs, after the node is back again, then
current connections going through the VIP on the failed-over node are lost because the
interface is gone.
Application VIPs, on the other side, are fully functional after they are failed over, and
continue to accept connections. Application VIPs are mainly used when the application
cannot be restarted on a node.
RAC VIPs are mainly used when there is a node failure because clients can use other
nodes to connect.

Question
What is the first step that should be performed when changing the VIP address of
a cluster node in a Linux environment?
Options:
1.

Verify that the VIP is no longer running

2.

Stop all resources depending on the VIP

3.

Determine the interface used to support your VIP

4.

Start nodeapps and all resources depending on it

Answer
Option 1: Incorrect. This is the third step that should be performed when changing
the VIP address of a cluster node.
Verify that the VIP is no longer running by executing the ifconfig -a command
again, and confirm that its interface is no longer listed in the output. If the interface
still shows as online, this is an indication that a resource that is dependent on the
VIP is still running. The crs_stat -t command can help to show resources that
are still online.
Option 2: Incorrect. This is the second step that should be performed when
changing the VIP address of a cluster node. Stop all resources that are dependent
on the VIP on that node First, stop the database instance, and then the ASM
instance. When done, stop nodeapps.
Option 3: Correct. This is the first step that should be performed when changing
the VIP address of a cluster node. Confirm the current IP address for the VIP by
running the ifconfig a command. On Windows, run the ipconfig /all
command. This should show you the current VIP bound to one of the network
interfaces.
Option 4: Incorrect. This is the sixth, and final, step that should be performed
when changing the VIP address of a cluster node. After starting nodeapps, you
can begin the process again on another node.
Correct answer(s):
3. Determine the interface used to support your VIP

2. Registering an application using CRS

There are some basic steps you need to follow to register an application that is monitored
by the CRS framework. If your application is accessed via the network, and if you want
your application to be still available after some network problems, it is recommended that
you create an application VIP for your application.
First, you should create an application profile to define the network information relating to
this VIP for example, the name of the public network adapter to use, the IP address,
and the netmask. In the profile, you should also specify the usrvip
action script provided by Oracle Clusterware. You can then use the default values for the
failover policies.
Use the crs_register command to add this application VIP to the list of managed
applications.
On UNIX-based operating systems, the application VIP script must run as the root user.
So, using crs_setperm, you can change the owner of the VIP to root. Using the same
command tool, you can also enable another user, such as oracle, to start the
application VIP.
When done, you can use the crs_start command to start the VIP application.
You should then
create an action script
You can now create an action script to support the start, check, and stop actions on your
application.
create the profile for your application
Create the profile for your application. You should use enough resource attributes to define
at least the action script location and name, the check interval, the failover policies, and
the required application VIP resource (if necessary). You can manage application
availability by specifying starting resources during cluster or node startup, restarting
applications that fail, and relocating applications to other nodes if they cannot run in their
current location.
define under which user your application should be running
Like for the VIP application, you can define under which user your application should be
running as well as which user can start your application. That is why on UNIX-based
platforms, Oracle Clusterware must run as the root user, and on Windows-based
platforms, Oracle Clusterware must run as Administrator.
register your application, and
When done, you can register your application by using the crs_register command.
start your application

You are then ready to start your application that is going to be monitored by Oracle
Clusterware. Do this by executing the crs_start command.
Here is an example that protects the apache application using Oracle Clusterware:

Code
# crs_profile create AppVIP1 t application \
a <CRS HOME>/bin/usrvip \
o oi=eth0,ov=144.25.214.49,on=255.255.252.0
# crs_register AppVIP1
# crs_setperm AppVIP1 o root
# crs_setperm AppVIP1 u user:oracle:r-x
$ crs_start AppVIP1
create the AppVP1 application
You create the AppVP1 application VIP profile by using the crs_profile create
command. In order, the parameters specified in this example are the name of the
application VIP, the application type, the predefined action script usrvip located in <CRS
HOME>/bin, the name of the public network adapter, the VIP address used to locate your
application regardless of the node it is running on, and the netmask used for the VIP. The
result of this command is to create a text file called AppVP1.cap in <CRS
HOME>/crs/profile. This file contains the attributes and is read by crs_register. If
your session is not running as the root user, the .cap file is created in <CRS
HOME>/crs/public.
register your application
Use the crs_register command to register your application VIP with Oracle
Clusterware.
run as the root user
On UNIX-based operating systems, the application VIP action script must run as the root
user. As the root user, change the owner of the resource as shown using the
crs_setperm o command.
manage your application, and
As the root user, enable the oracle user to manage your application VIP via CRS
commands. Use the crs_setperm u command.
start the application
As the oracle user, start the application VIP using the crs_start command.
After the application VIP is functional, you can write the action script for your application.
The example can be used by Oracle Clusterware as an action script to protect the apache
application. It is a shell script that can parse one argument with three different values. It

uses the apachectl command tool to start and stop the apache application on your
node. It uses the wget command to check whether a Web page can be accessed.
These are the three actions CRS will perform while protecting your application. For the
next steps, it is supposed that this script is called myApp1.scr.

Code
#!/bin/sh
VIPADD=144.25.214.49
HTTDCONFLOC=/etc/httpd/conf/httpd.conf
WEBCHECK=http://$VIPADD:80/icons/apache_pb.gif
case $1 in
'start')
/usr/bin/apachectl k start f $HTTDCONFLOC
RET=$?
;;
'stop')
/usr/bin/apachectl k stop
RET=$?
;;
'check')
/usr/bin/wget q delete-after $WEBCHECK
RET=$?
;;
*)
RET=0
;;
esac
exit $RET

Note
Make sure you distribute this script on all nodes of your cluster in the same
location. The default location is assumed to be <CRS HOME>/crs/script in this
case.
You need to complete the remaining steps to register an application that is monitored by
the CRS framework.

Code
# crs_profile create myApp1 t application r AppVIP1 \
a myapp1.scr o ci=5,ra=2

# crs_register myApp1
# crs_setperm myApp1 o root
# crs_setperm myApp1 u user:oracle:r-x
$ crs_start myApp1
myapp1.scr
To create a profile for your application. Here your resource is called myApp1. It uses
myApp1.scr as its action script and depends on the AppVIP1 application. If AppVIP1 fails
or if it is relocated to another node, then Oracle Clusterware stops or moves the myApp1
application. The example also defines its check interval to be five seconds, and the
number of attempts to restart the application to two. This means that Oracle Clusterware
will fail over the application to another node after a second local failure happens.
crs_register
The crs_register command registers myApp1 with Oracle Clusterware.
crs_setperm myApp1 o
Because you want the apache server listening on the default port 80, you want the
application to execute as the root user. As the root user, change the owner of the
resource, using the crs_setperm o command.
crs_setperm myApp1 u
As the root user, enable the oracle user to manage your application VIP via CRS
commands. Use the crs_setperm u command.
crs_start
As the oracle user, start myApp1 by using the crs_start command.

Question
After creating an application VIP, what is the next step in registering an application
that is monitored by the CRS framework?
Options:
1.

Set permissions on the application

2.

Create the profile for the application

3.

Start the application by using crs_start command

4.

Create an action script to support the start, check, and stop actions on the
application

Answer
Option 1: Incorrect. This is the fourth step in registering an application that is
monitored by the CRS framework. Like for the VIP application, you can define
under which user your application should be running as well as which user can
start your application. Setting the permissions on the application occurs after
creating the profile for the application.
Option 2: Incorrect. This is the third step in registering an application that is
monitored by the CRS framework. You should use enough resource attributes to
define at least the action script location and name, the check interval, the failover
policies, and the required application VIP resource. Creating the profile for the
application occurs after creating an action script to support the start, check, and
stop actions on the application.
Option 3: Incorrect. This is the sixth, and final, step in registering an application
that is monitored by the CRS framework. After registering the application by using
the crs_register command, you are then ready to start the application that is
going to be monitored by Oracle Clusterware. Do this by executing the
crs_start command.
Option 4: Correct. After creating an application VIP, the next step is to create an
action script that accepts three parameters the start parameter should start
the application, the check parameter should confirm that the application is up,
and the stop parameter should stop the application.
Correct answer(s):
4. Create an action script to support the start, check, and stop actions on the
application

Summary
To change the VIP address, you have to enter the virtual IP and virtual host name, which
is stored in the OCR. Then, you should follow a set of procedures to change the VIP
address on each node. If you have incorrect values for your IP subnet, or IP address for
both your public network and interconnect, you should correct it using the Oracle
Interface Configuration Tool or oifcfg.
There's one difference between a RAC VIP and an application VIP. When a RAC VIP fails
over to a surviving node, it no longer accepts connections. However, if an application VIP
fails over a surviving node, it continues to accept connections.
After registering an application using the CRS framework and functional application VIP,
you can write action scripts for your application. You can also create a profile for your

application. Then, being a root user, you can enable the oracle user to manage your
application VIP through CRS commands.

RAC Backup and Recovery Settings and RMAN


Learning Objectives

After completing this topic, you should be able to

recognize RAC recovery and backup settings

identify how to use RMAN in a RAC environment

1. Recovering RAC and back up settings


RAC backup and recovery is almost identical to other Oracle Database backup and
recovery operations. This is because you are backing up and recovering a single
database.
The main difference is that with RAC you are dealing with multiple threads of redo log
files.
Although RAC provides you with methods to avoid or to reduce down time due to a failure
of one or more (but not all) of your instances, you must still protect the database itself,
which is shared by all the instances.
This means that you need to consider disk backup and recovery strategies for your
cluster database just as you would for a non-clustered database.
To minimize the potential loss of data due to disk failures, you may want to use disk
mirroring technology (available from your server or disk vendor). As in non-clustered
databases, you can have more than one mirror if your vendor allows it, to help reduce the
potential for data loss and to provide you with alternative backup strategies.
For example, with your database in ARCHIVELOG mode and with three copies of your
disks, you can remove one mirror copy and perform your backup from it while the two
remaining mirror copies continue to protect ongoing disk activity.
To do this correctly, you must first put the tablespaces into backup mode and then, if
required by your cluster or disk vendor, temporarily halt disk operations by issuing the
ALTER SYSTEM SUSPEND command.
After the statement completes, you can break the mirror and then resume normal
operations by executing the ALTER SYSTEM RESUME command and taking the
tablespaces out of backup mode.

During backup and recovery operations involving archived log files, the Oracle server
determines the file destinations and names from the control file.
If you use RMAN, the archived log file path names can also be stored in the optional
recovery catalog. However, the archived log file path names do not include the node
name, so RMAN expects to find the files it needs on the nodes where the channels are
allocated.
If you use a cluster file system, your instances can all write to the same archive log
destination. This is known as the cluster file system scheme. Backup and recovery of the
archive logs are easy because all logs are located in the same directory.
If a cluster file system is not available, then Oracle recommends that local archive log
destinations be created for each instance with NFS-read mount points to all other
instances. This is known as the local archive with Network File System or NFS scheme.
During backup, you can either back up the archive logs from each host or select one host
to perform the backup for all archive logs.
During recovery, one instance may access the logs from any host without having to first
copy them to the local destination. Using either scheme, you may want to provide a
second archive destination to avoid single points of failure.

Question
What are characteristics of using a cluster file system for backup and recovery
operations involving archived log files?
Options:
1.

Archive logs from each instance are written to the same file location

2.

Each instance can read mounted archive destinations of all instances

3.

A second archive destination should be provided to avoid single points of failure

4.

Local archive log destinations are created for each instance with NFS-read mount
points to all other instances

Answer
Option 1: Correct. If you use a cluster file system, your instances can all write to
the same archive log destination. This is known as the cluster file system scheme.
Back up and recovery of the archive logs are easy because all logs are located in
the same directory.
Option 2: Incorrect. When the local archive with Network File System (NFS)
scheme is used, you can either back up the archive logs from each host or select

one host to perform the backup for all archive logs. During recovery, one instance
may access the logs from any host without having to first copy them to the local
destination.
Option 3: Correct. Using either the cluster file system scheme or the local archive
with NFS scheme, you may want to provide a second archive destination to avoid
single points of failure.
Option 4: Incorrect. If a cluster file system is not available, then Oracle
recommends that local archive log destinations be created for each instance with
NFS-read mount points to all other instances. This is known as the local archive
with NFS scheme.
Correct answer(s):
1. Archive logs from each instance are written to the same file location
3. A second archive destination should be provided to avoid single points of failure
To use a flash recovery area in RAC, you must place it on an ASM disk group, a cluster
file system, or on a shared directory that is configured through certified NFS for each
RAC instance. That is, the flash recovery area must be shared among all the instances of
a RAC database.
You access the Cluster Database backup and recovery related tasks by clicking the
Availability tab on the Cluster Database home page.
On the Availability tabbed page, you can use RMAN to perform a range of backup and
recovery operations, such as scheduling backups, performing recovery when necessary,
and configuring backup and recovery settings.
There are also links related to Oracle Secure Backup and Service management.

Graphic
The Availability tabbed page contains a section named Backup/Recovery. This
section has two subsections named Setup and Manage. The Setup subsection
has three links Backup Settings, Recovery Settings, and Recovery Catalog
Settings. The Manage subsection has six links Schedule Backup, Manage
Current Backups, Backup Reports, Manage Restore Points, Perform Recovery,
and View and Manage Transactions.
You can use Enterprise Manager to configure important recovery settings for your cluster
database.
On the Cluster Database home page, click the Availability tab, and then click the
Recovery Settings link.

Graphic
The Recovery Settings page consists of sections that include Instance Recovery,
Media Recovery, and Flash Recovery.
From here, you can ensure that your database is in archivelog mode and configure flash
recovery settings. With a RAC database, if the Archive Log Destination setting is not the
same for all instances, the field appears blank, with a message indicating that instances
have different settings for this field.
In this case, entering a location in this field sets the archive log location for all instances
of the database. You can assign instance specific values for an archive log destination by
using the Initialization Parameters page.
You can run the ALTER DATABASE SQL statement to change the archiving mode in RAC
as long as the database is mounted by the local instance but not open in any instances.
You do not need to modify parameter settings to run this statement. Set the initialization
parameters DB_RECOVERY_FILE_DEST and DB_RECOVERY_FILE_DEST_SIZE to the
same values on all instances to configure a flash recovery area in a RAC environment.
For any archived redo log configuration, uniquely identify the archived redo logs with the
LOG_ARCHIVE_FORMAT parameter.
The format of this parameter is operating system specific and it can include text strings,
one or more variables, and a file name extension. All of the thread parameters, in either
upper or lower case, are mandatory for RAC.
This enables the Oracle Database to create unique names for archive logs across the
incarnation.

Graphic
The Archived Redo File Conventions in RAC table contains three columns named
Parameter, Description, and Example. The %r parameter is described as
Resetlogs identifier and its example is log_1_62_23452345, the %R parameter's
description is Padded resetlogs identifier with log_1_62_0023452345.
The %s parameter is the Log sequence number, not padded, for example
log_251. The %S parameter is Log sequence number, left-zero-padded, for
example log_0000000251, the %t parameter is a Thread number, not padded, for
example, log_1. And the %T parameter's description is provided as Thread
number, left-zero-padded, for example log_0001.

This requirement is in effect when the COMPATIBLE parameter is set to 10.0 or greater.
Use the %R or %r parameter to include the resetlogs identifier to avoid overwriting the
logs from a previous incarnation. If you do not specify a log format, then the default is
operating system specific and includes %t, %s, and %r.
As an example, if the instance associated with redo thread number 1 sets
LOG_ARCHIVE_FORMAT to log_%t_%s_%r.arc, then its archived redo log files are
named

log_1_1000_23435343.arc

log_1_1001_23452345.arc, and

log_1_1002_23452345.arc
Persistent backup settings can be configured using Enterprise Manager. On the Database
Control home page, click the Availability tab, and then click the Backup Settings link.
You can configure disk settings such as the directory location of your disk backups, and
level of parallelism.
You can also choose the default backup type:

Graphic
These default backup type options are displayed as radio buttons. This page also
contains the Parallelism and Disk Backup Location fields.

Backup Set

Compressed Backup Set, or

Image Copy
You can also specify important tape-related settings such as the number of available tape
drives and vendor-specific media management parameters.

2. Using RMAN in a RAC environment


Oracle Recovery Manager or RMAN can use stored scripts, interactive scripts, or an
interactive GUI front end. When using RMAN with your RAC database, use stored scripts
to initiate the backup and recovery processes from the most appropriate node.
If you use different Oracle Home locations for your RAC instances on each of your nodes,
create a snapshot control file in a location that exists on all your nodes. The snapshot
control file is only needed on the nodes on which RMAN performs backups. The snapshot

control file does not need to be globally available to all instances in a RAC environment
though.
You can use a cluster file or a shared raw device as well as a local directory that exists on
each node in your cluster. This is one example.
For recovery, you must ensure that each recovery node can access the archive log files
from all instances by using one of the archive schemes discussed earlier, or make the
archived logs available to the recovering instance by copying them from another location.

Code
RMAN> CONFIGURE SNAPSHOT CONTROLFILE TO
'/oracle/db_files/snaps/snap_prod1.cf';
The snapshot control file is a temporary file that RMAN creates to resynchronize from a
read-consistent version of the control file. RMAN needs a snapshot control file only when
resynchronizing with the recovery catalog or when making a backup of the current control
file.
In a RAC database, the snapshot control file is created on the node that is making the
backup. You need to configure a default path and file name for these snapshot control
files that are valid on every node from which you might initiate an RMAN backup.
Run this RMAN command to determine the configured location of the snapshot control
file.

Code
RMAN> SHOW SNAPSHOT CONTROLFILE NAME;
/u01/app/oracle/product/11.1.0/dbs/scf/snap_prod.cf
You can change the configured location of the snapshot control file. For example, on
UNIX-based systems you can specify the snapshot control file location as
snap_prod.cf located in the ASM disk group +FRA by entering the following at the
RMAN prompt.
This command globally sets the configuration for the location of the snapshot control file
throughout your cluster database.

Code
RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO
'+FRA/SNAP/snap_prod.cf';
RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO

'/ocfs/oradata/dbs/scf/snap_prod.cf';
RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO
'/dev/sdj2';

Note
The CONFIGURE command creates persistent settings across RMAN sessions.
If you set CONFIGURE CONTROLFILE AUTOBACKUP to ON, RMAN automatically creates a
control file and an SPFILE backup after you run the BACKUP or COPY command.
RMAN can also automatically restore an SPFILE if required to start an instance to
perform recovery. This means that the default location for the SPFILE must be available
to all nodes in your RAC database.
These features are important in disaster recovery because RMAN can restore the control
file even without a recovery catalog. RMAN can restore an autobackup of the control file
even after the loss of both the recovery catalog and the current control file.

Code
RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON;
You can change the default name that RMAN gives to this file with the CONFIGURE
CONTROLFILE AUTOBACKUP FORMAT command. If you specify an absolute path name in
this command, this path must exist identically on all nodes that participate in backups.

Code
RMAN> CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR
DEVICE TYPE DISK TO '+FRA';

Note
RMAN performs the control file autobackup on the first allocated channel. When
you allocate multiple channels with different parameters (especially if you allocate
a channel with the CONNECT command), you must determine which channel will
perform the automatic backup. Always allocate the channel for the connected
node first.
When cross-checking on multiple RAC nodes, configure the cluster so that all backups
can be accessed by every node, regardless of which node created the backup. When the
cluster is configured this way, you can allocate channels at any node in the cluster during
restore or cross-check operations.

If you cannot configure the cluster so that each node can access all backups, then, during
restore and cross-check operations, you must allocate channels on multiple nodes by
providing the CONNECT option to the CONFIGURE CHANNEL command so that every
backup can be accessed by at least one node.
If some backups are not accessible during cross-check because no channel was
configured on the node that can access those backups, the backups are marked
EXPIRED in the RMAN repository after the cross-check.
For example, you can use CONFIGURE CHANNEL . . . CONNECT in an Oracle RAC
configuration in which tape backups are created on various nodes in the cluster and each
backup is accessible only on the node on which it is created.
When making backups in parallel, RMAN channels can connect to a different instance in
the cluster.
The two possible configurations are

Code
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
CONFIGURE CHANNEL 1 DEVICE TYPE sbt
CONNECT='sys/rac@RACDB1';
CONFIGURE CHANNEL 2 DEVICE TYPE sbt
CONNECT='sys/rac@RACDB2';
CONFIGURE CHANNEL 3 DEVICE TYPE sbt
CONNECT='sys/rac@RACDB3';
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
CONFIGURE CHANNEL DEVICE TYPE sbt CONNECT='sys/rac@BR';
dedicate channels to specific instances and
If you want to dedicate channels to specific instances, you can control at which instance
the channels are allocated by using separate connect strings for each channel
configuration as explained in this example.
define a special service for your backup and recovery jobs
If you define a special service for your backup and recovery jobs, you can use the second
example. If you configure this service with load balancing turned on, the channels are
allocated at a node as decided by the load-balancing algorithm.
During backup, the instances to which the channels connect must be either all mounted or

all open. For example, if the RACDB1 instance has the database mounted but the RACDB2
and RACDB3 instances have the database open, the backup fails.
In some cluster database configurations, some nodes of the cluster have faster access to
certain data files than to other data files. RMAN automatically detects this, which is known
as node affinity awareness.
When deciding which channel to use to back up a particular data file, RMAN gives
preference to the nodes with faster access to the data files that you want to back up.
For example, if you have a three-node cluster, and if node 1 has faster read/write access
to data files 7, 8, and 9 than do the other nodes, then node 1 has greater node affinity to
those files than nodes 2 and 3 and RMAN will take advantage of this automatically.

Code
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
CONFIGURE CHANNEL 1 DEVICE TYPE sbt
CONNECT='sys/rac@RACDB1';
CONFIGURE CHANNEL 2 DEVICE TYPE sbt
CONNECT='sys/rac@RACDB2';
CONFIGURE CHANNEL 3 DEVICE TYPE sbt
CONNECT='sys/rac@RACDB3';
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
CONFIGURE CHANNEL DEVICE TYPE sbt CONNECT='sys/rac@BR';
In Oracle Database 10g, RAC allows the use of nondeterministic connect strings that can
connect to different instances based on RAC features such as load balancing.
Therefore, to support RAC, the RMAN polling mechanism no longer depends on
deterministic connect strings, and makes it possible to use RMAN with connect strings
that are not bound to a specific instance in the grid environment.

Code
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
Previously, if you wanted to use RMAN parallelism and spread a job between many
instances, you had to manually allocate an RMAN channel for each instance.
In Oracle Database 10g, to use dynamic channel allocation, you do not need separate

CONFIGURE CHANNEL CONNECT statements anymore. You only need to define your
degree of parallelism by using a command such as CONFIGURE DEVICE TYPE disk
PARALLELISM, and then run backup or restore commands.
RMAN then automatically connects to different instances and does the job in parallel. The
grid environment selects the instances that RMAN connects to, based on load balancing.

Code
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
As a result of this, configuring RMAN parallelism in a RAC environment becomes as
simple as setting it up in a non-RAC environment. By configuring parallelism when
backing up or recovering a RAC database, RMAN channels are dynamically allocated
across all RAC instances.

Code
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE DEVICE TYPE sbt PARALLELISM 3;

Note
RMAN has no control over the selection of the instances. If you require a
guaranteed connection to an instance, you should provide a connect string that
can connect only to the required instance.

Question
What are the benefits of RMAN channel support for the GRID in Oracle Database
10g in a RAC environment?
Options:
1.

RAC allows the use of nondeterministic connect strings

2.

RMAN has complete control over the selection of the instances

3.

It simplifies the use of parallelism with RMAN in a RAC environment

4.

It is not based on the load-balancing characteristics of the grid environment

Answer

Option 1: Correct. In Oracle Database 10g, RAC allows the use of


nondeterministic connect strings that can connect to different instances based on
RAC features such as load balancing. Therefore, to support RAC, the RMAN
polling mechanism no longer depends on deterministic connect strings, and
makes it possible to use RMAN with connect strings that are not bound to a
specific instance in the grid environment.
Option 2: Incorrect. RMAN has no control over the selection of the instances. If
you require a guaranteed connection to an instance, you should provide a connect
string that can connect only to the required instance.
Option 3: Correct. In previous versions of Oracle Database, if you wanted to use
RMAN parallelism and spread a job between many instances, you had to
manually allocate an RMAN channel for each instance.
In Oracle Database 10g, to use dynamic channel allocation, you do not need
separate CONFIGURE CHANNEL CONNECT statements anymore. You only need to
define your degree of parallelism by using a command such as CONFIGURE
DEVICE TYPE disk PARALLELISM, and then run backup or restore commands.
RMAN then automatically connects to different instances and does the job in
parallel.
Option 4: Incorrect. The grid environment selects the instances that RMAN
connects to, based on load balancing. As a result of this, configuring RMAN
parallelism in a RAC environment becomes as simple as setting it up in a nonRAC environment. By configuring parallelism when backing up or recovering a
RAC database, RMAN channels are dynamically allocated across all RAC
instances.
Correct answer(s):
1. RAC allows the use of nondeterministic connect strings
3. It simplifies the use of parallelism with RMAN in a RAC environment
Recovery Manager automatically discovers which nodes of a RAC configuration can
access the files that you want to back up or restore.
Recovery Manager autolocates these files:

backup pieces during backup or restore

archived redo logs during backup, and

data file or control file copies during backup or restore

If you use a noncluster file system local archiving scheme, then a node can read only
those archived redo logs that were generated by an instance on that node.
RMAN never attempts to back up archived redo logs on a channel that it cannot read.
During a restore operation, RMAN automatically performs the autolocation of backups. A
channel connected to a specific node attempts to restore only those files that were
backed up to the node.
For example, assume that log sequence 1001 is backed up to the drive attached to
Node1, whereas log 1002 is backed up to the drive attached to Node2. If you then
allocate channels that connect to each node, then the channel connected to Node1 can
restore log 1001 (but not 1002), and the channel connected to Node2 can restore log
1002 (but not 1001.)

Summary
RAC backup and recovery is similar to that of Oracle Database backup and recovery
operations, except that RAC deals with multiple threads of redo log files. You can use
Enterprise Manager to configure important recovery settings for your cluster database.
You need to consider disk backup and recovery strategies for your cluster database just
as you would for a non-clustered database.
You can use RMAN with stored scripts, interactive scripts, or an interactive GUI front end.
If you use different Oracle Home locations for your RAC instances on each of your nodes,
create a snapshot control file in a location that exists on all your nodes. The snapshot
control file is a temporary file that RMAN creates to resynchronize from a read-consistent
version of a control file.

Configure RAC Backup


Learning Objective

After completing this topic, you should be able to

recognize how to configure backup and recovery in RAC

1. Distributing backups
When configuring the backup options for RAC, you can have the following configurations:
network backup server
Network backup server is a dedicated backup server that performs and manages backups
for the cluster and the cluster database. None of the nodes have local backup appliances.

one local drive, and


In one local drive one node has access to a local backup appliance and performs and
manages backups for the cluster database. All nodes of the cluster should be on a cluster
file system to be able to read all data files, archived redo logs, and SPFILEs. It is
recommended that you do not use the noncluster file system archiving scheme if you have
backup media on only one local drive.
multiple drives
In multiple drives, each node has access to a local backup appliance and can write to its
own local backup media.
In the cluster file system scheme, any node can access all the data files, archived redo
logs, and SPFILEs.
In the noncluster file system scheme, you must write the backup script so that the backup
is distributed to the correct drive and path for each node.
For example, node 1 can back up the archived redo logs whose path names begin with
/arc_dest_1, node 2 can back up the archived redo logs whose path names begin with
/arc_dest_2, and node 3 can back up the archived redo logs whose path names begin
with /arc_dest_3.
In a cluster file system backup scheme, each node in the cluster has read access to all
the data files, archived redo logs, and SPFILEs. This includes Automated Storage
Management or ASM, cluster file systems and Network Attached Storage or NAS.
When backing up to only one local drive in the cluster file system backup scheme, it is
assumed that only one node in the cluster has a local backup appliance such as a tape
drive. In this case, run these one-time configuration commands.

Code
RMAN> CONFIGURE DEVICE TYPE sbt PARALLELISM 1;
RMAN> CONFIGURE DEFAULT DEVICE TYPE TO sbt;
Because any node performing the backup has read/write access to the archived redo logs
written by the other nodes, the backup script for any node is simple.
In this case, the tape drive receives all data files, archived redo logs, and SPFILEs.

Code
RMAN> BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;

When backing up to multiple drives in the cluster file system backup scheme, it is
assumed that each node in the cluster has its own local tape drive.
You need to perform a one-time configuration so that one channel is configured for each
node in the cluster. This is a one-time configuration step. For example, enter this at the
RMAN prompt.

Code
CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT
'user1/passwd1@node1';
CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT
'user2/passwd2@node2';
CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT
'user3/passwd3@node3';
Similarly, you can perform this configuration for a device type of DISK. This backup script,
which you can run from any node in the cluster, distributes the data files, archived redo
logs, and SPFILE backups among the backup drives.
For example, if the database contains 10 data files and 100 archived redo logs are on
disk, then the node 1 backup drive can back up data files 1, 3, and 7 and logs 1-33. Node
2 can back up data files 2, 5, and 10 and logs 34-66. The node 3 backup drive can back
up data files 4, 6, 8, and 9 as well as archived redo logs 67-100.

Code
BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;
In a noncluster file system environment, each node can back up only its own local
archived redo logs. For example, node 1 cannot access the archived redo logs on node 2
or node 3 unless you configure the network file system for remote access.
To configure NFS, distribute the backup to multiple drives. However, if you configure NFS
for backups, then you can back up to only one drive.
When backing up to multiple drives in a noncluster file system backup scheme, it is
assumed that each node in the cluster has its own local tape drive. You can perform a
similar one-time configuration to configure one channel for each node in the cluster.

Graphic

The code to perform a one-time configuration to configure one channel for each
node in the cluster is the following:
CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT 'usr1/pwd1@n1';
CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'usr2/pwd2@n2';
CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT 'usr3/pwd3@n3';
Similarly, you can perform this one-time configuration for a device type of DISK. Develop
a production backup script for whole database backups that you can run from any node.
With the BACKUP example, the data file backups, archived redo logs, and SPFILE
backups are distributed among the different tape drives. However, channel 1 can read
only the logs archived locally on /arc_dest_1.
This is because the autolocation feature restricts channel 1 to back up only the archived
redo logs in the /arc_dest_1 directory. Because node 2 can read files only in the
/arc_dest_2 directory, channel 2 can back up only the archived redo logs in the
/arc_dest_2 directory, and so on. The important point is that all logs are backed up, but
they are distributed among the different drives.

Graphic
The code to perform one-time configuration for a device type of DISK is the
following:
BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;
Media recovery of a database that is accessed by RAC may require at least one archived
log file for each thread. However, if a thread's online redo log contains enough recovery
information, restoring archived log files for any thread is unnecessary.
If you use RMAN for media recovery and you share archive log directories, you can
change the destination of the automatic restoration of archive logs with the SET clause to
restore the files to a local directory of the node where you begin recovery.
If you backed up the archive logs from each node without using a central media
management system, you must first restore all the log files from the remote nodes and
move them to the host from which you will start recovery with RMAN.
However, if you backed up each node's log files using a central media management
system, you can use RMAN's AUTOLOCATE feature. This enables you to recover a
database using the local tape drive on the remote node.

If recovery reaches a time when an additional thread was enabled, the recovery process
requests the archived log file for that thread. If you are using a backup control file, when
all archive log files are exhausted, you may need to redirect the recovery process to the
online redo log files to complete recovery.
If recovery reaches a time when a thread was disabled, the process informs you that the
log file for that thread is no longer needed.

Question
Which statement about media recovery of a database that is accessed by RAC is
true?
Options:
1.

Media recovery requires an archived log file from each thread

2.

Backups must have been performed using a central media management system for
recovery to be possible

3.

Recovery processes request additional threads enabled during the recovery period

4.

Manual checks will need to be done to ensure if any threads have been disabled

Answer
Option 1: Incorrect. Media recovery of a database that is accessed by RAC may
require at least one archived log file for each thread. However, if a threads online
redo log contains enough recovery information, restoring archived log files for any
thread is unnecessary.
Option 2: Incorrect. If you backed up the archive logs from each node without
using a central media management system, you must first restore all the log files
from the remote nodes and move them to the host from which you will start
recovery with RMAN.
Option 3: Correct. If recovery reaches a time when an additional thread was
enabled, the recovery process requests the archived log file for that thread.
Option 4: Incorrect. If recovery reaches a time when a thread was disabled, the
process informs you that the log file for that thread is no longer needed.
Correct answer(s):
3. Recovery processes request additional threads enabled during the recovery
period

Summary

RAC allows three possible configurations for its backup options network backup server,
one local drive, and multiple drives. In a cluster file system, any node can access all data
files, archived redo logs, and SPFILEs. In the noncluster file system scheme, you must
write the backup script so that the backup is distributed to the correct drive and path for
each node.

Configuring Backup and Recovery in RAC


Learning Objective

After completing this topic, you should be able to

perform RAC backup and recovery using Enterprise Manager

Exercise overview
You are the database administrator for an Oracle Database 11g system and you need to
implement a backup and recovery strategy for the RDBB1 cluster database. You want to
enable the archiving of the redo logs in order to guarantee that you can recover all
committed transactions in the event of an operating system or disk failure. You also want
to configure backup settings in support of RAC, configure an RMAN recovery catalog,
and schedule a full database backup.
In this exercise, you're required to set ARCHIVELOG mode, configure backup settings,
configure RMAN, and perform a database backup.
This involves the following tasks:

setting ARCHIVELOG mode

configuring backup settings

configuring RMAN

performing a database backup

Task 1: Setting ARCHIVELOG mode


Use Enterprise Manager to place the database in ARCHIVELOG mode. Enable
Flashback Recovery and adjust the Flash Recovery Area Size to 1 GB. Specify
"administrator" as the username and "p@ssworD" as the password for the cluster
credentials. Specify "sys" as the username and "oracle" as the password for the database
credentials. Click Refresh on the Request in Process page to complete your
configuration.

Steps list
Instructions
1. Click the Availability tab
2. Click the Recovery Settings hyperlink in the Backup/Recovery Setup section
3. Select the ARCHIVELOG Mode* checkbox
4. Select the Enable Flashback Database - flashback logging can be used for fast database point-in-time
recovery* checkbox
5. Type 1 in the Flash Recovery Area Size text box and click the scroll bar down arrow
6. Click Apply and click Yes
7. Type administrator in the Username text box and p@ssworD in the Password text box for the cluster
credentials
8. Type sys in the Username text box and oracle in the Password text box for the database credentials
9. Click Continue
10. Click Refresh

Task 2: Configuring backup settings


Configure backup settings in support of RAC. Set disk parallelism to two and the backup
type to compressed. Specify that the backup policy should include autobackups of the
control file and spfile for every backup and unchanged files should be skipped. Specify a
host username of "administrator" and password of "p@ssworD."

Steps list
Instructions
1. Click the Backup Settings hyperlink in the Backup/Recovery Setup section
2. Type 2 in the Parallelism text box
3. Select the Compressed Backup Set radio button
4. Type administrator in the Username text box
5. Type p@ssworD in the Password text box
6. Click the Policy tab
7. Select the Automatically backup the control file and server parameter file (SPFILE) with every backup and
database structural change checkbox
8. Select the Optimize the whole database backup by skipping unchanged files such as read-only and
offline datafiles that have been backed up checkbox and click OK

Task 3: Configuring RMAN

Create a RMAN recovery catalog and configure it to be used for database RDBB1, which
is hosted on sqldb0773 and uses port 1521. Specify "rman" for the Recovery Catalog
username and password. Specify "administrator" as the host username and "p@ssworD"
as the host password. You have already created the recovery catalog and the rman user.

Steps list
Instructions
1. Click Recovery Catalog Settings hyperlink in the Backup/Recovery Setup section
2. Click the Add Recovery Catalog button
3. Type sqldb0773 in the Host text box, type 1521 in the Port text box, and type RDBB1 in the SID text box
4. Type rman in the Recovery Catalog Username text box, type rman in the Recovery Catalog Password text box,
and click Next
5. Click Finish
6. Select the Use Recovery Catalog radio button and ensure sqldb0773:1521:RDBB1 is selected from the
Recovery Catalog drop-down list
7. Type administrator in the Username text box
8. Type p@ssworD in the Password text box and click OK

Task 4: Performing a database backup


Perform a one-time, full database backup. You want the backup to run after business
hours when the system is not busy. Schedule the backup to run in the evening at 11:00
PM. Specify "administrator" as the host username and "p@ssworD" as the host
password.

Steps list
Instructions
1. Click the Schedule Backup hyperlink in the Manage section
2. Type administrator in the Username text box, type p@ssworD in the Password text box, and click the scroll
bar up arrow
3. Click the Schedule Customized Backup button
4. Click Next twice
5. Select the One Time (Later) radio button
6. Type 11 in the Start Time hour text box and type 00 in the Start Time minutes text box
7. Click Next
8. Click the Submit Job button

Diagnosing Oracle Clusterware Components


Learning Objective

After completing this topic, you should be able to

recognize how to carry out diagnostics-related activities on Oracle Clusterware components

1. Diagnostic data management in RAC


It is strongly recommended to set up Network Time Protocol or NTP on all cluster nodes,
even before you install RAC. This will synchronize the clocks among all nodes, and
facilitate analysis of tracing information based on time stamps as well as results from
queries issued on GV$ views.
Adjusting clocks by more than 15 minutes can cause instance evictions. It is strongly
advised to shut down all instances before date/time adjustments.

Supplement
Selecting the link title opens the resource in a new browser window.

Style Considerations
View more information on the style considerations for Oracle 11g Database used
in this course.
Launch window
Oracle Clusterware uses a unified log directory structure to consolidate the Oracle
Clusterware component log files. This consolidated structure simplifies diagnostic
information collection and assists during data retrieval and problem analysis.
The following are the main directories used by Oracle Clusterware to store its log files:

CRS logs are in this directory. The crsd.log file is archived every 10 MB (crsd.l01,
crsd.l02, ).
ORA_CRS_HOME directory includes log files such as crsd, cssd, evmd, racg, and client. The racg
log file includes racgeut, racgevtf, and racgmain. CRS logs are in the
$ORA_CRS_HOME/log/<hostname>/crsd/ directory.

CSS logs are in this directory. The cssd.log file is archived every 20 MB (cssd.l01,
cssd.l02, ).
CSS logs are in the $ORA_CRS_HOME/log/<hostname>/cssd/ directory.

EVM logs are in this directory.


EVM logs are in the $ORA_CRS_HOME/log/<hostname>/evmd/ directory.
The racg executable subdirectory is another directory used by Oracle Clusterware to
store its log files. Depending on the resource, specific logs are stored in these directories.
In the last directory, imon_<service>.log is archived every 10 MB for each service.
Each RACG executable has a subdirectory assigned exclusively for that executable. The
name of the racg executable subdirectory is the same as the name of the executable.

Graphic
Specific logs are in the
$ORA_CRS_HOME/log/<hostname>/racg/ and the
$ORACLE_HOME/log/<hostname>/racg/ directories.
There are a few more main directories used by Oracle Clusterware to store these logs
and alerts:

the SRVM (srvctl) and OCR (ocrdump, ocrconfig, ocrcheck) logs are in these directories
and
SRVM and OCR logs are in the $ORA_CRS_HOME/log/<hostname>/client/ and the
$ORACLE_HOME/log/<hostname>/client/ directories.

important Oracle Clusterware alerts can be found in alert<nodename>.log in this directory


Important Oracle Clusterware alerts can be found in the $ORA_CRS_HOME/log/<hostname>
directory.
Use the diagcollection.pl script to collect diagnostic information from an Oracle
Clusterware installation. The diagnostics provide additional information so that Oracle
Support can resolve problems. This script is located in $ORA_CRS_HOME/bin. Before
executing the script, you must be logged in as the root user, and you must set these
environment variables - ORACLE_BASE, ORACLE_HOME, ORA_CRS_HOME, and
HOSTNAME.
This example describes how to invoke the script to collect the diagnostic information.
When invoked with the -collect option, the script generates the four files in the local
directory. Mainly, basData.tar.gz contains log files from the $ORACLE_BASE/admin
directory. crsData.tar.gz contains log files from
$ORA_CRS_HOME/log/<hostname>.

Code

#
#
#
#
#

export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
export ORA_CRS_HOME=/u01/crs1020
export ORACLE_BASE= =/u01/app/oracle
cd $ORA_CRS_HOME/bin
./diagcollection.pl -collect

The ocrData.tar.gz files contain the results of an ocrdump, ocrcheck, and the list
of ocr backups. oraData.tar.gz contains log files from
$ORACLE_HOME/log/<hostname>. If you invoke the script with the -collect option,
and you already have the four files generated from a previous run in the local directory,
the script asks you if you want to overwrite the existing files.
You can also invoke the script with the -clean option to clean out the files generated
from a previous run in your local directory. Alternatively, you can invoke the script to just
capture a subset of the log files. You can do so by adding extra options after the
-collect option. These options include -crs for collecting Oracle Clusterware logs,
-oh for collecting ORACLE_HOME logs, -ob for collecting ORACLE_BASE logs, or -all for
collecting all logs.
The -all option is the default. The -coreanalyze option enables you to extract to text
files only core files found in the generated files.

Code
#
#
#
#
#

export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
export ORA_CRS_HOME=/u01/crs1020
export ORACLE_BASE= =/u01/app/oracle
cd $ORA_CRS_HOME/bin
./diagcollection.pl -collect

Problems that span Oracle RAC instances can be the most difficult types of problems to
diagnose. For example, you may need to correlate the trace files from across multiple
instances, and merge the trace files.
Oracle Database Release 11g includes an advanced fault diagnosibility infrastructure for
collecting and managing diagnostic data, and uses the Automatic Diagnostic Repository
or ADR file-based repository for storing the database diagnostic data.
When you create the ADR base on a shared disk, you can place ADR homes for all
instances of the same Oracle RAC database and the corresponding ASM instances under
the same ADR Base. With shared storage, you can use the ADRCI command-line tool to
correlate diagnostics across all instances because some ADRCI commands (such as
SHOW INCIDENT) can work with multiple ADR homes simultaneously.

Note

Although not required, it is recommended that you share ADR base with your RAC
databases. However, if you are using shared Oracle homes, you must share your
ADR base.
This example displays the diagnostic architecture of RAC instances in your cluster. This
architecture enables trace processing to incur very low overhead. The DIAG process was
introduced in Oracle9i database to manage all diagnostics-related activities, acting as a
broker between online debugging tools and regular database processes.
All debugging commands are issued through the DIAG process on the same node to
reach their intended targets. This DIAG process then coordinates with DIAG processes
on other nodes of the same cluster to complete the commands. Activities such as setting
trace levels, archiving the in-memory trace logs to files, and taking memory/crash dumps
are done by the DIAG processes resulting in very little overhead to the database server.
By default, minimal tracing is always on for foreground and background processes, and
all trace information is written into in-memory buffers within the System Global Area or
SGA instead of being written into files directly. Via the online diagnostic tools, you can
instruct DIAG to set trace levels, archive the in-memory trace logs to files, and take
memory dumps. This can be done for one or all processes on all instances.
Offline tools then transform the archived logs into human-readable formats, load them
into database for query, or display them with the GUI interface of the trace navigation tool
used by Oracle Support. All these trace files have .trw as their file extension, so they can
be distinguished from regular process trace files. Also these trace files are circular, similar
to the memory buffers to limit the file size.

2. Cluster verify stages and components


Cluster Verification Utility or CVU is provided with Oracle Clusterware and Oracle
Database 10g Release 2 (10.2) with Real Application Clusters.
The purpose of CVU is to enable you to verify during setup and configuration that all
components required for a successful installation of Oracle Clusterware and a RAC
database are installed and configured correctly, and to provide you with ongoing
assistance any time you need to make changes to your RAC cluster.
The two types of CVU commands include
stage commands and
Stage commands are CVU commands that are used to test system setup and readiness
for successful software installation, database creation, or configuration change steps.
These commands are also used to validate successful completion of specific cluster
configuration steps.
component commands

Component commands are CVU commands that are used to check individual cluster
components, and determine their state.
It is recommended to use stage checks during the installation of Oracle Clusterware and
RAC.
In addition, you can use CVU to verify a particular component while the stack is running
or to isolate a cluster subsystem for diagnosis. During the diagnostic mode of operation,
CVU tries to establish a reason for the failure of any verification task to help diagnose a
problem.

Note
CVU is a nonintrusive tool in the sense that it does not try to fix any issues it finds.

Question
Which are true statements about the use of Cluster Verification Utility or CVU?
Options:
1.

There are two types of CVU commands

2.

It is recommended that you use component checks during the installation of Oracle
Clusterware and RAC

3.

The CVU will help locate and fix any issues that may potentially cause problems
during the installation of Oracle Clusterware and RAC

4.

The CVU can be used before and after the installation of your RAC cluster to ensure
all components are installed and configured correctly

Answer
Option 1: Correct. There are two types of CVU commands, stage and component.
Stage commands are CVU commands used to test system setup and readiness
for successful software installation, database creation, or configuration change
steps. Component commands are CVU commands used to check individual
cluster components and determine their state.
Option 2: Incorrect. It is recommended that you use stage checks during the
installation of Oracle Clusterware and RAC. Stage commands are CVU
commands used to test system setup and readiness for successful software
installation, database creation, or configuration change steps. These commands
are also used to validate successful completion of specific cluster configuration
steps.

Option 3: Incorrect. CVU is a nonintrusive tool in the sense that it does not try to
fix any issues it finds.
Option 4: Correct. The purpose of CVU is to enable you to verify during setup and
configuration that all components required for a successful installation of Oracle
Clusterware or Oracle Clusterware and a RAC database are installed and
configured correctly. It also provides ongoing assistance any time you need to
make changes to your RAC cluster.
Correct answer(s):
1. There are two types of CVU commands
4. The CVU can be used before and after the installation of your RAC cluster to
ensure all components are installed and configured correctly
A stage is a specific phase of an Oracle Clusterware or RAC deployment. Before
performing any operations in a stage, a predefined set of checks must be performed to
ensure the readiness of cluster for that stage. These checks are known as pre-checks for
that stage.
Similarly, a predefined set of checks must be performed after completion of a stage to
ensure the correct execution of operations within that stage. These checks are known as
post-checks for that stage.
You can list verifiable stages with the cluvfy stage -list command. All stages have
pre or post steps and some stages have both. There are valid stage options and stage
names.
-post hwos
The -post hwos stage option is the post-check for the hardware and operating system.
-pre cfs
The -pre cfs stage option is the pre-check for CFS setup.
-post cfs
The -post cfs stage option is the post-check for CFS setup.
-pre crsinst
The -pre crsinst stage option is the pre-check for CRS installation.
-post crsinst
The -post crsinst stage option is the post-check for CRS installation.
-pre dbinst
The -pre dbinst stage option is the pre-check for database installation.
-pre dbcfg

The -pre dbcfg stage option is the post-check for database installation.
CVU supports the notion of component verification. The verifications in this category are
not associated with any specific stage. A component can range from a basic one, such as
free disk space, to a complex one (spanning over multiple subcomponents), such as the
Oracle Clusterware stack. Availability, integrity, or any other specific behavior of a cluster
component can be verified.
You can list verifiable CVU components with the cluvfy comp -list command.
These are the verifiable CVU components:

nodereach that checks the reachability between nodes

nodecon, which checks the node connectivity

cfs that checks the Oracle Cluster File System integrity the sharedness check for the file
system is supported for OCFS2 versions 1.2.1 and later

ssa, which checks the shared storage accessibility

space that checks the space availability, and

sys, which checks the minimum system requirements


The verifiable CVU components also include

clu, which checks the cluster integrity

clumgr that checks the cluster manager integrity

ocr that checks the OCR integrity

crs, which checks the CRS integrity

nodeapp that checks the node application's existence

admprv, which checks the administrative privileges, and

peer that compares the properties with peers

3. Cluster verify configuration

The Cluster Verification Utility or CVU is first released in Oracle Clusterware release
10.2.0.1.0. CVU supports 11gR1, 10gR2 as well as 10gR1 for Oracle Clusterware and
RAC products. CVU is available in three different forms:
form 1
It is available on Oracle Technology Network or OTN. From there, you need to download
the package and unzip it to a local directory <cvhome>. You can use the cluvfy
command from the <cvhome>/bin.
Optionally, you can set the CV_DESTLOC environment variable. This should point to a
writable area on all nodes. CVU attempts to copy the necessary bits as required to this
location. If this variable is not set, CVU uses /tmp as the default.
form 2, and
It is available in 11.1 Oracle software DVD as packaged version. Make use of
runcluvfy.sh, which is needed when nothing is installed. You can find it in Disk1.
form 3
It is installed on both 11.1 Oracle Clusterware and RAC homes. Make use of cluvfy if the
CRS software stack is installed. If the CRS software is installed, you can find cluvfy
under $ORA_CRS_HOME/bin.
For manual installation, you need to install CVU on only one node. CVU deploys itself on
remote nodes during executions that require access to remote nodes.
You can use the CVU's configuration file to define specific inputs for the execution of the
CVU. The path for the configuration file is $CV_HOME/cv/admin/cvu_config. Here
are some keys supported in cvu_config.

Code
$ cat cvu_config
# Configuration file for CVU
# Version: 011405
#
#CV_ORACLE_RELEASE=11gR1
#CV_NODE_ALL=
CV_RAW_CHECK_ENABLED=TRUE
CV_ASSUME_DISTID=Taroon
#CV_XCHK_FOR_SSH_ENABLED=TRUE

#ORACLE_SRVM_REMOTESHELL=/usr/bin/ssh
#ORACLE_SRVM_REMOTECOPY=/usr/bin/scp
CV_NODE_ALL
If set, CV_NODE_ALL specifies the list of nodes that should be picked up when Oracle
Clusterware is not installed and the -n all option has been used in the command line.
CV_RAW_CHECK_ENABLED
If set to TRUE, the CV_RAW_CHECK_ENABLED key enables the check for accessibility of
shared SCSI disks on Red Hat release 3.0 and higher. This shared disk accessibility check
requires that you install a cvuqdisk rpm on all the nodes. By default, this key is set to TRUE
and shared disk check is enabled.
CV_ASSUME_DISTID
The CV_ASSUME_DISTID key specifies the distribution ID that CVU uses. For example, to
make CVU working with SuSE 9 ES, set it to Pensacola.
#CV_XCHK_FOR_SSH_ENABLED
If set to TRUE, CV_XCHK_FOR_SSH_ENABLED enables the X-Windows check for verifying
user equivalence with ssh. By default, this entry is commented out and X-Windows check
is disabled.
ORACLE_SRVM_REMOTESHELL
If set, ORACLE_SRVM_REMOTESHELL specifies the location for the ssh/rsh command to
override CVU's default value. By default, this entry is commented out and the tool uses
/usr/sbin/ssh and /usr/sbin/rsh.
If CVU does not find a key entry defined in the configuration file, the CVU searches for the
environment variable that matches the name of the key; otherwise, the CVU uses a
default.
ORACLE_SRVM_REMOTECOPY
If set, ORACLE_SRVM_REMOTECOPY specifies the location for the scp or rcp command to
override the CVU default value. By default, this entry is commented out and the CVU uses
/usr/bin/scp and /usr/sbin/rcp.
If the CVU does not find a key entry defined in the configuration file, the CVU searches
for the environment variable that matches the name of the key. If the environment
variable is set, the CVU uses its value. Otherwise it uses a default value for that entity.

Code
$ cat cvu_config
# Configuration file for CVU
# Version: 011405

#
#CV_ORACLE_RELEASE=11gR1
#CV_NODE_ALL=
CV_RAW_CHECK_ENABLED=TRUE
CV_ASSUME_DISTID=Taroon
#CV_XCHK_FOR_SSH_ENABLED=TRUE
#ORACLE_SRVM_REMOTESHELL=/usr/bin/ssh
#ORACLE_SRVM_REMOTECOPY=/usr/bin/scp

Question
In a Linux environment, which key can be set in the CVU's configuration file to
enable the check for accessibility of shared SCSI disks?
Options:
1.

CV_NODE_ALL

2.

CV_RAW_CHECK_ENABLED

3.

ORACLE_SRVM_REMOTESHELL

4.

CV_XCHK_FOR_SSH_ENABLED

Answer
Option 1: Incorrect. If this key is set in the CVU's configuration file, it specifies the
list of nodes that should be picked up when Oracle Clusterware is not installed
and the -n all option has been used in the command line.
Option 2: Correct. If this key is set to TRUE in the CVU's configuration file, it
enables the check for accessibility of shared SCSI disks on Red Hat release 3.0
and higher. This shared disk accessibility check requires that you install a
cvuqdisk rpm on all the nodes. By default, this key is set to TRUE and shared disk
check is enabled.
Option 3: Incorrect. If this key is set in the CVU's configuration file, it specifies the
location for the ssh/rsh command to override CVU's default value. By default,
this entry is commented out and the tool uses /usr/sbin/ssh and
/usr/sbin/rsh.

Option 4: Incorrect. If this key is set to TRUE in the CVU's configuration file, it
enables the X-Windows check for verifying user equivalence with ssh. By default,
this entry is commented out and X-Windows check is disabled.
Correct answer(s):
2. CV_RAW_CHECK_ENABLED
To provide the CVU with a list of all the nodes of a cluster, you can use the -n all option
while executing a command. The CVU attempts to obtain the node list in the following
sequence:

Code
$ cat cvu_config
# Configuration file for CVU
# Version: 011405
#
#CV_ORACLE_RELEASE=11gR1
#CV_NODE_ALL=
CV_RAW_CHECK_ENABLED=TRUE
CV_ASSUME_DISTID=Taroon
#CV_XCHK_FOR_SSH_ENABLED=TRUE
#ORACLE_SRVM_REMOTESHELL=/usr/bin/ssh
#ORACLE_SRVM_REMOTECOPY=/usr/bin/scp
1. if vendor clusterware is available, the CVU selects all the configured nodes from the vendor
clusterware using the lsnodes utility
2. if Oracle Clusterware is installed, the CVU selects all the configured nodes from Oracle Clusterware using
the olsnodes utility, and
3. if neither the vendor nor Oracle Clusterware is installed, the CVU searches for a value for the
CV_NODE_ALL key in the configuration file
If the vendor and Oracle Clusterware are not installed and if no key named
CV_NODE_ALL exists in the configuration file, the CVU searches for a value for the
CV_NODE_ALL environmental variable. If you have not set this variable, the CVU reports
an error.

Code
$ cat cvu_config
# Configuration file for CVU
# Version: 011405
#
#CV_ORACLE_RELEASE=11gR1
#CV_NODE_ALL=
CV_RAW_CHECK_ENABLED=TRUE
CV_ASSUME_DISTID=Taroon
#CV_XCHK_FOR_SSH_ENABLED=TRUE
#ORACLE_SRVM_REMOTESHELL=/usr/bin/ssh
#ORACLE_SRVM_REMOTECOPY=/usr/bin/scp
These are possible Cluster verify examples.

Code
$ cluvfy comp sys -n node1,node2 -p crs -verbose
$ cluvfy comp ssa -n all -s /dev/sda1
$ cluvfy comp space -n all -l /home/product -z 5G
$ cluvfy comp nodereach -n node2 -srcnode node1
$ cluvfy comp nodecon -n node1,node2 i eth0 -verbose
$ cluvfy comp sys -n node1,node2 -p crs -verbose
In this example, to verify the minimal system requirements on the nodes before installing
Oracle Clusterware or RAC, the sys component verification command is used. To check
the system requirements for installing RAC, use the -p database argument, and to check
the system requirements for installing Oracle Clusterware, use the -p crs argument.
To check the system requirements for installing Oracle Clusterware or RAC from Oracle
Database 10g release 1 (10.1), use the -r 10gR1 argument. The example verifies the
system requirements for installing Oracle Clusterware on the cluster nodes known as

node1 and node2. The -verbose option can be used with any command. It basically
gives you more information in the output.
$ cluvfy comp ssa -n all -s /dev/sda1
To verify whether storage is shared among the nodes in your cluster database or to
identify all of the storage that is available on the system and can be shared across the
cluster nodes, the component verification command ssa is used in this example. This
example uses the -s option to specify the path.
$ cluvfy comp space -n all -l /home/product -z 5G
You can use this example, if you are planning to install more software on the local
/home/product file system of each node in the cluster, and that software will take up 5
GB on each node. This command is successful if 5 GB is available in /home/product of
every node; otherwise, it fails.
$ cluvfy comp nodereach -n node2 -srcnode node1
To verify the reachability of the cluster nodes from the local node or from any other cluster
node, the component verification command nodereach is used in this example. This
example tries to check whether node2 can be reached from node1.
$ cluvfy comp nodecon -n node1,node2 i eth0 -verbose
In this example, to verify the connectivity between the cluster nodes through all of the
available network interfaces or through specific network interfaces, the component
verification command nodecon is used. The example checks whether node1 and node2
can communicate through the eth0 network interface.
Without the -i option, the CVU discovers all the network interfaces that are available on
the cluster nodes, reviews the interfaces' corresponding IP addresses and subnets,
obtains the list of interfaces that are suitable for use as VIPs and the list of interfaces
suitable for use as private interconnects, and verifies the connectivity between all the
nodes through those interfaces.
To verify user accounts and administrative permissions-related issues for user
equivalence, Oracle Clusterware installation, and RAC installation, the component
verification command admprv from this example is used.

Code
$ cluvfy comp admprv -n all -o user_equiv -verbose
On Linux and UNIX platforms, this example verifies user equivalence for all the nodes by
first using ssh and then using rsh if the ssh check fails.
To verify the equivalence only through ssh, use the -sshonly option. By default, the
equivalence check does not verify X-Windows configurations, such as when you have
disabled X-forwarding with the setting of the DISPLAY environment variable.

To verify X-Windows aspects during user equivalence checks, set the


CV_XCHK_FOR_SSH_ENABLED key to TRUE in the configuration file before you run the
command.

Code
$ cluvfy comp admprv -n all -o user_equiv -verbose
There are three arguments that verify specific permissions:

Code
$ cluvfy comp admprv -n all -o user_equiv -verbose

-o crs_inst argument verifies whether you have permissions to install Oracle Clusterware

-o db_inst argument verifies the permissions that are required for installing RAC, and

-o db_config argument verifies the permissions that are required for creating a RAC database
or for modifying a RAC database's configuration
These are a few more interesting examples of Cluster verify.

Code
$ cluvfy comp nodeapp -n all -verbose
$ cluvfy comp peer -n all verbose | more
$ cluvfy comp nodeapp -n all -verbose
This example verifies the existence of node applications, namely VIP, ONS, and GSD, on
all the nodes. To verify the integrity of all the Oracle Clusterware components, use the
component verification crs command. To verify the integrity of each individual Cluster
Manager Subcomponent (CSS), use the component verification command clumgr.
To verify the integrity of Oracle Cluster Registry, use the component verification ocr
command. To check the integrity of your entire cluster, which means to verify that all the
nodes in the cluster have the same view of the cluster configuration, use the component
verification clu command.
$ cluvfy comp peer -n all verbose | more
This example compares all the nodes and determines whether any differences exist
between the values of preselected properties. This is successful if the same setup is found

across all the nodes. You can also use the comp peer command with the -refnode
option to compare the properties of other nodes against the reference node.
This command allows you to specify the -r 10gR1 option. The truncated list of the
preselected properties include -Total memory, Swap space, Kernel version, System
architecture, Package existence for various components (glibc, make,
binutils, gcc, and compat-db), Group existence for "oinstall", Group
existence for "dba", and User existence for "nobody".
This code shows you the output of the $ cluvfy comp crs -n all -verbose
command. This command checks the complete Oracle Clusterware stack.

Code
$ cluvfy comp crs -n all -verbose
Verifying CRS integrity
Checking CRS integrity...
Checking daemon liveness...

Liveness of all the daemons


Node Name
CRS daemon
CSS daemon
------------ ------------- -------------------------------atlhp9
yes
yes
atlhp8
yes
yes
Checking CRS health...
Check: Health of CRS
Node Name
CRS OK?
--------------------------------------------------------atlhp9
yes
atlhp8
yes
Result: CRS health check passed.
CRS integrity check passed.
Verification of CRS integrity was successful.

EVM daemon

yes
yes

Supplement
Selecting the link title opens the resource in a new browser window.

Configuring Oracle Clusterware


View the various steps involved in diagnosing Oracle Clusterware components
and fixing Oracle Clusterware issues.

Launch window

Summary
It is recommended to setup Network Time Protocol or NTP on all nodes before installing
RAC. Oracle Clusterware uses a unified log directory structure to consolidate Oracle
Clusterware component log files. The diagcollection.pl script is used to collect
diagnostic information from an Oracle Clusterware installation.
Cluster Verification Utility or CVU helps to verify that you have a well-formed cluster. It is
recommended to use stage checks during the installation of Oracle Clusterware and
RAC.
CVU is available from the Oracle Technology Network or OTN, the Oracle software DVD,
and it is also installed in both Oracle Clusterware and RAC homes. For manual
installation, you need to install CVU on only one node. CVU deploys itself on remote
nodes during executions that require access to remote nodes.

Configuring Oracle Clusterware


The following short exercises are intended to make you familiar with various
inspecting and diagnosing your Oracle Clusterware components including log
locations and assorted utilities provided to aid in data collection activities. Step
1:
In this exercise you stop the crsd process uncleanly and inspect the logs that are generated. Using the ps
command, find the process ID for the crsd process and kill it with a signal 9. Wait a few moments and
change directory to /u01/crs11g/hostname and inspect the various log files that are generated.

Using the ps command find the process ID of the crsd process. Use the kill command with
sudo to kill the process since it is owned by root.

ps -ef|grep -i crsd
ps -ef | grep -i "crsd" | grep -v grep | grep -v killcrs | awk
'{print "sudo kill -9 " $2 }' >
/home/oracle/solutions/less10/z.sh
chmod 777 /home/oracle/solutions/less10/z.sh
echo "Killing crsd..."
/home/oracle/solutions/less10/z.sh
root
3220
1 0 Nov16 ?
/etc/init.d/init.crsd run
root
7658 3220 0 Nov16 ?
/u01/crs11g/bin/crsd.bin reboot

00:00:00 /bin/sh
00:07:17

Oracle
26952 26950 0 02:54 pts/1
Killing crsd...

00:00:00 grep -i crsd

Change directory to /u01/crs11g/log/<hostname> and find the log files generated by the
termination of the crsd process.

cd /u01/crs11g/log/$y
find . -type f -print
./crsd/crsdOUT.log
./crsd/crsd.log
./racg/ora.RDB.db.log
./alertvx0306.log

Inspect the various files that were generated, starting with the Clusterware Alert log.

Select inspecting the log file the log files to view the complete code.

Step 2:
Use the crsctl command to check the health of your clusterware. Use the CLUVFY command to check
the viability of the cluster nodeapps.

Run the crsctl command from the /u01/crs11g/bin directory.

cd /u01/crs11g/bin
./crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy

Change directory to /stage/10gR2/rdbms/clusterware/cluvfy and run CLUVFY using the


runcluvfy.sh script to check the nodeapps.

/u01/crs11g/bin/cluvfy comp nodeapp -n all -verbose


Verifying node application existence
Checking node application existence...
Checking existence of VIP node application
Node Name
Required
Status
Comment
------------ ------------------------ ------------------------ ---------vx0306
yes
exists
passed
vx0313
yes
exists
passed
Result: Check passed.
Checking existence of ONS node application
Node Name
Required
Status
Comment

------------ ------------------------ ------------------------ ---------vx0306


no
exists
passed
vx0313
no
exists
passed
Result: Check passed.
Checking existence of GSD node application
Node Name
Required
Status
Comment
------------ ------------------------ ------------------------ ---------vx0306
no
exists
passed
vx0313
no
exists
passed
Result: Check passed.
Verification of node application existence was successful.
Step 3:
Using the crs_stat command find all crs configuration data for the VIP resource located on your second
node.
cd /u01/crs11g/bin
./crs_stat -p ora.${z}.vip
NAME=ora.vx0313.vip
TYPE=application
ACTION_SCRIPT=/u01/crs11g/bin/racgwrap
ACTIVE_PLACEMENT=1
AUTO_START=1
CHECK_INTERVAL=15
DESCRIPTION=CRS application for VIP on a node
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=vx0313
OPTIONAL_RESOURCES=
PLACEMENT=favored
REQUIRED_RESOURCES=
RESTART_ATTEMPTS=0
SCRIPT_TIMEOUT=60
START_TIMEOUT=0
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=
USR_ORA_IF=eth0
USR_ORA_INST_NOT_SHUTDOWN=

USR_ORA_LANG=
USR_ORA_NETMASK=255.255.252.0
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=10.216.4.75
Step 4:
Determine the file(s) the OCR is using. Determine the total space available and what is currently being
used.

Run the ocrcheck command from the /u01/crs11g/bin directory.

cd /u01/crs11g/bin
./ocrcheck
Status of Oracle Cluster Registry is as follows :
Version
:
2
Total space (kbytes)
:
616652
Used space (kbytes)
:
3848
Available space (kbytes):
612804
ID
: 992212279
Device/File Name
: /dev/sdb1
Device/File integrity
check succeeded
Device/File Name
: /dev/sdb2
Device/File integrity
check succeeded
Cluster registry integrity check succeeded

In this example, the OCR is using the file /dev/sdb1 and /dev/sdb2. The size of the file is 616652k
and of that, 3848k is currently in use.

Fixing Oracle Clusterware issues


An error is introduced into the cluster synchronization service on your cluster. Observe the results and
inspect your logs to identify and diagnose the problem. Repair the problem and return your cluster to
normal operation.

Your nodes will now reboot in reaction to the problem that was introduced. There is something
seriously wrong!

Step 1:
You ping the first node in your cluster.
[oracle@vx0309 ~]$ ping 10.216.4.19
PING 10.216.4.19 (10.216.4.19) 56(84)
64 bytes from 10.216.4.19: icmp_seq=0
64 bytes from 10.216.4.19: icmp_seq=1
64 bytes from 10.216.4.19: icmp_seq=2
64 bytes from 10.216.4.19: icmp_seq=3

bytes of data.
ttl=64 time=10.2 ms
ttl=64 time=0.352 ms
ttl=64 time=0.323 ms
ttl=64 time=0.325 ms

--- 10.216.4.19 ping statistics --4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.323/2.817/10.271/4.303 ms, pipe 2
[oracle@vx0301 ~]$
Step 2:
You must now inspect your Oracle Clusterware log files to find the problem.

When the node is stable, change your directory to your log directory, /u01/crs11g/log/<host
name>. List the log files. The first file you should look at is the clusterware alert log. In this
example, that file is alerted-.log.

y=`cat /home/oracle/nodeinfo | sed -n '1,1p'`


z=`cat /home/oracle/nodeinfo | sed -n '2,2p'`
cd /u01/crs11g/log/$y*
ls -al
total 52
drwxr-xr-t 8 root oinstall 4096 Nov 22 02:33 .
drwxr-xr-x 4 oracle oinstall 4096 Nov 26 02:04 ..
drwxr-x--- 2 oracle oinstall 4096 Nov 22 02:33 admin
-rw-rw-r-- 1 oracle oinstall 18421 Nov 26 02:48
alertvx0308.log
drwxrwx--- 2 oracle oinstall 4096 Nov 26 02:48 client
drwxr-x--- 2 root oinstall 4096 Nov 22 02:34 crsd
drwxr-x--- 4 oracle oinstall 4096 Nov 26 02:54 cssd
drwxr-x--- 2 oracle oinstall 4096 Nov 22 05:32 evmd
drwxrwx--T 5 oracle oinstall 4096 Nov 22 02:42 racg
Step 3:
Looking at the alert log, what could be the cause of the problem?

You can see that the voting disk appears to be intermittently offline. This would certainly cause
problems with CSS.

Select analyzing the alert log to view the complete code.

Step 4:
The alert log indicates that further information can be found. Where would you have to look at?

In the /u01/crs11g/log/<host name>/cssd/ocssd.log. The real problem is revealed here. You can
see that the voting disk, /dev/sdb5 is corrupted. This was the cause of the reboot.

Select the alert log to view the complete code.

Step 5:
Fix the diagnosed problem.

Although Oracle Clusterware is able to function in this situation, it is necessary to recover the
voting disk file, /dev/sbd5. A backup was made at the beginning of this practice called vdisk.bak.
After you stop Oracle Clusterware on both nodes, use the dd command and specify a 4K block
size. Execute the command.

[oracle@vx0308 less10]$ sudo /u01/crs11g/bin/crsctl stop crs


Stopping resources.
This could take several minutes.
Successfully stopped Oracle Clusterware resources
Stopping Cluster Synchronization Services.
Shutting down the Cluster Synchronization Services daemon.
Shutdown request successfully issued.
[oracle@vx0309 ~]$ sudo /u01/crs11g/bin/crsctl stop crs
Stopping resources.
This could take several minutes.
Successfully stopped Oracle Clusterware resources
Stopping Cluster Synchronization Services.
Shutting down the Cluster Synchronization Services daemon.
Shutdown request successfully issued.
[oracle@vx0309 ~]$
dd if=/home/oracle/solutions/less10/vdisk.bak of=/dev/sdb5
bs=4k
154224+1 records in
154224+1 records out
Step 6:
After you fixed the problem, what should you do?

Re-start Oracle Clusterware on both nodes using the crsctl command as root on both nodes.

y=`cat /home/oracle/nodeinfo | sed -n '1,1p'`


z=`cat /home/oracle/nodeinfo | sed -n '2,2p'`

HOST=`hostname|cut -c 1-10`
sudo /u01/crs11g/bin/crsctl start crs
ssh $z sudo /u01/crs11g/bin/crsctl start crs
Attempting to
The CRS stack
Attempting to
The CRS stack

start Oracle Clusterware stack


will be started shortly
start Oracle Clusterware stack
will be started shortly

Step 7:
Using the crs_stat command, check the status of your CRS stack and nodeapps. Be patient, it takes a
few minutes for the components to restart.
/u01/crs11g/bin/crs_stat
NAME=ora.RDB.RDB1.inst
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0308
NAME=ora.RDB.RDB2.inst
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0309
NAME=ora.RDB.db
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0309
NAME=ora.vx0308.ASM1.asm
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0308
NAME=ora.vx0308.LISTENER_VX0308.lsnr
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0308
NAME=ora.vx0308.gsd
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0308

NAME=ora.vx0308.ons
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0308
NAME=ora.vx0308.vip
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0308
NAME=ora.vx0309.ASM2.asm
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0309
NAME=ora.vx0309.LISTENER_VX0309.lsnr
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0309
NAME=ora.vx0309.gsd
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0309
NAME=ora.vx0309.ons
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0309
NAME=ora.vx0309.vip
TYPE=application
TARGET=ONLINE
STATE=ONLINE on vx0309
Step 8:
The database instances will be the last things that are started and may take several minutes to do so.
What could be the cause of that delay? Because it may take too long to restart both instances, you can
manually start them if needed.

Take a look at the database alert log on your first node and check what is happening. The delay,
of course is caused by instance recovery. Remember, that the problem that was introduced at the
beginning of the demonstration caused the Oracle Clusterware stack to crash and reboot the
node and this of course crashed the database also.

y=`cat /home/oracle/nodeinfo | sed -n '1,1p'`


z=`cat /home/oracle/nodeinfo | sed -n '2,2p'`

DBNAME=`ps -ef | grep dbw0_RDB | grep -v grep | grep -v


callout1 | awk '{ print $8 }' | sed 's/1/''/' | sed
's/ora_dbw0_/''/'`
I1NAME=$DBNAME"1"
I2NAME=$DBNAME"2"
cat /u01/app/diag/rdbms/rdb*/RDB*/trace/alert*
sleep 100
/u01/crs11g/bin/srvctl start instance -d $DBNAME -i $I2NAME
/u01/crs11g/bin/srvctl start instance -d $DBNAME -i $I1NAME
Learning Objective

After completing this topic, you should be able to

identify how to enable debugging in RAC

1. Oracle Clusterware debugging


You can use various tools to manipulate Oracle Cluster Registry or OCR such as
ocrdump, ocrconfig, ocrcheck, and srvctl.
These utilities create log files in this directory.

Graphic
The utilities create log files in the directory
$ORA_CRS_HOME/log/<hostname>/client/.
To change the amount of logging, edit this file.

Graphic
To change the amount of logging, edit the
$ORA_CRS_HOME/srvm/admin/ocrlog.ini file.
The default logging level is 0, which basically means minimum logging. When
mesg_logging_level is set to 0, which is its default value, only error conditions are

logged. You can change this setting to 3 or 5 for detailed logging information.
If that is not enough, you can also change the logging and trace level for each of the
components used to manipulate OCR. To do that, edit the entries containing
comploglvl and comptrclvl in ocrlog.ini.
Using the three lines in this code, you could add to ocrlog.ini to turn on additional
debugging information. A typical example where you might have to change the ocrlog.ini
file is in a situation where you get errors while using either ocrdump or ocrconfig tools.

Code
mesg_logging_level = 5
comploglvl="OCRAPI:5 ; OCRSRV:5; OCRCAC:5; OCRMAS:5;
OCRCONF:5; OCRRAW:5"
comptrclvl="OCRAPI:5 ; OCRSRV:5; OCRCAC:5; OCRMAS:5;
OCRCONF:5; OCRRAW:5"

Note
You should never execute the commands on your production environment unless
explicitly asked by Oracle Support.
You may be requested to enable tracing to capture additional information for problem
resolution with Oracle Clusterware resources when working with Oracle Support.
Because the procedures described here may adversely affect performance, perform
these activities only with the assistance of Oracle Support.
To generate additional trace information for a particular running resource, as the root
user, you can use crsctl to enable resource debugging using this syntax.

Syntax
crsctl debug log res "<resource name>:1"
This example enables debugging for the ora.atlhp8.vip resource. This has the effect
of setting the environment variable USER_ORA_DEBUG to 1 in its resource's profile. This
setting is enforced before running the start, stop, or, check action scripts for the
resource.

Code
# crsctl debug log res "ora.atlhp8.vip:1"

If you are asked to enable tracing for all resources, add these lines to the racgwrap
script in both $ORA_CRS_HOME/bin and $ORACLE_HOME/bin.

Graphic
The following lines are added to racgwrap script to enable tracing for all
resources:
_USR_ORA_DEBUG=1
export _USR_ORA_DEBUG

Code
# Set _USR_ORA_DEBUG to enable more tracing
#_USR_ORA_DEBUG=1 && export _USR_ORA_DEBUG
_USR_ORA_DEBUG=1
export _USR_ORA_DEBUG
$ORACLE_HOME/bin/racgmain "$@
status=$?
exit $status
After you capture all trace information, do not forget to either execute these
corresponding commands, or to remove the added lines from racgwrap scripts to switch
off debugging.

Code
crsctl debug log res "<resource name>:0"
The main Oracle Clusterware daemons (crsd, cssd, and evmd) use various internal
modules during their execution.
You can use crsctl commands as the root user to enable dynamic debugging for the
Oracle Clusterware modules. The crsctl lsmodules crs, css, or evm commands are
used to list the module's components that can be used for debugging.
The example lists the commands for crs.

Code
$ crsctl lsmodules crs

CRSUI
CRSCOMM CRSRTI
CRSMAIN CRSPLACE CRSAPP CRSRES
CRSCOMM CRSOCR CRSTIMER CRSEVT CRSD
CLUCLS CSSCLNT
COMMCRS COMMNS
When asked by Oracle Support, you can then use these commands to enable additional
logging.

Code
crsctl debug log <module name> <component>:<debugging level>
crsctl debug statedump crs|css|evm
crsctl debug trace css|crs|evm

crsctl debug log <module name> <component>:<debugging level>


In the crsctl debug log <module name> <component>:<debugging level>
command, <module name> is the name of the module, crs, evm, or css; <component
name> is the name of the corresponding component obtained using the crsctl
lsmodules command; and <debugging level> is a level from 1 to 5.
crsctl debug statedump crs|css|evm
The crsctl debug statedump crs, css, or evm command dumps state information for
crs, css, or evm modules.
crsctl debug trace css|crs|evm
The crsctl debug trace css, crs, or evm command dumps crs, css, or evm inmemory tracing caches.
The example explains how to dynamically enable additional logging (level 5) for these
CRS components CRSEVT, CRSAPP, CRSTIMER, and CRSRES.

Code
# crsctl debug log crs CRSEVT:5,CRSAPP:5,CRSTIMER:5,CRSRES:5

Note
Don't execute these commands on a production system without explicit guidance
from Oracle Support.

2. Trace control and hang analysis


All Java-based tools and utilities that are available in RAC are invoked by executing
scripts of the same name as the tool or utility.

This includes the Cluster Verification Utility (cluvfy), the Database Configuration
Assistant (dbca), the Database Upgrade Assistant (dbua), the Net Configuration
Assistant (netca), the Virtual Internet Protocol Configuration Assistant (vipca), Server
Control (srvctl), and the Global Services Daemon (gsdctl).
For example, to run the Database Configuration Assistant, enter the command dbca.
By default, Oracle enables traces for dbca and dbua. The resulting log files are written to
these directory paths respectively.

Code
$ORACLE_HOME/cfgtoollogs/dbca/
$ORACLE_HOME/cfgtoollogs/dbua/
For cluvfy, gsdctl, srvctl, and vipca, you can set the SRVM_TRACE environment
variable to TRUE to make the system generate traces. Traces are written to either log files
or standard output.

Code
$ export SRVM_TRACE=TRUE
$ srvctl config database d xwkE > /tmp/srvctl.trc
$ cat /tmp/srvctl.trc
/u01/app/oracle/product/Crs/jdk/jre/bin/java -classpath
/u01/app/oracle/product/Crs/jlib/netcfg.jar:
srvctl.jar -DTRACING.ENABLED=true -DTRACING.LEVEL=2
oracle.ops.opsctl.OPSCTLDriver config database -d xwkE
[main] [9:47:27:454]
[OPSCTLDriver.setInternalDebugLevel:165]
tracing is true at level 2 to file null
[main] [9:47:27:496] [OPSCTLDriver.<init>:95] Security
manager
is set
For example, the system writes traces to log files in this path for cluvfy. However, it
writes traces directly to the standard output for srvctl.

Code
$ORA_CRS_HOME/cv/log/ for cluvfy

Note

Don't execute these commands on a production system without explicit guidance


from Oracle Support.

Question
Oracle Support has asked you to enable tracing for all Oracle Clusterware
resources. How is this accomplished?
Options:
1.

Set the mesg_logging_level attribute to 5 in the ocrlog.ini file

2.

Set the SRVM_TRACE environment variable to TRUE using a terminal window

3.

Set the environment variable _USR_ORA_DEBUG to 1 in the racgwrap scripts

4.

Change USR_ORA_DEBUG resource attribute to 1 for each Oracle Clusterware


resource

Answer
Option 1: Incorrect. The ocrdump, ocrconfig, ocrcheck, and srvctl utilities
create log files in $ORA_CRS_HOME/log/<hostname>/client/. When
mesg_logging_level is set to 0 in the ocrlog.ini file, which is its default value,
only error conditions are logged. You can change this setting to 3 or 5 for detailed
logging information.
Option 2: Incorrect. For the Java-based tools cluvfy, gsdctl, srvctl, and
vipca, you can set the SRVM_TRACE environment variable to TRUE to make the
system generate traces. Traces are written to either log files or standard output.
Option 3: Correct. You can enable tracing for all resources by setting the
_USR_ORA_DEBUG environment variable to 1 in the racgwrap script in both
$ORA_CRS_HOME/bin and $ORACLE_HOME/bin.
Option 4: Incorrect. To generate additional trace information for a particular
running resource, as the root user, you can use the crsctl command to enable
resource debugging by changing the USR_ORA_DEBUG resource attribute to 1 for
the specific resource. This can be done for all resources. However, this is not the
best method to enable resource debugging for all Oracle Clusterware resources.
Correct answer(s):
3. Set the environment variable _USR_ORA_DEBUG to 1 in the racgwrap scripts
The diagnostic tracing facility provides various controls on its tracing behavior. You can
specify the controls through a number of interfaces. Don't execute the commands for

these interfaces on a production system without explicit guidance from Oracle Support.
The interfaces are
initialization parameter during instance startup
The initialization parameter TRACE_ENABLED is visible to the users to enable or disable
the tracing mechanism. By default, its value is TRUE, and minimal tracing takes place.
However, you can set it to FALSE to meet some high-end benchmark requirements.
SQL statements
At run time of an instance, tracing behavior can also be modified through SQL statements
such as ALTER TRACING and ALTER SYSTEM SET. ALTER TRACING is used to turn on or
off the tracing mechanism, to flush trace buffers to disk, and to enable or disable trace
events. ALTER SYSTEM SET is used to change value of initialization parameters.
fixed table views, and
There are two fixed table views related to this tracing mechanism X$TRACE_EVENTS and
X$TRACE. They are used for online monitoring of tracing characteristics and contents of
trace buffers in SGA.
oradebug during run time
It is possible to execute oradebug commands during run time on remote instances from a
local instance and get the results back to the local node. You can use the -r and -g
options to do that.
Crash dump is one of the most important features of the DIAG process.
During an instance crash, DIAG sends out a dump message to peer DIAG processes in
the cluster and then creates a system state dump. Each remote DIAG process then
flushes its trace information to the disks.
That way a cluster snapshot can be reused later by Oracle Support for better diagnostics.
During the instance cleanup procedure, DIAG is the second to last process to be
terminated because it needs to perform trace flushing to the file system. By default, the
terminating process, usually PMON, gives a small amount of time to DIAG for dumping.
Instance freezing is not required in order to obtain the snapshot of traces across all
instances. It is because all traces with execution history required for diagnosis are already
stored in the memory buffers and are dumped to file after a DIAG process receives the
crash notification. Traces for the moment of crash are likely to be in this history.

Note
A dump directory, named cdmp_<timestamp>, is created at the
BACKGROUND_DUMP_DEST location and all trace dump files are placed in this
directory on the remote nodes.

These are examples of using the DIAG trace control interface, provided you can connect
normally to your database.

Code
SQL> alter tracing enable "10425:10:135";
SQL> SELECT trclevel,status,procs FROM x$trace_events
2> WHERE event=10425;
SQL> alter tracing disable "ALL";
SQL> alter tracing flush;
SQL> oradebug setmypid
Statement processed.
SQL> oradebug setinst all
Statement processed.
SQL> oradebug -g def hanganalyze 3
Hang Analysis in
/u01/app/oracle/diag/rdbms/racdb/racdb1/trace/racdb1_diag_11
347.trc
SQL>
SQL> alter tracing enable "10425:10:135";
This example is used from a SQL*Plus session to turn on tracing for event 10425 at level
10 for process ID 135.
SQL> SELECT trclevel,status,procs FROM x$trace_events
2> WHERE event=10425;
You can then query x$trace_events to determine which events are traced (STATUS=1)
at which level. This view contains 1,000 event statuses. You can also query x$trace to
retrieve trace records directly from the SGA.
SQL> alter tracing disable "ALL";
This statement is used to disable tracing for all events.
SQL> alter tracing flush;
This command is used to archive trace logs related to your process. Using this command,
you can also flush trace buffers of all other processes. Using some undocumented
parameters, it is also possible to continuously flush trace buffers to disk instead of having
to manually flush them.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug setinst all

Statement processed.
SQL> oradebug -g def hanganalyze 3
Hang Analysis in
/u01/app/oracle/diag/rdbms/racdb/racdb1/trace/racdb1_diag_11347.tr
c
SQL>
This example shows you how to use the hang analyzer throughout your RAC cluster. After
you are connected to SQL*Plus as SYSDBA, you attach oradebug to your process.
Then you set the instance list to all instances in your cluster, and execute the hang
analyzer at level 3. This executes a hang analysis clusterwide. When executed, the
command returns the name of the trace file containing the result of this analysis.
Don't execute these commands on a production system without explicit guidance from
Oracle Support.

Code
SQL> alter tracing enable "10425:10:135";
SQL> SELECT trclevel,status,procs FROM x$trace_events
2> WHERE event=10425;
SQL> alter tracing disable "ALL";
SQL> alter tracing flush;
SQL> oradebug setmypid
Statement processed.
SQL> oradebug setinst all
Statement processed.
SQL> oradebug -g def hanganalyze 3
Hang Analysis in
/u01/app/oracle/diag/rdbms/racdb/racdb1/trace/racdb1_diag_11
347.trc
SQL>
Sometimes even connecting to the instance can hang. In this case there is a slightly
unsafe (uses remote Oradebug commands) but useful mechanism to collect diagnostic
dumps. This mechanism leverages prelim connections that were introduced in Oracle
Database 10.1.
Prelim connections create an authenticated foreground that just attaches to the SGA.
This is enough to execute Oradebug commands. At the same time, since no process
state object, session state object, and more are created, you are guaranteed to be able to

connect to the hung instance.


The example explains such a case where you cannot access your instance normally.

Code
$ sqlplus /nolog
SQL*Plus: Release 11.1.0.6.0 - Production on Fri Nov 2
22:51:24 2007
Copyright (c) 1982, 2007, Oracle.

All rights reserved.

SQL> set _prelim on


SQL> connect / as sysdba
Prelim connection established
SQL> oradebug setorapname reco
Oracle pid: 19, Unix process pid: 11381, image:
oracle@edcdr12p1.us.oracle.com (RECO)
SQL> oradebug dump hanganalyze_global 1
Statement processed.
SQL> oradebug dump systemstate_global 267
Statement processed.
SQL> oradebug setorapname diag
Oracle pid: 4, Unix process pid: 11347, image:
oracle@edcdr12p1.us.oracle.com (DIAG)
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/racdb1/racdb11/trace/racdb11_diag
_11347.trc
SQL>

Note
REMOTE ORADEBUG commands are unsafe. Only run these commands with
explicit guidance from Oracle Support.
This example specifies the RECO process, not DIAG, because the system will use RECO
to message DIAG to do the global hang analysis and system state dumps.
In addition, Oracle Database 11gR1 introduces the V$WAIT_CHAINS view (in Enterprise
Manager) which displays information about blocked sessions. A wait chain comprises
sessions that are blocked by one another.
Each row represents a blocked and blocker session pair. If a wait chain is not a cyclical
wait chain, then the last row for the chain does not have a blocker.

Code
$ sqlplus /nolog
SQL*Plus: Release 11.1.0.6.0 - Production on Fri Nov 2
22:51:24 2007
Copyright (c) 1982, 2007, Oracle.

All rights reserved.

SQL> set _prelim on


SQL> connect / as sysdba
Prelim connection established
SQL> oradebug setorapname reco
Oracle pid: 19, Unix process pid: 11381, image:
oracle@edcdr12p1.us.oracle.com (RECO)
SQL> oradebug dump hanganalyze_global 1
Statement processed.
SQL> oradebug dump systemstate_global 267
Statement processed.
SQL> oradebug setorapname diag
Oracle pid: 4, Unix process pid: 11347, image:
oracle@edcdr12p1.us.oracle.com (DIAG)
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/racdb1/racdb11/trace/racdb11_diag
_11347.trc
SQL>
Some of the new 11gR1 background processes include
ACMS
The Atomic Controlfile to Memory Server or ACMS per-instance process is an agent that
contributes to ensuring a distributed SGA memory update. This process verifies if SGA
memory update is either globally committed on success or globally aborted in the event of
a failure in an Oracle RAC environment.
DBRM
The DataBase Resource Manager or DBRM process is responsible for setting resource
plans and other resource manager related tasks.
DIAG
The DIAGnosability or DIAG process performs diagnostic dumps and executes global
oradebug commands.
DIA0, and
DIAgnosability or DIA0 process 0 (only 0 is currently being used) is responsible for hang
detection and deadlock resolution.

EMNC
EMoN Coordinator (event monitor coordinator) or EMNC is the background server process
used for database event management and notifications.
A few more new 11gR1 background processes are
GTXj
The Global Txn process j (0-j) or GTXj processes provide transparent support for XA global
transactions in an Oracle RAC environment. The database autotunes the number of these
processes based on the workload of XA global transactions. Global transaction processes
are only seen in an Oracle RAC environment.
KATE
Konductor of ASM Temporary Errands or KATE performs proxy I/O to an ASM metafile
when a disk goes offline.
MARK
Mark Au for Resync Koordinator or MARK marks ASM allocation units as stale following a
missed write to an offline disk.
PING, and
The PING process governs the interconnect latency measurement.
VKTM
The Virtual Keeper of TiMe or VKTM process is responsible for providing a wall-clock time
(updated every second) and reference-time counter (updated every 20 ms and available
only when running at elevated priority.)

Question
Which new Oracle Database 11g background process is an agent that contributes
to ensuring a distributed SGA memory update is either globally committed or
globally aborted?
Options:
1.

ACMS

2.

DBRM

3.

KATE

4.

VKTM

Answer
Option 1: Correct. The Atomic Controlfile to Memory Server per-instance process
or ACMS is an agent that contributes to ensuring a distributed SGA memory

update is either globally committed on success or globally aborted in the event of


a failure in an Oracle RAC environment.
Option 2: Incorrect. The DataBase Resource Manager or DBRM process is
responsible for setting resource plans and other resource manager related tasks.
Option 3: Incorrect. The Konductor of ASM Temporary Errands or KATE process
performs proxy I/O to an ASM metafile when a disk goes offline.
Option 4: Incorrect. The Virtual Keeper of TiMe or VKTM process is responsible
for providing a wall-clock time (updated every second) and reference-time counter
(updated every 20 ms and available only when running at elevated priority).
Correct answer(s):
1. ACMS

Summary
Tools used to manipulate OCR include ocrdump, ocrconfig, ocrcheck, and srvctl.
These tools generate log files in specific directory paths. You can also change the logging
and trace level for each of the components used to manipulate OCR. To do that, edit the
entries containing comploglvl and comptrclvl in ocrlog.ini. You can use crsctl
commands as the root user to enable dynamic debugging for Oracle Clusterware
modules.
All Java-based tools and utilities that are available in RAC are invoked by executing
scripts of the same name as the tool or utility. Oracle enables traces for dbca and dbua by
default. In cases where connecting to the instance can hang, you can use remote
Oradebug commands to collect diagnostic dumps. However, this method is considered
slightly unsafe.

Add a Node to a RAC Cluster


Learning Objective

After completing this topic, you should be able to

recognize how to add nodes and instances in a RAC database

1. Adding a new node in a RAC database


There are mainly three methods you can use to add and delete nodes in a RAC
environment:

silent cloning procedures that enable you to copy images of Oracle Clusterware and RAC
software onto the other nodes that have identical hardware and software
the Enterprise Manager Grid Control that is basically a GUI interface to cloning procedures, and
interactive or silent procedures using addNode.sh/rootdeletenode.sh and the Database
Configuration Assistant or DBCA
The preferred method to add multiple nodes and instances to RAC databases is to use
the cloning procedures. This is especially relevant when you are massively deploying
software across your enterprise.
You can directly use Oracle Universal Installer or OUI and DBCA to add one node to and
delete one node from your cluster.
These are the main steps you need to follow to add a new node to your RAC cluster:

install and configure OS and hardware for the new node

add Oracle Clusterware to the new node

configure ONS for the new node

add ASM home to the new node

add RAC home to the new node

add a listener to the new node, and

add a database instance to the new node

Note
The step that involves adding ASM home to the new node is optional.
Basically, you are going to use OUI to copy the Oracle Clusterware software as well as
the RAC software to the new node. For each main step, you have to do some manual
configurations.

Note
For all the add node and delete node procedures for UNIX-based systems,
temporary directories such as /tmp, $TEMP, or $TMP should not be shared
directories. If your temporary directories are shared, set your temporary

environment variable, such as $TEMP, to a nonshared location on a local node. In


addition, use a directory that exists on all the nodes.

Question
When adding a new node to your RAC cluster, which optional step occurs after
configuring ONS for the new node?
Options:
1.

Add a listener to the new node

2.

Add ASM home to the new node

3.

Add Oracle Clusterware to the new node

4.

Install and configure OS and hardware for new node

Answer
Option 1: Incorrect. Adding a listener to the new node occurs after adding a RAC
home and before adding a database instance. You need to use netca to begin
this process.
Option 2: Correct. This step occurs after configuring ONS for the new node and is
needed only if you use a specific home directory to host ASM. If you run ASM and
your RAC database out of the same Oracle Home, you can skip this step. From
the first node, you need to execute the addNode.sh script from the ASM home to
begin this process.
Option 3: Incorrect. After ensuring that the OS and hardware are installed and
configured properly on the node you want to add to your RAC cluster, and before
you configure ONS for the new node, you must add Oracle Clusterware to the new
node. This process can be started by executing the addNode.sh script located in
your Oracle Clusterware home directory on the first node, as the oracle user.
Option 4: Incorrect. Installing and configuring the OS and hardware for the new
node is the first step that should be performed when adding a new node to your
RAC cluster. Before you can proceed with the Oracle Clusterware installation on
the node you want to add to your RAC cluster, you must make sure that all
operating system and hardware prerequisites are met.
Correct answer(s):
2. Add ASM home to the new node

Question

Which is the final step when adding a new node to your RAC cluster?
Options:
1.

Add a listener to the new node

2.

Configure ONS for the new node

3.

Add Oracle Clusterware to the new node

4.

Add a database instance to the new node

Answer
Option 1: Incorrect. Adding a listener to the new node occurs after adding a RAC
home and before adding a database instance. You need to use netca to begin
this process.
Option 2: Incorrect. Configuring ONS for the new node occurs after adding Oracle
Clusterware to the new node. However, this is not the final step in the process of
adding a new node to your RAC cluster.
Option 3: Incorrect. After ensuring that the OS and hardware are installed and
configured properly on the node you want to add to your RAC cluster and before
you configure ONS for the new node, you must add Oracle Clusterware to the new
node. This process can be started by executing the addNode.sh script located in
your Oracle Clusterware home directory on the first node, as the oracle user.
Option 4: Correct. The final step when adding a new node to your RAC cluster
should be to add a database instance to the new node. This is accomplished by
using the DBCA from the first node.
Correct answer(s):
4. Add a database instance to the new node
Before you can proceed with the Oracle Clusterware installation on the node you want to
add to your RAC cluster, you must make sure that all operating system and hardware
prerequisites are met.
After this is done, you can verify that the system has been configured properly for Oracle
Clusterware by using this Cluster Verify command from one of the nodes that is already
part of your cluster.

Syntax
\ cluvfy stage -pre crsinst -n <list of all nodes> -r 11gR1

This example assumes that you have only one node currently as part of your cluster, and
you want to add a new one called vx0313. If any errors are reported during the
preceding verification, fix them before proceeding to the next step.

Graphic
The code that you use to add a new node is the following:
/u01/crs11g/bin/cluvfy stage -pre crsinst -n vx0306, vx0313 -r 11gR1

Code
bash-3.00$ /u01/crs11g/bin/cluvfy stage -pre crsinst -n
vx0306, vx0313 -r 11gR1
Performing pre-checks for cluster services setup
Checking node reachability...
Node reachability check passed from node "vx0306".
Checking user equivalence...
User equivalence check passed for user "oracle".
Checking administrative privileges...
User existence check passed for "oracle".
Group existence check passed for "oinstall".
Membership check for user "oracle" in group "oinstall" [as
Primary] passed.
Administrative privileges check passed.
Checking node connectivity...
Node connectivity check passed for subnet "10.216.4.0" with
node(s) vx0306,vx0313.
Add oracle clusterware to the new node by performing subsequent steps. Log in as the
oracle user and execute the addNode.sh script located in your Oracle Clusterware
home directory on the first node. This script runs the Oracle Universal Installer. On the
Welcome screen, click Next.
On the Specify Cluster Nodes to Add to Installation screen, OUI recognizes the existing
nodes and asks you to enter the short public node name of the host you want to add to
your cluster. That should automatically populate the corresponding Private Node Name
and Virtual host name fields. Make sure that those three names are correct and click
Next.

Next in the Cluster Node Addition Summary screen, you can review the list of products to
be installed. Click Install.

Note
It is also possible to execute OUI in silent mode. A possible example where you
want to add a new node called newnode is addNode.sh -silent
-responseFile myinstallresponsefile.
Here, myinstallresponsefile contains CLUSTER_NEW_NODES =
{"newnode"}
CLUSTER_NEW_PRIVATE_NODE_NAMES = {"newnode-priv"}
CLUSTER_NEW_VIRTUAL_HOSTNAMES =
{"newnode-vip"}.
You can now follow the installation progression from the Cluster Node Addition Progress
screen.

Graphic
The Cluster Node Addition Progress screen performs four steps to complete the
installation: Instantiation of add node scripts complete, Copying to remote nodes,
Save inventory pending, and Execution of root scripts pending. This screen also
contains a note that says You can find the log of this install session at:
/u01/app/oraInventory/logs/addNodeActions2007-11-09_06-59-28AM.log.
The OUI copies the Oracle Clusterware software to the new node, and then asks you to
run a few scripts as the root user on both nodes. Make sure that you run the scripts on
the correct node as specified one after another.

Graphic
The Execute Configuration scripts dialog box lists the location of the scripts and
the node names. The scripts rootaddnode.sh and root.sh are accessible from the
Execute Configuration scripts dialog box.
There are two scripts you have to execute:

Code
[root@vx0306 oracle]# /u01/crs11g/install/rootaddnode.sh
clscfg: EXISTING configuration version 4 detected.
clscfg: version 4 is 11 Release 1.
Attempting to add 1 new nodes to the configuration

Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.


node <nodenumber>: <nodename> <private interconnect name>
<hostname>
node 2: vx0313 vx0313-priv vx0313
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
/u01/crs11g/bin/srctl add nodeapps -n vx0313 -A vx0313vip/255.255.252.0/eth0
[root@vx0306 oracle]#
[root@vx0313 ~]]# /u01/crs11g/root.sh
Checking to see if Oracle CRS stack is already configured
OCR LOCATIONS = /dev/sdb1
OCR backup directory '/u01/crs11g/cdata/vx_cluster02 does
not exist.
Creating now
Setting the permissions on OCR backup directory
Setting up Network socket directories
Oracle Cluster Registry configuration upgraded successfully
clscfg: EXISTING configuration version 4 detected.
clscfg: version 4 is 11 Release 1.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name>
<hostname>
' ...
rootaddnode.sh script and
You have to execute the rootaddnode.sh script on the first node. Basically, this script
adds the nodeapps of the new node to the OCR configuration.
root.sh script
After executing the rootaddnode.sh script, you have to execute the root.sh script
from the new node. This script starts the Oracle Clusterware stack on the new node and
then uses VIPCA or Virtual IP Configuration Assistant in silent mode for configuring
nodeapps.

Supplement
Selecting the link title opens the resource in a new browser window.

The complete root.sh script


Click to view the complete root.sh script.
Launch window

After both scripts are executed successfully, you can check your Oracle Cluster Registry
or OCR configuration. At this point, the crs_stat command reports these three new
resources on the new node. These resources correspond to nodeapps.

Graphic
The three new resources on the new node are the following:
ora.vx0313.gsd application ONLINE ONLINE
ora.vx0313.ons application ONLINE ONLINE
ora.vx0313.vip application ONLINE ONLINE

vx0313
vx0313
vx0313

Code
bash-3.00$ /u01/crs11g/bin/crs_stat -t
Name
Type
Target
State
Host
-------------------------------------------------------------ora....BB1.srv
application
OFFLINE
OFFLINE
ora.....JFV.cs
application
OFFLINE
OFFLINE
ora....B1.inst
application
ONLINE
ONLINE
vx0306
ora.RDBB.db
application
ONLINE
ONLINE
vx0306
ora....SM1.asm
application
ONLINE
ONLINE
vx0306
ora....06.1snr
application
ONLINE
ONLINE
vx0306
ora.vx0306.gsd
application
ONLINE
ONLINE
vx0306
ora.vx0306.ons
application
ONLINE
ONLINE
vx0306
ora.vx0306.vip
application
ONLINE
ONLINE
vx0306
ora.vx0313.gsd
application
ONLINE
ONLINE
vx0313
ora.vx0313.ons
application
ONLINE
ONLINE
vx0313
ora.vx0313.vip
application
ONLINE
ONLINE
vx0313
bash-3.00$
After checking the resources, perform these two steps:
1. on the Execute Configuration scripts screen, click OK to reach the end of the Oracle Clusterware
installation and
2. on the End of Installation screen, click Exit

2. Adding objects to a new node


You now need to add the new node ONS or Oracle Notification Server configuration
information to the shared ONS configuration information stored in OCR.

From the first node, and looking at the ons.config file located in the <Oracle
Clusterware home>/opmn/conf directory, you can determine the ONS remote port to
be used such as 6251. You need to use this port in the racgons add_config command
to make sure that the ONS on the first node can communicate with the ONS on the new
node.

Code
bash-3.00$/u01/crs11g/bin/racgons add_config vx0313:6251
bash-3.00$
You then add ASM Home to the new node. However, this step is needed only if you use a
specific home directory to host ASM. If you run ASM and your RAC database out of the
same Oracle Home, you can skip this step.

Graphic
The Welcome screen of the Oracle Universal Installer is displayed.
From the first node, you need to execute the addNode.sh script from the ASM home
directory.

Graphic
The code to execute the addNode.sh script from the first node is the following:
bash-3.00$ ./addNode.sh

Code
bash-3.00$ export
ORACLE_HOME=/u01/app/oracle/product/11.1.0/asm_1
bash-3.00$ cd $ORACLE_HOME/oui/bin
bash-3.00$ ./addNode.sh
Starting Oracle Universal Installer...
The installation scenario is identical to the one shown for the Oracle Clusterware
installation. However, in the case of an Oracle Home, you just need to select the name of
the node you want to add on the Specify Cluster Nodes to Add to Installation screen.
Run the root.sh script from the new node after OUI has copied the database software.

Graphic

The root.sh script is listed in the Scripts to be executed section on the Execute
Configuration scripts dialog box.
To add RAC Home to the new node, from the first node, execute the addNode.sh script
from the RAC home directory. The installation scenario is identical to the one shown for
the ASM home installation.

Graphic
The code you use to execute the addNode.sh script from the first node is the
following:
bash-3.00$ ./addNode.sh

Code
bash-3.00$ export
ORACLE_HOME=/u01/app/oracle/product/11.1.0/asm_1
bash-3.00$ cd $ORACLE_HOME/oui/bin
bash-3.00$ ./addNode.sh
Starting Oracle Universal Installer...
From the new node, you need to add a listener. In this example, you are adding a listener
from the ASM Home. You need to use NETCA or NETwork Configuration Assistant for
that.

Graphic
The line of code that uses the NETCA to add a listener from the ASM home is the
following:
bash-3.00$ netca

Code
bash-3.00$ . ./setasm2.sh
bash-3.00$ echo $ORACLE_HOME
/u01/app/oracle/product/11.1.0/asm_1
bash-3.00$ netca
To add a listener on the new node with the name LISTENER_<New node name>,
perform these steps:

on the Real Application Clusters, Configuration screen, select Cluster configuration and click
Next
This screen also contains the Single node configuration radio button.

on the Real Application Clusters, Active Nodes screen, select the name of the new node and click
Next
The Real Application Clusters, Active Nodes dialog box contains two nodes vx0306 and vx0313.
The node vx0313 is selected.

on the Welcome screen, select Listener configuration and click Next


This screen also includes the Naming Methods configuration, Local Net Service Name
configuration, and Directory Usage Configuration radio buttons.

on the Listener Configuration, Listener screen, select Add and click Next
This screen also contains the option buttons Reconfigure, Delete, and Rename. These three
option buttons are currently disabled.

on the Listener Configuration, Listener Name screen, enter LISTENER in the Listener name field,
and

on the Listener Configuration, Select Protocols screen, select TCP and click Next
The TCP option is available in the Selected Protocols list box. This screen also has the Available
Protocols list box that has the options TCPS and IPC.
On the Listener Configuration, TCP/IP Protocol screen, select Use the standard port
number of 1521, and click Next. Continue to click Next until you exit from NETCA.

Graphic
The Listener Configuration, TCP/IP Protocol screen also contains the Use another
port number radio button with a text box containing a default port number.
Before you can add your database instance to the new node, you need to add an ASM
instance to the new node. To do so, use DBCA from your ASM home.

Graphic
The code that enables you to add an ASM instance using the DBCA is the
following:
bash-3.00$ dbca

Code

bash-3.00$ . ./setasm1.sh
bash-3.00$ echo $ORACLE_HOME
/u01/app/oracle/product/11.1.0/asm_1
bash-3.00$ dbca
Use these steps to complete the task:

on the Welcome screen, click Next


This screen contains two radio buttons for types of databases. The first radio button, Oracle Real
Application Clusters database is selected by default. The other radio button is Oracle single
instance database.

on the Operations screen, select Configure Automatic Storage Management and click Next
This screen also contains five other options: Create a Database, Configure Database Options,
Delete a Database, Manage Templates, and Instance Management.

on the Node Selection screen, select the node you want to add and click Next, and
This screen provides a list box that contains two nodes: vx0306 and vx0313. Both are selected.
This screen also has two buttons named Select All and Deselect All.

after a while, DBCA prompts you for ASM instance addition on your second node, click Yes
The DBCA prompts the user to extend the ASM after informing that ASM is present on the cluster
but needs to be extended for the node vx0313.
DBCA then prompts you to enter the password for the ASM administrator. Enter your
password and click OK.
On the ASM Disk Groups screen, ensure all your disk groups mounted on all nodes of
your cluster are listed. Click Finish.

Note
By default, if your ASM instance is running out of the same oracle home, then
DBCA automatically extends ASM to the new node when you use DBCA to extend
your database instance to the new node.
You now need to add a database instance to your RAC database. You can do so by using
the DBCA from the first node.

Graphic
The code to add a database instance to the RAC database using the DBCA from
the first node is the following:
bash-3.00$ dbca

Code
bash-3.00$ . ./setrdb1.sh
bash-3.00$ echo $ORACLE_HOME
/u01/app/oracle/product/11.1.0/db_1
bash-3.00$ dbca
These are some of the steps:

on the Welcome screen, select Oracle Real Application Clusters database and click Next
This screen also provides the radio button Oracle single instance database.

on the Operations screen, select Instance Management and click Next, and
This screen also includes five other radio buttons: Create a Database, Configure Database
Options, Delete a Database, Manage Templates, and Configure Automatic Storage Management.

on the Instance Management screen, select Add an instance and click Next
This screen also includes the radio button Delete an instance.
These are the remaining steps to add a database instance to your RAC database:

on the List of cluster databases screen, select your RAC database, enter SYS credentials, and
then click Next
on the List of cluster database instances screen, click Next

on the Instance naming and node selection screen, select the node name on which you want to
add the instance, specify the name of that instance, and click Next

on the Instance Storage screen, click Finish, and


This screen shows a tree listing directory structure that has Storage as the parent directory. This
directory has the sub-directories Tablespaces, Datafiles, and Redo Log Groups. The Storage
directory is selected. This page includes information on Database Storage page and the
parameters you can specify in the page. The page also contains information on how to create and
delete objects.

on the Summary screen, check the various parameters and click OK


The Summary screen specifies that the STCDB2 instance will be added on the stc-raclin06 node.
The screen also has two tables. The first table is named Initialization Parameters and has three
columns named Instance, Name, and Value. The second table is named Tablespaces and has
three columns named Name, Type, and Extent Management. This screen also contains a button
named Save as an HTML file button.
At this point, if you are using ASM for your database storage and there is currently no
running ASM instance on your new node, DBCA detects the need for an ASM instance

creation on the new node. This must be done before the DBCA can create the database
instance on that node. Click Yes.
The assistant is now adding your instance to your RAC database on the new node. It will
also start that instance at the end of the operation.
You can also add a new instance to your RAC database by using the Add Instance
wizard.
Here are the initial two steps:

click the Server tab from the Cluster Database: RDBB page and

click Add Instance in the Change Database section of the Server tabbed page
Next in the Add Instance: Cluster Credentials page, you specify the cluster and ASM
credentials. The wizard automatically adds the ASM instance before adding the database
instance if it is not already created. When done, click Next.

Graphic
This page has two sections: Cluster Credentials and ASM Credentials. The
Cluster Credentials section contains the Username and Password fields where
you can enter the relevant credentials for the install owner of the Oracle Home
from which the cluster database instances are running. These fields are
mandatory. The ASM Credentials section contains the Username and Password
fields where you can enter the relevant credentials of the SYSDBA user running
the asm instance. This section also contains the label the ASM Instance
+ASM1_vx0306.us.oracle.com.
Next in the Add Instance: Host page, you specify on which host you want to add the
database instance. These steps complete the remainder of the Add Instance wizard:

select the node in question and click Next

click Submit Job to start the job's execution on the Add Instance: Review page

click View Job to see the job's log on the Confirmation page, and

view the succeeded status after some refreshes of that page


The succeeded status in the Summary page is specified as Running. The other summary
information are provided using the labels: Scheduled, Started, Ended, Elapsed Time, Notification,
Type, Owner, Description, and Oracle Home. The page also includes the Targets field, the Status
drop-down list, and the Go button. The page also contains a table with six columns named Name,
Targets, Status, Started, Ended, and Elapsed Time (seconds).

Summary
You can add and delete nodes and instances using silent cloning procedures, Enterprise
Manager Grid Control, and interactive or silent procedures. The preferred method to add
multiple nodes and instances to RAC databases is to use cloning procedures. However,
you can directly use OUI and DBCA to add and delete single nodes.
Once Oracle Clusterware software is installed, you need to add the new node ONS
configuration information to the shared ONS configuration information stored in OCR. You
execute a series of scripts to perform actions such as adding RAC home, and a listener
service to the new node.
You can add an instance to your new node using the DBCA or using the Add Instance
wizard of the Enterprise Manager.

Delete a Node from a RAC Cluster


Learning Objective

After completing this topic, you should be able to

recognize how to delete nodes and instances in a RAC database

1. Deleting a node from a RAC cluster


These are the main steps you need to follow to delete a node from a RAC Cluster:
1. delete the instance on the node to be deleted
2. clean up the ASM instance
3. remove the listener from the node to be deleted
4. remove the node from the database
5. remove the node from ASM (when using separate ASM directory)
6. remove ONS configuration from the node to be deleted, and
7. remove the node from the clusterware
For all of the add node and delete node procedures for UNIX-based systems, temporary
directories such as /tmp, $TEMP, or $TMP, should not be shared directories. If your
temporary directories are shared, then set your temporary environment variable, such as

$TEMP, to a nonshared location on a local node. In addition, use a directory that exists on
all the nodes.
The first step is to remove the database instance from the node that you want to delete.
For that, you use the DBCA from the node you want to delete. On the Welcome screen,
select Oracle Real Application Clusters database and click Next.

Graphic
The other radio button available for selecting the type of database is Oracle single
instance database.
On the Operations screen, select Instance Management and click Next.

Graphic
This screen also contains other radio buttons like Create a Database, Configure
Database Options, Delete a Database, Manage Templates, and Configure
Automatic Storage Management.
On the Instance Management screen, select Delete an instance and click Next.

Graphic
This screen also contains the radio button Add an instance.
On the List of cluster databases screen, select the RDBB database from which you want
to delete an instance, enter sys and its password, and click Next.

Graphic
You enter sys in the Username field and the password in the Password field.

Question
When removing a node from a RAC cluster, which is the first step that should be
performed?
Options:
1.

Clean up the ASM instance

2.

Remove the listener from the node

3.

Delete the database instance from the node

4.

Remove the node from the clusterware

Answer
Option 1: Incorrect. After your database instance is removed from the node you
want to delete, you can clean up the corresponding ASM instance. To do this, you
need to use SRVCTL to first stop the ASM instance currently running on the node
that you want to remove, and then remove that ASM instance from the same
node.
Option 2: Incorrect. After the corresponding ASM instance has been cleaned up,
you can remove the listener from the node that you want to delete. This listener
can be from either the ASM home or the database home depending on when it
was created. To remove the listener, you can use NETCA.
Option 3: Correct. When removing a node from a RAC cluster, the first step is to
remove the database instance from the node that you want to delete. For that, you
use the DBCA from the node you want to delete.
Option 4: Incorrect. When removing a node from a RAC cluster, the final step is to
remove the node from the Oracle Clusterware. This is done by updating the
inventory on the node that is to be removed, removing the Oracle Clusterware
from that node, and then updating the inventory from the first node in the RAC
cluster.
Correct answer(s):
3. Delete the database instance from the node
On the List of cluster database instances screen, select the instance that you want to
delete and click Finish.

Graphic
The instance vx0313:RDBB2, which is active, is selected in this example.
In the Database Configuration Assistant dialog box, click OK to validate your choice.
This triggers the remove instance process. When completed, your instance is removed
from your cluster database.
After your database instance is removed from the node, you have to complete two tasks:

Code

[oracle@vx0306 ~]$ srvctl stop asm -n vx0313


[oracle@vx0306 ~]$ srvctl remove asm -n vx0313
[oracle@vx0306 ~]$
[oracle@vx0313 ~]$ rm -f
/u01/app/oracle/product/11.1.0/asm_1/dbs/*ASM*
[oracle@vx0313 ~]$ rm -rf /u01/app/oracle/admin/+ASM
[oracle@vx0313 ~]$
clean up the corresponding ASM instance and
To clean up the corresponding ASM instance, you need to use SRVCTL to first stop the
ASM instance currently running on the node that you want to remove, and then remove
that ASM instance from the same node.
remove files of that ASM instance
You need to manually remove the initialization parameter file of that ASM instance. You
can remove files containing the ASM string from the <ASM home>/dbs directory as shown.
After this is done, you can also remove all the log files of that ASM instance. These files
are generally located in the $ORACLE_BASE/admin directory.
The last thing you can do is to remove the associated ASM entry from the /etc/oratab
file.

Code
#

# This file is used by ORACLE utilities. It is created by


root.sh
# and updated by the Database Configuration Assistant when
creating
# a database.
# A colon, ':', is used as the field terminator. A new line
terminates
# the entry. Lines beginning with a pound sign, '#', are
comments.
#
# Entries of the form:
#
$ORACLE_SID:$ORACLE_HOME:<N|Y>:
#
# The first and second fields are the system identifier and
home
# directory of the database respectively. The third field
indicates

# to the dbstart utility that the database should , "Y", or


should not,
# "N", be brought up at the system boot time.
#
# Multiple entries with the same $ORACLE_SID are not
allowed.
#
#
~
"/etc/oratab" 23L, 723C
22,1
All

Question
After the database instance has been deleted from the node you want to remove
from a RAC cluster, which is the next step that should be performed?
Options:
1.

Clean up the ASM instance

2.

Remove the node from ASM

3.

Remove the listener from the node

4.

Remove the node from the database

Answer
Option 1: Correct. After your database instance is removed from the node you
want to delete, you can clean up the corresponding ASM instance. To do this, you
need to use SRVCTL to first stop the ASM instance currently running on the node
that you want to remove, and then remove that ASM instance from the same
node.
Option 2: Incorrect. Removing the node from the ASM occurs after you clean up
the ASM instance, remove the listener from the node to be deleted, and remove
the node from the database. And before you can use the Oracle Universal Installer
to remove the ASM software installation, you need to update the inventory on the
node to be deleted.
Option 3: Incorrect. After the corresponding ASM instance has been cleaned up,
the next step is to remove the listener from the node that you want to delete. This
listener can be from either the ASM home or the database home depending on
when it was created. To remove the listener, you can use NETCA.
Option 4: Incorrect. Removing the node from the database should be done after
you remove the listener from the node to be deleted. Before you can use the

Oracle Universal Installer to remove the database software installation, you need
to update the inventory on the node to be deleted.
Correct answer(s):
1. Clean up the ASM instance

2. Removing objects and installations


You can now remove the listener from the node that you want to delete. This listener can
be from either the ASM home or the database home depending on when it was created.
To remove the listener, you can use NETCA. On the Configuration screen, select Cluster
configuration and click Next.

Graphic
This screen also contains the radio button Single node configuration.
These are the remaining steps in removing the listener from the node that you want to
delete:

on the Active Nodes screen, select the node from which you want to remove the listener and click
Next
The node vx0313 is selected in this example.

on the Welcome screen, select Listener configuration and click Next


The other three radio buttons available on this screen are Naming Methods configuration, Local
Net Service Name configuration, and Directory Usage Configuration.

on the Listener screen, select Delete and click Next


The other three radio buttons available on this screen are Add, Reconfigure, and Rename.

on the Select Listener screen, select the corresponding listener, normally called LISTENER, and
click Next, and

follow the rest of the screens until the listener is removed from the node
Before you can use the Oracle Universal Installer to remove the database software
installation, you need to update the inventory on the node to be deleted by executing this
command.
You need to execute this command from the oui/bin subdirectory in the database
home.

Graphic

The example of the command provided is the following:


bash-3.00$ $ORACLE_HOME/oui/bin/runInstaller updateNodeList
ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=vx0313" -local

Code
bash-3.00$ . . /setrdb2.sh
bash-3.00$ echo $ORACLE_HOME
/u01/app/oracle/product/11.1.0/db_1
bash-3.00$ echo $PATH
/bin:/opmn/bin:/Apache/Apache/bin:/dcm/bin:/bin:/usr/local/b
in:/usr/bin
/X11:/usr/X11R6/bin:/usr/sbin:/usr/kerberos/bin:/usr/local/b
in:/bin:/usr/bin:/us
r/X11R6/bin:/usr/NX/bin:/u01/app/oracle/product/11.1.0/db_1/
bin
bash-3.00$
bash-3.00$ $ORACLE_HOME/oui/bin/runInstaller updateNodeList
ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=vx0313" -local
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.
Actual
4095 MB
Passed
Checking monitor: must be configured to display at least 256
colors.
Actual 65536
Passed
The inventory pointer is located at etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
'UpdateNodeList' was successful.
bash-3.00$

Syntax
./runInstaller updateNodeList ORACLE_HOME=<Database home>
"CLUSTER_NODES=<node to be
removed>" local
After this command is executed, you can start OUI from the same directory and click
Deinstall products on the Welcome screen. Then select the database home and click
Remove. This will remove the database home from the node to be deleted.
You now need to update the corresponding inventory on the remaining nodes. You can
use this command from the first node. This command needs to be executed from the
oui/bin subdirectory of the database home.

Graphic
The example of the command provided is the following:
$ORACLE_HOME/oui/bin/runInstaller updateNodeList
ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=vx0306"

Code
bash-3.00$ . . /setrdb1.sh
bash-3.00$ echo $ORACLE_HOME
/u01/app/oracle/product/11.1.0/db_1
bash-3.00$ $ORACLE_HOME/oui/bin/runInstaller updateNodeList
ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=vx0306"
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.
Actual
3564 MB
Passed
Checking monitor: must be configured to display at least 256
colors.
Actual 65536
Passed
The inventory pointer is located at etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
'UpdateNodeList' was successful.
bash-3.00$

Syntax
./runInstaller updateNodeList ORACLE_HOME=<Database home>
"CLUSTER_NODES=<remaining nodes>"
Before you can use the Oracle Universal Installer to remove the ASM software
installation, you need to update the inventory on the node to be deleted by executing this
command.
You need to execute this command from the oui/bin subdirectory in the ASM home.

Graphic
The example of the command provided is the following:
$ORACLE_HOME/oui/bin/runInstaller updateNodeList
ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=vx0313" -local

Code

bash-3.00$ . . /setasm2.sh
bash-3.00$ echo $ORACLE_HOME
/u01/app/oracle/product/11.1.0/asm_1
bash-3.00$ echo $PATH
/bin:/opmn/bin:/Apache/Apache/bin:/dcm/bin:/bin:/usr/bin:/us
r/local/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/sbin:
/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bi
n:/usr/NX/bin:/u01/app/oracle/product/11.1.0/asm_1/
bin
bash-3.00$ $ORACLE_HOME/oui/bin/runInstaller updateNodeList
ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=vx0313" -local
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.
Actual
4095 MB
Passed
Checking monitor: must be configured to display at least 256
colors.
Actual 65536
Passed
The inventory pointer is located at etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
'UpdateNodeList' was successful.
bash-3.00$

Syntax
./runInstaller updateNodeList ORACLE_HOME=<ASM home>
"CLUSTER_NODES=<node to be removed>" local
After this command is executed, you can start OUI from the same directory and click
Deinstall products on the Welcome screen. Then select the ASM home and click
Remove. This will remove the ASM home from the node to be deleted.
You now need to update the corresponding inventory on the remaining nodes. You can
use the highlighted command from the first node.
This command needs to be executed from the oui/bin subdirectory of the ASM home.

Graphic
The example of the command provided is the following:
$ORACLE_HOME/oui/bin/runInstaller updateNodeList
ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=vx0306"

Code

bash-3.00$ . . /setasm1.sh
bash-3.00$ echo $ORACLE_HOME
/u01/app/oracle/product/11.1.0/asm_1
bash-3.00$ $ORACLE_HOME/oui/bin/runInstaller updateNodeList
ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=vx0306"
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.
Actual
3564 MB
Passed
Checking monitor: must be configured to display at least 256
colors.
Actual 65536
Passed
The inventory pointer is located at etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
'UpdateNodeList' was successful.
bash-3.00$

Syntax
./runInstaller updateNodeList ORACLE_HOME=<ASM home>
"CLUSTER_NODES=<remaining nodes>"

Note
This step is not needed if you are not using a separate home directory for ASM.
Before you can use OUI to remove the Oracle Clusterware software installation from the
node to be deleted, you need to perform these steps:

Code
bash-3.00$ /u01/crs11g/bin/racgons remove_config vx0313:6200
racgons: Existing key value on vx0313 = 6251.
WARNING: vx0313:6200 does not exist.
bash-3.00$ /u01/crs11g/bin/racgons remove_config vx0313:6251
racgons: Existing key value on vx0313 = 6251.
racgons: vx0313:6251 removed from OCR.
bash-3.00$
bash-3.00$ sudo /u01/crs11g/install/rootdelete.sh
Getting local node name
NODE = vx0313
Getting local node name
NODE = vx0313
Stopping resources.
This could take several minutes.

' ...
bash-3.00$ /u01/crs11g/bin/olsnodes -n
vx0306 1
vx0313 2
bash-3.00$ sudo /u01/crs11g/install/rootdeletenode.sh
vx0313,2
CRS-0210: Could not find resource 'ora.vx0313.ons'.
CRS-0210: Could not find resource 'ora.vx0313.vip'.
CRS-0210: Could not find resource 'ora.vx0313.gsd'
' ...
Step 1
On the first node, run the <Oracle Clusterware home>/bin/racgons remove_config
<Node to be removed>:6251 command. However replace port 6251 with one you get in
the remoteport section of the ons.config file found in the <Oracle Clusterware
home>/opmn/conf directory.
Step 2, and
Log in as root user on the node to be removed, and run the <Oracle Clusterware
home>/install/rootdelete.sh command.
Step 3
Logged in as root user on the first node, determine the node number to be deleted using
the <Oracle Clusterware home>/bin/olsnodes n command. Then execute the
<Oracle Clusterware home>/install/rootdeletenode.sh <node name to be
deleted>,<node number to be deleted> command.

Supplement
Selecting the link title opens the resource in a new browser window.

Full commands and output


View all the code associated with these steps for this example.
Launch window
You now need to update the inventory from the node to be deleted by executing this
command.

Graphic
The example of the command provided is the following:

$ORACLE_HOME/oui/bin/runInstaller updateNodeList
ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=vx0313" CRS=TRUE
-local

Code
bash-3.00$ export ORACLE_HOME=/u01/crs11g
bash-3.00$ $ORACLE_HOME/oui/bin/runInstaller updateNodeList
ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=vx0313" CRS=TRUE
-local
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.
Actual
4095 MB
Passed
Checking monitor: must be configured to display at least 256
colors.
Actual 65536
Passed
The inventory pointer is located at etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
'UpdateNodeList' was successful.
bash-3.00$

Syntax
<Oracle Clusterware home>/oui/bin/runInstaller -updateNodeList
ORACLE_HOME=<Oracle Clusterware home>"CLUSTER_NODES=<Node to be
deleted>" CRS=TRUE local
When done, run OUI from the same directory and choose Deinstall products and
remove the Oracle Clusterware installation on the node to be deleted.
You can now update the inventory from the first node by executing the command <Oracle
Clusterware home>/oui/bin/runInstaller -updateNodeList
ORACLE_HOME=<Oracle Clusterware home> "CLUSTER_NODES=<Remaining nodes>"
CRS=TRUE.

Graphic
The example of the command provided is the following:
$ORACLE_HOME/oui/bin/runInstaller updateNodeList
ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=vx0306" CRS=TRUE

Code

bash-3.00$ export ORACLE_HOME=/u01/crs11g


bash-3.00$ $ORACLE_HOME/oui/bin/runInstaller updateNodeList
ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=vx0306" CRS=TRUE
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.
Actual
3492 MB
Passed
Checking monitor: must be configured to display at least 256
colors.
Actual 65536
Passed
The inventory pointer is located at etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
'UpdateNodeList' was successful.
bash-3.00$
To verify the removal of the node from the cluster, you need to run these three commands
from the first node:

Syntax
srvctl status nodeapps -n <Deleted node>
crs_stat | grep -i <Deleted node>
olsnodes n

use this command to get a message saying Invalid node


srvctl status nodeapps -n <Deleted node>
Syntax
srvctl status nodeapps -n <Deleted node>
crs_stat | grep -i <Deleted node>
olsnodes n

use this command which should not get any output, and
crs_stat | grep -i <Deleted node>
Syntax
srvctl status nodeapps -n <Deleted node>
crs_stat | grep -i <Deleted node>
olsnodes n

use this command to get a list of all the present nodes without the deleted node
olsnodes n
Syntax

srvctl status nodeapps -n <Deleted node>


crs_stat | grep -i <Deleted node>
olsnodes n

Note
You should also remove all corresponding oracle homes after this step.
A new auxiliary, system-managed tablespace called SYSAUX contains performance data
and combines content that was stored in different tablespaces (some of which are no
longer required) in earlier releases of the Oracle database.
This is a required tablespace for which you must plan disk space. The SYSAUX system
tablespace now contains the DRSYS (contains data for OracleText), CWMLITE (contains
the OLAP schemas), XDB (for XML features), ODM (for Oracle Data Mining), and OEMREPO tablespaces.
If you add nodes to your RAC database environment, then you may need to increase the
size of the SYSAUX tablespace. Conversely, if you remove nodes from your cluster
database, then you may be able to reduce the size of your SYSAUX tablespace and thus
save valuable disk space. Use this formula, to properly size the SYSAUX tablespace.
If you apply this formula to a four-node cluster, then you find that the SYSAUX tablespace
is sized around 1,300 megabytes.

Summary
You use the DBCA from the node you want to delete. To clean up the ASM Instance use
SRVCTL to first stop and remove the ASM instance from the node. Then remove the
initialization parameter file of that ASM instance manually. Remove files containing the
ASM string and log files of that ASM instance. Finally, remove the associated ASM entry
from the /etc/oratab file.
To remove the Listener from the node to be deleted you can use the NETwork
Configuration Assistant or NETCA. To remove the node from the database, ASM, and
Oracle Clusterware, you use the Oracle Universal Installer. The SYSAUX tablespace
contains performance data and combines content. You can manage its size depending on
the number of nodes in your RAC database environment.

You might also like