You are on page 1of 55

<Insert Picture Here>

Oracle Maximum Availability Architecture & Best Practices Technical Overview


Lawrence To, High Availability and MAA Database Team Joe Meeks, HA/MAA Product Manager

Agenda
Oracle Maximum Availability Architecture MAA Best Practices Oracle Database
Minimizing Unplanned Outages Minimizing Planned Outages
<Insert Picture Here>

Resources

Oracle Maximum Availability Architecture


Integrated suite of best-of-breed HA technologies - Scaleable, active-active, data centric
Online Upgrade Real Application Clusters & Clusterware
Fault Tolerant Server Scale-Out Upgrade Hardware and Software Online

Best Availability AT Lowest Cost Data Guard

Fully Active Failover Replica Rolling Database Upgrades

Database Automatic Storage Management


Fault Tolerant Storage Scale-Out

Database

Storage Storage Streams


Multi-master Replication Hub & Spoke Replication

Online Redefinition
Redefine Tables Online

Flashback
Correct Errors by Moving Back in Time

Recovery Manager & Oracle Secure Backup


Data Protection & Archival 3

Oracles Integrated HA Solution Set


System Failures
Real Application Clusters ASM Flashback RMAN & Oracle Secure Backup Data Guard Streams Online Reconfiguration Online Patching Rolling Upgrades Online Redefinition

Oracle MAA Best Practices

Unplanned Downtime

Data Failures

Planned Downtime

System Changes Data Changes

Integrated Management

Enterprise Manager 11g High Availability Console

MAA Integrated HA Best Practices


MAA is a blueprint for achieving HA
Correlates HA capabilities to customer requirements Operational best practices Prevent, tolerate, and recover

en nd v e Pr ate, a ler er s o T cov ge Re Outa m Fro

A A M t,

Tested, validated, and documented


Database, Storage, Cluster, Network Oracle Enterprise Manager Oracle Application Server Oracle Applications

otn.oracle.com/deploy/availability
6

MAA OTN

www.oracle.com/technology/deploy/availability/htdocs/maa.htm www.oracle.com/technology/deploy/availability/demonstrations.html

Agenda
Oracle Maximum Availability Architecture MAA Best Practices Oracle Database
Minimizing Unplanned Outages Minimizing Planned Outages

Resources

Server Scalability and High Availability


Oracle RAC
RAC pools standard low cost servers Active Active configuration Great scalability - no idle resources Service Management Framework
Easily manage resources across a cluster

High Availability
Automatic failover and load balance

Runs commercial applications


Oracle Applications, SAP, etc.
Database

Thousands of production customers

http://www.oracle.com/technology/products/database/clustering/index.html
9

Results Chart by Failure

10

MAA Best Practices Oracle RAC


Client

FCF, FAN ONS, FAN OCI, JDBC connection pooling Fast_start_mttr_target (11g) _fast_start_instance_recovery_target (10.2) CSS in real time VIP check interval Async io Listener Throttling can help in some cases
http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/configbp.htm#i101358 3 http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_FastRecover yOracleClusterwareandRAC.pdf

Server

MAA Best Practices

11

Storage Protection & Performance


Automatic Storage Management ASM
ASM mirrors data across low cost modular storage arrays
Automatically remirror when disk fails

Simplifies administration
Database

Add/subtract disks online Automatically rebalance I/O

ASM Oracle 11g enhancements


Storage

Use mirror to automatically re-read and repair when encountering IO problems Fast resync of mirror copy upon recovery from transient disk failures uses only changed blocks Rolling Upgrade for ASM instances

http://www.oracle.com/technology/products/database/asm/index.html
12

MAA Best Practices - ASM


Use clustered ASM to enable the storage GRID Use vendor RAID with legacy storage arrays; use ASM redundancy with medium/low cost storage arrays ASM ORACLE_HOME should be different from and RDBMS ORACLE_HOME to ease planned maintenance For maximum protection, use at least 3 Failure Groups for normal redundancy and 4 Failure Groups for external redundancy Ensure paths to storage feature both multipathing and fault tolerance Two diskgroups to ease manageability (DATA and Flash Recovery Area) Additional MAA Best Practices for ASM
http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/configbp.ht m#CHDIBCCC

13

Protection from Human Error


Oracle Flashback Technologies
Flashback Revolutionizes Error Recovery
Operates on just changed data Time to correct error equals time to make error Minutes instead of hours
80 70 60 50 40 30 20 10 0

Traditional Recovery

Flashback
Time To Recover (minutes)

Correction Time = Error Time + f(DB_SIZE)


Flashback is Easy
Single command instead of complex procedure

Less performance overhead for OLTP and batch Great for testing especially when used with restore points
http://www.oracle.com/technology/deploy/availability/htdocs/Flashback_Overview.htm
14

Error Investigation with Flashback


Flashback Query
Query all data at point in time select * from Salary AS OF 12:00 P.M. where

Flashback Version Query


See all versions of a row between times See transactions that changed the row

Tx 3

select * from Salary VERSIONS BETWEEN 12:00 PM and 2:00 PM where

Tx 2

Flashback Transaction Query

See all changes made by a transaction

Tx 1

select * from FLASHBACK_TRANSACTION_QUERY where xid = 000200030000002D;

15

Error Correction with Flashback


Database
Customer

Correct errors at any level


Flashback Database restore database to time Flashback Table restore contents of tables to time

Order

Flashback Transaction back out transaction and all subsequent conflicting transactions

16

Flashback Database Use Cases


Data Guard Integration
Fast-start Failover Reinstate Snapshot Standby

Upgrade Fallback Logical Failures & Reinstate Fast Restore for Testing or Planned Changes

17

MAA Best Practices - Flashback


Enabling Flashback Database
http://download.oracle.com/docs/cd/B19306_01/server.102/b251 59/configbp.htm#i1014673

Recovery from human error


http://download.oracle.com/docs/cd/B19306_01/server.102/b251 59/outage.htm#i1010215

18

Data Corruption Protection


Oracle-aware Validation, Backup, and Repair
DB_ULTRA_SAFE parameter (11g)
Most comprehensive data corruption detection and prevention DB_BLOCK_CHECKING detects and prevents data block corruptions DB_BLOCK_CHECKSUM detects and prevents (standby only) redo and data block corruptions DB_LOST_WRITE_PROTECT - Detect writes lost by the I/O subsystem Best protection when used together with Data Guard standby database

Data Recovery Advisor (11g)


Quickly diagnose and repair data failures

Oracle Recovery Manager RMAN


Automate backups and management of recovery related files

Oracle Secure Backup


Integrated tape backup and management

19

MAA Best Practices Data Protection


Set DB_ULTRA_SAFE on primary and standby (11g)
Block checking prevents memory and data corruptions. Overhead on every block change. Redo and data block checksum detect corruptions on the primary and protect the standby. Minimal CPU resource required. Lost write protection detects lost writes on the primary and protects physical standby databases from these corruptions. Minimal redo increase.

Use Data Recovery Advisor for non-RAC primary databases Use RMAN to detect physical and logical corruptions Use Data Guard for best comprehensive corruption protection

20

Data Recovery Advisor - DRA


An Oracle tool that automatically diagnoses data failures, presents repair options, and executes repairs at the user's request Determines failures based on symptoms
E.g. an open failed because datafiles f045.dbf and f003.dbf are missing Failure Information recorded in diagnostic repository (ADR) Flags problems before user discovers them, via automated health monitoring

Intelligently determines recovery strategies


Aggregates failures for efficient recovery Presents only feasible recovery options Indicates any data loss for each option

Can automatically perform selected recovery steps First release only supports non-RAC primary databases Reduce downtime by eliminating confusion and automating detection and repair
21

Data Recovery Advisor Wizard

22

Data Recovery Advisor View Failures

23

Data Recovery Advisor Manual Repair

24

Data Recovery Advisor Recovery Advice

25

Data Recovery Advisor Summary

26

Automated Disk Backups


Oracle Recovery Manager
Fully automatic disk-based backup and recovery
Set and Forget

Nightly incremental backup rolls forward recovery area backup


Changed blocks are tracked in production DB or standby DB
Database Area Nightly Flash Recovery Apply Area Validated Incremental Weekly Archive To Tape

Full scan is never needed Dramatically faster (20x)

Low cost ATA disks can be used for recovery area

Blocks validated during entire backup and recovery process http://www.oracle.com/technology/deploy/availability/htdocs/rman_overview.htm


27

RMAN Enhancements
Better performance
Intra-file parallel backup and restore of single data files >= 1 GB Faster backup compression (ZLIB, ~40% faster)

Better security
Virtual Private Catalog - grant visibility of a subset of registered databases in the catalog to specific RMAN users

Lower space consumption and faster instantiation


Duplicate database or create standby database over the network, avoiding intermediate staging areas

Integration with Windows Volume Shadow Copy Services API


Allows database to participate in snapshots coordinated by VSScompliant backup management tools and storage products Database is automatically recovered upon snapshot restore via RMAN

28

MAA Best Practices - RMAN


Enable Archive Log Mode
http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/confi gbp.htm#i1006953

Use a Flash Recovery Area


http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/confi gbp.htm#i1014270

Configure backup and recovery


http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/confi gbp.htm#i1007374

Recovering from data corruption (data failures)


http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/outa ge.htm#i1006317

Use Oracle Secure Backups Tape Backup Management Solution


http://www.oracle.com/technology/products/secure-backup/index.html
29

Data Availability and Disaster Protection


Oracle Data Guard
High availability
Tolerate outages transparently Recover from outages quickly Address planned maintenance and unplanned events

Complete data protection


Standby data must be isolated from production faults No data should be lost

Full systems utilization


Standby resources should be utilized for productive use

Straightforward to manage
Integrated, reliable, high performance

30

Best Failure Protection at Lowest Cost

Automatic Failover Production Database

Synchronous Redo Shipping

Physical or Logical Standby DB

Data Guard
Synchronous or asynchronous redo shipping Corruptions dont propagate - Most comprehensive redo and block corruption and lost write detection and protection Deploy on low cost servers and storage, no special network components Thousands of production customers
31

Zero Data Loss over Longer Distances


Data Guard DR Sweet Spot
Far enough to avoid regional disaster Close enough for zero data loss

100 miles

200 miles

300+ miles

Data Guard: Synchronous Redo Shipping

Synchronous Disk Mirroring

Data Guard redo transport uses order of magnitude less network messaging than disk-based remote mirroring
Enables zero data loss at hundreds of miles

32

Enhanced Automatic Failover


Fast-Start Failover supports ASYNC configurations
Automatic failover to a standby located 1,000s of miles away Configurable maximum data loss

Immediate failover for user-configurable health conditions


ENABLE FAST_START FAILOVER [CONDITION <value>]; Examples: datafile offline, corrupted controlfile, any explicit ORAxyz error (e.g. ORA-1578) . . .

Apps can request fast-start failover


DBMS_DG.INITIATE_FS_FAILOVER

Integrated with Oracle Cold Cluster Failover

33

Active Data Guard 11g


Real-time Query
Real-time Query Queries

Real-time

Continuous Redo Shipment and Apply

Production Database

Physical Standby Database

Offload read-only queries to physical standby Read-only scalability with standby reader farm or RAC standby Offload fast incremental backups to physical standby
34

Snapshot Standby
Use Standby Database for Testing

Updates

Queries Updates

Primary Database

Physical Standby Standby Snapshot Database Database

Preserves zero data loss continuous redo transport while open read-write Similar to storage snapshots, but provides continuous DR using same storage Can also be done using Data Guard 10g Release 2 but more manual steps

35

MAA Best Practices Data Guard


Configure for Oracle Database 10g
http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/configbp.ht m#i1007026

Set DB_ULTRA_SAFE for data corruption protection (11g)


http://download.oracle.com/docs/cd/B28359_01/server.111/b28281/hafeatures. htm#sthref84

Optimize, transport, apply, and role transitions


http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm Data Guard Redo Transport & Network Configuration Data Guard Redo Apply & Media Recovery (physical standby) Data Guard SQL Apply (logical standby) Data Guard Switchover and Failover Data Guard Fast-Start Failover (automatic failover) Data Guard Client Failover for Highly Available Oracle Databases

36

Multi-Master Replication
Oracle Streams
Source Database Target Database

Propagate

Redo Logs

Capture

Apply1 Apply2
Transparent Gateway

All sites active and updatable Non-Oracle Flexible configurations n-way, hub & spoke, Database Database platform / release / schema structure can differ HA for custom apps where update conflicts can be avoided or managed
37

Agenda
Oracle Maximum Availability Architecture MAA Best Practices Oracle Database
Minimizing Unplanned Outages Minimizing Planned Outages
<Insert Picture Here>

Resources

38

MAA Planned Maintenance Solutions


Activity
Add and remove processors and nodes Grow and shrink memory Add and remove disks Migrate to new storage Rebalance IO Move data files Rolling upgrade Diagnostic Patches Some one-off patches

Oracle Solution Dynamic Resource Management Automatic Shared Memory Management

Downtime Zero Zero

Automatic Storage Management (ASM)

Zero

Online Patching

Zero

39

MAA Planned Maintenance Solutions


Activity
System and hardware upgrades Operating system upgrades Qualified one-off patches, CPUs CRS upgrades System, HW and cluster upgrades Migration to ASM, RAC Migration to different some platforms (Windows/Linux and some mixed DG support) Patchset or database upgrade Testing for development, Q/A, upgrades

Oracle Solution

Downtime

Real Application Clusters (RAC) Zero Oracle Clusterware (< 1 min)

Data Guard (physical, logical, snapshot standby databases)

< 1 min

40

MAA Planned Maintenance Solutions


Activity
Database upgrades Cross platform migration Cross characterset migrations Application upgrades

Oracle Solution

Downtime

Oracle Streams

< 1 minute

Database upgrades Same endianness and cross endianness platform migration

Transportable Technologies

dependent on data file conversion time

41

MAA Planned Maintenance Solutions

Activity
Reorganize and redefine tables and its attributes Add, delete or change column names, types and sizes Create, rebuild, coalesce, move and analyze indexes Convert LONG and LONG RAW columns to LOB Change table without recompilation Reorganize single partition, advanced queue and clustered tables, table containing ADT

Oracle Solution

Downtime

Online Redefinition

secs

42

Online Reconfiguration
Scaling on Demand
CPU
Add/remove CPUs on SMP online Add/remove RAC nodes online Add/remove instances Add/remove listeners Add/remove services No data movement needed Grow and shrink shared memory and buffer cache online Auto tuning of memory online Add/remove ASM disks online Automatically rebalance
43

Cluster Nodes

Database

Storage

Memory

Disk

Online Patching - One-off Patches


Ability to patch running Oracle executable
No downtime. Online patching is done on the instance level. No need to do rolling upgrades using RAC / Data Guard Many one-off patches can be patched online Great for diagnostic patches

Supports enabling, disabling, de-installing patches with no downtime Integrated with Opatch
E.g. determine if a patch can be applied online: opatch query -is_online

Initially available on Linux (32-bit) and Solaris (64-bit) Long term goal is online patching of Critical Patch Updates (CPUs) Refer to OpenWorld - MAA Best Practices for Online Patching
http://www.oracle.com/technology/deploy/availability/pdf/oracle-openworld2007/s291525_maa_plannedmaint.pdf

44

Rolling Patch Update using CRS and RAC


Clients
1

Clients
2

B B

Patch

Initial RAC Configuration Clients on A, Patch B

Oracle Patch Upgrades, including Critical Patch Updates (CPUs) Operating System Upgrades Hardware Upgrades
45

A
4

Patch A A
3

Upgrade Complete

Clients on B, Patch A

Service Level Impacts


CRS Rolling Patchset

SLO

46

SQL Apply Rolling Database Upgrades


Upgrade Redo
A B

Clients

Logs Queue

Patch Set Upgrades

Version X 1

Version X 2

X+1

Initial SQL Apply Config

Upgrade node B to X+1

Major Release Upgrades

Redo Upgrade
A B A

Redo
B

Cluster Software & Hardware Upgrades

X+1

X+1 3

X+1

4 Switchover to B, upgrade A

Run in mixed mode to test

47

Rolling Database Upgrades


Transient Logical Standby Physical Logical Upgrade Physical
Leverage existing physical standby databases
48

Start rolling database upgrades with physical standbys Temporarily convert physical standby to logical to perform the upgrade
Data type restrictions limited to shorter upgrade window

No need for separate logical standby Also possible in 10.2 (more manual steps)

Rolling Database Upgrades


Streams
Rolling upgrade with Streams if:
Heterogeneous platforms Different charactersets Database rolling upgrade when logical standby is not appropriate Application upgrades Use shadow tables and transformations to work around data type restrictions

49

Streams Rolling Upgrade


Extended Data Type Support
insert into EMP values ( 1001, Smith, Sales, 42, sysdate, 30000, 10, 19);

Source Database

Upgraded Target Physical Standby Database

EMP

EMP

Capture

Apply

Propagate Data Guard

CUST Trigger

CUST log table

DML Handler

CUST

insert into CUST values (123, Acme Corp, address_typ(123 Any St, New York, NY, 10001));
50

Online Redefinition
All indexing operations can be done online
Create new index, move index, defragment index

Tables can be Reorganized & Redefined online (DBMS_REDEFINITION)


Table contents are copied to a new table Defragments and allows changing location, table type, partitioning Contents can be transformed as they are copied Can change columns, types, sizes - specified using SQL Select

Updates and Queries can continue uninterrupted

Source Table

Copy Table

Transform Result Table


Store Updates

Continuous Queries & Updates

Update Tracking

Transform Updates

GUI interface to make it simple

51

Online Operations & Redefinition Improvements


Fast add column with default value Invisible indexes speed application migration and testing No recompilation of dependent objects when Online Redefinition does not logically affect objects Support Online Redefinition for tables with Materialized Views Enhanced Online DDL execution
DDL operations now wait if underlying resource is busy (configured through DDL_LOCK_TIMEOUT parameter) Some DDL operations (add/modify constraint, add column, Index create/rebuild) only required shared lock

52

Agenda
Oracle Maximum Availability Architecture MAA Best Practices Oracle Database
Minimizing Unplanned Outages Minimizing Planned Outages
<Insert Picture Here>

Resources

53

Resources
MAA Demonstrations
http://www.oracle.com/technology/deploy/availability/demonstrations.html

MAA Best Practices for High Availability 10gR2


http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/toc.htm

MAA Overview 11gR2 (detailed best practices to follow)


http://download.oracle.com/docs/cd/B28359_01/server.111/b28281.pdf

MAA Best Practice White Papers


www.oracle.com/technology/deploy/availability/htdocs/maa.htm

Oracle High Availability


www.oracle.com/ha

54

QUESTIONS ANSWERS

55

You might also like