You are on page 1of 50

www.brianhitchcock.

net

Session id: 31885


Dedicated to Pramitha Chowrira, the Goddess of the Rockies, Mike Waldron, the Student who became the Master, Sheryl Driscoll, who leads purely for the glory and Ann Bischoff, who took me in trade for a developer...

How STATSPACK Was Used to Solve Common Performance Issues


Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com
www.brianhitchcock.net

What STATSPACK Is
Set of SQL and PL/SQL Collects performance data from v$ tables Stores collected data in separate tables Each collection of data is a snapshot Reports deltas in data between snapshots Supports ad-hoc SQL queries of the snapshot data

www.brianhitchcock.net

STATSPACK Details
Works for 8.1.7 onwards Gathers data for a single instance Snapshot levels

Determine how much data is collected Defaults are fine

Snapshot interval of 15 minutes suggested Long report periods miss transient events

Reports over an instance restart are not valid

www.brianhitchcock.net

STATSPACK -- Good
Free (very cool!) Gathers a wide range of data

You dont know what youre looking for at first Root cause isnt usually obvious Gathers the same data on all instances Easy to share with vendors, support groups

Standard process to collect performance data


www.brianhitchcock.net

STATSPACK -- Not Perfect


Gathers a wide range of data

Ocean of data Any information? Easy to get lost Shows what is happening in the instance You need to figure out if this is a problem or not

Does not tell you what the problem is


Does not tell you the solution Does not tell you that you are done tuning...
www.brianhitchcock.net

How to Interpret Output?


Requires experience with your system

No single way to analyze output Must have history of your system Look for possible problem areas Trial and error to change problem behavior Only you can tell if you have a performance problem

www.brianhitchcock.net

STATSPACK Report Sections


Instance Summary, Efficiency Top 5 Wait Events, Wait Events SQL Ordered by Gets, Reads, Executions Instance Activity Stats Tablespace IO Stats Ordered by IOs, Tblspc-file Buffer Pool Statistics Rollback Segment Stats, Storage Latch Activity, Sleep, Miss Sources Dictionary Cache Stats Library Cache Activity SGA Memory Summary, Breakdown init.ora Parameters

www.brianhitchcock.net

Documentation of output
No comprehensive documentation

Oracle 8i Reference
Appendix A -- Wait Events defined Appendix B -- Enqueue Names defined Appendix C -- Statistics Descriptions

Database Performance Guide and Ref 9.0.1


Chapters 21-23, Supplied packages, how to use

$ORACLE_HOME/rdbms/admin/spdoc.txt ORACLE High-Performance Tuning with STATSPACK, Donald K. Burleson, Oracle Press ISBN 0-07-213378-3 What output means for your system

No explanation of

www.brianhitchcock.net

Configuration Used
Oracle 8.1.7.2 Snapshots every 15 minutes

snapshots taken continuously

Default STATSPACK snapshot level Application loads and analyzes web site click stream data

Lots of data More data all the time We dont know what vendor code looks like

www.brianhitchcock.net

Actual Use
4 Performance issues in 2002

Case 1) Reports Running Slow


STATSPACK output didnt show the problem

Case 2) Vendor Demo Slow


STATSPACK output didnt show the problem

Case 3) Data Load Slow


STATSPACK output led to 18x speedup (1800%)

Case 4) Data Load Time Varies


STATSPACK output led to the root cause

www.brianhitchcock.net

Case 1) Reports Running Slow


Vendor code allows users to setup reports

Vendor code generates SQL for report

Long run interferes with next days data load STATSPACK captures SQL

Generate explain plan(s)

Report SQL doesnt generate where clause properly to use partition pruning Vendor refuses to change their code We simply removed the reports

Performance issue resolved

www.brianhitchcock.net

Case 2) Vendor Demo Slow


Due to issues like Case 1)

New vendor sets up demo, data load slow Data load runs twice as fast at vendor Statspack output doesnt show anything obvious Compare configuration of vendor and our dbs Vendor has only one redo log file per group We had two redo log files per group We drop one file per group, performance issue resolved

www.brianhitchcock.net

Case 3) Data Load Slow


First time loading new type of web log data No baseline to compare with

Classical performance tuning doesnt always apply to the real world

Data load so slow no time for daily reporting Must run faster or the data wont be loaded We dont know if this load will run faster Do we have a performance issue?

Yes, data load must run faster to be useful No, perhaps this is as fast as it can be...

www.brianhitchcock.net

Case 3) Data Load Slow


SQL -- Highest Gets per Exec
SQL ordered by Gets for DB: BHDATA04 Instance: BHDATA04 Snaps: 4426 -4427 -> End Buffer Gets Threshold: 10000 -> Note that resources reported for PL/SQL includes the resources used by all SQL statements called within the PL/SQL code. As individual SQL statements are also reported, it is possible and valid for the summed total % to exceed 100 Buffer Gets Executions Gets per Exec % Total Hash Value --------------- ------------ -------------- ------- -----------634,304 68 9,328.0 97.5 3948948039 SELECT t526.keyvalueid FROM bh_lqueryvalue t526 WHERE t526.query infoid = :ph0 ORDER BY t526.keyvalueid ASC

www.brianhitchcock.net

Bad SQL?
SQL shouldnt cost much

Select looking for one row Table has two indexes Explain plan shows index full scan Should show index range scan Explain plan with hint to force one index Verify cost of each index Optimizer is choosing wrong index!
Indexes added by vendor to be safe...

Drop the costly index!

www.brianhitchcock.net

Explain Plan Force Index1


Cost 4
SQL> truncate table plan_table;
Table truncated. SQL> explain plan set Statement_Id = 'TEST' for SELECT /*+ INDEX(t526 X_LQRYVL_QUERYIDKYVL) */ t526.keyvalueid FROM bh_lqueryvalue t526 WHERE t526.queryinfoid = 100 ORDER BY t526.keyvalueid ASC; 2 3 4 5 Explained. SQL> @$ORACLE_HOME/rdbms/admin/utlxpls Plan Table -------------------------------------------------------------------------------| Operation | Name | Rows | Bytes| Cost | Pstart| Pstop | -------------------------------------------------------------------------------| SELECT STATEMENT | | 3 | 24 | 1 | | | | INDEX RANGE SCAN |X_LQRYVL_ | 3 | 24 | 3 | | | --------------------------------------------------------------------------------

www.brianhitchcock.net

Explain Plan Force Index2


Cost 11,667
SQL> explain plan set Statement_Id = 'TEST' for SELECT /*+ INDEX(t526 X_LQYVAL_KYVLQRYID) */ t526.keyvalueid FROM bh_lqueryvalue t526 WHERE t526.queryinfoid = 100 ORDER BY t526.keyvalueid ASC; 2 3 4 5 Explained. SQL> @$ORACLE_HOME/rdbms/admin/utlxpls Plan Table -------------------------------------------------------------------------------| Operation | Name | Rows | Bytes| Cost | Pstart| Pstop | -------------------------------------------------------------------------------| SELECT STATEMENT | | 3 | 24 | 2334 | | | | INDEX FULL SCAN |X_LQYVAL_ | 3 | 24 | 9333 | | | -------------------------------------------------------------------------------SQL>

www.brianhitchcock.net

Solution
After dropping costly index

data load time was 18 hours, became 1 hour 18:1 improvement (1800%) No idea, Oracle requested running 18 hour data load to gather instance data Business users said NO! Indexes created by vendor, no need for both indexes

Why did optimizer choose wrong index?

Know when to quit tuning!


www.brianhitchcock.net

What About Wait Events?


Popular DBAs

Wait Events are all that matters Return to Case 3) Examine Top 5 Wait Events section Try to understand what is causing the wait time

I want (desperately) to be popular too


www.brianhitchcock.net

Case 3) Wait Events


Top 5 Wait Events
Top 5 Wait Events ~~~~~~~~~~~~~~~~~ Wait % Total Event Waits Time (cs) Wt Time -------------------------------------------- ------------ ------------ ------PX Deq: Execution Msg 335 1,156 45.30 latch free 1,740 522 20.45 control file parallel write 88 329 12.89 db file sequential read 253 323 12.66 log file parallel write 2 42 1.65 -------------------------------------------------------------

PX -- parallel query issues?

Contact Oracle Tech Support

www.brianhitchcock.net

Ask the Experts


Oracle Tech Support

Many wait events in STATSPACK output should be ignored (a bug perhaps? Or an RFE?) Requests the full STATSPACK report Tells me that the latch free wait event must be addressed set session_cached_cursors = 100 Performance improvement will be significant Data load now takes 19 hours (about 10% worse)

www.brianhitchcock.net

What Happened?
How could the experts miss the bad SQL? Wait events are important

If the total wait time is the largest problem

In this case

Bad SQL dominated the overall run time


Many events should be ignored Need to determine how much of total run time is due to wait events only

Wait event analysis


Go back and fix the SQL issue


www.brianhitchcock.net

Total CPU Time


From Instance Activity Stats
Instance Activity Stats for DB: BHDATA04 Instance: BHDATA04 Snaps: 4426 -4427 Statistic Total per Second per Trans --------------------------------- ---------------- ------------ -----------CPU used by this session 27,441 99.4 27,441.0 CPU used when call started 27,475 99.6 27,475.0

Total CPU time is 27475 10s of milliseconds


10 milliseconds is 1 centisecond (cs) = 0.01 sec 27457 --> 27475 cs = 274.75 seconds Report interval was 4.60 minutes (276 seconds) confused? How many cs left until Happy Hour?

www.brianhitchcock.net

Total Wait Time


Look at all Wait Events
Wait Events for DB: BHDATA04 Instance: BHDATA04 Snaps: 4426 -4427 -> cs - centisecond - 100th of a second -> ms - millisecond - 1000th of a second -> ordered by wait time desc, waits desc (idle events last) Avg Total Wait wait Event Waits Timeouts Time (cs) (ms) ---------------------------- ------------ ---------- ----------- -----PX Deq: Execution Msg 335 0 1,156 35 latch free 1,740 1,046 522 3 control file parallel write 88 0 329 37 db file sequential read 253 0 323 13 log file parallel write 2 0 42 210 enqueue 61 0 40 7 refresh controlfile command 602 0 37 1 PX Deq: Msg Fragment 64 0 29 5 PX Deq: Parse Reply 59 0 26 4 control file sequential read 2,673 0 16 0 log file sync 1 0 15 150 PX Deq: Signal ACK 2 2 11 55 PX Deq: Join ACK 57 0 3 1 PX Deq: Execute Reply 5 0 3 6 SQL*Net more data to client 37 0 0 0 file open 6 0 0 0 db file parallel write 4 0 0 0 PX Idle Wait 4,316 4,287 877,818 2034 SQL*Net message from client 2,213 0 27,615 125 SQL*Net message to client 2,214 0 1 0 -----------------------------------------------------------------------> Total Wait Time 907997 cs Waits /txn -----335.0 ###### 88.0 253.0 2.0 61.0 602.0 64.0 59.0 ###### 1.0 2.0 57.0 5.0 37.0 6.0 4.0 ###### ###### ######

www.brianhitchcock.net

Real Total Wait Time


Remove idle events

MetaLink Note:191103.1 PQ Wait Events STATSPACK report should filter out idle events Database Performance Guide and Ref 9.0.1
Explains more about this feature

www.brianhitchcock.net

Real Total Wait Time


Idle Events
Wait Events for DB: BHDATA04 Instance: BHDATA04 Snaps: 4426 -4427 -> cs - centisecond - 100th of a second -> ms - millisecond - 1000th of a second -> ordered by wait time desc, waits desc (idle events last) Avg Total Wait wait Event Waits Timeouts Time (cs) (ms) ---------------------------- ------------ ---------- ----------- -----PX Deq: Execution Msg 335 0 1,156 35 latch free 1,740 1,046 522 3 control file parallel write 88 0 329 37 db file sequential read 253 0 323 13 log file parallel write 2 0 42 210 enqueue 61 0 40 7 refresh controlfile command 602 0 37 1 PX Deq: Msg Fragment 64 0 29 5 PX Deq: Parse Reply 59 0 26 4 control file sequential read 2,673 0 16 0 log file sync 1 0 15 150 PX Deq: Signal ACK 2 2 11 55 PX Deq: Join ACK 57 0 3 1 PX Deq: Execute Reply 5 0 3 6 SQL*Net more data to client 37 0 0 0 file open 6 0 0 0 db file parallel write 4 0 0 0 PX Idle Wait 4,316 4,287 877,818 2034 SQL*Net message from client 2,213 0 27,615 125 SQL*Net message to client 2,214 0 1 0 ----------------------------------------------------------------------> Total Wait Time 1397 cs Waits /txn -----335.0 ###### 88.0 253.0 2.0 61.0 602.0 64.0 59.0 ###### 1.0 2.0 57.0 5.0 37.0 6.0 4.0 ###### ###### ######

<----- remove

<----- remove

<----- remove <----- remove

www.brianhitchcock.net

Total Response Time


Total CPU Time + Total Wait Time

27475 cs + 1397 cs = 28872 cs 1397/28872 = 0.05 5% of Total Response Time

Total Wait Time


Wait Time was never an issue! If you dont remove the idle events

907997/(27475+907997) = 97%

www.brianhitchcock.net

Bad SQL Rules


Slow data load time

Time due to Bad SQL Time due All Others


Including Wait Events

Bad SQL All Others

www.brianhitchcock.net

Case 4) Data Load Time Varies


Data Load Time

Normal 6.5 hours, Long 16 hours Varies randomly, no pattern Normal, Long Compare reports Look for differences between reports
Normal -- 11 tablespaces accessed Long -- 74 tablespaces accessed

Generate STATSPACK report


Tablespace IO Stats Section


www.brianhitchcock.net

Problem and Solution


Vendor data load shouldnt touch all tables What process would access all tables? Production db supported by another group

We arent allowed to connect as oracle Cant see what they might be running (cron?) Production DBAs decided we needed full exports We werent notified Stop the exports, performance issue goes away!

Turns out

www.brianhitchcock.net

What About Cache Hit Rates?


Back to the subject of experts

Remember when it was cool to discuss hit rates? Compute buffer cache hit ratio from tables Tables larger than physical memory Cant have all pages in memory at once Buffer cache hit ratio wont be 100% Bad SQL (index) was the real problem Buffer cache hit ratio wasnt relevant

For Case 4)

Even if we had 100%


www.brianhitchcock.net

Select Buffer Cache Hit Ratio


Data Load without Exports running
1.2 1 0.8 0.6 0.4 0.2 0

Hit Ratio

Case 4)

13 17

Snapshot Id
www.brianhitchcock.net

21

Total Wait Time?


For Case 4), 15 minute report interval

Wait Time is 28% of total time


Instance: BHDATA01 Snaps: 13817 -1381

Instance Activity Stats for DB: BHDATA01

Statistic Total per Second per Trans --------------------------------- ---------------- ------------ -----------CPU used by this session 1,422,921 1,556.8 154.4 CPU used when call started 2,237,915 2,448.5 242.8 <----Top 5 Wait Events ~~~~~~~~~~~~~~~~~ Wait % Total Event Waits Time (cs) Wt Time --------------------------------------- ------------ ------------ ------PX Deq: Table Q Normal 495,566 1,017,115 30.12 <-- remove slave wait 25,580 744,985 22.06 PX Deq Credit: send blkd 406,262 678,556 20.10 <-- remove PX Deq: Execution Msg 33,009 438,309 12.98 <-- remove latch free 42,575 134,840 3.99 ------------------------------------------------------------879825 cs total time = 2237915 + 879825 = 3117740 Wait time is 879825/3117740 = 28%

www.brianhitchcock.net

Review
For the 4 issues we had, STATSPACK output

Was useful for all 4


Provided standard set of data for all involved

Fixed 2 issues
Provided the data that led to the root cause Verified that the fix was working

Performance improvements were substantial


Tuning process much faster with STATSPACK Same process worked for all 4 issues Decided not to look for further improvements
Wait Time analysis might be useful...

www.brianhitchcock.net

oraperf.com Analyzer
Website oraperf.com

Submit STATSPACK report Analyzer reviews report Generates detailed analysis


CPU time Wait time

Gives specific advice Not perfect, but it is fast and free!


Has same issues with idle wait events as STATSPACK report

www.brianhitchcock.net

oraperf.com
Who or what is oraperf.com?

From the website...

Oraperf.com is run by Anjo Kolk. Anjo has worked for over 16 years at Oracle (1985-2001). While at Oracle he worked in different countries and different departments. Many people generate utlbstat/utlestats and statspack reports, but don't know how to interpret the data. People that do look at these are reports also mostly looking at the wrong information and end up making the wrong tuning decisions. That is why the reports are analyzed based on the YAPP method. The YAPP method will show what component of the total response time should be tuned first. YAPP-Method -- Yet Another Performance Profiling Method http://oraperf.com/download/yapp_anjo_kolk.pdf www.brianhitchcock.net

oraperf.com -- Case 3)
Upload report from slow data load Analyzer shows

Response time
91.63% CPU Time 8.37% Wait Time

Advice?
Reduce the number of buffer gets or executions

Wait time

Matters only as a % of total response time

www.brianhitchcock.net

oraperf.com -- Case 3)
Upload report from fast data load Analyzer shows

Response time
5.16% CPU Time 94.84% Wait Time

Advice?
Tune PX Deq: Execution Msg event - But this is an idle event...

Non-idle wait time is only about 25% total time


www.brianhitchcock.net

oraperf.com -- Case 3)
Conclusion

oraperf.com analyzer provides another tool for performance tuning Well worth using if only to compute
Response Time CPU Time Wait Time - Check for idle wait events...

www.brianhitchcock.net

No Excuses
Install STATSPACK Generate two snapshots Generate standard report Upload to oraperf.com Review advice Fast, free performance analysis!

www.brianhitchcock.net

Installing STATSPACK
Create separate tablespace Create PERFSTAT user Execute SQL script to create tables Setup job to execute snapshots Setup process to purge data over time Set timed_statistics = TRUE

Not required, but needed to get wait time data

www.brianhitchcock.net

Installing STATSPACK
As user SYS
create tablespace perfstat datafile '/xxx/xxx/perfstat_01.dbf' size 500M;

cd $ORACLE_HOME/rdbms/admin sqlplus sys @spcreate.sql


Enter value for default_tablespace: perfstat Enter value for temporary_tablespace: temp
www.brianhitchcock.net

Generate Standard Report


Report SQL supplied by Oracle
sqlplus perfstat/perfstat@<SID> execute statspack.snap execute statspack.snap sqlplus perfstat/perfstat@<SID> @$ORACLE_HOME/rdbms/admin/spreport.sql Enter value for begin_snap: 1 Enter value for end_snap: 2 Enter value for report_name: testing
www.brianhitchcock.net

Select STATSPACK Data


Query the tables directly
select to_char(snap_time,'yyyy-mm-dd HH24') mydate, new.name buffer_pool_name, (((new.consistent_gets-old.consistent_gets)+ (new.db_block_gets-old.db_block_gets))-(new.physical_reads-old.physical_reads)) / ((new.consistent_gets-old.consistent_gets)+ (new.db_block_gets-old.db_block_gets)) bhr from perfstat.stats$buffer_pool_statistics old, perfstat.stats$buffer_pool_statistics new, perfstat.stats$snapshot sn where new.snap_id > 13125 and new.snap_id < 13149 and Based on SQL from new.name = old.name ORACLE High-Performance Tuning with STATSPACK and Donald K. Burleson new.snap_id = sn.snap_id Oracle Press ISBN 0-07-213378-3 and old.snap_id = sn.snap_id-1;

www.brianhitchcock.net

Buffer Cache Hit Ratio Case 4)


Output of SQL on previous slide
yr. mo dy Hr BUFFER_POOL_NAME ------------- -------------------- ----2002-09-17 02 DEFAULT 1.00 2002-09-17 02 DEFAULT 1.00 2002-09-17 02 DEFAULT .94 2002-09-17 02 DEFAULT .87 2002-09-17 03 DEFAULT .90 2002-09-17 03 DEFAULT .93 2002-09-17 03 DEFAULT .97 2002-09-17 03 DEFAULT .82 2002-09-17 04 DEFAULT .78 2002-09-17 04 DEFAULT .78 2002-09-17 04 DEFAULT .76 2002-09-17 04 DEFAULT .80 2002-09-17 05 DEFAULT .69 2002-09-17 05 DEFAULT .73 2002-09-17 05 DEFAULT .76 2002-09-17 05 DEFAULT .69 2002-09-17 06 DEFAULT .81 2002-09-17 06 DEFAULT 1.00 2002-09-17 06 DEFAULT 1.00 2002-09-17 06 DEFAULT 1.00 2002-09-17 07 DEFAULT 1.00 2002-09-17 07 DEFAULT 1.00 2002-09-17 07 DEFAULT 1.00 BHR

www.brianhitchcock.net

Space Used
Snapshot size varies with

Number of tablespaces Number of SQL statements captured

Db1 21 tablespaces --> 0.15 Mb/snapshot Db2 376 tablespaces --> 0.37 Mb/snapshot Assuming a snapshot every 15 minutes

96 snapshots per day Db1 --> 14.4 Mb/day Db2 --> 35.6 Mb/day

www.brianhitchcock.net

Removing Snapshot Data


Oracle supplied SQL

SQL removes snapshot data for a range of snapshot id numbers


sqlplus perfstat/perfstat@<SID> @$ORACLE_HOME/rdbms/admin/sppurge ... ... (listing of all existing snapshots) ... Specify the Lo Snap Id and Hi Snap Id range to purge ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Enter value for losnapid: 4001 Using 4001 for lower bound. Enter value for hisnapid: 5000 Using 5000 for upper bound. Deleting snapshots 4001 - 5000. commit; Note: large deletes may fill rollback segments

Example

www.brianhitchcock.net

Summary
STATSPACK

Free, easy to install, easy to run Output can be very useful or confusing Real-world use has resulted in big performance gains Useful for all instances Standard way to gather performance data

www.brianhitchcock.net

Reminder please complete the OracleWorld session survey


Thank you.

www.brianhitchcock.net

You might also like