Explain Plan for this session The Problem Good performance habits The approach Define the problem Instance Tuning: Examine OS Instance Tuning: Examine Oracle Stats Instance Tuning: Implement and measure changes Using SQL Trace and TKPROF SQL Trace File Example Formatting Trace Files with TKPROF TKPROF File Example
The Problem Perception Oracle Performance tuning is wizardy Why? Too many things to check.. Too many ratios to figure out.. Ratios should be > 90%, some even > 99% else performance is bad Any drop in ratios = performance loss Larger the Oracle caches = Better system performance Until 9i DBAs had a challenge to manually size SGA DBAs don't need to size SGA manually, we have AMM in 10g/11g Good Performance Habits Seek out the bottlenecks on your system Cure the disease not just the symptoms Tuning should be part of the lifecycle of an application. Most of the times tuning phase is left until the system is in production. At this time, tuning becomes a reactive fire- fighting exercise Tuning the resource hog on your system will provide you maximum benefits Set tuning goals, stop tuning when you achieve them
The Approach Define the problem Seek out the OS Bottlenecks Seek out the Oracle Bottlenecks Identify the real bottlenecks Make minimal changes as required Measure the effect of each change
Define the problem Get the business expectations (SLAs) for the program/job Identify Scope of problem. Whole instance? One program or specific operation? Identify time frame when problem occurs Quantify the slow down Identify any changes. Any OS change? Has more data been loaded to system or the data volume has grown? If symptoms can be identified as local to a program, focus on that program. Application tuning. Instance Tuning: Examine OS: Detecting CPU Bottlenecks CPU Bottlenecks (sar, top, prstat) sar u 5 5 essappu41:/apps/alice/aliceappl/fnd/11.5.0/resource> sar -o 5 5 SunOS essappu41 5.9 Generic_118558-20 sun4u 02/09/2007 06:44:47 %usr %sys %wio %idle 06:44:52 31 9 0 60
%usr This is the percentage of time that the processor is in user mode %sys This is the percentage of time that the processor is in system mode, servicing system calls. Users can cause this percentage to increase above normal levels by using system calls inefficiently %wio This is the percentage of time that the processor is waiting on completion of I/O, from disk, NFS. If the percentage is regularly high, check the I/O systems for inefficiencies %idle This is the percentage of time the processor is idle. If the percentage is high and the system is heavily loaded, there is probably a memory or an I/O problem. Instance Tuning: Examine OS: Detecting CPU Bottlenecks CPU Bottlenecks (sar, top, prstat) Example 1 essdbpu42 $sar -u 5 5 SunOS essdbpu42 5.9 Generic_112233-12 sun4u 11/19/2004 05:18:23 %usr %sys %wio %idle 05:18:28 4 3 0 93 05:18:33 9 2 0 89 05:18:38 11 5 0 84 05:18:44 2 3 0 94 05:18:49 5 8 0 87 Average 6 4 0 89
essdbpu42 $ Above output system load look likes Very good because 89 % CPU is idle Instance Tuning: Examine OS: Detecting CPU Bottlenecks
If the CPU is used by a small number of high usage programs then examine those programs If high usage programs are non-oracle, then determine If they require that amount of cpu. If so, then can their executions be delayed to off-peak hours If high CPU Usage programs are oracle processes identify high CPU usage oracle sessions, Drill down at session level to examine which sql is running and why is it taking high cpu. Trace the session if need be
Instance Tuning: Examine OS: Detecting I/O Bottlenecks I/O bottlenecks (sar, iostat) sar d 5 1 essappu41:/apps/alice/aliceappl/fnd/11.5.0/resource> sar -d 5 1
i/o requests should be evenly balanced high disk queue numbers + high service times I/O contention Check the wait events in v$system_event to see if the top wait events are i/o related
Instance Tuning: Examine OS: Detecting I/O Bottlenecks I/O bottlenecks (sar, iostat) iostat reports terminal and disk I/O activity Look for service time, disk utilization, queue length, iostat will show service times, busy rates and queue lengths by disk Important columns are svc_t, wait, %w and %b svc_t - service time in millisec.
Long service times >50 ms indicate too much head movement. Slow service times significantly impare performance of a disk. Wait - wait is the number of entries in the queue for that device waiting to be serviced %w - is the percent if time the queue is occupied. 100% would mean that the queue never got emptied during that time period %b - is the percent of time the device is busy: <35% is good > 60% is severerly bad and very little useful work is performed. Disk > 65% busy on a steady basis are severely overworked
In the above example essnasu32s0:/vol/dev2/alicedev8/db file system asvc_t is 55 ms and %b (busy) is 100 % that means system is running with poor IO performance.
If you see like this first we need to find out what are the instances are resides in that file system then what are the processes on that instances performing more IO activity
Instance Tuning: Examine OS: Detecting I/O Bottlenecks i/o wait events are db file sequential read, db file scattered read, db file single write and db file parallel write. These are all events corresponding to i/os performed against the data file headers, control files or data files
If any of these wait events correspond to high average time in AWR then investigate the problem
Check v$sqlarea and sort sqls by disk reads to find out oracle sessions doing physical reads
OEM will help in identifying such sql statements Instance Tuning: Examine OS: Detecting Memory Bottlenecks vmstat S 5 1
essappu41:/apps/alice/aliceappl/fnd/11.5.0/resource> vmstat -S 5 1 kthr memory page disk faults cpu r b w swap free si so pi po fr de sr m0 m1 m3 m4 in sy cs us sy id 0 0 0 53270408 12000224 0 0 148 65 62 0 0 0 0 0 0 3353 7206 4838 27 5 67
Do not go overboard on SGA sizing just to get high cache hit ratios
An example One interface program run was performing poor. To attain high library cache hit ratios, sga (shared pool) was increased to 2gb. On examination, severe library latch contention was found. Cause was lack of bind variable usage. On fixing the application, system performance came right back. Instance Tuning: Examine OS: Detecting Network Bottlenecks Look at the network round trip ping times. (Ping, Tracert on Windows, TraceRoute on Unix) C:\Documents and Settings\kmra>tracert essdbpu51.emrsn.com Tracing route to essdbpu51.emrsn.com [10.237.129.81] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms 142.176.225.1 2 26 ms 26 ms 24 ms 10.140.47.193 3 27 ms 27 ms 26 ms 10.140.33.5 4 28 ms 28 ms 29 ms 152.162.104.121 5 240 ms 240 ms 240 ms 152.162.102.182 6 254 ms 254 ms 255 ms 172.21.250.169 7 264 ms 281 ms 290 ms 172.21.250.222 8 255 ms 254 ms 255 ms 10.237.133.194 9 256 ms 256 ms 257 ms essdbpu51.emrsn.com [10.237.129.81] Trace complete.
If the network is causing large delays in response time, then involve network admin to find out possible causes Conversion programs need to be tested in similar network as production Example One conversion program was using db links to transfer data. Program was tested on test servers located in same data center. When this program ran in production, severe performance hit was found. On investigation when we did ping from one production server to another, there was large delay in response time as these two production servers were located in different data centers. Instance Tuning: Examine Oracle: Detecting Oracle Bottlenecks Oracle Statistics are examined and cross-referenced with OS Stats Wait events means server process/thread has to wait for an event to complete before further processing A server process can wait for A resource to become available (such as buffer or latch) An action to complete (such as an I/O) 220 Wait events in Oracle8i, about 400 in Oracle9i, more than 800 in Oracle10g and above 1100 in 11g. Instance Tuning: Oracle Wait Interface OWI Philosophy: The OWI is a performance tracking methodology that focuses on process bottlenecks, better known as wait events. OWI includes waits for I/O, locks, latches, background processes, network latencies OWI records and presents all the bottlenecks that a process encounters from start to finish. Keeps tracks of the number of times and amount of time a process spent on each bottleneck. Removing or even reducing the impact of major bottleneck improves performance. Instance Tuning: Oracle Wait Interface OWI plays a significant role in determining the database response time.
Response Time = Service Time + Wait Time
Service time is amount of time a process spends on a CPU. Wait time is the amount of time a process wait for specific resource to be available before continuing with processing.
You can improve database response time by shortening the service time, the wait time, or both.
Instance Tuning: Oracle Wait Interface OWI Components: A collection of a few dynamic performance views and extended SQL trace file. OWI provides statistics for wait events to tell how many times and how long the session waited for an event to complete. V$EVENT_NAME not a dynamic view. Contains all the wait events defined for your database. V$SYSTEM_EVENT displays aggregated stats for all wait events encountered by all oracle sessions since the instance startup. V$SESSION_EVENT contains aggregated wait event stats by session for all sessions currently connected to the instance. V$SESSION_WAIT provides details info about the event or resource that each session is waiting for Instance Tuning: Oracle Wait Interface I/O Related Wait Events: Db file sequential read -> is initiated by SQL that perform single-block read operations against indexes, undo segments. This is a single block read operation. db file scattered read -> is initiated by SQL that perform full scans operation against tables and indexes. This is a multi-block read. Direct path read -> waits are driven by SQL that perform direct read operation from the temporary or regular tablespaces. Direct path write -> waits are driven by SQL that perform direct write operation in the temporary or regular tablespaces. Locks related Wait Events: Latch free -> it means the process failed to obtain the latch in the willing-to-wait mode after spinning _SPIN_COUNT times and went to sleep. (shared pool, Library Cache, Cache Buffer Chains Latches) Enqueue -> Due to variety of enqueue types, this event can occur for many different reasons. The common causes and appropriate action to take depends on enqueue type and the mode that the sessions are competing for. Buffer busy waits -> occurs when a session wants to access a data block in the buffer cache that is currently in use by other session.
Instance Tuning: Examine Oracle Start with v$system_event to determine the top waits on your system Column names: Event, Total_waits, Total_timeouts, Time_waited, Avg_wait Drill down to v$session_event Same column names + SID. Event information for the session. Stats are lost as session log off. Remember to get the SQL executed by SID Get all the SQL for the session in question by joining v$session_wait, v$session, v$sql Get the bottlenecking segment for this SQL by joining v$session_wait, dba_data_files, dba_extents
Instance Tuning: Examine Oracle Initialization Parameters that effect performance SGA_MAX_SIZE DB_CACHE_SIZE PGA_AGGREGATE_TARGET SHARED_POOL_SIZE Metalink Note 216205.1 Database Initialization Parameters and Configurations for Oracle Apps 11i. A poorly sized buffer cache results in excessive buffer gets and physical I/O A poorly sized shared pool results in library cache and shared pool latch contention due to lack of space
Instance Tuning: Implement and Measure Changes Identify two or three changes that could potentially improve the performance Implement one change at a time Measure affect of change against the baseline data collected in performance definition phase Its an iterative process. Know when to stop tuning. Using SQL Trace and TKPROF alter session set events '10046 trace name context forever, level n'; If the level is 4 or 12 then the trace file contains information about the bind variables. If the level is 8 or 12 then the trace file contains information about the events that the session waited for. Without any level it is equivalent to alter session set sql_trace=true Some relevant init.ora parameters user_dump_dest - Trace file location timed_statistics=TRUE - Enables Timing max_dump_file_size - Max size trace files (unlimited by default) _trace_files_public=TRUE -changes permissions on trace files from 640 to 644 sql_trace=TRUE - enables trace globally Using SQL Trace and TKPROF How to find your trace file select spid from v$process a, v$session b where a.addr=b.paddr and b.sid= (select distinct sid from v$mystat)
ALTER SESSION SET TRACEFILE_IDENTIFIER= bala select username, traceid from v$process
How to Trace Someone Elses Session Exec sys.dbms_system.set_ev (sid, serial#, 10046, 12, ) - to turn on Exec sys.dbms_system.set_ev (sid, serial#, 10046, 0, ) - to turn off
Oracle Form level trace Help -> Diagnostics -> Trace -> Trace with binds and waits
Setting the Trace for MWA/telnet session or a particular user Log in as a System Administrator & move to Profile/System Check off the USER box - and enter username that you are tracing Search on the profile option - 'Initialization SQL Statement - Custom. Set following at the user level - with the following string -
BEGIN FND_CTL.FND_SESS_CTL('','', '', 'TRUE','','ALTER SESSION SET TRACEFILE_IDENTIFIER='||''''||bala ||''''||' EVENTS ='||''''||' 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12 '||''''); END; SQL Trace File Example Dump file bala_ora_461_DM_USER2.trc
*** TRACE DUMP CONTINUED FROM FILE ***
System name: SunOS Node name: essdbsu41 Release: 5.9 Version: Generic_112233-12 Machine: sun4u Instance name: betsys2 Redo thread mounted by this instance: 1 Oracle process number: 74 Unix process pid: 461, image: oracle@essdbsu41 (TNS V1-V3)
*** SESSION ID:(41.34778) 2012-05-25 04:53:31.081 ===================== PARSING IN CURSOR #30 len=104 dep=2 uid=173 oct=42 lid=173 tim=30531412 hv=3741596375 ad='5de1c6c0' ALTER SESSION SET TRACEFILE_IDENTIFIER='DM_USER2' EVENTS =' 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12 ' END OF STMT EXEC #30:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=30531412 ===================== PARSING IN CURSOR #21 len=194 dep=1 uid=173 oct=47 lid=173 tim=30531412 hv=1642868942 ad='5de1ce8c' BEGIN FND_CTL.FND_SESS_CTL('','', '', 'TRUE','','ALTER SESSION SET TRACEFILE_IDENTIFIER='||''''||'DM_USER2' ||''''||' EVENTS ='||''''||' 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12 '||''''); END; END OF STMT EXEC #21:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=1,dep=1,og=4,tim=30531412 WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0 =====================
SQL Trace File Example STAT #13 id=1 cnt=0 STAT#13 Cursor identifier STAT #13 id=2 cnt=1 id=1 Row source id STAT #13 id=3 cnt=1 cnt=1 Rows processed STAT #13 id=4 cnt=1 STAT #13 id=5 cnt=1
2. SQL> SELECT relative_fno, owner, segment_name, segment_type FROM dba_extents WHERE file_id = &file AND &block BETWEEN block_id AND block_id + blocks - 1; 2 3 4 Enter value for file: 1045 old 3: WHERE file_id = &file new 3: WHERE file_id = 1045 Enter value for block: 99752 old 4: AND &block BETWEEN block_id AND block_id + blocks - 1 new 4: AND 99752 BETWEEN block_id AND block_id + blocks - 1
Formatting Trace Files with TKPROF tkprof produces a formatted output from the trace files generated by SQL trace or event 10046. The output generated by tkprof can be sorted by a variety of parameters to separate the more expensive statements from the rest. The first part of the parameter specifies the stage prs - parse exe - execution fch - fetch The second part specifies the actual statistic mis - misses in library cache dsk - physical reads qry - consistent reads cu - current read cpu - CPU time ela - elapsed time
TKPROF: Release 11.2.0.1.0 - Production on Sun Oct 21 18:04:27 2012
Sort options: default
******************************************************************************** count =number of times OCI procedure was executed cpu =cpu time in seconds executing elapsed =elapsed time in seconds executing disk =number of physical reads of buffers from disk query =number of buffers gotten for consistent read current =number of buffers gotten in current mode (usually for update) rows =number of rows processed by the fetch or execute call ********************************************************************************
alter session set events='10046 trace name context forever, level 12'
Misses in library cache during parse: 0 Optimizer goal: CHOOSE Parsing user id: 173
Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 1 0.00 0.00 SQL*Net message from client 1 0.00 0.00 ******************************************************************************** TKPROF File Example 1
TKPROF: Release 11.2.0.1.0 - Production on Sun Oct 21 18:04:27 2012
Sort options: default
******************************************************************************** count =number of times OCI procedure was executed cpu =cpu time in seconds executing elapsed =elapsed time in seconds executing disk =number of physical reads of buffers from disk query =number of buffers gotten for consistent read current =number of buffers gotten in current mode (usually for update) rows =number of rows processed by the fetch or execute call ********************************************************************************
alter session set events='10046 trace name context forever, level 12'
Misses in library cache during parse: 0 Optimizer goal: CHOOSE Parsing user id: 173
Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 1 0.00 0.00 SQL*Net message from client 1 0.00 0.00 ******************************************************************************** TKPROF File Example 2 Sort options: exeela fchela ******************************************************************************** count = number of times OCI procedure was executed cpu = cpu time in seconds executing elapsed = elapsed time in seconds executing disk = number of physical reads of buffers from disk query = number of buffers gotten for consistent read current = number of buffers gotten in current mode (usually for update) rows = number of rows processed by the fetch or execute call ********************************************************************************
SELECT CPT.SERIAL_NUMBER PT_SL_NUMBER , DRA.SERIAL_NUMBER DR_SL_NUMBER , CPT.INVENTORY_ITEM_ID PT_ITEM_ID , DRA.INVENTORY_ITEM_ID DR_ITEM_ID FROM CSD_PRODUCT_TXNS_V CPT , CSD_REPAIRS DRA WHERE ACTION_TYPE IN('RMA', 'WALK_IN_RECEIPT') AND DRA.REPAIR_LINE_ID = :B1 AND CPT.REPAIR_LINE_ID = DRA.REPAIR_LINE_ID AND NVL(CPT.SERIAL_NUMBER_CONTROL_CODE,1) > 1
[H] shows that this query is visiting 297 buffers to find the rows to update [I] shows that only 3 buffer are visited performing the update [J] shows that only 1 row is updated.
Reading 297 buffers to update 1 rows is a lot of work and would tend to indicate that the access path being used is not particularly efficient. Perhaps there is an index missing that would improve the access performance?