Professional Documents
Culture Documents
Performance
The measure of efficiency for an application or multiple applications running in the same environment Is usually measured in: k/
Can be shortened by: Reducing contention and wait times, particularly disk I/O wait times Using faster components Reducing the amount of time the resources are needed
is commonly measured in transactions per second (tps), but it can also be measured per minute, per hour, per day, and so on
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 2
Database Performance
Can be thought by using the concepts of supply and demand
Users demand information from the database / k The DBMS supplies information to those requesting it t . the demand for The rate at which the DBMS supplies e b u information can be termed database performance t
se Five factors influence c database performance: / Workload / : p Throughput t t Resource h Optimization Contention
Page 3
s c Throughput defines the overall capability of the / / data : computer to process p t Composite of htI/O speed, CPU speed, parallel
capabilities of the machine, and the efficiency of the operating system and system software
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/
Page 4
tpcondition in which two or more Contention is t the componentsh of the workload are attempting to use a single resource in a conflicting way
As contention increases, throughput decreases
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 5
Performance Tuning
Adjusting various parameters and design choices to improve system performance for a specific application. Tuning is best done by / k 1. identifying bottlenecks, and t . e 2. eliminating them. b tu Can tune a database system e at 3 levels:
Hardware -- e.g., add disks to speed up I/O, add memory to increase buffer hits, move to a faster processor. Database system parameters -- e.g., set buffer size to avoid paging of buffer, set checkpointing intervals to limit log size. System may have automatic tuning. Higher level database design, such as the schema, indices and transactions (more later)
t t h
/ : p
s c /
Page 7
Bottlenecks
Performance of most systems (at least before they are tuned) usually limited by performance of one or a few components: these are called bottlenecks
. e Worth spending most time on 20% of code that take 80% of time b u t Bottlenecks may be in hardware (e.g. disks are very busy, e s CPU is idle), or in software c / / Removing one bottleneck often exposes another : p t De-bottlenecking consists of repeatedly finding t h
bottlenecks, and removing them
This is a heuristic
E.g. 80% of the code may take up 20% of time and 20% of code takes up 80% of time
/ k t
Page 8
Identifying Bottlenecks
With concurrent transactions, transactions may have to wait for a requested service while other transactions are being served / k t with a queue Can model database as a queueing system . for each service be
transactions repeatedly do the following
request a service, wait in queue for the service, and get serviced
Bottlenecks in a database system typically show up as very / : high utilizations (and correspondingly, very long queues) of a p particular servicett
E.g. disk vs CPU utilization 100% utilization leads to very long waiting time:
s c /
u t e
Rule of thumb: design system for about 70% utilization at peak load utilization over 90% should be avoided IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 9
t t h
/ : p
s c /
u t e
. e b
/ k t
Page 10
Tunable Parameters
Tuning Tuning Tuning Tuning Tuning of of of of of hardware schema / k indices t . e b materialized views u t e s transactions c
p t t
/ / :
Page 11
Tuning of Hardware
Even well-tuned transactions typically require a few I/O operations
Typical disk supports about 100 random I/O operations per second Suppose each transaction requires just 2 random I/O operations. Then to support n transactions per second, we need to stripe data across n/50 disks (ignoring skew)
s transaction can be reduced Number of I/O operationsc per by keeping more data / in/ memory
reducing number of disks required, but has a memory cost
u t e
. e b
/ k t
: p If all data is in memory, I/O needed only for writes t Keeping frequently ht used data in memory reduces disk accesses,
Page 12
s c / Buying memory: If accesses are more then saving is greater than cost / : current disk and memory prices leads to: Solving above equation with p t a page that is randomly accessed is t 5-minute rule: if h
(by buying sufficient memory!)
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/
u t e
. e b
/ k t
e b urequires lots of disks to For reasonably large r and w, this t e handle workload s c than RAID 1 to handle load! RAID 5 may require more / disks / of disks by RAID 5 (by using parity, as Apparent saving of number : p done by RAID 1) may be illusory! opposed to the mirroring t ht 5 is fine when writes are rare and data Thumb rule: RAID
is very large, but RAID 1 is preferable otherwise
If you need more disks to handle I/O load, just mirror them since disk capacities these days are enormous!
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 15
RAID 1 requires r + 2w I/O operations per second RAID 5 requires: r + 4w I/O operations per second
k t .
Sometimes splitting normalized tables can improve performance Can split tables in two ways: Adds complexity to the applications
Adding redundant columns Adding derived attributes Collapsing tables Duplicating tables Cluster together on the same disk page records that would match in a frequently required join,
Denormalization:
t t h
/ : p
s c /
u t e
. e b
/ k t
Page 16
might choose horizontal splitting if a table stores huge amounts of rarely used historical data, and the applications have high performance needs for current data in the same table
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 17
t t h
/ : p
s c /
u t e
. e b
/ k t
Page 18
/ both of the above conditions Makes even more sense when / are true p:
When a table contains very long columns that are accessed infrequently,
Some columns are accessed more frequently than other columns The table has wide rows, and splitting the table reduces the number of pages that need to be read
e s c
b u t
frequently used columns With shorter rows, more data rows fit on a data page, so for many queries, fewer pages can be accessed
t t hseparate table can greatly speed the retrieval of the more placing them in a
Page 19
Tuning the Database Design Schema tuning (cont.) Vertical Splitting Example
account relation with the following schema: / can be split into two relations: be
/ These two representations are equivalent and in the / : (BCNF). But, they could give p maximum normal form t t different performance characteristic, depends on the h information that usually retrieved at the same time. E.g. Branch-name need not be fetched unless required
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 20
: most frequent queries require All or nearly all of the p t set of joined data access to the t full happlications perform table scans when A majority of joining tables Computational complexity of derived columns requires temporary tables or excessively complex queries
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/
Page 21
Disadvantages:
It can increase the size of tables In some instances, it simplifies coding; in others, it makes coding more complex.
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 22
/ : It usually speeds retrieval but can slow data modification p t It is always application-specific and must be reevaluated if the t h application changes
s c /
u t e
. e b
/ k t
t t h
/ : p
s c /
u t e
. e b
/ k t
Page 23
Tuning the Database Design Schema tuning (cont.) Adding Redundant Columns
To eliminate frequent joins for many/ queries k t For example, if performing frequent joins on the . e b titleauthor and authors tables to retrieve the u t e authors last name, you s can add the au_lname c / column to titleauthor . / : The problems with tp this solution are that it:
Have to make changes to two tables, and possibly to many rows in one of the tables.
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 24
t t h
/ : p
s c /
u t e
. e b
/ k t
Page 25
h still need access to just the pairs of data If some users from the two tables, this access can be restored by using queries that select only the needed columns or by using views
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 26
u t be very sure that the data integritye requirements are well documented and s well known to all application developers and to those who must maintain c / applications / : Batch reconciliation, run at appropriate intervals, to bring the p t into agreement denormalized data t back h If 100 percent consistency is not required at all times
. e b
/ k t
From an integrity point of view, triggers provide the best solution, although they can be costly in terms of performance
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 28
Columns that are frequently referenced in the ORDER BY and e sconsidered for indexes GROUP BY clauses should c be / on columns with a high number of / Indexes should be created : when used as filter conditions in p unique values, or columns t the WHERE clause return a low percentage of rows of data t h from a table The effective use of indexes requires a thorough knowledge of table relationships, query and transaction requirements, and the data itself
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 30
b u t
p t t be used on columns that contain a high Indexes shouldh not number of NULL values
However, the batch job's performance is slowed considerably by the index Might consider dropping the index before the batch job, and then recreating the index after the job has completed
/ / :
e s c
Page 31
Overheads
Space Time for view maintenance
. e Immediate view maintenance: done u asb part of update t time overhead paid by update transaction e s only when required Deferred view maintenance: done c update transaction is not / affected, but system time is spent on view / maintenance : until updated, p the view may be out-of-date t t Preferable to denormalized schema since view h
/ k t
Page 32
Tuning the Database Design Materialized Views (cont.) How to choose set of materialized views
Helping one transaction type by introducing a / k materialized view may hurt others t . Choice of materialized views depends on costs be
u t Users often have no idea of actual cost of operations e s c Overall, manual selection of materialized views / / : is tedious p t t h Some database systems provide tools to
help DBA choose views to materialize
Materialized view selection wizards
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 33
Tuning of Transactions
Basic approaches to tuning of transactions
Improve set orientation Reduce lock contention
Rewriting of queries to improve performance was important / this less in the past, but smart optimizers have made k t . important e b Communication overhead and query handling overheads u t significant part of cost of each ecall
Combine multiple embedded SQL/ODBC/JDBC queries into a single set-oriented query
/ : Set orientation -> fewer calls to database p tthat computes total salary for each department using E.g. tune program t h query by instead using a single query that computes a separate SQL
total salaries for all department at once (using group by)
s c /
e b E.g. large query to compute bank statistics and regular u t e bank transactions s c / To reduce contention / : p Use multi-version concurrency control t t E.g. Oracle which support multi-version 2PL hsnapshots
Use degree-two consistency (cursor-stability) for long transactions
Drawback: result may be approximate
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 35
/ k Use mini-batch transactions to limit number of updates that t e. if a single large a single transaction can carry out. b E.g., u transaction updates every record of a very large relation, log t e may grow too big. s c of ``mini-transactions,'' each * Split large transaction into/ batch / performing part of the : updates p Hold locks across transactions in a mini-batch to ensure serializability t If lock table size htis a problem can release locks, but at the cost of
serializability
and also greatly increase recovery time after a crash, and may even exhaust log space during recovery if recovery algorithm is badly designed!
* In case of failure during a mini-batch, must complete its remaining portion on recovery, to ensure atomicity.
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 36
u t The optimizers, in some cases,e seem to read from the bottom of the s WHERE clause up. Therefore, you would want to place the most c / restrictive condition last in the WHERE clause / : Rewriting the SQL statement using the IN predicate p t instead of the OR t h operator consistently
The join conditions should be in the first position(s) of the WHERE clause followed by the filter clause(s) The effective placement of the most restrictive condition in the query requires knowledge of how the optimizer operates
. e b
/ k t
Page 37
Performance Simulation
Performance simulation using queuing model useful to predict bottlenecks as well as the effects of tuning changes, even without access to real system Queuing model as we saw earlier
. usually omits some Simulation model is quite detailed, but e low level details ub c / / t e s
/ k t
Model service time, but disregard details of service E.g. approximate disk read time by using an average disk read time
Experiments can be run on model, and provide an estimate : p average throughput/response time of measures sucht as t h Parameters can be tuned in model and then replicated in real system
E.g. number of disks, memory, algorithms, etc
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 38
Performance Benchmarks
Suites of tasks used to quantify the performance of software systems / systems, k Important in comparing database t . e more especially as systems become b u t standards compliant. e s c Commonly used performance measures: //
transaction to return of result) Availability or mean time to failure
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/
: p Throughput (transactions per second, or tps) t t h Response time (delay from submission of
Page 39
E.g., suppose a system runs transaction type A at 99 tps and transaction type B at 1 tps. Given an equal mixture of types A and B, throughput is not (99+1)/2 = 50 tps. Running one transaction of each type takes time 1+.01 seconds, giving a throughput of 1.98 tps. To compute average throughput, use harmonic mean: n
p t t
/ / :
e s c
b u t
t . e
Interference (e.g. lock contention) makes even this incorrect if different transaction types run concurrently
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 40
u t e systems tuned to one of Architecture of some database s c the two classes / / : E.g. Teradata is tuned to decision support p t Others try to balance t the two requirements h E.g. Oracle, with snapshot support for long read-only transaction
including online analytical processing, or OLAP applications require good query evaluation algorithms and query optimization.
. e b
/ k t
Page 41
Benchmarks Suites
The Transaction Processing Council (TPC) benchmark suites are widely used.
/ TPC-A and TPC-B: simple OLTP application k t . modeling a bank teller application with and e b without communication tu / / TPC-C: complex OLTP application modeling an : p t inventory system ht
Current standard for OLTP benchmarking Not used anymore
e s c
Page 42
. e Total of 22 queries with emphasis on aggregation b u prohibits materialized views t eand foreign keys permits indices only on primary s c as TPC-H, but without any / TPC-R: (R for reporting) same / views and indices : restrictions on materialized p t TPC-W: (W fort Web) End-to-end Web service benchmark h bookstore, with combination of static and modeling a Web
dynamically generated pages
/ k t
Page 43
. e bsizes to be scaled up TPC benchmark requires database u t with increasing transactions-per-second e s where more customers means reflects real world applications c / more database size and more transactions-per-second / : p External audit of TPC performance numbers mandatory t ht claims can be trusted TPC performance
/ k t
Page 44
multiple streams running in parallel each generates queries, with one parallel update stream
Composite query / per hour metric: square root of : product of power and throughput metrics p t t Composite metric hprice/performance
s c /
u t e
Page 45