Database Tuning PDF

/ k t .
e Database Tuning b u t e s c / / Widagdo : Tricya p t Departemen Teknik Informatika ht

Institut Teknologi Bandung
Page 1
IF-ITB/TW/22 Dec'03 IF3211 Database Tuning http://csetube.weebly.com/
Performance
The measure of efficiency for an application or multiple applications running in the same environment Is usually measured in: k/
t task takes to response time: the time that a single . complete be / : p s c / u t e
t t throughput: hthe volume of work completed in a fixed

time period
Can be shortened by: Reducing contention and wait times, particularly disk I/O wait times Using faster components Reducing the amount of time the resources are needed
is commonly measured in transactions per second (tps), but it can also be measured per minute, per hour, per day, and so on
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 2
Database Performance
Can be thought by using the concepts of supply and demand
Users demand information from the database / k The DBMS supplies information to those requesting it t . the demand for The rate at which the DBMS supplies e b u information can be termed database performance t
se Five factors influence c database performance: / Workload / : p Throughput t t Resource h Optimization Contention
IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/
Page 3
Factors Influencing Database Performance

Workload that is requested of the DBMS defines the demand
A combination of online transactions, batch jobs, ad / k and system hoc queries, data warehousing analysis, t . e commands directed through the system b u Can fluctuate drastically, sometimes can be predicted et
s c Throughput defines the overall capability of the / / data : computer to process p t Composite of htI/O speed, CPU speed, parallel
capabilities of the machine, and the efficiency of the operating system and system software
Page 4
Factors Influencing Database Performance (cont.)

Resources of the system include database kernel, disk space, memory, cache controllers, and microcode / k Optimization of queries is primarily accomplished t . e internal to the DBMS b
tpcondition in which two or more Contention is t the componentsh of the workload are attempting to use a single resource in a conflicting way
As contention increases, throughput decreases
u Many factors that need to be t optimized: e s SQL formulation c / Database parameters :/
Database Performance Definition

Database performance can be defined as the optimization of resource use to increase / throughput and minimize contention, k t . enabling the largest possible workload to e b u be processed t e s c Performance management tasks not / / : covered by the definition should be handled p t t h other that the DBA or at a by someone minimum shared between the DBA and other technicians
Performance Tuning
Adjusting various parameters and design choices to improve system performance for a specific application. Tuning is best done by / k 1. identifying bottlenecks, and t . e 2. eliminating them. b tu Can tune a database system e at 3 levels:
Hardware -- e.g., add disks to speed up I/O, add memory to increase buffer hits, move to a faster processor. Database system parameters -- e.g., set buffer size to avoid paging of buffer, set checkpointing intervals to limit log size. System may have automatic tuning. Higher level database design, such as the schema, indices and transactions (more later)
t t h
/ : p
s c /
Page 7
Bottlenecks
Performance of most systems (at least before they are tuned) usually limited by performance of one or a few components: these are called bottlenecks
. e Worth spending most time on 20% of code that take 80% of time b u t Bottlenecks may be in hardware (e.g. disks are very busy, e s CPU is idle), or in software c / / Removing one bottleneck often exposes another : p t De-bottlenecking consists of repeatedly finding t h
bottlenecks, and removing them
This is a heuristic
E.g. 80% of the code may take up 20% of time and 20% of code takes up 80% of time
/ k t
Page 8
Transactions request a sequence of services

e.g. CPU, Disk I/O, locks
Identifying Bottlenecks
With concurrent transactions, transactions may have to wait for a requested service while other transactions are being served / k t with a queue Can model database as a queueing system . for each service be
transactions repeatedly do the following
request a service, wait in queue for the service, and get serviced
Bottlenecks in a database system typically show up as very / : high utilizations (and correspondingly, very long queues) of a p particular servicett
E.g. disk vs CPU utilization 100% utilization leads to very long waiting time:
s c /
u t e
Rule of thumb: design system for about 70% utilization at peak load utilization over 90% should be avoided IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/ Page 9
Queues In A Database System
t t h
/ : p
s c /
u t e
. e b
/ k t
Page 10
Tunable Parameters
Tuning Tuning Tuning Tuning Tuning of of of of of hardware schema / k indices t . e b materialized views u t e s transactions c
p t t
/ / :
Page 11
Tuning of Hardware
Even well-tuned transactions typically require a few I/O operations
Typical disk supports about 100 random I/O operations per second Suppose each transaction requires just 2 random I/O operations. Then to support n transactions per second, we need to stripe data across n/50 disks (ignoring skew)
s transaction can be reduced Number of I/O operationsc per by keeping more data / in/ memory
reducing number of disks required, but has a memory cost
u t e
. e b
/ k t
: p If all data is in memory, I/O needed only for writes t Keeping frequently ht used data in memory reduces disk accesses,
Page 12
Hardware Tuning: Five-Minute Rule

Question: which data to keep in memory:
If a page is accessed n times per second, keeping it in memory saves
n* price-per-disk-drive accesses-per-second-per-disk
Cost of keeping page in memory

price-per-MB-of-memory pages-per-MB-of-memory
Break-even point: value of n for which above costs are equal
s c / Buying memory: If accesses are more then saving is greater than cost / : current disk and memory prices leads to: Solving above equation with p t a page that is randomly accessed is t 5-minute rule: if h
(by buying sufficient memory!)
u t e
. e b
/ k t
used more frequently than once in 5 minutes it should be kept in memory

Page 13
Hardware Tuning: One-Minute Rule

For sequentially accessed data, more pages can be read per second. Assuming sequential reads of 1MB of data at a time: / k t 1-minute rule: sequentially accessed data . that is accessed once or u more be in a minute t should be kept in memory e s c / Prices of disk and memory have changed greatly / over the years,t but p: the ratios have not changed much ht
so rules remain as 5 minute and 1 minute rules, not 1 hour or 1 second rules!
To use RAID 1 or RAID 5?
Hardware Tuning: Choice of RAID Level

RAID 5 requires 2 block reads and 2 block writes to write out one data block
Depends on ratio of reads and writes
If an application requires r reads and w writes / per second
e b urequires lots of disks to For reasonably large r and w, this t e handle workload s c than RAID 1 to handle load! RAID 5 may require more / disks / of disks by RAID 5 (by using parity, as Apparent saving of number : p done by RAID 1) may be illusory! opposed to the mirroring t ht 5 is fine when writes are rare and data Thumb rule: RAID
is very large, but RAID 1 is preferable otherwise
If you need more disks to handle I/O load, just mirror them since disk capacities these days are enormous!
RAID 1 requires r + 2w I/O operations per second RAID 5 requires: r + 4w I/O operations per second
k t .
Tuning the Database Design Schema tuning

Can be done in several ways:
Splitting tables
Horizontally Vertically
Sometimes splitting normalized tables can improve performance Can split tables in two ways: Adds complexity to the applications
Adding redundant columns Adding derived attributes Collapsing tables Duplicating tables Cluster together on the same disk page records that would match in a frequently required join,
Denormalization:
t t h
/ : p
s c /
u t e
. e b
/ k t
compute join very efficiently when required.
Page 16
Tuning the Database Design Schema tuning (cont.)

Horizontal Splitting Placing rows in two separate tables, depending on data values in one or more columns / k t Use horizontal splitting if: . e b A table is large, and reducing u its size reduces the number t of index pages read in a query e s c The table split corresponds to a natural separation of the / / geographical sites or historical rows, such as different : p t versus current tdata
might choose horizontal splitting if a table stores huge amounts of rarely used historical data, and the applications have high performance needs for current data in the same table
Table splitting distributes data over the physical media

Horizontal Splitting Example
t t h
/ : p
s c /
u t e
. e b
/ k t
Page 18

Vertical Splitting Partition relations to isolate the data that is accessed most often only fetch needed information / k Keeps the relation in normal form t . Use vertical splitting if: e
/ both of the above conditions Makes even more sense when / are true p:
When a table contains very long columns that are accessed infrequently,
Some columns are accessed more frequently than other columns The table has wide rows, and splitting the table reduces the number of pages that need to be read
e s c
b u t
frequently used columns With shorter rows, more data rows fit on a data page, so for many queries, fewer pages can be accessed
t t hseparate table can greatly speed the retrieval of the more placing them in a
Page 19
Tuning the Database Design Schema tuning (cont.) Vertical Splitting Example
account relation with the following schema: / can be split into two relations: be
k account(account-number, brach-name,t balance) .
/ These two representations are equivalent and in the / : (BCNF). But, they could give p maximum normal form t t different performance characteristic, depends on the h information that usually retrieved at the same time. E.g. Branch-name need not be fetched unless required
u branch-name) account-branch(account-number, t e balance) s account-balance(account-number, c
Tuning the Database Design Schema tuning (cont.) Denormalization

Can be done with tables or columns / k Assumes prior normalization t . e of how the data is Requires a thorough knowledge b u t being used e s Used if: /c
: most frequent queries require All or nearly all of the p t set of joined data access to the t full happlications perform table scans when A majority of joining tables Computational complexity of derived columns requires temporary tables or excessively complex queries
Page 21

Denormalization (cont.)
Denormalization can improve performance by :
Minimizing the need for joins Reducing the number of foreign keys on tables Reducing the number of indexes, saving storage space, and reducing data modification time Precomputing aggregate values Reducing the number of tables (in some cases)
Disadvantages:
It can increase the size of tables In some instances, it simplifies coding; in others, it makes coding more complex.
/ : It usually speeds retrieval but can slow data modification p t It is always application-specific and must be reevaluated if the t h application changes
s c /
u t e
. e b
/ k t

Denormalization (cont.)
Issues to examine when considering denormalization include:
What are the critical transactions, and what is the expected response time? How often are the transactions executed? What tables or columns do the critical transactions use? How many rows do they access each time? What is the mix of transaction types: select, insert, update, and delete? What is the usual sort order? What are the concurrency expectations? How big are the most frequently accessed tables? Do any processes compute summaries? Where is the data physically located?
t t h
/ : p
s c /
u t e
. e b
/ k t
Page 23
Tuning the Database Design Schema tuning (cont.) Adding Redundant Columns
To eliminate frequent joins for many/ queries k t For example, if performing frequent joins on the . e b titleauthor and authors tables to retrieve the u t e authors last name, you s can add the au_lname c / column to titleauthor . / : The problems with tp this solution are that it:
t h Requires maintenance of new columns

Requires more disk space
Have to make changes to two tables, and possibly to many rows in one of the tables.

Adding Derived Columns
Can eliminate some joins and reduce the time needed to produce aggregate values E.g. The total_sales column in the titles table
eliminating both the join and the aggregate at runtime increases storage needs, and requires maintenance of the derived column whenever changes are made to the titles table
t t h
/ : p
s c /
u t e
. e b
/ k t
Page 25
Tuning the Database Design Schema tuning (cont.) Collapsing Tables

If most users need to see the full set of joined data / k from two tables, collapsing the two tables into one t . can improve performance by eliminating the join e b u The data from the two tables must be in a one-tot e s one relationship to collapse tables c / / Eliminates the join, but loses the conceptual : p separation of the t t data
h still need access to just the pairs of data If some users from the two tables, this access can be restored by using queries that select only the needed columns or by using views
Tuning the Database Design Schema tuning (cont.) Duplicating Tables

If a group of users / regularly needs only a k t . subset of data, the e b critical table subset can u t e be duplicated for that s c / group / : p Minimizes contention, t t h but requires managing redundancy

Managing Denormalized Data
Need to ensure data integrity by using:
Triggers, which can update derived or duplicated data anytime the base data changes Application logic, using transactions in each application that update denormalized data, to ensure that changes are atomic
u t be very sure that the data integritye requirements are well documented and s well known to all application developers and to those who must maintain c / applications / : Batch reconciliation, run at appropriate intervals, to bring the p t into agreement denormalized data t back h If 100 percent consistency is not required at all times
. e b
/ k t
From an integrity point of view, triggers provide the best solution, although they can be costly in terms of performance
Tuning the Database Design Index tuning

Create appropriate indices to speed up slow queries/updates / k Speed up slow updates by removing excess indices t . e (tradeoff between queries and updates) b u t Choose type of index (B-tree/hash) appropriate for e s c most frequent types of queries. / / : p to make clustered Choose which index t t h Index tuning wizards look at past history of queries and updates (the workload) and recommend which indices would be best for the workload
Tuning the Database Design Index tuning (cont.)

When should indexes be considered Unique indexes are implicitly used in conjunction with a primary key for the primary key to work / k for an index Foreign keys are also excellent candidates t because they are often used to join the e. parent table
Most, if not all, columns used for table joins should be indexed
Columns that are frequently referenced in the ORDER BY and e sconsidered for indexes GROUP BY clauses should c be / on columns with a high number of / Indexes should be created : when used as filter conditions in p unique values, or columns t the WHERE clause return a low percentage of rows of data t h from a table The effective use of indexes requires a thorough knowledge of table relationships, query and transaction requirements, and the data itself
b u t
Tuning the Database Design Index tuning (cont.)

When should indexes be avoided Indexes should not be used on small tables Indexes should not be used on columns that return a high / condition in a percentage of data rows when used as a k filter t query's WHERE clause . e update jobs run can Tables that have frequent, large batch b be indexed tu
Columns that are frequently manipulated should not be indexed

Maintenance on the index can become excessive
p t t be used on columns that contain a high Indexes shouldh not number of NULL values
However, the batch job's performance is slowed considerably by the index Might consider dropping the index before the batch job, and then recreating the index after the job has completed
/ / :
e s c
Page 31
Tuning the Database Design Materialized Views

Materialized views can help speed up certain queries
Particularly aggregate queries
Overheads
Space Time for view maintenance
maintenance is systems responsibility, not programmers

Avoids inconsistencies caused by errors in update programs
. e Immediate view maintenance: done u asb part of update t time overhead paid by update transaction e s only when required Deferred view maintenance: done c update transaction is not / affected, but system time is spent on view / maintenance : until updated, p the view may be out-of-date t t Preferable to denormalized schema since view h
/ k t
Page 32
Tuning the Database Design Materialized Views (cont.) How to choose set of materialized views
Helping one transaction type by introducing a / k materialized view may hurt others t . Choice of materialized views depends on costs be
u t Users often have no idea of actual cost of operations e s c Overall, manual selection of materialized views / / : is tedious p t t h Some database systems provide tools to
help DBA choose views to materialize
Materialized view selection wizards
Tuning of Transactions
Basic approaches to tuning of transactions
Improve set orientation Reduce lock contention
Rewriting of queries to improve performance was important / this less in the past, but smart optimizers have made k t . important e b Communication overhead and query handling overheads u t significant part of cost of each ecall
Combine multiple embedded SQL/ODBC/JDBC queries into a single set-oriented query
/ : Set orientation -> fewer calls to database p tthat computes total salary for each department using E.g. tune program t h query by instead using a single query that computes a separate SQL
total salaries for all department at once (using group by)
s c /
Use stored procedures: avoids re-parsing and re-optimization of query

Tuning of Transactions (Cont.)

Reducing lock contention Long transactions (typically read-only) that examine large parts of a relation result in lock / k t contention with update transactions .
e b E.g. large query to compute bank statistics and regular u t e bank transactions s c / To reduce contention / : p Use multi-version concurrency control t t E.g. Oracle which support multi-version 2PL hsnapshots
Use degree-two consistency (cursor-stability) for long transactions
Drawback: result may be approximate
Long update transactions cause several problems

Exhaust lock space Exhaust log space
Tuning of Transactions (Cont.)
/ k Use mini-batch transactions to limit number of updates that t e. if a single large a single transaction can carry out. b E.g., u transaction updates every record of a very large relation, log t e may grow too big. s c of ``mini-transactions,'' each * Split large transaction into/ batch / performing part of the : updates p Hold locks across transactions in a mini-batch to ensure serializability t If lock table size htis a problem can release locks, but at the cost of
serializability
and also greatly increase recovery time after a crash, and may even exhaust log space during recovery if recovery algorithm is badly designed!
* In case of failure during a mini-batch, must complete its remaining portion on recovery, to ensure atomicity.
Tuning of SQL Statements

Proper arrangement of tables in the FROM clause
List the smaller tables first and the larger tables last
Proper order of join conditions
Avoiding the HAVING clause Avoid large sort operation
u t The optimizers, in some cases,e seem to read from the bottom of the s WHERE clause up. Therefore, you would want to place the most c / restrictive condition last in the WHERE clause / : Rewriting the SQL statement using the IN predicate p t instead of the OR t h operator consistently
The join conditions should be in the first position(s) of the WHERE clause followed by the filter clause(s) The effective placement of the most restrictive condition in the query requires knowledge of how the optimizer operates
. e b
/ k t
Page 37
Performance Simulation
Performance simulation using queuing model useful to predict bottlenecks as well as the effects of tuning changes, even without access to real system Queuing model as we saw earlier
. usually omits some Simulation model is quite detailed, but e low level details ub c / / t e s
Models activities that go on in parallel
/ k t
Model service time, but disregard details of service E.g. approximate disk read time by using an average disk read time
Experiments can be run on model, and provide an estimate : p average throughput/response time of measures sucht as t h Parameters can be tuned in model and then replicated in real system
E.g. number of disks, memory, algorithms, etc
Performance Benchmarks
Suites of tasks used to quantify the performance of software systems / systems, k Important in comparing database t . e more especially as systems become b u t standards compliant. e s c Commonly used performance measures: //
transaction to return of result) Availability or mean time to failure
: p Throughput (transactions per second, or tps) t t h Response time (delay from submission of
Page 39
Performance Benchmarks (Cont.)

Suites of tasks used to characterize performance
single task not enough for complex systems
Beware when computing average throughput of different transaction types k/
E.g., suppose a system runs transaction type A at 99 tps and transaction type B at 1 tps. Given an equal mixture of types A and B, throughput is not (99+1)/2 = 50 tps. Running one transaction of each type takes time 1+.01 seconds, giving a throughput of 1.98 tps. To compute average throughput, use harmonic mean: n
p t t
/ / :
e s c
b u t
t . e
1/t1 + 1/t2 + + 1/tn
Interference (e.g. lock contention) makes even this incorrect if different transaction types run concurrently
Database Application Classes

Online transaction processing (OLTP)
requires high concurrency and clever techniques to speed up commit processing, to support a high rate of update transactions.
Decision support applications
u t e systems tuned to one of Architecture of some database s c the two classes / / : E.g. Teradata is tuned to decision support p t Others try to balance t the two requirements h E.g. Oracle, with snapshot support for long read-only transaction
including online analytical processing, or OLAP applications require good query evaluation algorithms and query optimization.
. e b
/ k t
Page 41
Benchmarks Suites
The Transaction Processing Council (TPC) benchmark suites are widely used.
/ TPC-A and TPC-B: simple OLTP application k t . modeling a bank teller application with and e b without communication tu / / TPC-C: complex OLTP application modeling an : p t inventory system ht
Current standard for OLTP benchmarking Not used anymore
e s c
Page 42
Benchmarks Suites (Cont.)

TPC benchmarks (cont.)
TPC-D: complex decision support application
Superceded by TPC-H and TPC-R
TPC-H: (H for ad hoc) based on TPC-D with some extra queries
. e Total of 22 queries with emphasis on aggregation b u prohibits materialized views t eand foreign keys permits indices only on primary s c as TPC-H, but without any / TPC-R: (R for reporting) same / views and indices : restrictions on materialized p t TPC-W: (W fort Web) End-to-end Web service benchmark h bookstore, with combination of static and modeling a Web
dynamically generated pages
Models ad hoc queries which are not known beforehand
/ k t
Page 43
TPC Performance Measures

TPC performance measures
transactions-per-second with specified constraints on response time transactions-per-second-per-dollar accounts for cost of owning system
. e bsizes to be scaled up TPC benchmark requires database u t with increasing transactions-per-second e s where more customers means reflects real world applications c / more database size and more transactions-per-second / : p External audit of TPC performance numbers mandatory t ht claims can be trusted TPC performance
/ k t
Page 44
TPC Performance Measures

Two types of tests for TPC-H and TPC-R
Power test: runs queries and updates sequentially, then takes mean to find queries per hour / k Throughput test: runs queries and updates t . e concurrently b
multiple streams running in parallel each generates queries, with one parallel update stream
Composite query / per hour metric: square root of : product of power and throughput metrics p t t Composite metric hprice/performance
s c /
u t e
Page 45

Database Tuning PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database Tuning PDF

Uploaded by

Copyright:

Available Formats

/ k t .

e Database Tuning b u t e s c / / Widagdo : Tricya p t Departemen Teknik Informatika ht

IF-ITB/TW/22 Dec'03 IF3211 Database Tuning http://csetube.weebly.com/

t task takes to response time: the time that a single . complete be / : p s c / u t e

t t throughput: hthe volume of work completed in a fixed

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

Factors Influencing Database Performance

Factors Influencing Database Performance (cont.)

u Many factors that need to be t optimized: e s SQL formulation c / Database parameters :/

Database Performance Definition

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

Transactions request a sequence of services

Queues In A Database System

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

Hardware Tuning: Five-Minute Rule

Cost of keeping page in memory

Break-even point: value of n for which above costs are equal

used more frequently than once in 5 minutes it should be kept in memory

Hardware Tuning: One-Minute Rule

To use RAID 1 or RAID 5?

Hardware Tuning: Choice of RAID Level

Depends on ratio of reads and writes

If an application requires r reads and w writes / per second

Tuning the Database Design Schema tuning

compute join very efficiently when required.

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

Tuning the Database Design Schema tuning (cont.)

Table splitting distributes data over the physical media

Tuning the Database Design Schema tuning (cont.)

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

Tuning the Database Design Schema tuning (cont.)

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

k account(account-number, brach-name,t balance) .

u branch-name) account-branch(account-number, t e balance) s account-balance(account-number, c

Tuning the Database Design Schema tuning (cont.) Denormalization

Tuning the Database Design Schema tuning (cont.)

Tuning the Database Design Schema tuning (cont.)

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

t h Requires maintenance of new columns

Tuning the Database Design Schema tuning (cont.)

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

Tuning the Database Design Schema tuning (cont.) Collapsing Tables

Tuning the Database Design Schema tuning (cont.) Duplicating Tables

Tuning the Database Design Schema tuning (cont.)

Tuning the Database Design Index tuning

Tuning the Database Design Index tuning (cont.)

Tuning the Database Design Index tuning (cont.)

Columns that are frequently manipulated should not be indexed

Tuning the Database Design Materialized Views

maintenance is systems responsibility, not programmers

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

Use stored procedures: avoids re-parsing and re-optimization of query

Tuning of Transactions (Cont.)

Long update transactions cause several problems

Tuning of Transactions (Cont.)

Tuning of SQL Statements

Proper order of join conditions

Avoiding the HAVING clause Avoid large sort operation

IF-ITB/TW/22 Des'03 IF3211 Database Tuning http://csetube.weebly.com/

Models activities that go on in parallel