You are on page 1of 9

Execution Plan

When SQL Server executes a query, it will use an execution plan if one already exists. If
there is no preexisting plan, SQL Server follows a series of steps to create a plan. The
objective of this is to find an acceptable plan, not necessarily the optimal plan. The effort to
find the best possible plan might take longer than to execute an acceptable, but not
optimal plan. To come to this plan, the following steps are taken by the Query Optimizer.
Each step might produce one or more plans. Each plan is assigned a total cost for
execution, and SQL Server uses the plan with the lowest costs.

• First, SQL Server checks if a trivial plan is available. If so, it will go ahead with this
plan. For example, with an INSERT statement using the VALUES clause, there is
only one possible execution.

• If no trivial plan is available, SQL will try to simplify the query, so that a trivial plan
will be available. This will not result in a new plan, but helps SQL to analyze the
query. At this point SQL server will load any statistics that will help it in the cost-
based process that follows. This cost-based process has three steps.
o First is the transaction processing phase. In this phase, SQL Server picks
out plans for simple queries typical of transaction processing databases. If
a plan is formed with a total cost below a threshold, it will use this plan.

o If a cheap plan can not be found, the Quick Plan phase starts. In this
phase, SQL includes choices that are often useful in more complex
queries. This might include use of indexes and nested loop joins. Again, if
a cheap enough plan is found, SQL Server will use this plan.

o The last phase is the Full Optimization Phase. In this phase, SQL Server
compares every possible execution plan and then goes with the cheapest
one. This phase has a time limit. When the time limit is reached, SQL
Server goes ahead with the cheapest plan found up until that moment.

Statistics

This process -- except for the trivial plan -- is based solely on statistics. Bad statistics lead
to bad execution plans. If an index is placed on a small table, the optimal plan might be to
ignore the index and use a table scan. If the statistics are not updated while the table
grows significantly, SQL Server will still assume that the table is small and use the table
scan, even if this is no longer the optimal plan.

In most cases it is wise to let SQL Server automatically update statistics. To turn on the
automatic updates for all statistics in a database, if not already on, execute the following
statement:

USE master
EXEC sp_dboption 'MyDatabase', ' auto update statistics', 'true'

To turn on the automatic updates for all statistics on a specific table, such as Clients,
execute the following statement:

USE MyDatabase
EXEC sp_autostats Clients, 'ON'
To manually update the statistics on specific table, such as Clients, execute the following
command:

USE MyDatabase
UPDATE STATISTICS Clients

Other options for updating statistics can be found on this website.

How Statistics Work

Keeping statistics up to date is crucial for optimal performance of a database. In some


cases, however, an available execution plan is not ideal for the given situation. To
understand when and why this happens, it is important to understand how statistics work.

The following command displays the statistics for a given index (index IX1a on table
Clients)

USE MyDatabase
DBCC SHOW_STATISTICS (Clients, IX1a)

This gives all statistical information on this index. Among the date and time of the last
update and it's density, it returns samples of the index. This sample is taken at random
from the database and gives the following information:
Figure 1: DBCC SHOW_STATISTICS output

The last part of this output is important. On the basis of these records, SQL determines the
expected number of hits for a search parameter. The output has the following entries:

RANGE_HI_KEY: The upper value of the histogram step. For example, 'Smit'

RANGE_ROWS: The number of rows of the histogram step, excluding the occurrences of
the upper bound. For example, there are 66,811 rows between 'Smit' and 'Snijders'.

EQ_ROWS: The number of rows equal to the value of the upper bound of the histogram
step. For example, there are 48,760 rows with the value 'Smit'.

DISTINCT_RANGE_ROWS: The number of distinct values in the histogram step,


excluding the upper bound. For example, there are 462 distinct values between 'Smit' and
'Snijders' (not including either).

AVG_RANGE_ROWS: The average number of occurrences for each unique value in the
histogram step. This is calculated as RANGE_ROWS / DISTINCT_RANGE_ROWS. For
example, between 'Smit' and 'Snijders' are 66811 rows with 462 distinct values, resulting in
an expected 144.61255 rows for unique value between 'Smit' and 'Snijders'.

The Estimated Execution Plan for a search for 'Smits' (between 'Smit' and 'Snijders')
confirms this number:
Figure 2: Estimated Execution Plan - estimated row count 'Smits'

On the basis of this estimate, the following steps in an execution plan will be determined.

If a search value is matched in the statistics table, the value for the EQ_ROWS will be the
estimated row count, as can be seen in the case for searching for 'Smit':
Figure 3: Estimated Execution Plan - estimated row count 'Smit'

Variance

In the case of last names, there is a very large difference in the number of occurrences per
distinct value. In our database, we host approximately 18 million records with about
500.000 different last names. This gives an average of 36 records per name. In fact, half of
these names only occur once. Another 100.000 names occur less than 10 times. The
frequently used names occur up to 95,000 times. The variance is extremely high. In cases
of such extremely high variance, the use of statistics in this manner is not very accurate.

If a second indexed search parameter is provided, SQL Server might decide not to use this
index, but to use a filter to extract the second parameter. When in practice the actual row
count is much higher, this plan is less ideal and might lead to extreme query times.

The entry for Smit is found directly in the statistics. Therefore the estimated row count
equals the actual row count of 48,760. The same applies to the street name 'Kerkstraat'
with a rowcount of 42,722. If a search is made for last name 'Smit' and street name
'Kerkstraat' the following plan is used:
Figure 4: Execution Plan - Search for Smit and Kerkstraat

As can be seen, optimal use is made of the two available indexes. If the same query is
made for 'Smits', where the estimated row count is 144 (as calculated above) and an
actual row count of 25,718 (a given) and 'Kerkstraat', the following plan is used:
Figure 5: Execution Plan - Search for Smits and Kerkstraat

Because SQL Server expects 144 rows, the effort of filtering the records for the second
parameter is predicted to be faster than to use two indexes and cross referencing these.
However, the actual row count is much higher, causing this plan to be far from ideal.

Even though the number of occurrences for 'Smits' is approximately half of the number for
'Smit', this query takes three and a half minutes to execute while the query for Smit only
takes about 1 second.

Work Around

So how do you solve this problem? Statistics can not be expanded. They are designed to
be kept small, so that the process of determining an execution plan is kept small. Even if
the statistics could be doubled in size, this problem will still arise.

The option of using an index hint will only be helpful if used in all the queries with a search
parameter in this field. Adding an index hint is a manual job, requiring knowledge of the
dataset. A human will encounter the same difficulties in determining a best use of indexes
as SQL Server itself. Also, automating the use of index hints in a production environment is
time-consuming and far from efficient.
In these situations, a work around is the best option. By changing the specific index into a
clustered index, a bookmark lookup is no longer necessary. This saves considerable time
and effort. If the statistics are off and the number of hits exceeds the expected number, all
relevant records are clustered together and require little extra time.

In the above examples, if the index on gnaam is a clustered index, the execution plan is as
follows:

Figure 6: Execution Plan - Search for Smits and Kerkstraat with Clustered Index

This query takes 7 seconds to execute. A significant improvement.

Index Planning

The effect of changing currently existing clustered index into a non-clustered index should
be examined. There can only be one clustered index on a table. In general, the difference
in performance for a field containing unique values is minimal between a clustered and a
non-clustered index. If, however, this field is used in a one to many relationships, the
general performance for the overall queries to the database might decline.
It is wise to take all these aspects into consideration when designing the index structure for
your databases. It is often worth the effort to try different configurations and go with the
one best suitable. Of course using the Index Tuning Wizard will help in this process.