You are on page 1of 15

Cloud Computing Security 101: Learn how to keep your users safe.

• RECOMMENDED

• MOST POPULAR

• 1 Still Think Linux Is Just for Start-Ups?

• 2 10 Reasons Why SSDs Are Better Than Mechanical Disks

• 3 5 Strategies for Replacing Your Apple Xserves

• 4 Active Directory Quick Start Set Up Guide

• 5 5 Reasons Nokia Won't Save Microsoft's Mobile OS

How to Optimize Queries (Theory an Practice)


By (Send Email)

August 7, 2001

This article assumes you already know SQL and want to optimize queries.

This article is valid for any SQL-92 and up database Queries, it is also helpful for optimizing non-database oriented programs.

Top of Form
Server Tech Daily http://w w w .serv

Enter Your Email Add

Bottom of Form

The reasons to optimize


Time is money and people don't like to wait so programs are expected to be fast.

In Internet time and client/server programming, it's even more true because suddenly a lot of people are waiting for the DB to give them an

answer which makes response time even longer.


Even if you use faster servers, this has been proven to be a small factor compared to the speed of the algorithm used. Therefore, the solution lies

in optimization.

Theory of optimization
There are many ways to optimize Databases and queries. My method is the following.

Look at the DB Schema and see if it makes sense

Most often, Databases have bad designs and are not normalized. This can greatly affect the speed of your Database. As a general case, learn the 3

Normal Forms and apply them at all times. The normal forms above 3rd Normal Form are often called de-normalization forms but what this really

means is that they break some rules to make the Database faster.

What I suggest is to stick to the 3rd normal form except if you are a DBA (which means you know subsequent forms and know what you're

doing). Normalization after the 3rd NF is often done at a later time, not during design.

Only query what you really need

Filter as much as possible

Your Where Clause is the most important part for optimization.

Select only the fields you need

Never use "Select *" -- Specify only the fields you need; it will be faster and will use less bandwidth.

Be careful with joins


Joins are expensive in terms of time. Make sure that you use all the keys that relate the two tables together and don't join to unused

tables -- always try to join on indexed fields. The join type is important as well (INNER, OUTER,... ).

Optimize queries and stored procedures (Most Run First)

Queries are very fast. Generally, you can retrieve many records in less than a second, even with joins, sorting and calculations. As a rule of

thumb, if your query is longer than a second, you can probably optimize it.

Start with the Queries that are most often used as well as the Queries that take the most time to execute.

Add, remove or modify indexes

If your query does Full Table Scans, indexes and proper filtering can solve what is normally a very time-consuming process. All primary keys

need indexes because they makes joins faster. This also means that all tables need a primary key. You can also add indexes on fields you often

use for filtering in the Where Clauses.

You especially want to use Indexes on Integers, Booleans, and Numbers. On the other hand, you probably don't want to use indexes on Blobs,

VarChars and Long Strings.

Be careful with adding indexes because they need to be maintained by the database. If you do many updates on that field, maintaining indexes

might take more time than it saves.

In the Internet world, read-only tables are very common. When a table is read-only, you can add indexes with less negative impact because

indexes don't need to be maintained (or only rarely need maintenance).


Move Queries to Stored Procedures (SP)

Stored Procedures are usually better and faster than queries for the following reasons:

1. Stored Procedures are compiled (SQL Code is not), making them faster than SQL code.

2. SPs don't use as much bandwidth because you can do many queries in one SP. SPs also stay on the server until the final

results are returned.

3. Stored Procedures are run on the server, which is typically faster.

4. Calculations in code (VB, Java, C++, ...) are not as fast as SP in most cases.

5. It keeps your DB access code separate from your presentation layer, which makes it easier to maintain (3 tiers model).

Remove unneeded Views

Views are a special type of Query -- they are not tables. They are logical and not physical so every time you run select * from MyView, you run

the query that makes the view and your query on the view.

If you always need the same information, views could be good.

If you have to filter the View, it's like running a query on a query -- it's slower.

Tune DB settings

You can tune the DB in many ways. Update statistics used by the optimizer, run optimization options, make the DB read-only, etc... That takes a

broader knowledge of the DB you work with and is mostly done by the DBA.

Using Query Analysers

In many Databases, there is a tool for running and optimizing queries. SQL Server has a tool called the Query Analyser, which is very useful for

optimizing. You can write queries, execute them and, more importantly, see the execution plan. You use the execution to understand what SQL

Server does with your query.


Optimization in Practice

Example 1:

I want to retrieve the name and salary of the employees of the R&D department.

Original:

Query : Select * From Employees

In Program : Add a filter on Dept or use command : if Dept = R&D--

Corrected :

Select Name, Salary From Employees Where Dept = R&D--

In the corrected version, the DB filters data because it filters faster than the program.

Also, you only need the Name and Salary, so only ask for that.

The data that travels on the network will be much smaller, and therefore your performances will improve.

Example 2 (Sorting):
Original:

Select Name, Salary

From Employees

Where Dept = 'R&D'

Order By Salary

Do you need that Order By Clause? Often, people use Order By in development to make sure returned data are ok; remove it if you don't need it.

If you need to sort the data, do it in the query, not in the program.

Example 3:

Original:

For i = 1 to 2000

Call Query : Select salary From Employees Where EmpID = Parameter(i)

Corrected:

Select salary From Employees Where EmpID >= 1 and EmpID <= 2000

The original Query involves a lot of network bandwidth and will make your whole system slow.
You should do as much as possible in the Query or Stored Procedure. Going back and forth is plain stupid.

Although this example seems simple, there are more complex examples on that theme.

Sometimes, the processing is so great that you think it's better to do it in the code but it's probably not.

Sometimes, your Stored Procedure will be better off creating a temporary table, inserting data in it and returning it than going back and forth

10,000 times. You might have a slower query that saves time on a greater number of records or that saves bandwidth.

Example 4 (Weak Joins):

You have two tables Orders and Customers. Customers can have many orders.

Original:

Select O.ItemPrice, C.Name

From Orders O, Customers C

Corrected:

Select O.ItemPrice, C.Name

From Orders O, Customers C

Where O.CustomerID = C.CustomerID


In that case, the join was not there at all or was not there on all keys. That would return so many records that your query might take hours. It's a

common mistake for beginners.

Corrected 2:

Depending on the DB you use, you will need to specify the Join type you want in different ways.

In SQL Server, the query would need to be corrected to:

Select O.ItemPrice, C.Name

From Orders O INNER JOIN Customers C ON O.CustomerID = C.CustomerID

Choose the good join type (INNER, OUTER, LEFT, ...).

Note that in SQL Server, Microsoft suggests you use the joins like in the Corrected 2 instead of the joins in the Where Clause because it will be

more optimized.

Example 5 (Weak Filters):

This is a more complicated example, but it illustrates filtering at its best.

We have two tables -- Products (ProductID, DescID, Price) and Description(DescID, LanguageID, Text). There are 100,000 Products and

unfortunately we need them all.

There are 100 languages (LangID = 1 = English). We only want the English descriptions for the products.
We are expecting 100 000 Products (ProductName, Price).

First try:

Select D.Text As ProductName, P.Price

From Products P INNER JOIN Description D On P.DescID = D.DescID

Where D.LangID = 1

That works but it will be really slow because your DB needs to match 100,000 records with 10,000,000 records and then filter that Where LangID

= 1.

The solution is to filter On LangID = 1 before joining the tables.

Corrected:

Select D.Text As ProductName, P.Price

From (Select DescID, Text From Description Where D.LangID = 1) D

INNER JOIN Products P On D.DescID = P.DescID

Now, that will be much faster. You should also make that query a Stored Procedure to make it faster.

Example 6 (Views):
Create View v_Employees AS

Select * From Employees

Select * From v_Employees

This is just like running Select * From Employees twice.

You should not use the view in that case.

If you were to always use the data for employees of R&D and would not like to give the rights to everyone on that table because of salaries being

confidential, you could use a view like that:

Create View v_R&DEmployees AS

Select Name, Salary From Employees Where Dept = 1

(Dept 1 is R&D).

You would then give the rights to View v_R&DEmployees to some people and would restrict the rights to Employees table to the DBA only.

That would be a possibly good use of views.

Conclusion
I hope this will help you make your queries faster and your databases more optimized. This should make your program look better and can

possibly mean money, especially for high load web applications where it means your program can serve more transactions per hour and you often

get paid by transaction.

While you can put the above examples to practice in your database of choice, the preceding tips are especially true for major Databases like

Oracle or SQL Server.

115.248.98.27

37 Comments (click to add your comment)

Comment Page:
1

By robin March 20 2009 3:49 AMPDT


Hi frn

i have 2 query

SQL1 : select intake from class ;

SQL 2 : select Intake from class where intake = SQL1.intake

how can i combine these two also it shouldnt take lot of time for execution ...

thnaks in advance..

this is just an sample to get an idea for me,...

Reply to this comment


Reply by prashant July 15 2009 3:58 PMPDT
hint: do a self join.

Reply by Ugo Conti January 9 2011 2:24 PMPDT


use exist, if u only want to valid the key

By Ramakrishna March 26 2009 10:19 AMPDT


Thanks for the above information and that information is good . Can you show some more examples.

Reply to this comment

By Beginer April 30 2009 8:51 AMPDT


Select t1.intake from class t1 inner join class t2 on t1.intake = t2.intake

--(and t1.rows = t2.rows)

Reply to this comment

By Hemalatha May 18 2009 3:49 AMPDT


Thanks for the above information. It helped me a lot..Great work..Plz keep on updating the information to help us more.

Reply to this comment

By Siva May 22 2009 2:57 AMPDT


Hi, this is very nice article. its very useful for me for optimizing queries. thank u so much. keep on giving..... cheers, Siva

Reply to this comment

By anonymous May 25 2009 5:19 AMPDT


This is a good article..
Reply to this comment

By anonymous June 5 2009 5:30 AMPDT


Good article

Reply to this comment

By sajan July 15 2009 2:23 AMPDT


wonderfull articile......

my bow to the author

Reply to this comment

By fernet July 17 2009 3:23 AMPDT


To be honest, only example 5 help me, but thanks for that ;)

Reply to this comment

By Peeyush August 7 2009 4:59 AMPDT


Nice simple code/examples. Thanks

Reply to this comment

By hana September 2 2009 12:36 PMPDT


Very helpful, thanks.

Reply to this comment

By Rajesh September 16 2009 4:39 AMPDT


Nice Example. Can we ahve more examples? Thanks.

Reply to this comment


By bob ama October 6 2009 4:42 AMPDT
"Even if you use faster servers, this has been proven to be a small factor compared to the speed of the algorithm used. Therefore,

the solution lies in optimization."

Utter garbage

Just adding more memory normally solves numerous problems and is a lot cheaper than spending time optimising something

Reply to this comment

By hardit October 9 2009 3:18 AMPDT


nice xplanation..........................

good

Reply to this comment

By sash October 16 2009 3:00 AMPDT


hi guys,

how can i optimize this query?

select count(*) from SMIS.APERS P, SMIS.ASUBJECT S

where P.apnum =S.apnum;

thank you very much my graduation depends on this!

Reply to this comment

By this is a good artical November 25 2009 8:13 AMPDT


This is good article for beginners;
Reply to this comment

By By pravin Gotmare November 25 2009 8:15 AMPDT


This is good article for beginners as well as experience people too.

Reply to this comment

By Jai January 11 2010 12:30 PMPDT


This article is simply superb and good for everyone to learn.Thanks 2 the author.

Reply to this comment

By newbee January 12 2010 4:53 PMPDT


select * from vw_TERR_ACTIVITY join

S_OPTY on (vw_TERR_ACTIVITY.OPTY_ID = S_OPTY.ROW_ID) join

S_REVN on S_REVN.OPTY_ID = S_OPTY.ROW_ID left join

S_PROD_INT on S_REVN.PROD_ID = S_PROD_INT.ROW_ID join

Interface.dbo.VIASYS_REPORTS on (Interface.dbo.VIASYS_REPORTS.PROD_NAME = S_PROD_INT.NAME)

where Interface.dbo.VIASYS_REPORTS.REPORT_ID = '16'

You might also like