You are on page 1of 12

SQL joins - multi-table queries

By Ian Gilfillan
Most of you should be familiar with basic queries, SELECTS, UPDATES and DELETES using one
table. But to harness the true power of relational databases it is vital to master queries using more
than one table. This article will introduce you to database joins - queries using 2 or more tables.
This article assumes you now how to perform basic !"# queries. $f you don%t, $ suggest you
read !imple !"#& 'etting !tarted (ith !"# first. ) warning though - !"# implementations are
notoriously non-standard, with almost every *BM! having its own e+tensions, as well as
e+clusions, especially when it gets to the realm of inner and outer joins, )nd version by version,
they change. !o, although most of these e+amples should wor with most implementations, they
don%t wor with all. The final word should come from the documentation of your particular
database installation.
#et%s do a quic recap for those who may be unsure. (e will perform queries on the table below,
containing data about tourist guides. The table is defined with&
CREATE TABLE tour_guides(
employee_number INT NT NULL!
employee_n"me #ARC$AR(%&&'!
(ourly_r"te INT!
PRI)AR* +E*(employee_number'',
To add records into this table, we use&
INSERT INT tour_guides(
employee_number!
employee_n"me!
(ourly_r"te'
#ALUES(-.%!/0olis1e 0"b"/!23'
INSERT INT tour_guides(
employee_number!
employee_n"me!
(ourly_r"te'
#ALUES(-4.!/Siy"bonge Nom5ete/!6&'
INSERT INT tour_guides(
employee_number!
employee_n"me!
(ourly_r"te'
#ALUES(-72!/8e"n9)"r: It(ier/!63'
INSERT INT tour_guides(
employee_number!
employee_n"me!
(ourly_r"te'
#ALUES(-26!/Tyrone Arendse/!62'
-ote that you can also do a shortcut INSERT statement, leaving out the fieldnames if the number
and order of the fields is the same, e.g&
INSERT INT tour_guides #ALUES(-.2!/)"t(e1 Boot(/!23'
$ don%t suggest using the shortcut however, as, particularly if you%re doing your !"# from within
an application, the table structure may change, and then the !"# may no longer be valid. .or
e+ample, $ may add another field, such as months_employed. -ow, the above INSERT statement
will not wor. /ou will get an error something lie
Column :ount doesn/t m"t:( 5"lue :ount "t ro1 %
$f you had written the statement as&
INSERT into tour_guides(
employee_number!
employee_n"me!
(ourly_r"te'
#ALUES(-.2!/)"t(e1 Boot(/!23'
it would have wored. .or this reason, $ suggest you don%t use the shortcut - it maes your
queries much less fle+ible, and less able to survive changes in table structure.
)fter the above, your table will loos as follows& -ow the table contains&
employee_number employee_name hourly_rate
012 !iyabonge -omvete 34
052 6ean-Marc $thier 37
023 Tyrone )rendse 32
028 9oliswe 9aba 27
022 Mathew Booth 27
To delete data from the table is also easy. :erhaps we have entered 9oliswe 9aba prematurely,
as he has not accepted our generous offer of employment, and taen another job. To remove him,
we use the statement&
DELETE ;R) tour_guides <$ERE employee_number=-.%
(e use the employee number, and not the employee name, as according to our table definition
the employee number is unique ;the primary ey<. $t is impossible for there to be another
employee 028, while, unliely as it may seem, there may, at least in theory, be another 9oliswe
9aba. This is why we almost always create a field that contains a unique code for each table.
#et%s assume we need to now how long someone has wored for us, so we add the
months_employed field taled about earlier. (e use&
ALTER TABLE tour_guides ADD mont(s_employed int
)fter the *=#=T=, and the creation of the new field, the table now loos as follows&
employee_number employee_name hourly_rate months_employed
012 !iyabonge -omvete 34 4
052 6ean-Marc $thier 37 4
023 Tyrone )rendse 32 4
022 Mathew Booth 27 4
!ee how all the months_employed values have defaulted to >ero. -umeric fields, unless you
specify otherwise, automatically default to >ero. To start adding the months_employed, we use
the ?:*)T= statement, as follows&
UPDATE tour_guides
SET mont(s_employed=>
<$ERE employee_number=-4.
UPDATE tour_guides
SET mont(s_employed=%2
<$ERE employee_number=-72
UPDATE tour_guides
SET mont(s_employed=>
<$ERE employee_number=-26
-ow our table contains the following&
employee_number employee_name hourly_rate months_employed
012 !iyabonge -omvete 34 @
052 6ean-Marc $thier 37 82
023 Tyrone )rendse 32 @
022 Mathew Booth 27 4
)ll of this putting data in, changing and taing out is of course of no use unless you can retrieve
the data. This is done with the !=#=AT statement. .or e+ample,
SELECT employee_n"me ;R) tour_guides
returns
employee_name
!iyabonge -omvete
6ean-Marc $thier
Tyrone )rendse
Mathew Booth
while
SELECT employee_n"me ;R) tour_guides <$ERE (ourly_r"te?6&
returns
employee_name
6ean-Marc $thier
Tyrone )rendse
$f all this !=#=AT stuff is a bit over your head, and you%re unsure of the powerful ways it can
limit and perform calculations on the data, $ suggest you first read a more introductory article,
such as !imple !"#& 'etting !tarted (ith !"#
The second table - Pae !
By Ian Gilfillan
)ll of this so far should be familiar to regular readers of !"#wire. -ow, we introduce the 2nd
table, that will show the real power of relational databases. .irst, a quic introduction to
relational databases. (hy are they given this nameB The answer comes from the fact that, unlie
earlier database structures ;hierarchical and networ<, relational databases allow potentially
every file, or table, to relate to every other one. They do this by using a common field. #et%s add
another couple of tables. ;.or more on database design, $ suggest you read the article on database
normali>ation.< (e add the following tables&
tour_locations
location_co
de
location_name
1024 Table Mountain
1025 Robben Island
1026
Kruger National
Park
1027 St Luia
tour_expeditions
location_co
de
employee_numb
er
hours_work
ed
tourgroup_si
ze
1024 !7" 5 "
1027 !42 " 4
1025 !2# # 20
1026 !"2 6 "
1024 !7" 5 "
1025 !7" # 16
$ assume by now you can do the CREATE and INSERT statements to populate the above tables.
-ow you should be able to see the reason for the term relational database. The way these tables
relate is by the common fields they have - tour expeditions joins to tour_guides through the field
employee_number, and to tour_locations though location_code. -ote that the field names do not
have to be the same in both tables, as long as their definitions are the same ;ie both int in this
case<. Try and mae as many fields as possible NT NULL ;fields where there cannot logically be
a NULL value<. .or e+ample, the fields location_code and employee_number in the table
tour_expeditions are good candidates. Mae them NT NULL now, and we%ll reap the benefits
later,
-ow comes the cru+. Cow would we answer the question D(hich employees wored in which
locationsBD. The secret here is to use the fields that relate in each table to join. #et%s first answer a
more simple question to introduce the concept. D(hich employeeEnumbers wored in which
locationsBD. (e would use the following query&
SELECT employee_number!lo:"tion_n"me
;R) tour_lo:"tions!tour_e@peditions
<$ERE tour_lo:"tionsAlo:"tion_:ode = tour_e@peditionsAlo:"tion_:ode
This returns&
employee_numb
er
location_name
!7" Table Mountain
!42 St Luia
!2# Robben Island
!"2
Kruger National
Park
!7" Table Mountain
!7" Robben Island
Cow did we get this queryB The first part, immediately after the SELECT lists the fields we want
to return. =asy enough - employee_numbers and location_name. The second part, after the ;R),
provides the tables that contain the fields. $n this case it%s clearly tour_locations for
location_name. But which table to choose for employee_numberB Both tour_expeditions and
tour_guides contain this field. Cere, we have to loo at which table is related to tour_location.
!ince tour_location is related only to tour_expedition ;through the location_code field<, we
could only use tour_expedition. )nd the third part, after the <$ERE clause, tells us which fields
the relation e+ists on, or are being joined.
The usual SELECT rules apply. To bring bac only the employeeEnumbers that gave a tour to
Table Mountain, and to bring bac only unique records ;notice that the above query brought bac
a duplicate value, as there are 2 records that apply<, we use&
SELECT DISTINCT employee_number!lo:"tion_n"me
;R) tour_lo:"tions!tour_e@peditions
<$ERE tour_lo:"tionsAlo:"tion_:ode = tour_e@peditionsAlo:"tion_:ode AND
lo:"tion_n"me=/T"ble )ount"in/
employee_numb
er
location_nam
e
!7"
Table
Mountain
is the only row returned. -ote how the DISTINCT eyword returns only one identical row.
(ithout it, we would have returned 2 identical rows, one for each time employee 012 gave a tour
to Table Mountain.
Cow now do we return the name of the employees, as we originally requested, not just their
numbers. To do so, we join the tour_expedition table to the tour_guide table, on the
employee_number field, as follows&
SELECT DISTINCT
tour_e@peditionsAemployee_number!employee_n"me!lo:"tion_n"me
;R)
tour_lo:"tions!tour_e@peditions!tour_guides
<$ERE tour_lo:"tionsAlo:"tion_:ode = tour_e@peditionsAlo:"tion_:ode AND
tour_e@peditionsAemployee_number=tour_guidesAemployee_number
This brings bac
employee_numb
er
employee_name location_name
!7"
Si$abonge
No%&ete
Table Mountain
!2# T$rone 'rendse Robben Island
!7"
Si$abonge
No%&ete
Robben Island
!"2 Mat(e) *oot(
Kruger National
Park
!42 +ean,Mar It(ier St Luia
-ote the changes we made to our original join. (e%ve added employee_name to the fields
returned, tour_guides to the table list, and we%ve had to add the name of the table to the
employee_number field of tour_expeditions, maing it tour_e@peditionsAemployee_number
;now that there are 2 tables returning the employee_number, we need to specify which table to
use.< $n this case it maes no difference, but in others it may. )nd finally, we%ve added a join
condition to the <$ERE clause, pointing out the relation to use to join the 2 tables.
Go to page: Pre& 1 2 # 4 5 Ne-t
Left joins - Pae "
By Ian Gilfillan
#et%s add another record to the tourEguide table.
INSERT into tour_guides(
employee_number!
employee_n"me'
#ALUES(-.6!/Nelson 8ir"ng"/',
-ow run the query again&
SELECT DISTINCT
tour_e@peditionsAemployee_number!
employee_n"me!lo:"tion_n"me
;R) tour_lo:"tions!
tour_e@peditions!
tour_guides
<$ERE tour_lo:"tionsAlo:"tion_:ode = tour_e@peditionsAlo:"tion_:ode AND
tour_e@peditionsAemployee_number=tour_guidesAemployee_number
(e get identical results
employee_number employee_name location_name
012 !iyabonge -omvete Table Mountain
023 Tyrone )rendse Fobben $sland
012 !iyabonge -omvete Fobben $sland
022 Mathew Booth Gruger -ational :ar
052 6ean-Marc $thier !t #ucia
This maes sense, as our new tour guide has not yet undertaen any tours. Ce does not yet
appear in the tourEe+peditions table, and so the join does not wor, as there is nothing in
tourEe+peditions to join to.
But what if we want all the employees bac, regardless of whether they have undertaen a tour
or notB (e need to e+plicitly state this, and we do so using a LE;T 8IN ;also called a LE;T
UTER 8IN<. To introduce the concept, try the following query&
sele:t DISTINCT employee_n"me
Brom tour_guides
LE;T 8IN tour_e@peditions N
tour_guidesAemployee_number = tour_e@peditionsAemployee_number
This returns&
employee_name
!iyabonge -omvete
6ean-Marc $thier
Tyrone )rendse
Mathew Booth
-elson 6iranga
-ote the synta+ is almost the same, e+cept that the table names are separated by LE;T 8IN, not
a comma, and N is used for the fields to be joined, rather than <$ERE.
!o, going bac to our original question - how do we return the employee numbers, names and
locations of all guides, including those who have not yet given a tour. The query is as follows&
SELECT DISTINCT
tour_guidesAemployee_number!employee_n"me!lo:"tion_n"me
;R) tour_guides
LE;T 8IN tour_e@peditions N
tour_guidesAemployee_number = tour_e@peditionsAemployee_number
LE;T 8IN tour_lo:"tions N
tour_lo:"tionsAlo:"tion_:ode=tour_e@peditionsAlo:"tion_:ode
This now returns&
employee_number employee_name location_name
012 !iyabonge -omvete Table Mountain
023 Tyrone )rendse Fobben $sland
012 !iyabonge -omvete Fobben $sland
022 Mathew Booth Gruger -ational :ar
052 6ean-Marc $thier !t #ucia
023 -elson 6iranga -?##
-ote that -elson 6iranga now appears in the results table, and as he has not yet lead any tours,
that field is NULL.
#e$ritin subselects as joins - Pae %
By Ian Gilfillan
Many e+isting queries mae use of what are called subselects ;selects within selects<. .or
e+ample, try the following query, which returns all employees who%ve wored with a tourgroup
of more than 84 people&
SELECT employee_n"me
;R) tour_guides
<$ERE employee_number IN
(SELECT employee_n"me
;R) tour_e@peditions
<$ERE tourgroup_siCe?%&'
The results are
employee_name
!iyabonge -omvete
Mathew Booth
This query resolves in 2 steps - first the inner query ;which returns 023 and 012< is resolved.
Then we are left with
SELECT employee_n"me
;R) tour_guides
<$ERE employee_number IN
(-26!-4.'
which resolves to the results above. But $%ve just demonstrated another way to do this, and which
is usually a better way - the join. /ou can rewrite this query as&
SELECT employee_n"me
;R) tour_guides!tour_e@peditions
<$ERE tourgroup_siCe?%& AND
tour_guidesAemployee_number=tour_e@peditionsAemployee_number
(hy do $ say this is betterB 2 reasons. Hne is that many *BM!%s ;such as early versions of
My!"#< do not support nested selects. )nd the second reason is that more often they can be
rewritten as a join, and the join is usually more efficient. Hn those big, heavily used tables,
where performance is vital, you will want to do without nested selects as much as possible.
#et%s tae another e+ample. Cow could we find all tourEguides who have not yet given a tourB
(e could write
SELECT employee_n"me
;R) tour_guides
<$ERE employee_number NT IN
(SELECT employee_number
;R) tour_e@peditions'
)nd this would return
employee_name
-elson 6iranga
But, using the same principle as before, we could rewrite this as a join, in this case a LE;T 8IN
;remembering that LE;T 8INS return values that are not present<. Try the following&
SELECT employee_n"me
;R) tour_guides
LE;T 8IN tour_e@peditions N
tour_guidesAemployee_number = tour_e@peditionsAemployee_number
<$ERE tour_e@peditionsAemployee_number IS NULL
-ow we see an advantage of declaring employee_number NT NULL. $t allows us to use this ind
of query, which is often more efficient than the nested select, and it also saves space ;the *BM!
does not have to waste space telling if the field is NULL or not--by definition it%s not<
Self joins - Pae &
By Ian Gilfillan
.or this e+ercise, we first need to INSERT another tour guide, as follows&
INSERT INT tour_guides
#ALUES(/---/!/8on D1el"ne/!6&!3'
-ow consider another request. (e want to find the names of all the employees who have the
same hourly rate as !iyabonge -omvete. )gain, we can do this with a nested select&
SELECT employee_n"me
;R) tour_guides
<$ERE (ourly_r"te IN
(sele:t (ourly_r"te
Brom tour_guides
1(ere employee_n"me=/Siy"bonge Nom5ete/'
But again, a join is preferable. $n this case it will be a self-join, as all the data that we need is in
the one table--tour_guides. !o, we could use the following, more efficient, query&
SELECT e%Aemployee_n"me
;R) tour_guides e%!tour_guides e2
<$ERE e%A(ourly_r"te=e2A(ourly_r"te AND
e2Aemployee_n"me=/Siy"bonge Nom5ete/
This returns&
employee_name
!iyabonge -omvete
6on "welane
There are a few important points to notice here. (e could not have used the query, as some of
you may have thought,
SELECT employee_n"me
;R) tour_guides
<$ERE employee_number=employee_number AND
employee_n"me =/Siy"bonge Nom5ete/
The reason is that we need to see the table as two separate tables to be joined. This query only
returns D!iyabonge -omveteD, satisfying the final condition. $n order to mae the *BM! see the
query as a join, we need to provide an alias for the tables. (e give them the names e8 and e2.
)lso important is why we use e1 in SELECT e%Aemployee_n"me and e2 in
e2Aemployee_n"me=/Siy"nbonge Nom5ete/. These 2 have to come from the 2 %different%
versions of the table. $f we chose the employeeEname from the same table that we impose the
condition& <$ERE employee_n"me=/Siy"bonge Nom5ete/, of course we%re on%y going to get
that one result bac.
'ood luc with your joins--remember to eep it as simple as possible for your *BM!, as few as
possible nested selects, and you%ll see the benefits in your applications,

You might also like