Professional Documents
Culture Documents
DatabaseDesignTheory
Guidessystematicimprovementstodatabaseschemas
Generalidea:
DesignTheoryforRelationalDBs:
FunctionalDependencies,
Decompositions,NormalForms
Expressconstraintsonthedata
Usethesetodecomposetherelations
Ultimately,getaschemathatisinanormalformthat
guaranteescertaindesirableproperties
Normalinthesenseofconformingtoastandard
Theprocessofconvertingaschematoanormalformiscalled
normalization
Introductiontodatabases
CSCC43Winter2012
RyanJohnson
ThankstoManosPapagelis,JohnMylopoulos,ArnoldRosenbloom
andReneeMillerformaterialintheseslides
Goal#1:removeredundancy
Considerthisschema
StudentName
Xiao
Xiao
Jaspreet
StudentEmail
xiao@gmail
xiao@gmail
jaspreet@utsc
Course
CSCC43
CSCD08
CSCC43
Instructor
Johnson
Bretscher
Johnson
Whatif
Xiaochangesemailaddresses?(updateanomaly)
XiaodropsCSCD08?(deletionanomaly)
UTSCcreatesanewcourse,CSCC44(insertionanomaly)
Multiplerelations=>exponentiallyworse
Goal#2:expressingconstraints
Considerthefollowingsetsofschemas:
Students(utorid,name,email)
vs.
Students(utorid,name)
Emails(utorid,address)
Consideralso:
House(street,city,value,owner,propertyTax)
vs.
House(street,city,value,owner)
TaxRates(city,value,propertyTax)
Dependencies,constraintsaredomaindependent
Functionaldependencies
LetX,YbesetsofattributesfromrelationR
X>YisanassertionabouttuplesinR
AnytupleswhichagreeinallattributesofXmustalsoagreeinall
attributesofY
PartI:
FunctionalDependencies
XfunctionallydeterminesY
Or,ThevaluesofattributesYareafunctionofthoseinX
Notnecessarilyaneasyfunctiontocompute,mindyou
=>ConsiderX>h,wherehisthehashofattributesinX
Notationalconventions
a,b,c specificattributes
A,B,C setsof(unnamed)attributes
abc>def sameas{a,b,c}>{d,e,f}
Mostcommontoseesingletons (X>yorabc>d)
FD:relaxestheconceptofakey
Functionaldependency:X >Y
Superkey:X >R
Asuperkeymustincludeallremainingattributes
oftherelationontheRHS
AnFDcaninvolvejustasubsetofthem
Example:
RulesandprinciplesaboutFDs
Rules
Thesplitting/combiningrule
TrivialFDs
Thetransitiverule
AlgorithmsrelatedtoFDs
theclosureofasetofattributesofarelation
aminimalbasisofarelation
Houses(street,city,value,owner,tax)
street,city>value,owner,tax(bothFDandkey)
city,value>tax(FDonly)
TheSplitting/CombiningruleofFDs
Attributesonrightindependentofeachother
SplittingFDs example
10
ConsidertherelationandFD
Considera,b,c>d,e,f
Attributesa,b,andcfunctionallydetermined,e,andf
=>Nomentionofdrelatingtoeorfdirectly
EmailAddress(user,domain,firstName,lastName)
user,domain>firstName,lastName
Thefollowinghold
Splittingrule(UsefultosplituprightsideofFD)
user,domain>firstName
user,domain>lastName
Nosafewaytosplitleftside
ThefollowingdoNOThold!
user>firstName,lastName
domain>firstName,lastName
Combiningrule(Usefultocombinerightsides):
ifabc>d,abc>e,abc>fholds,thenabc>defholds
Gotcha:doesnthold=notalltuples!=alltuplesnot
TrivialFDs
Notallfunctionaldependenciesareuseful
A>Aalwaysholds
abc>aalsoalwaysholds(rightsideissubsetofleftside)
FDwithanattributeonbothsidesistrivial
11
Transitiverule
12
ThetransitiveruleholdsforFDs
ConsidertheFDs:a>b and b>c;thena>cholds
ConsidertheFDs:ad>b and b>cd;thenad>cdholdsor
justad>c (becauseofthetrivialdependencyrule)
SimplifybyremovingL RfromR
abc>adbecomesabc>d
Or,insingletonform,deletetrivialFDs
abc>aandabc>dbecomesjustabc>d
Identifyingfunctionaldependencies
13
FDsaredomainknowledge
Intrinsicfeaturesofthedatayouredealingwith
Somethingyouknow(orassume)aboutthedata
DatabaseenginecannotidentifyFDsforyou
Designermustspecifythemaspartofschema
DBMScanonlyenforceFDswhentoldto
14
CoincidenceorFD?
ID
City
Country
Surname
1983
tom@gmail.com
Toronto
Canada
Fairgrieve
8624
mar@bell.com
London
Canada
Samways
9141
scotty@gmail.com
Winnipeg
Canada
Samways
1204
birds@gmail.com
Aachen
Germany
Lakemeyer
WhatifwetrytoinferFDsfromthedata?
DBMScannotsafelyoptimizeFDseither
Ithasonlyafinitesampleofthedata
AnFDconstrainstheentiredomain
ID>email,city,country,surname
email>city,country,surname
city>country
surname>country
DomainknowledgerequiredtovalidateFDs
KeysandFDs
15
ConsiderrelationRwithattributesA
Superkey
AnyS As.t.S>A
=>AnysubsetofAwhichdeterminesallremainingattributesinA
Candidatekey(orkey)
C As.t.C>AandX>AdoesnotholdforanyX C
=>Asuperkeywhichcontainsnoothersuperkeys
=>Removeanyattributeandyounolongerhaveakey
Primarykey
Thecandidatekeyweusetoidentifytherelation
=>Alwaysexists,onlyoneallowed,doesntmatterwhichCweuse
Candidatekeysvs.superkeys
16
Considertheserelations
Students(ID,surname,name,email,address,major)
Houses(street,city,value,owner,tax)
Whatarethecandidatekeys?
Students:ID,whatelse?
Houses:?
Whatothersuperkeysexist?
Students:ID,surnameID,nameID,name,surname
Houses:?
Primeattributes?
Students:?
Houses:?
Primeattribute
candidatekeyCs.t.x C(attributethatparticipatesinatleastonekey)
Cyclicfunctionaldependencies?
17
AttributesonrightsideofoneFDmayappear
onleftsideofanother!
LetDbethedomainoftuplesinR
EverypossibletupleisapointinD
FDXonRrestrictstuplesinRtoasubsetofD
Simpleexample:assumerelation(A,B)&FDs:A>B,B>A
WhatdoesthissayaboutAandB?
PointsinDwhichviolateXcannotbeinR
Example
Example:D(x,y,z)
studentID>emailemail>studentID
Inferringfunctionaldependencies
Problem
GivenFDsX1 >a1,X2 >a2,etc.
DoessomeFDY>B(notgiven)alsohold?
Considerthedependencies
A>BB>C
Intuitively,A>Calsoholds
ThegivenFDsentail(imply)it(transitivityrule)
18
GeometricviewofFDs
xy>z
=>z=abs(x)+abs(y)
z>x,y
=>x=y=abs(z)/2
19
(1,1,0)
(1,1,2)
(0,0,1)
(1,1,2)
(0,0,0) (2,2,4)
(2,2,4)
(1,1,2)
(3,2,1)
(1,2,3)
(1,1,2)
ClosuretestforFDs
20
GivenattributesetAandFDsetF
DenoteAF+ astheclosureofArelativetoF
=>AF+ =setofallFDsgivenorimpliedbyA
Computingthe[transitive]closureofA
Start:AF+ =A,F=F
WhileX Fs.t.LHS(X) AF+:
AF+ =AF+ U RHS(X)
F=F X
Atend:A>BB AF+
Howtoproveitinthegeneralcase?
21
Closuretest example
ConsiderR(a,b,c,d,e,f)
withFDsab>c,ac>d,c>e,ade>f
FindA+ ifA=aborfind{a,b}+
a b
d e
a b
d e
Example:ClosureTest
F:AB>C
A> D
D> E
AC> B
22
XXF+
A{A,D,E}
AB{A,B,C,D,E}
AC{A,C,B,D,E}
B{B}
D{D,E}
IsAB> EentailedbyF?Yes
IsD>CentailedbyF?No
a b
d e
a b
d e
Result:XF+allowsustodetermineallFDsoftheform
X> Y entailedby F
DiscardingredundantFDs
Minimalbasis:oppositeextremefromclosure
GivenasetofFDsF,wanttominimizeFs.t.
F F
FentailsXXF
PropertiesofaminimalbasisF
RHSisalwayssingleton
IfanyFDisremovedfromF,Fisnolongeraminimalbasis
IfforanyFDinFweremoveoneormoreattributesfrom
theLHSofF,theresultisnolongeraminimalbasis
23
Constructingaminimalbasis
24
Straightforwardbuttimeconsuming
1. SplitallRHSintosingletons
2. X F,testwhetherJ=(FX)+ isstillequivalenttoF+
=>MightmakeFtoosmall
3. i LHS(X)X F,letLHS(X)=LHS(X)i
Testwhether(FX+X)+ isstillequivalenttoF+
=>MightmakeFtoobig
4. Repeat(2)and(3)untilneithermakesprogress
MinimalBasis:Example
25
MinimalBasis:Example(cont.)
26
1st Step
RelationR:R(A,B,C,D)
DefinedFDs:
H={A>A,A>C,B>A,B>B,B>C,D>A,D>B,D>C}
2nd Step
F={A>AC,B>ABC,D>ABC}
FindtheminimalBasisMofF
A>A,B>B:can beremovedastrivial
A>C:cant beremoved,asthereisnootherLHSwithA
B>A:cant beremoved,becauseforJ=H{B>A}isB+=BC
B>C:can beremoved,becauseforJ=H{B>C}isB+=ABC
D>A:can beremoved,becauseforJ=H{D>A}isD+=DBA
D>B:cant beremoved,becauseforJ=H{D>B}isD+=DC
D>C:can beremoved,becauseforJ=H{D>C}isD+=DBAC
Stepoutcome=>H={A>C,B>A,D>B}
MinimalBasis:Example(cont.)
3rd Step
HdoesntchangeasallLHSinHaresingleattributes
4th
Step
Hdoesntchange
MinimalBasis:M=H={A>C,B>A,D>B}
27
MinimalBasis:Example2
28
RelationR:R(A,B,C)
DefinedFDs:
A>B,A>C,B>C,B>A,C>A,C>B
AB>,ACB,BC>A
A>BC
A>A
PossibleMinimalBases:
{A>B,B>A,B>C,C>B}or
{A>B,B>C,C>A}
RepresentingFDsasgraphs
29
Insight:treatanFDasadirectededgeinagraph
EntireLHSbecomesaclosednode(ornodecluster)
EachattributeofRHSbecomesanopennode
DrawedgefromLHStoRHS
OKtomergeopennode(s)withamatchingclosednode
=>Illegaltomergeopennodeswitheachotherdirectly!
30
Example:FDsetasgraph
Example1:a>bcb>cd>b
d
d
a
b
a
b b
dnotprime(dominatedbya)
b
b
c twodifferent
csinks
c
Example2:ab>cc>dce>a
Terminologyintermsofgraphs
Superkey:setofnodeswhichreachesallsinks
Candidatekey:anynonredundantsetofsourceswhich
reachesallsinks(e.g.removinganysourceorphans1+sinks)
=>Sourcenode<=/=>primeattribute
c
a
b
d
c
e
a
b
c
e
FDsandredundancy
32
GivenrelationRandFDsF
PartII:
Schemadecomposition
Roftenexhibitsanomaliesduetoredundancy
Fidentifiesmany(notall)oftheunderlyingproblems
Idea
UseFtoidentifygoodwaystosplitrelations
SplitRinto2+smallerrelationshavinglessredundancy
SplitupFintosubsetswhichapplytothenewrelations
(computetheprojectionoffunctionaldependencies)
33
Schemadecomposition
GivenrelationRandFDsF
34
Splittingrelations example
Considerthefollowingrelation:
StudentName
Xiao
Xiao
Jaspreet
Caveat:entirelypossibletoloseinformation
F+ mayentailFDXwhichisnotin(Ui Fi)+
=>DecompositionlostsomeFDs
PossibletohaveR iRi
=>Decompositionlostsomerelationships
StudentEmail
xiao@gmail
xiao@gmail
jaspreet@utsc
Course
CSCC43
CSCD08
CSCC43
Instructor
Johnson
Bretscher
Johnson
Onepossibledecomposition
Students(email,name)
Courses(name,instructor)
Taking(studentEmail,courseName)
Goal:minimizeanomalieswithoutlosinginfo
Wellrevisitinformationlosslater
Gotcha:lossyjoindecomposition
35
Considerarelationwithonemoretuple
StudentName
Xiao
Xiao
Jaspreet
Mary
Students
StudentEmail
xiao@gmail
xiao@gmail
jaspreet@utsc
mary@utsc
Taking
Course
CSCC43
CSCD08
CSCC43
CSCD08
Instructor
Johnson
Bretscher
Johnson
Rosenburg
Courses hasbogustuples!
MaryisnottakingBretscherssectionofD08
XiaoisnotinRosenburgssectionofD08
36
Informationlosswithdecomposition
DecomposeRintoSandT
ConsiderFDa>b,witha onlyinS andb onlyinT
FDloss
Attributesa andb nolongerinsamerelation
=>MustjoinTandStoenforcea>b(expensive)
Joinloss
LHSandRHSnolongerinsamerelation,nootherconnection
Neither(S T)>Snor(S T)>TinF+
=>JoiningTandSproducesbogustuples(irreparable)
Inourexample:
({email,course} {course,instructor})={course}
course/>instructorandcourse/>email
Whydidthishappen?Howtopreventit?
37
FDlossasagraph
title
title
title
title
title
year
year
year
year
year
star
studioAddr
studio
star
studioAddr
Joiningrecoversoriginalrelation
becausestudio>studioAddr
ProjectingFDs
Oncewevesplitarelationwehavetorefactor
ourFDstomatch
EachFDsmustonlymentionattributesfromonerelation
Similartogeometricprojection
Manypossibleprojections(dependsonhowwesliceit)
Keeponlytheonesweneed(minimalbasis)
salary
39
star
join
loss
lostFD!
studio
38
Joinlossasagraph
studio
studioAddr
studio
FDprojectionalgorithm
40
StartwithFi =
ForeachsubsetXofRi
ComputeX+
ForeachattributeainX+
IfaisinRi
addX>atoFi
ComputetheminimalbasisofFi
Projectionisexpensive
SupposeR1 hasnattributes
HowmanysubsetsofR1 arethere?
10
41
Makingprojectionmoreefficient
Ignoretrivialdependencies
42
Example:ProjectingFDs
ABC withFDsA>B andB>C
A +=ABC ;yieldsA>B,A>C
WedonotneedtocomputeAB+ orAC+
Ignoretrivialsubsets
Theemptysetorthesetofallattributes(botharesubsetsof
X)
IgnoresupersetsofXifX + =R
B +=BC ;yieldsB>C
C+=C;yieldsnothing.
BC +=BC;yieldsnothing.
TheycanonlygiveusweakerFDs(withmoreontheLHS)
41
42
43
Example Continued
ResultingFDs:A>B,A>C,andB>C
ProjectionontoAC :A>C
PartIII:
Normalforms
OnlyFDthatinvolvesasubsetof{A,C}
ProjectiononBC:B>C
OnlyFDthatinvolvessubsetof{B,C}
43
11
Motivationfornormalforms
45
Identifyagoodschema
Nomultivaluedattributesallowed
Forsomedefinitionofgood
Avoidanomalies,redundancy,etc.
Imaginestoringalist/setofthingsinanattribute
=>NotreallyevenexpressibleinRA
Manynormalforms
Counterexample
1st
Course(name,instructor,[student,email]*)
Redundancyinnonlistattributes
2nd
3rd
BoyceCodd
...andseveralmorewewontdiscuss
47
1NFintermsofgraphs?
Weonlyneedtographtheschema
=>Structureoftuplesdoesnotvaryfromtupletotuple
Consideragainourexample
=>Cannotcapturethestructureatschemalevelonly
Name
CSCC43
CSCD08
46
1st normalform(1NF)
Instructor
Johnson
Rosenburg
StudentName
Xiao
Jaspreet
Mary
Jaspreet
StudentEmail
xiao@gmail
jaspreet@utsc
mary@utsc
jaspreet@utsc
Name
CSCC43
Instructor
Johnson
CSCD08
Rosenburg
StudentName
Xiao
Jaspreet
Mary
Jaspreet
StudentEmail
xiao@gmail
jaspreet@utsc
mary@utsc
jaspreet@utsc
48
2nd normalform(2NF)
Nonprimeattributesdependoncandidatekeys
Considernonprime(ie.notpartofakey)attributea
ThenFDXs.t.X>aandXisacandidatekey
Counterexample
Movies(title,year,star,studio,studioAddress,salary)
FD:title,year>studio;studio>studioAddress;star>salary
Title
Year
Star
Studio
StudioAddr
Salary
StarWars
1977 Hamill
Lucasfilm
1LucasWay
$100,000
StarWars
1977 Ford
Lucasfilm
1LucasWay
$100,000
StarWars
1977 Fisher
Lucasfilm
1LucasWay
PatriotGames
1992 Ford
Paramount
Cloud9
$2,000,000
$100,000
LastCrusade
1989 Ford
Lucasfilm
1LucasWay
$1,000,000
12
49
2NFintermsofgraphs
Requireapathfromeverysourcetoeverysink
50
3rd normalform(3NF)
Nonprimeattr.dependonly oncandidatekeys
Notrivialedgesallowed!
Disconnectedcomponentsviolate2NF
Watchfornodeclusterswhicharesubsetsofcandidatekeys
ConsiderFDX>a
Eithera XORXisasuperkeyORaisprime(partofakey)
=>Notransitivedependenciesallowed
Counterexample:
title
star
year
studio
trivial
redundant
studioAddr
studio>studioAddr
(studioAddrdependsonstudio whichisnotacandidatekey)
nopathfrom
titleyeartosalary,
norfromtitleyearstar
tostudio/studioAddr
salary
51
3NFintermsofgraphs
3NFviolation:transitivedependency
title
title
year
year
lostredundant
FD
Title
Year
Studio
StudioAddr
StarWars
1977 Lucasfilm
PatriotGames
1992 Paramount
Cloud9
LastCrusade
1989 Lucasfilm
1LucasWay
1LucasWay
3NF,dependencies,andjoinloss
52
Theorem:alwayspossibletoconvertaschematojoin
lossless,dependencypreserving3NF
Caveat:alwayspossible tocreateschemasin3NFfor
whichthesepropertiesdonothold
Joinlossexample1:
MovieInfo(title,year,studioName)
StudioAddress(title,year,studioAddress)
=>CannotenforcestudioName>studioAddress
studioName
studioAddr
studioName
studioName
studioAddr
Joinlossexample2:
Movies(title,year,star)
StarSalary(star,salary)
=>CannotenforceMoviesStarSalaryyieldsbogustuples(irreparable)
Note:OKfordecompositiontoloseredundantFDs
13
Graphsandlossydecomposition
53
Loss:anFDwhichspanstworelations
BoyceCoddnormalform(BCNF)
54
Oneadditionalrestrictionover3NF
Joinlossifnotransitiveconnectionbetweenthetwonodes
=>Nosetofjoinscanreconstructtheconnection
AllnontrivialFDhavesuperkeyLHS
Counterexample
Our3NFexampleshowedalostdependency
title,year >studioAddr
=>Nojoinlossbecausetitle>year>studioName >studioAddr
CanadianAddress(street,city,province,postalCode)
Candidatekeys:{street,postalCode},{street,city,province}
FD:postalCode>city,province
Satisfies3NF:city,provincebothnonprime
ViolatesBCNF:postalCodeisnotasuperkey
=>PossibleanomaliesinvolvingpostalCode
Dowecare?Howoftendopostalcodeschange?
AnotherExample
emps(emp_id,emp_name,emp_phone,dept_name,emp_city,
emp_straddr)
empadds(emp_city,emp_zip,emp_straddr)
FDs:
emp_id>emp_nameemp_phonedept_dname
emp_cityemp_straddr>emp_zip
emp_zip>emp_city
TheFDemp_zip>emp_cityispreservedintherelation
empaddsbutemp_zipisnotakey.TheschemaisnotinBCNF.
Theattributeemp_cityisprime(thereiskeyemp_city
emp_straddr).Hencetheschemaisin3NF.
55
56
MoreExamples
Manager
Brown
Green
Green
Hoskins
Hoskins
Project
Mars
Jupiter
Mars
Saturn
Venus
Branch
Chicago
Birmingham
Birmingham
Birmingham
Birmingham
Division
1
1
1
2
2
Functionaldependencies:
Manager>Branch,Division eachmanagerworksatone
branchandmanagesonedivision
Branch,Division>Manager foreachbranchanddivision
thereisasinglemanager
Project,Branch>Division,Manager foreachbranch,a
projectisallocatedtoasingledivisionandhasasole
managerresponsible
14
57
Agooddecomposition
Manager
Brown
Green
Hoskins
Branch
Chicago
Birmingham
Birmingham
Division
1
1
2
Project
Mars
Jupiter
Mars
Saturn
Venus
Branch
Chicago
Birmingham
Birmingham
Birmingham
Birmingham
Division
1
1
1
2
2
Note:Thefirstrelationhasasecondkey{Branch,Division}
Thedecompositionisin3NFbutnotinBCNF;moreover,itis
losslessanddependenciesarepreserved
ThisexampledemonstratesthatBCNFistoostrongacondition
toimposeonarelationalschema
Limitsofdecomposition
58
Picktwo
Losslessjoin
Dependencypreservation
Anomalyfree
3NF
Alwaysallowsjoinlosslessanddependencypreserving
Mayallowsomeanomalies
BCNF
Alwaysexcludesanomalies
Maygiveuponeofjoinlosslessordependencypreserving
Usedomainknowledgetochoose3NFvs.BCNF
15