You are on page 1of 15

2

DatabaseDesignTheory
Guidessystematicimprovementstodatabaseschemas
Generalidea:

DesignTheoryforRelationalDBs:
FunctionalDependencies,
Decompositions,NormalForms

Expressconstraintsonthedata
Usethesetodecomposetherelations

Ultimately,getaschemathatisinanormalformthat
guaranteescertaindesirableproperties
Normalinthesenseofconformingtoastandard
Theprocessofconvertingaschematoanormalformiscalled
normalization

Introductiontodatabases
CSCC43Winter2012
RyanJohnson
ThankstoManosPapagelis,JohnMylopoulos,ArnoldRosenbloom
andReneeMillerformaterialintheseslides

Goal#1:removeredundancy
Considerthisschema
StudentName
Xiao
Xiao
Jaspreet

StudentEmail
xiao@gmail
xiao@gmail
jaspreet@utsc

Course
CSCC43
CSCD08
CSCC43

Instructor
Johnson
Bretscher
Johnson

Whatif
Xiaochangesemailaddresses?(updateanomaly)
XiaodropsCSCD08?(deletionanomaly)
UTSCcreatesanewcourse,CSCC44(insertionanomaly)

Multiplerelations=>exponentiallyworse

Goal#2:expressingconstraints

Considerthefollowingsetsofschemas:
Students(utorid,name,email)
vs.
Students(utorid,name)
Emails(utorid,address)
Consideralso:
House(street,city,value,owner,propertyTax)
vs.
House(street,city,value,owner)
TaxRates(city,value,propertyTax)

Dependencies,constraintsaredomaindependent

Functionaldependencies

LetX,YbesetsofattributesfromrelationR
X>YisanassertionabouttuplesinR
AnytupleswhichagreeinallattributesofXmustalsoagreeinall
attributesofY

PartI:
FunctionalDependencies

XfunctionallydeterminesY
Or,ThevaluesofattributesYareafunctionofthoseinX
Notnecessarilyaneasyfunctiontocompute,mindyou
=>ConsiderX>h,wherehisthehashofattributesinX

Notationalconventions
a,b,c specificattributes
A,B,C setsof(unnamed)attributes
abc>def sameas{a,b,c}>{d,e,f}

Mostcommontoseesingletons (X>yorabc>d)

FD:relaxestheconceptofakey

Functionaldependency:X >Y
Superkey:X >R
Asuperkeymustincludeallremainingattributes
oftherelationontheRHS
AnFDcaninvolvejustasubsetofthem
Example:

RulesandprinciplesaboutFDs

Rules
Thesplitting/combiningrule
TrivialFDs
Thetransitiverule

AlgorithmsrelatedtoFDs
theclosureofasetofattributesofarelation
aminimalbasisofarelation

Houses(street,city,value,owner,tax)
street,city>value,owner,tax(bothFDandkey)
city,value>tax(FDonly)

TheSplitting/CombiningruleofFDs

Attributesonrightindependentofeachother

SplittingFDs example

10

ConsidertherelationandFD

Considera,b,c>d,e,f
Attributesa,b,andcfunctionallydetermined,e,andf
=>Nomentionofdrelatingtoeorfdirectly

EmailAddress(user,domain,firstName,lastName)
user,domain>firstName,lastName

Thefollowinghold

Splittingrule(UsefultosplituprightsideofFD)

user,domain>firstName
user,domain>lastName

abc>def becomesabc>d,abc>e andabc>f

Nosafewaytosplitleftside

ThefollowingdoNOThold!

abc>def isNOTthesameasab>def andc>def!

user>firstName,lastName
domain>firstName,lastName

Combiningrule(Usefultocombinerightsides):
ifabc>d,abc>e,abc>fholds,thenabc>defholds

Gotcha:doesnthold=notalltuples!=alltuplesnot

TrivialFDs
Notallfunctionaldependenciesareuseful
A>Aalwaysholds
abc>aalsoalwaysholds(rightsideissubsetofleftside)

FDwithanattributeonbothsidesistrivial

11

Transitiverule

12

ThetransitiveruleholdsforFDs
ConsidertheFDs:a>b and b>c;thena>cholds
ConsidertheFDs:ad>b and b>cd;thenad>cdholdsor
justad>c (becauseofthetrivialdependencyrule)

SimplifybyremovingL RfromR
abc>adbecomesabc>d
Or,insingletonform,deletetrivialFDs
abc>aandabc>dbecomesjustabc>d

Identifyingfunctionaldependencies

13

FDsaredomainknowledge
Intrinsicfeaturesofthedatayouredealingwith
Somethingyouknow(orassume)aboutthedata

DatabaseenginecannotidentifyFDsforyou
Designermustspecifythemaspartofschema
DBMScanonlyenforceFDswhentoldto

14

CoincidenceorFD?
ID

Email

City

Country

Surname

1983

tom@gmail.com

Toronto

Canada

Fairgrieve

8624

mar@bell.com

London

Canada

Samways

9141

scotty@gmail.com

Winnipeg

Canada

Samways

1204

birds@gmail.com

Aachen

Germany

Lakemeyer

WhatifwetrytoinferFDsfromthedata?

DBMScannotsafelyoptimizeFDseither
Ithasonlyafinitesampleofthedata
AnFDconstrainstheentiredomain

ID>email,city,country,surname
email>city,country,surname
city>country
surname>country

DomainknowledgerequiredtovalidateFDs

KeysandFDs

15

ConsiderrelationRwithattributesA
Superkey
AnyS As.t.S>A
=>AnysubsetofAwhichdeterminesallremainingattributesinA

Candidatekey(orkey)
C As.t.C>AandX>AdoesnotholdforanyX C
=>Asuperkeywhichcontainsnoothersuperkeys
=>Removeanyattributeandyounolongerhaveakey

Primarykey
Thecandidatekeyweusetoidentifytherelation
=>Alwaysexists,onlyoneallowed,doesntmatterwhichCweuse

Candidatekeysvs.superkeys

16

Considertheserelations
Students(ID,surname,name,email,address,major)
Houses(street,city,value,owner,tax)

Whatarethecandidatekeys?
Students:ID,whatelse?
Houses:?

Whatothersuperkeysexist?
Students:ID,surnameID,nameID,name,surname
Houses:?

Primeattributes?
Students:?
Houses:?

Primeattribute
candidatekeyCs.t.x C(attributethatparticipatesinatleastonekey)

Cyclicfunctionaldependencies?

17

AttributesonrightsideofoneFDmayappear
onleftsideofanother!

LetDbethedomainoftuplesinR
EverypossibletupleisapointinD

FDXonRrestrictstuplesinRtoasubsetofD

Simpleexample:assumerelation(A,B)&FDs:A>B,B>A
WhatdoesthissayaboutAandB?

PointsinDwhichviolateXcannotbeinR

Example

Example:D(x,y,z)

studentID>emailemail>studentID

Inferringfunctionaldependencies
Problem
GivenFDsX1 >a1,X2 >a2,etc.
DoessomeFDY>B(notgiven)alsohold?

Considerthedependencies
A>BB>C
Intuitively,A>Calsoholds
ThegivenFDsentail(imply)it(transitivityrule)

18

GeometricviewofFDs

xy>z
=>z=abs(x)+abs(y)
z>x,y
=>x=y=abs(z)/2

19

(1,1,0)

(1,1,2)

(0,0,1)

(1,1,2)
(0,0,0) (2,2,4)
(2,2,4)
(1,1,2)
(3,2,1)
(1,2,3)

(1,1,2)

ClosuretestforFDs

20

GivenattributesetAandFDsetF
DenoteAF+ astheclosureofArelativetoF
=>AF+ =setofallFDsgivenorimpliedbyA

Computingthe[transitive]closureofA
Start:AF+ =A,F=F
WhileX Fs.t.LHS(X) AF+:
AF+ =AF+ U RHS(X)
F=F X
Atend:A>BB AF+

Howtoproveitinthegeneralcase?

21

Closuretest example
ConsiderR(a,b,c,d,e,f)
withFDsab>c,ac>d,c>e,ade>f
FindA+ ifA=aborfind{a,b}+
a b

d e

a b

d e

Example:ClosureTest
F:AB>C
A> D
D> E
AC> B

22

XXF+
A{A,D,E}
AB{A,B,C,D,E}
AC{A,C,B,D,E}
B{B}
D{D,E}

IsAB> EentailedbyF?Yes
IsD>CentailedbyF?No
a b

d e

a b

d e

Result:XF+allowsustodetermineallFDsoftheform
X> Y entailedby F

{a,b}+={a,b,c,de,f}or ab>cdef abisacandidatekey!

DiscardingredundantFDs
Minimalbasis:oppositeextremefromclosure
GivenasetofFDsF,wanttominimizeFs.t.
F F
FentailsXXF

PropertiesofaminimalbasisF
RHSisalwayssingleton
IfanyFDisremovedfromF,Fisnolongeraminimalbasis
IfforanyFDinFweremoveoneormoreattributesfrom
theLHSofF,theresultisnolongeraminimalbasis

23

Constructingaminimalbasis

24

Straightforwardbuttimeconsuming
1. SplitallRHSintosingletons
2. X F,testwhetherJ=(FX)+ isstillequivalenttoF+
=>MightmakeFtoosmall
3. i LHS(X)X F,letLHS(X)=LHS(X)i
Testwhether(FX+X)+ isstillequivalenttoF+
=>MightmakeFtoobig
4. Repeat(2)and(3)untilneithermakesprogress

MinimalBasis:Example

25

MinimalBasis:Example(cont.)

26

1st Step

RelationR:R(A,B,C,D)
DefinedFDs:

H={A>A,A>C,B>A,B>B,B>C,D>A,D>B,D>C}

2nd Step

F={A>AC,B>ABC,D>ABC}

FindtheminimalBasisMofF

A>A,B>B:can beremovedastrivial
A>C:cant beremoved,asthereisnootherLHSwithA
B>A:cant beremoved,becauseforJ=H{B>A}isB+=BC
B>C:can beremoved,becauseforJ=H{B>C}isB+=ABC
D>A:can beremoved,becauseforJ=H{D>A}isD+=DBA
D>B:cant beremoved,becauseforJ=H{D>B}isD+=DC
D>C:can beremoved,becauseforJ=H{D>C}isD+=DBAC

Stepoutcome=>H={A>C,B>A,D>B}

MinimalBasis:Example(cont.)
3rd Step
HdoesntchangeasallLHSinHaresingleattributes

4th

Step
Hdoesntchange

MinimalBasis:M=H={A>C,B>A,D>B}

27

MinimalBasis:Example2

28

RelationR:R(A,B,C)
DefinedFDs:

A>B,A>C,B>C,B>A,C>A,C>B
AB>,ACB,BC>A
A>BC
A>A

PossibleMinimalBases:
{A>B,B>A,B>C,C>B}or
{A>B,B>C,C>A}

RepresentingFDsasgraphs

29

Insight:treatanFDasadirectededgeinagraph
EntireLHSbecomesaclosednode(ornodecluster)
EachattributeofRHSbecomesanopennode
DrawedgefromLHStoRHS
OKtomergeopennode(s)withamatchingclosednode
=>Illegaltomergeopennodeswitheachotherdirectly!

30

Example:FDsetasgraph
Example1:a>bcb>cd>b
d
d
a

b
a

b b

dnotprime(dominatedbya)
b
b

c twodifferent
csinks
c

Example2:ab>cc>dce>a

Terminologyintermsofgraphs
Superkey:setofnodeswhichreachesallsinks
Candidatekey:anynonredundantsetofsourceswhich
reachesallsinks(e.g.removinganysourceorphans1+sinks)
=>Sourcenode<=/=>primeattribute

c
a
b

d
c
e

a
b

c
e

FDsandredundancy

32

GivenrelationRandFDsF

PartII:
Schemadecomposition

Roftenexhibitsanomaliesduetoredundancy
Fidentifiesmany(notall)oftheunderlyingproblems

Idea
UseFtoidentifygoodwaystosplitrelations
SplitRinto2+smallerrelationshavinglessredundancy
SplitupFintosubsetswhichapplytothenewrelations
(computetheprojectionoffunctionaldependencies)

33

Schemadecomposition
GivenrelationRandFDsF

34

Splittingrelations example
Considerthefollowingrelation:

SplitRintoRi s.t.iRi R(nonewattributes)


SplitFintoFi s.t.iFentailsFi (nonewFDs)
Fi involvesonlyattributesinRi

StudentName
Xiao
Xiao
Jaspreet

Caveat:entirelypossibletoloseinformation
F+ mayentailFDXwhichisnotin(Ui Fi)+
=>DecompositionlostsomeFDs
PossibletohaveR iRi
=>Decompositionlostsomerelationships

StudentEmail
xiao@gmail
xiao@gmail
jaspreet@utsc

Course
CSCC43
CSCD08
CSCC43

Instructor
Johnson
Bretscher
Johnson

Onepossibledecomposition
Students(email,name)
Courses(name,instructor)
Taking(studentEmail,courseName)

Goal:minimizeanomalieswithoutlosinginfo
Wellrevisitinformationlosslater

Gotcha:lossyjoindecomposition

35

Considerarelationwithonemoretuple
StudentName
Xiao
Xiao
Jaspreet
Mary

Students

StudentEmail
xiao@gmail
xiao@gmail
jaspreet@utsc
mary@utsc

Taking

Course
CSCC43
CSCD08
CSCC43
CSCD08

Instructor
Johnson
Bretscher
Johnson
Rosenburg

Courses hasbogustuples!

MaryisnottakingBretscherssectionofD08
XiaoisnotinRosenburgssectionofD08

36

Informationlosswithdecomposition
DecomposeRintoSandT
ConsiderFDa>b,witha onlyinS andb onlyinT

FDloss
Attributesa andb nolongerinsamerelation
=>MustjoinTandStoenforcea>b(expensive)

Joinloss
LHSandRHSnolongerinsamerelation,nootherconnection
Neither(S T)>Snor(S T)>TinF+
=>JoiningTandSproducesbogustuples(irreparable)

Inourexample:
({email,course} {course,instructor})={course}
course/>instructorandcourse/>email

Whydidthishappen?Howtopreventit?

37

FDlossasagraph
title

title

title

title

title

year

year

year

year

year

star

studioAddr

studio

star

studioAddr

Joiningrecoversoriginalrelation
becausestudio>studioAddr

ProjectingFDs
Oncewevesplitarelationwehavetorefactor
ourFDstomatch
EachFDsmustonlymentionattributesfromonerelation

Similartogeometricprojection
Manypossibleprojections(dependsonhowwesliceit)
Keeponlytheonesweneed(minimalbasis)

salary

39

star

join
loss

lostFD!
studio

38

Joinlossasagraph

studio

studioAddr

studio

FDprojectionalgorithm

40

StartwithFi =
ForeachsubsetXofRi
ComputeX+
ForeachattributeainX+
IfaisinRi
addX>atoFi

ComputetheminimalbasisofFi
Projectionisexpensive
SupposeR1 hasnattributes
HowmanysubsetsofR1 arethere?

10

41

Makingprojectionmoreefficient
Ignoretrivialdependencies

42

Example:ProjectingFDs
ABC withFDsA>B andB>C
A +=ABC ;yieldsA>B,A>C

NoneedtoaddX >A ifAisinXitself

WedonotneedtocomputeAB+ orAC+

Ignoretrivialsubsets
Theemptysetorthesetofallattributes(botharesubsetsof
X)

IgnoresupersetsofXifX + =R

B +=BC ;yieldsB>C
C+=C;yieldsnothing.
BC +=BC;yieldsnothing.

TheycanonlygiveusweakerFDs(withmoreontheLHS)

41

42

43

Example Continued
ResultingFDs:A>B,A>C,andB>C
ProjectionontoAC :A>C

PartIII:
Normalforms

OnlyFDthatinvolvesasubsetof{A,C}

ProjectiononBC:B>C
OnlyFDthatinvolvessubsetof{B,C}

43

11

Motivationfornormalforms

45

Identifyagoodschema

Nomultivaluedattributesallowed

Forsomedefinitionofgood
Avoidanomalies,redundancy,etc.

Imaginestoringalist/setofthingsinanattribute
=>NotreallyevenexpressibleinRA

Manynormalforms

Counterexample

1st

Course(name,instructor,[student,email]*)
Redundancyinnonlistattributes

2nd
3rd
BoyceCodd
...andseveralmorewewontdiscuss

BCNF 3NF 2NF 1NF(focuson3NF/BCNF)

47

1NFintermsofgraphs?
Weonlyneedtographtheschema
=>Structureoftuplesdoesnotvaryfromtupletotuple

Consideragainourexample
=>Cannotcapturethestructureatschemalevelonly
Name
CSCC43

CSCD08

46

1st normalform(1NF)

Instructor
Johnson

Rosenburg

StudentName
Xiao
Jaspreet
Mary
Jaspreet

StudentEmail
xiao@gmail
jaspreet@utsc
mary@utsc
jaspreet@utsc

Name
CSCC43

Instructor
Johnson

CSCD08

Rosenburg

StudentName
Xiao
Jaspreet
Mary
Jaspreet

StudentEmail
xiao@gmail
jaspreet@utsc
mary@utsc
jaspreet@utsc

48

2nd normalform(2NF)

Nonprimeattributesdependoncandidatekeys
Considernonprime(ie.notpartofakey)attributea
ThenFDXs.t.X>aandXisacandidatekey

Counterexample
Movies(title,year,star,studio,studioAddress,salary)
FD:title,year>studio;studio>studioAddress;star>salary
Title

Year

Star

Studio

StudioAddr

Salary

StarWars

1977 Hamill

Lucasfilm

1LucasWay

$100,000

StarWars

1977 Ford

Lucasfilm

1LucasWay

$100,000

StarWars

1977 Fisher

Lucasfilm

1LucasWay

PatriotGames

1992 Ford

Paramount

Cloud9

$2,000,000

$100,000

LastCrusade

1989 Ford

Lucasfilm

1LucasWay

$1,000,000

12

49

2NFintermsofgraphs
Requireapathfromeverysourcetoeverysink

50

3rd normalform(3NF)
Nonprimeattr.dependonly oncandidatekeys

Notrivialedgesallowed!
Disconnectedcomponentsviolate2NF
Watchfornodeclusterswhicharesubsetsofcandidatekeys

ConsiderFDX>a
Eithera XORXisasuperkeyORaisprime(partofakey)
=>Notransitivedependenciesallowed

Counterexample:
title

star

year

studio

trivial

redundant
studioAddr

studio>studioAddr
(studioAddrdependsonstudio whichisnotacandidatekey)

nopathfrom
titleyeartosalary,
norfromtitleyearstar
tostudio/studioAddr

salary

51

3NFintermsofgraphs
3NFviolation:transitivedependency
title

title

year

year

lostredundant
FD

Title

Year

Studio

StudioAddr

StarWars

1977 Lucasfilm

PatriotGames

1992 Paramount

Cloud9

LastCrusade

1989 Lucasfilm

1LucasWay

1LucasWay

3NF,dependencies,andjoinloss

52

Theorem:alwayspossibletoconvertaschematojoin
lossless,dependencypreserving3NF
Caveat:alwayspossible tocreateschemasin3NFfor
whichthesepropertiesdonothold
Joinlossexample1:
MovieInfo(title,year,studioName)
StudioAddress(title,year,studioAddress)
=>CannotenforcestudioName>studioAddress

studioName
studioAddr

studioName

studioName
studioAddr

Joinlossexample2:
Movies(title,year,star)
StarSalary(star,salary)
=>CannotenforceMoviesStarSalaryyieldsbogustuples(irreparable)

Note:OKfordecompositiontoloseredundantFDs

13

Graphsandlossydecomposition

53

Loss:anFDwhichspanstworelations

BoyceCoddnormalform(BCNF)

54

Oneadditionalrestrictionover3NF

Joinlossifnotransitiveconnectionbetweenthetwonodes
=>Nosetofjoinscanreconstructtheconnection

AllnontrivialFDhavesuperkeyLHS

Counterexample

Our3NFexampleshowedalostdependency
title,year >studioAddr
=>Nojoinlossbecausetitle>year>studioName >studioAddr

CanadianAddress(street,city,province,postalCode)
Candidatekeys:{street,postalCode},{street,city,province}
FD:postalCode>city,province
Satisfies3NF:city,provincebothnonprime
ViolatesBCNF:postalCodeisnotasuperkey
=>PossibleanomaliesinvolvingpostalCode

Dowecare?Howoftendopostalcodeschange?

AnotherExample
emps(emp_id,emp_name,emp_phone,dept_name,emp_city,
emp_straddr)
empadds(emp_city,emp_zip,emp_straddr)
FDs:
emp_id>emp_nameemp_phonedept_dname
emp_cityemp_straddr>emp_zip
emp_zip>emp_city
TheFDemp_zip>emp_cityispreservedintherelation
empaddsbutemp_zipisnotakey.TheschemaisnotinBCNF.
Theattributeemp_cityisprime(thereiskeyemp_city
emp_straddr).Hencetheschemaisin3NF.

55

56

MoreExamples
Manager
Brown
Green
Green
Hoskins
Hoskins

Project
Mars
Jupiter
Mars
Saturn
Venus

Branch
Chicago
Birmingham
Birmingham
Birmingham
Birmingham

Division
1
1
1
2
2

Functionaldependencies:
Manager>Branch,Division eachmanagerworksatone
branchandmanagesonedivision
Branch,Division>Manager foreachbranchanddivision
thereisasinglemanager
Project,Branch>Division,Manager foreachbranch,a
projectisallocatedtoasingledivisionandhasasole
managerresponsible

14

57

Agooddecomposition
Manager
Brown
Green
Hoskins

Branch
Chicago
Birmingham
Birmingham

Division
1
1
2

Project
Mars
Jupiter
Mars
Saturn
Venus

Branch
Chicago
Birmingham
Birmingham
Birmingham
Birmingham

Division
1
1
1
2
2

Note:Thefirstrelationhasasecondkey{Branch,Division}
Thedecompositionisin3NFbutnotinBCNF;moreover,itis
losslessanddependenciesarepreserved
ThisexampledemonstratesthatBCNFistoostrongacondition
toimposeonarelationalschema

Limitsofdecomposition

58

Picktwo
Losslessjoin
Dependencypreservation
Anomalyfree

3NF
Alwaysallowsjoinlosslessanddependencypreserving
Mayallowsomeanomalies

BCNF
Alwaysexcludesanomalies
Maygiveuponeofjoinlosslessordependencypreserving

Usedomainknowledgetochoose3NFvs.BCNF

15

You might also like