Professional Documents
Culture Documents
CompilerDesignQuickGuide
COMPILERDESIGNQUICKGUIDE
http://www.tutorialspoint.com/compiler_design/compiler_design_quick_guide.htm
Copyrighttutorialspoint.com
COMPILERDESIGNOVERVIEW
Computersareabalancedmixofsoftwareandhardware.Hardwareisjustapieceofmechanicaldeviceand
itsfunctionsarebeingcontrolledbyacompatiblesoftware.Hardwareunderstandsinstructionsintheform
ofelectroniccharge,whichisthecounterpartofbinarylanguageinsoftwareprogramming.Binary
languagehasonlytwoalphabets,0and1.Toinstruct,thehardwarecodesmustbewritteninbinaryformat,
whichissimplyaseriesof1sand0s.Itwouldbeadifficultandcumbersometaskforcomputer
programmerstowritesuchcodes,whichiswhywehavecompilerstowritesuchcodes.
LanguageProcessingSystem
Wehavelearntthatanycomputersystemismadeofhardwareandsoftware.Thehardwareunderstandsa
language,whichhumanscannotunderstand.Sowewriteprogramsinhighlevellanguage,whichiseasier
forustounderstandandremember.TheseprogramsarethenfedintoaseriesoftoolsandOScomponents
togetthedesiredcodethatcanbeusedbythemachine.ThisisknownasLanguageProcessingSystem.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
1/61
11/29/2015
CompilerDesignQuickGuide
Thehighlevellanguageisconvertedintobinarylanguageinvariousphases.Acompilerisaprogramthat
convertshighlevellanguagetoassemblylanguage.Similarly,anassemblerisaprogramthatconvertsthe
assemblylanguagetomachinelevellanguage.
Letusfirstunderstandhowaprogram,usingCcompiler,isexecutedonahostmachine.
UserwritesaprograminClanguagehigh levellanguage.
TheCcompiler,compilestheprogramandtranslatesittoassemblyprogramlow levellanguage.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
2/61
11/29/2015
CompilerDesignQuickGuide
Anassemblerthentranslatestheassemblyprogramintomachinecodeobject.
Alinkertoolisusedtolinkallthepartsoftheprogramtogetherforexecutionexecutablemachinecode.
Aloaderloadsallofthemintomemoryandthentheprogramisexecuted.
Beforedivingstraightintotheconceptsofcompilers,weshouldunderstandafewothertoolsthatwork
closelywithcompilers.
Preprocessor
Apreprocessor,generallyconsideredasapartofcompiler,isatoolthatproducesinputforcompilers.It
dealswithmacroprocessing,augmentation,fileinclusion,languageextension,etc.
Interpreter
Aninterpreter,likeacompiler,translateshighlevellanguageintolowlevelmachinelanguage.The
differenceliesinthewaytheyreadthesourcecodeorinput.Acompilerreadsthewholesourcecodeat
once,createstokens,checkssemantics,generatesintermediatecode,executesthewholeprogramandmay
involvemanypasses.Incontrast,aninterpreterreadsastatementfromtheinput,convertsittoan
intermediatecode,executesit,thentakesthenextstatementinsequence.Ifanerroroccurs,aninterpreter
stopsexecutionandreportsit.whereasacompilerreadsthewholeprogramevenifitencountersseveral
errors.
Assembler
Anassemblertranslatesassemblylanguageprogramsintomachinecode.Theoutputofanassembleris
calledanobjectfile,whichcontainsacombinationofmachineinstructionsaswellasthedatarequiredto
placetheseinstructionsinmemory.
Linker
Linkerisacomputerprogramthatlinksandmergesvariousobjectfilestogetherinordertomakean
executablefile.Allthesefilesmighthavebeencompiledbyseparateassemblers.Themajortaskofalinker
istosearchandlocatereferencedmodule/routinesinaprogramandtodeterminethememorylocation
wherethesecodeswillbeloaded,makingtheprograminstructiontohaveabsolutereferences.
Loader
Loaderisapartofoperatingsystemandisresponsibleforloadingexecutablefilesintomemoryand
executethem.Itcalculatesthesizeofaprograminstructionsanddataandcreatesmemoryspaceforit.It
initializesvariousregisterstoinitiateexecution.
Crosscompiler
AcompilerthatrunsonplatformAandiscapableofgeneratingexecutablecodeforplatformBiscalleda
crosscompiler.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
3/61
11/29/2015
CompilerDesignQuickGuide
SourcetosourceCompiler
Acompilerthattakesthesourcecodeofoneprogramminglanguageandtranslatesitintothesourcecode
ofanotherprogramminglanguageiscalledasourcetosourcecompiler.
CompilerArchitecture
Acompilercanbroadlybedividedintotwophasesbasedonthewaytheycompile.
AnalysisPhase
Knownasthefrontendofthecompiler,theanalysisphaseofthecompilerreadsthesourceprogram,
dividesitintocorepartsandthenchecksforlexical,grammarandsyntaxerrors.Theanalysisphase
generatesanintermediaterepresentationofthesourceprogramandsymboltable,whichshouldbefedto
theSynthesisphaseasinput.
SynthesisPhase
Knownasthebackendofthecompiler,thesynthesisphasegeneratesthetargetprogramwiththehelpof
intermediatesourcecoderepresentationandsymboltable.
Acompilercanhavemanyphasesandpasses.
Pass:Apassreferstothetraversalofacompilerthroughtheentireprogram.
Phase:Aphaseofacompilerisadistinguishablestage,whichtakesinputfromthepreviousstage,
processesandyieldsoutputthatcanbeusedasinputforthenextstage.Apasscanhavemorethan
onephase.
PhasesofCompiler
Thecompilationprocessisasequenceofvariousphases.Eachphasetakesinputfromitspreviousstage,
hasitsownrepresentationofsourceprogram,andfeedsitsoutputtothenextphaseofthecompiler.Letus
understandthephasesofacompiler.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
4/61
11/29/2015
CompilerDesignQuickGuide
LexicalAnalysis
Thefirstphaseofscannerworksasatextscanner.Thisphasescansthesourcecodeasastreamof
charactersandconvertsitintomeaningfullexemes.Lexicalanalyzerrepresentstheselexemesintheform
oftokensas:
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
5/61
11/29/2015
CompilerDesignQuickGuide
<tokenname,attributevalue>
SyntaxAnalysis
Thenextphaseiscalledthesyntaxanalysisorparsing.Ittakesthetokenproducedbylexicalanalysisas
inputandgeneratesaparsetreeorsyntaxtree.Inthisphase,tokenarrangementsarecheckedagainstthe
sourcecodegrammar,i.e.theparserchecksiftheexpressionmadebythetokensissyntacticallycorrect.
SemanticAnalysis
Semanticanalysischeckswhethertheparsetreeconstructedfollowstherulesoflanguage.Forexample,
assignmentofvaluesisbetweencompatibledatatypes,andaddingstringtoaninteger.Also,thesemantic
analyzerkeepstrackofidentifiers,theirtypesandexpressionswhetheridentifiersaredeclaredbeforeuse
ornotetc.Thesemanticanalyzerproducesanannotatedsyntaxtreeasanoutput.
IntermediateCodeGeneration
Aftersemanticanalysisthecompilergeneratesanintermediatecodeofthesourcecodeforthetarget
machine.Itrepresentsaprogramforsomeabstractmachine.Itisinbetweenthehighlevellanguageand
themachinelanguage.Thisintermediatecodeshouldbegeneratedinsuchawaythatitmakesiteasierto
betranslatedintothetargetmachinecode.
CodeOptimization
Thenextphasedoescodeoptimizationoftheintermediatecode.Optimizationcanbeassumedas
somethingthatremovesunnecessarycodelines,andarrangesthesequenceofstatementsinordertospeed
uptheprogramexecutionwithoutwastingresourcesCPU, memory.
CodeGeneration
Inthisphase,thecodegeneratortakestheoptimizedrepresentationoftheintermediatecodeandmapsit
tothetargetmachinelanguage.Thecodegeneratortranslatestheintermediatecodeintoasequenceof
generallyrelocatablemachinecode.Sequenceofinstructionsofmachinecodeperformsthetaskasthe
intermediatecodewoulddo.
SymbolTable
Itisadatastructuremaintainedthroughoutallthephasesofacompiler.Alltheidentifier'snamesalong
withtheirtypesarestoredhere.Thesymboltablemakesiteasierforthecompilertoquicklysearchthe
identifierrecordandretrieveit.Thesymboltableisalsousedforscopemanagement.
COMPILERDESIGNLEXICALANALYSIS
Lexicalanalysisisthefirstphaseofacompiler.Ittakesthemodifiedsourcecodefromlanguage
preprocessorsthatarewrittenintheformofsentences.Thelexicalanalyzerbreaksthesesyntaxesintoa
seriesoftokens,byremovinganywhitespaceorcommentsinthesourcecode.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
6/61
11/29/2015
CompilerDesignQuickGuide
Ifthelexicalanalyzerfindsatokeninvalid,itgeneratesanerror.Thelexicalanalyzerworkscloselywiththe
syntaxanalyzer.Itreadscharacterstreamsfromthesourcecode,checksforlegaltokens,andpassesthe
datatothesyntaxanalyzerwhenitdemands.
Tokens
Lexemesaresaidtobeasequenceofcharactersalphanumericinatoken.Therearesomepredefinedrules
foreverylexemetobeidentifiedasavalidtoken.Theserulesaredefinedbygrammarrules,bymeansofa
pattern.Apatternexplainswhatcanbeatoken,andthesepatternsaredefinedbymeansofregular
expressions.
Inprogramminglanguage,keywords,constants,identifiers,strings,numbers,operatorsandpunctuations
symbolscanbeconsideredastokens.
Forexample,inClanguage,thevariabledeclarationline
intvalue=100;
containsthetokens:
int(keyword),value(identifier),=(operator),100(constant)and;(symbol).
SpecificationsofTokens
Letusunderstandhowthelanguagetheoryundertakesthefollowingterms:
Alphabets
Anyfinitesetofsymbols{0,1}isasetofbinaryalphabets,{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}isasetof
Hexadecimalalphabets,{az,AZ}isasetofEnglishlanguagealphabets.
Strings
Anyfinitesequenceofalphabetsiscalledastring.Lengthofthestringisthetotalnumberofoccurrenceof
alphabets,e.g.,thelengthofthestringtutorialspointis14andisdenotedby|tutorialspoint|=14.Astring
havingnoalphabets,i.e.astringofzerolengthisknownasanemptystringandisdenotedbyepsilon.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
7/61
11/29/2015
CompilerDesignQuickGuide
SpecialSymbols
Atypicalhighlevellanguagecontainsthefollowingsymbols:
ArithmeticSymbols
Punctuation
Assignment
SpecialAssignment
+=,/=,*=,=
Comparison
==,!=,<,<=,>,>=
Preprocessor
LocationSpecifier
&
Logical
&,&&,|,||,!
ShiftOperator
>>,>>>,<<,<<<
Language
Alanguageisconsideredasafinitesetofstringsoversomefinitesetofalphabets.Computerlanguagesare
consideredasfinitesets,andmathematicallysetoperationscanbeperformedonthem.Finitelanguages
canbedescribedbymeansofregularexpressions.
RegularExpressions
Thelexicalanalyzerneedstoscanandidentifyonlyafinitesetofvalidstring/token/lexemethatbelongto
thelanguageinhand.Itsearchesforthepatterndefinedbythelanguagerules.
Regularexpressionshavethecapabilitytoexpressfinitelanguagesbydefiningapatternforfinitestringsof
symbols.Thegrammardefinedbyregularexpressionsisknownasregulargrammar.Thelanguage
definedbyregulargrammarisknownasregularlanguage.
Regularexpressionisanimportantnotationforspecifyingpatterns.Eachpatternmatchesasetofstrings,
soregularexpressionsserveasnamesforasetofstrings.Programminglanguagetokenscanbedescribed
byregularlanguages.Thespecificationofregularexpressionsisanexampleofarecursivedefinition.
Regularlanguagesareeasytounderstandandhaveefficientimplementation.
Thereareanumberofalgebraiclawsthatareobeyedbyregularexpressions,whichcanbeusedto
manipulateregularexpressionsintoequivalentforms.
Operations
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
8/61
11/29/2015
CompilerDesignQuickGuide
Thevariousoperationsonlanguagesare:
UnionoftwolanguagesLandMiswrittenas
LUM={s|sisinLorsisinM}
ConcatenationoftwolanguagesLandMiswrittenas
LM={st|sisinLandtisinM}
TheKleeneClosureofalanguageLiswrittenas
L*=ZeroormoreoccurrenceoflanguageL.
Notations
IfrandsareregularexpressionsdenotingthelanguagesLrandLs,then
Union:r|sisaregularexpressiondenotingLrULs
Concatenation:rsisaregularexpressiondenotingLrLs
Kleeneclosure:r*isaregularexpressiondenotingL(r)*
risaregularexpressiondenotingLr
PrecedenceandAssociativity
*,concatenation. ,and|pipesignareleftassociative
*hasthehighestprecedence
Concatenation. hasthesecondhighestprecedence.
|pipesignhasthelowestprecedenceofall.
Representingvalidtokensofalanguageinregularexpression
Ifxisaregularexpression,then:
x*meanszeroormoreoccurrenceofx.
i.e.,itcangenerate{e,x,xx,xxx,xxxx,}
x+meansoneormoreoccurrenceofx.
i.e.,itcangenerate{x,xx,xxx,xxxx}orx.x*
x?meansatmostoneoccurrenceofx
i.e.,itcangenerateeither{x}or{e}.
[az]isalllowercasealphabetsofEnglishlanguage.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
9/61
11/29/2015
CompilerDesignQuickGuide
[AZ]isalluppercasealphabetsofEnglishlanguage.
[09]isallnaturaldigitsusedinmathematics.
Representingoccurrenceofsymbolsusingregularexpressions
letter=[az]or[AZ]
digit=0|1|2|3|4|5|6|7|8|9or[09]
sign=[+|]
Representinglanguagetokensusingregularexpressions
Decimal=sign? digit+
Identifier=letterletter | digit*
Theonlyproblemleftwiththelexicalanalyzerishowtoverifythevalidityofaregularexpressionusedin
specifyingthepatternsofkeywordsofalanguage.Awellacceptedsolutionistousefiniteautomatafor
verification.
FiniteAutomata
Finiteautomataisastatemachinethattakesastringofsymbolsasinputandchangesitsstateaccordingly.
Finiteautomataisarecognizerforregularexpressions.Whenaregularexpressionstringisfedintofinite
automata,itchangesitsstateforeachliteral.Iftheinputstringissuccessfullyprocessedandtheautomata
reachesitsfinalstate,itisaccepted,i.e.,thestringjustfedwassaidtobeavalidtokenofthelanguagein
hand.
Themathematicalmodeloffiniteautomataconsistsof:
FinitesetofstatesQ
Finitesetofinputsymbols
OneStartstateq0
Setoffinalstatesqf
Transitionfunction
ThetransitionfunctionmapsthefinitesetofstateQtoafinitesetofinputsymbols,QQ
FiniteAutomataConstruction
LetLrbearegularlanguagerecognizedbysomefiniteautomataFA.
States:StatesofFAarerepresentedbycircles.Statenamesarewritteninsidecircles.
Startstate:Thestatefromwheretheautomatastarts,isknownasthestartstate.Startstatehasan
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
10/61
11/29/2015
CompilerDesignQuickGuide
arrowpointedtowardsit.
Intermediatestates:Allintermediatestateshaveatleasttwoarrowsonepointingtoand
anotherpointingoutfromthem.
Finalstate:Iftheinputstringissuccessfullyparsed,theautomataisexpectedtobeinthisstate.
Finalstateisrepresentedbydoublecircles.Itmayhaveanyoddnumberofarrowspointingtoitand
evennumberofarrowspointingoutfromit.Thenumberofoddarrowsareonegreaterthaneven,
i.e.odd=even+1.
Transition:Thetransitionfromonestatetoanotherstatehappenswhenadesiredsymbolinthe
inputisfound.Upontransition,automatacaneithermovetothenextstateorstayinthesamestate.
Movementfromonestatetoanotherisshownasadirectedarrow,wherethearrowspointstothe
destinationstate.Ifautomatastaysonthesamestate,anarrowpointingfromastatetoitselfis
drawn.
Example:WeassumeFAacceptsanythreedigitbinaryvalueendingindigit1.FA={Q(q0,qf),0, 1,q0,
qf,}
LongestMatchRule
Whenthelexicalanalyzerreadthesourcecode,itscansthecodeletterbyletterandwhenitencountersa
whitespace,operatorsymbol,orspecialsymbols,itdecidesthatawordiscompleted.
Forexample:
intintvalue;
Whilescanningbothlexemestillint,thelexicalanalyzercannotdeterminewhetheritisakeywordintor
theinitialsofidentifierintvalue.
TheLongestMatchRulestatesthatthelexemescannedshouldbedeterminedbasedonthelongestmatch
amongallthetokensavailable.
Thelexicalanalyzeralsofollowsruleprioritywhereareservedword,e.g.,akeyword,ofalanguageis
givenpriorityoveruserinput.Thatis,ifthelexicalanalyzerfindsalexemethatmatcheswithanyexisting
reservedword,itshouldgenerateanerror.
COMPILERDESIGNSYNTAXANALYSIS
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
11/61
11/29/2015
CompilerDesignQuickGuide
COMPILERDESIGNSYNTAXANALYSIS
Syntaxanalysisorparsingisthesecondphaseofacompiler.Inthischapter,weshalllearnthebasic
conceptsusedintheconstructionofaparser.
Wehaveseenthatalexicalanalyzercanidentifytokenswiththehelpofregularexpressionsandpattern
rules.Butalexicalanalyzercannotcheckthesyntaxofagivensentenceduetothelimitationsoftheregular
expressions.Regularexpressionscannotcheckbalancingtokens,suchasparenthesis.Therefore,thisphase
usescontextfreegrammarCFG,whichisrecognizedbypushdownautomata.
CFG,ontheotherhand,isasupersetofRegularGrammar,asdepictedbelow:
ItimpliesthateveryRegularGrammarisalsocontextfree,butthereexistssomeproblems,whichare
beyondthescopeofRegularGrammar.CFGisahelpfultoolindescribingthesyntaxofprogramming
languages.
ContextFreeGrammar
Inthissection,wewillfirstseethedefinitionofcontextfreegrammarandintroduceterminologiesusedin
parsingtechnology.
Acontextfreegrammarhasfourcomponents:
AsetofnonterminalsV.Nonterminalsaresyntacticvariablesthatdenotesetsofstrings.The
nonterminalsdefinesetsofstringsthathelpdefinethelanguagegeneratedbythegrammar.
Asetoftokens,knownasterminalsymbols.Terminalsarethebasicsymbolsfromwhichstrings
areformed.
AsetofproductionsP.Theproductionsofagrammarspecifythemannerinwhichtheterminals
andnonterminalscanbecombinedtoformstrings.Eachproductionconsistsofanonterminal
calledtheleftsideoftheproduction,anarrow,andasequenceoftokensand/oronterminals,
calledtherightsideoftheproduction.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
12/61
11/29/2015
CompilerDesignQuickGuide
OneofthenonterminalsisdesignatedasthestartsymbolSfromwheretheproductionbegins.
Thestringsarederivedfromthestartsymbolbyrepeatedlyreplacinganonterminalinitiallythestartsymbol
bytherightsideofaproduction,forthatnonterminal.
Example
Wetaketheproblemofpalindromelanguage,whichcannotbedescribedbymeansofRegularExpression.
Thatis,L={w|w=wR}isnotaregularlanguage.ButitcanbedescribedbymeansofCFG,asillustrated
below:
G=(V,,P,S)
Where:
V={Q,Z,N}
={0,1}
P={QZ|QN|Q|Z0Q0|N1Q1}
S={Q}
Thisgrammardescribespalindromelanguage,suchas:1001,11100111,00100,1010101,11111,etc.
SyntaxAnalyzers
Asyntaxanalyzerorparsertakestheinputfromalexicalanalyzerintheformoftokenstreams.Theparser
analyzesthesourcecodetokenstreamagainsttheproductionrulestodetectanyerrorsinthecode.The
outputofthisphaseisaparsetree.
Thisway,theparseraccomplishestwotasks,i.e.,parsingthecode,lookingforerrorsandgeneratinga
parsetreeastheoutputofthephase.
Parsersareexpectedtoparsethewholecodeevenifsomeerrorsexistintheprogram.Parsersuseerror
recoveringstrategies,whichwewilllearnlaterinthischapter.
Derivation
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
13/61
11/29/2015
CompilerDesignQuickGuide
Aderivationisbasicallyasequenceofproductionrules,inordertogettheinputstring.Duringparsing,we
taketwodecisionsforsomesententialformofinput:
Decidingthenonterminalwhichistobereplaced.
Decidingtheproductionrule,bywhich,thenonterminalwillbereplaced.
Todecidewhichnonterminaltobereplacedwithproductionrule,wecanhavetwooptions.
LeftmostDerivation
Ifthesententialformofaninputisscannedandreplacedfromlefttoright,itiscalledleftmostderivation.
Thesententialformderivedbytheleftmostderivationiscalledtheleftsententialform.
RightmostDerivation
Ifwescanandreplacetheinputwithproductionrules,fromrighttoleft,itisknownasrightmost
derivation.Thesententialformderivedfromtherightmostderivationiscalledtherightsententialform.
Example
Productionrules:
EE+E
EE*E
Eid
Inputstring:id+id*id
Theleftmostderivationis:
EE*E
EE+E*E
Eid+E*E
Eid+id*E
Eid+id*id
Noticethattheleftmostsidenonterminalisalwaysprocessedfirst.
Therightmostderivationis:
EE+E
EE+E*E
EE+E*id
EE+id*id
Eid+id*id
ParseTree
Aparsetreeisagraphicaldepictionofaderivation.Itisconvenienttoseehowstringsarederivedfromthe
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
14/61
11/29/2015
CompilerDesignQuickGuide
startsymbol.Thestartsymbolofthederivationbecomestherootoftheparsetree.Letusseethisbyan
examplefromthelasttopic.
Wetaketheleftmostderivationofa+b*c
Theleftmostderivationis:
EE*E
EE+E*E
Eid+E*E
Eid+id*E
Eid+id*id
Step1:
EE*E
Step2:
EE+E*E
Step3:
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
15/61
11/29/2015
CompilerDesignQuickGuide
Eid+E*E
Step4:
Eid+id*E
Step5:
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
16/61
11/29/2015
CompilerDesignQuickGuide
Eid+id*id
Inaparsetree:
Allleafnodesareterminals.
Allinteriornodesarenonterminals.
Inordertraversalgivesoriginalinputstring.
Aparsetreedepictsassociativityandprecedenceofoperators.Thedeepestsubtreeistraversedfirst,
thereforetheoperatorinthatsubtreegetsprecedenceovertheoperatorwhichisintheparentnodes.
TypesofParsing
Syntaxanalyzersfollowproductionrulesdefinedbymeansofcontextfreegrammar.Thewaythe
productionrulesareimplementedderivationdividesparsingintotwotypes:topdownparsingandbottom
upparsing.
TopdownParsing
Whentheparserstartsconstructingtheparsetreefromthestartsymbolandthentriestotransformthe
startsymboltotheinput,itiscalledtopdownparsing.
Recursivedescentparsing:Itisacommonformoftopdownparsing.Itiscalledrecursiveasit
usesrecursiveprocedurestoprocesstheinput.Recursivedescentparsingsuffersfrombacktracking.
Backtracking:Itmeans,ifonederivationofaproductionfails,thesyntaxanalyzerrestartsthe
processusingdifferentrulesofsameproduction.Thistechniquemayprocesstheinputstringmore
thanoncetodeterminetherightproduction.
BottomupParsing
Asthenamesuggests,bottomupparsingstartswiththeinputsymbolsandtriestoconstructtheparsetree
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
17/61
11/29/2015
CompilerDesignQuickGuide
uptothestartsymbol.
Example:
Inputstring:a+b*c
Productionrules:
SE
EE+T
EE*T
ET
Tid
Letusstartbottomupparsing
a+b*c
Readtheinputandcheckifanyproductionmatcheswiththeinput:
a+b*c
T+b*c
E+b*c
E+T*c
E*c
E*T
E
S
Ambiguity
AgrammarGissaidtobeambiguousifithasmorethanoneparsetreeleftorrightderivationforatleastone
string.
Example
EE+E
EEE
Eid
Forthestringid+idid,theabovegrammargeneratestwoparsetrees:
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
18/61
11/29/2015
CompilerDesignQuickGuide
Thelanguagegeneratedbyanambiguousgrammarissaidtobeinherentlyambiguous.Ambiguityin
grammarisnotgoodforacompilerconstruction.Nomethodcandetectandremoveambiguity
automatically,butitcanberemovedbyeitherrewritingthewholegrammarwithoutambiguity,orby
settingandfollowingassociativityandprecedenceconstraints.
Associativity
Ifanoperandhasoperatorsonbothsides,thesideonwhichtheoperatortakesthisoperandisdecidedby
theassociativityofthoseoperators.Iftheoperationisleftassociative,thentheoperandwillbetakenby
theleftoperatororiftheoperationisrightassociative,therightoperatorwilltaketheoperand.
Example
OperationssuchasAddition,Multiplication,Subtraction,andDivisionareleftassociative.Iftheexpression
contains:
idopidopid
itwillbeevaluatedas:
(idopid)opid
Forexample,id + id+id
OperationslikeExponentiationarerightassociative,i.e.,theorderofevaluationinthesameexpression
willbe:
idop(idopid)
Forexample,id^id id
Precedence
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
19/61
11/29/2015
CompilerDesignQuickGuide
Iftwodifferentoperatorsshareacommonoperand,theprecedenceofoperatorsdecideswhichwilltake
theoperand.Thatis,2+3*4canhavetwodifferentparsetrees,onecorrespondingto2 + 3*4andanother
correspondingto2+3 4.Bysettingprecedenceamongoperators,thisproblemcanbeeasilyremoved.As
inthepreviousexample,mathematically*multiplicationhasprecedenceover+addition,sotheexpression
2+3*4willalwaysbeinterpretedas:
2+(3*4)
Thesemethodsdecreasethechancesofambiguityinalanguageoritsgrammar.
LeftRecursion
AgrammarbecomesleftrecursiveifithasanynonterminalAwhosederivationcontainsAitselfasthe
leftmostsymbol.Leftrecursivegrammarisconsideredtobeaproblematicsituationfortopdown
parsers.TopdownparsersstartparsingfromtheStartsymbol,whichinitselfisnonterminal.So,when
theparserencountersthesamenonterminalinitsderivation,itbecomeshardforittojudgewhentostop
parsingtheleftnonterminalanditgoesintoaninfiniteloop.
Example:
(1)A=>A|
(2)S=>A|
A=>Sd
1isanexampleofimmediateleftrecursion,whereAisanynonterminalsymbolandrepresentsastring
ofnonterminals.
2isanexampleofindirectleftrecursion.
AtopdownparserwillfirstparsetheA,whichinturnwillyieldastringconsistingofAitselfandtheparser
maygointoaloopforever.
RemovalofLeftRecursion
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
20/61
11/29/2015
CompilerDesignQuickGuide
Onewaytoremoveleftrecursionistousethefollowingtechnique:
Theproduction
A=>A|
isconvertedintofollowingproductions
A=>A
A=>A|
Thisdoesnotimpactthestringsderivedfromthegrammar,butitremovesimmediateleftrecursion.
Secondmethodistousethefollowingalgorithm,whichshouldeliminatealldirectandindirectleft
recursions.
Algorithm
START
ArrangenonterminalsinsomeorderlikeA1,A2,A3,,An
foreachifrom1ton
{
foreachjfrom1toi1
{
replaceeachproductionofformAiAj
withAi1|2|3||
whereAj1|2||narecurrentAjproductions
}
}
eliminateimmediateleftrecursion
END
Example
Theproductionset
S=>A|
A=>Sd
afterapplyingtheabovealgorithm,shouldbecome
S=>A|
A=>Ad|d
andthen,removeimmediateleftrecursionusingthefirsttechnique.
A=>dA
A=>dA|
Nownoneoftheproductionhaseitherdirectorindirectleftrecursion.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
21/61
11/29/2015
CompilerDesignQuickGuide
LeftFactoring
Ifmorethanonegrammarproductionruleshasacommonprefixstring,thenthetopdownparsercannot
makeachoiceastowhichoftheproductionitshouldtaketoparsethestringinhand.
Example
Ifatopdownparserencountersaproductionlike
A||
Thenitcannotdeterminewhichproductiontofollowtoparsethestringasbothproductionsarestarting
fromthesameterminalornon terminal.Toremovethisconfusion,weuseatechniquecalledleftfactoring.
Leftfactoringtransformsthegrammartomakeitusefulfortopdownparsers.Inthistechnique,wemake
oneproductionforeachcommonprefixesandtherestofthederivationisaddedbynewproductions.
Example
Theaboveproductionscanbewrittenas
A=>A
A=>||
Nowtheparserhasonlyoneproductionperprefixwhichmakesiteasiertotakedecisions.
FirstandFollowSets
Animportantpartofparsertableconstructionistocreatefirstandfollowsets.Thesesetscanprovidethe
actualpositionofanyterminalinthederivation.Thisisdonetocreatetheparsingtablewherethedecision
ofreplacingT[A,t]=withsomeproductionrule.
FirstSet
Thissetiscreatedtoknowwhatterminalsymbolisderivedinthefirstpositionbyanonterminal.For
example,
t
Thatisderivestterminalintheveryfirstposition.So,tFIRST.
AlgorithmforcalculatingFirstset
LookatthedefinitionofFIRSTset:
ifisaterminal,thenFIRST={}.
ifisanonterminalandisaproduction,thenFIRST={}.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
22/61
11/29/2015
CompilerDesignQuickGuide
ifisanonterminaland123nandanyFIRSTcontainstthentisinFIRST.
Firstsetcanbeseenas:FIRST={t|* t}{|* }
FollowSet
Likewise,wecalculatewhatterminalsymbolimmediatelyfollowsanonterminalinproductionrules.We
donotconsiderwhatthenonterminalcangeneratebutinstead,weseewhatwouldbethenextterminal
symbolthatfollowstheproductionsofanonterminal.
AlgorithmforcalculatingFollowset:
ifisastartsymbol,thenFOLLOW=$
ifisanonterminalandhasaproductionAB,thenFIRSTBisinFOLLOWAexcept.
ifisanonterminalandhasaproductionAB,whereB,thenFOLLOWAisinFOLLOW.
Followsetcanbeseenas:FOLLOW={t|S*t*}
ErrorrecoveryStrategies
Aparsershouldbeabletodetectandreportanyerrorintheprogram.Itisexpectedthatwhenanerroris
encountered,theparsershouldbeabletohandleitandcarryonparsingtherestoftheinput.Mostlyitis
expectedfromtheparsertocheckforerrorsbuterrorsmaybeencounteredatvariousstagesofthe
compilationprocess.Aprogrammayhavethefollowingkindsoferrorsatvariousstages:
Lexical:nameofsomeidentifiertypedincorrectly
Syntactical:missingsemicolonorunbalancedparenthesis
Semantical:incompatiblevalueassignment
Logical:codenotreachable,infiniteloop
Therearefourcommonerrorrecoverystrategiesthatcanbeimplementedintheparsertodealwitherrors
inthecode.
Panicmode
Whenaparserencountersanerroranywhereinthestatement,itignorestherestofthestatementbynot
processinginputfromerroneousinputtodelimiter,suchassemicolon.Thisistheeasiestwayoferror
recoveryandalso,itpreventstheparserfromdevelopinginfiniteloops.
Statementmode
Whenaparserencountersanerror,ittriestotakecorrectivemeasuressothattherestofinputsof
statementallowtheparsertoparseahead.Forexample,insertingamissingsemicolon,replacingcomma
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
23/61
11/29/2015
CompilerDesignQuickGuide
withasemicolonetc.Parserdesignershavetobecarefulherebecauseonewrongcorrectionmayleadtoan
infiniteloop.
Errorproductions
Somecommonerrorsareknowntothecompilerdesignersthatmayoccurinthecode.Inaddition,the
designerscancreateaugmentedgrammartobeused,asproductionsthatgenerateerroneousconstructs
whentheseerrorsareencountered.
Globalcorrection
Theparserconsiderstheprograminhandasawholeandtriestofigureoutwhattheprogramisintended
todoandtriestofindoutaclosestmatchforit,whichiserrorfree.WhenanerroneousinputstatementXis
fed,itcreatesaparsetreeforsomeclosesterrorfreestatementY.Thismayallowtheparsertomake
minimalchangesinthesourcecode,butduetothecomplexitytimeandspaceofthisstrategy,ithasnotbeen
implementedinpracticeyet.
AbstractSyntaxTrees
Parsetreerepresentationsarenoteasytobeparsedbythecompiler,astheycontainmoredetailsthan
actuallyneeded.Takethefollowingparsetreeasanexample:
Ifwatchedclosely,wefindmostoftheleafnodesaresinglechildtotheirparentnodes.Thisinformation
canbeeliminatedbeforefeedingittothenextphase.Byhidingextrainformation,wecanobtainatreeas
shownbelow:
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
24/61
11/29/2015
CompilerDesignQuickGuide
Abstracttreecanberepresentedas:
ASTsareimportantdatastructuresinacompilerwithleastunnecessaryinformation.ASTsaremore
compactthanaparsetreeandcanbeeasilyusedbyacompiler.
LimitationsofSyntaxAnalyzers
Syntaxanalyzersreceivetheirinputs,intheformoftokens,fromlexicalanalyzers.Lexicalanalyzersare
responsibleforthevalidityofatokensuppliedbythesyntaxanalyzer.Syntaxanalyzershavethefollowing
drawbacks:
itcannotdetermineifatokenisvalid,
itcannotdetermineifatokenisdeclaredbeforeitisbeingused,
itcannotdetermineifatokenisinitializedbeforeitisbeingused,
itcannotdetermineifanoperationperformedonatokentypeisvalidornot.
Thesetasksareaccomplishedbythesemanticanalyzer,whichweshallstudyinSemanticAnalysis.
COMPILERDESIGNSEMANTICANALYSIS
Wehavelearnthowaparserconstructsparsetreesinthesyntaxanalysisphase.Theplainparsetree
constructedinthatphaseisgenerallyofnouseforacompiler,asitdoesnotcarryanyinformationofhow
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
25/61
11/29/2015
CompilerDesignQuickGuide
toevaluatethetree.Theproductionsofcontextfreegrammar,whichmakestherulesofthelanguage,do
notaccommodatehowtointerpretthem.
Forexample
EE+T
TheaboveCFGproductionhasnosemanticruleassociatedwithit,anditcannothelpinmakinganysense
oftheproduction.
Semantics
Semanticsofalanguageprovidemeaningtoitsconstructs,liketokensandsyntaxstructure.Semanticshelp
interpretsymbols,theirtypes,andtheirrelationswitheachother.Semanticanalysisjudgeswhetherthe
syntaxstructureconstructedinthesourceprogramderivesanymeaningornot.
CFG+semanticrules=SyntaxDirectedDefinitions
Forexample:
inta=value;
shouldnotissueanerrorinlexicalandsyntaxanalysisphase,asitislexicallyandstructurallycorrect,butit
shouldgenerateasemanticerrorasthetypeoftheassignmentdiffers.Theserulesaresetbythegrammar
ofthelanguageandevaluatedinsemanticanalysis.Thefollowingtasksshouldbeperformedinsemantic
analysis:
Scoperesolution
Typechecking
Arrayboundchecking
SemanticErrors
Wehavementionedsomeofthesemanticserrorsthatthesemanticanalyzerisexpectedtorecognize:
Typemismatch
Undeclaredvariable
Reservedidentifiermisuse.
Multipledeclarationofvariableinascope.
Accessinganoutofscopevariable.
Actualandformalparametermismatch.
AttributeGrammar
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
26/61
11/29/2015
CompilerDesignQuickGuide
Attributegrammarisaspecialformofcontextfreegrammarwheresomeadditionalinformationattributes
areappendedtooneormoreofitsnonterminalsinordertoprovidecontextsensitiveinformation.Each
attributehaswelldefineddomainofvalues,suchasinteger,float,character,string,andexpressions.
Attributegrammarisamediumtoprovidesemanticstothecontextfreegrammaranditcanhelpspecify
thesyntaxandsemanticsofaprogramminglanguage.Attributegrammarwhenviewedasaparse treecanpass
valuesorinformationamongthenodesofatree.
Example:
EE+T{E.value=E.value+T.value}
TherightpartoftheCFGcontainsthesemanticrulesthatspecifyhowthegrammarshouldbeinterpreted.
Here,thevaluesofnonterminalsEandTareaddedtogetherandtheresultiscopiedtothenonterminal
E.
Semanticattributesmaybeassignedtotheirvaluesfromtheirdomainatthetimeofparsingandevaluated
atthetimeofassignmentorconditions.Basedonthewaytheattributesgettheirvalues,theycanbe
broadlydividedintotwocategories:synthesizedattributesandinheritedattributes.
Synthesizedattributes
Theseattributesgetvaluesfromtheattributevaluesoftheirchildnodes.Toillustrate,assumethe
followingproduction:
SABC
IfSistakingvaluesfromitschildnodesA, B, C,thenitissaidtobeasynthesizedattribute,asthevaluesof
ABCaresynthesizedtoS.
AsinourpreviousexampleE E + T,theparentnodeEgetsitsvaluefromitschildnode.Synthesized
attributesnevertakevaluesfromtheirparentnodesoranysiblingnodes.
Inheritedattributes
Incontrasttosynthesizedattributes,inheritedattributescantakevaluesfromparentand/orsiblings.Asin
thefollowingproduction,
SABC
AcangetvaluesfromS,BandC.BcantakevaluesfromS,A,andC.Likewise,CcantakevaluesfromS,A,
andB.
Expansion:Whenanonterminalisexpandedtoterminalsasperagrammaticalrule
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
27/61
11/29/2015
CompilerDesignQuickGuide
Reduction:Whenaterminalisreducedtoitscorrespondingnonterminalaccordingtogrammarrules.
Syntaxtreesareparsedtopdownandlefttoright.Wheneverreductionoccurs,weapplyitscorresponding
semanticrulesactions.
SemanticanalysisusesSyntaxDirectedTranslationstoperformtheabovetasks.
SemanticanalyzerreceivesASTAbstractSyntaxTreefromitspreviousstagesyntaxanalysis.
SemanticanalyzerattachesattributeinformationwithAST,whicharecalledAttributedAST.
Attributesaretwotuplevalue,<attributename,attributevalue>
Forexample:
intvalue=5;
<type,integer>
<presentvalue,5>
Foreveryproduction,weattachasemanticrule.
SattributedSDT
IfanSDTusesonlysynthesizedattributes,itiscalledasSattributedSDT.Theseattributesareevaluated
usingSattributedSDTsthathavetheirsemanticactionswrittenaftertheproductionrighthandside.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
28/61
11/29/2015
CompilerDesignQuickGuide
Asdepictedabove,attributesinSattributedSDTsareevaluatedinbottomupparsing,asthevaluesofthe
parentnodesdependuponthevaluesofthechildnodes.
LattributedSDT
ThisformofSDTusesbothsynthesizedandinheritedattributeswithrestrictionofnottakingvaluesfrom
rightsiblings.
InLattributedSDTs,anonterminalcangetvaluesfromitsparent,child,andsiblingnodes.Asinthe
followingproduction
SABC
ScantakevaluesfromA,B,andCsynthesized.AcantakevaluesfromSonly.BcantakevaluesfromSandA.
CcangetvaluesfromS,A,andB.Nononterminalcangetvaluesfromthesiblingtoitsright.
AttributesinLattributedSDTsareevaluatedbydepthfirstandlefttorightparsingmanner.
WemayconcludethatifadefinitionisSattributed,thenitisalsoLattributedasLattributeddefinition
enclosesSattributeddefinitions.
COMPILERDESIGNPARSER
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
29/61
11/29/2015
CompilerDesignQuickGuide
COMPILERDESIGNPARSER
Inthepreviouschapter,weunderstoodthebasicconceptsinvolvedinparsing.Inthischapter,wewilllearn
thevarioustypesofparserconstructionmethodsavailable.
Parsingcanbedefinedastopdownorbottomupbasedonhowtheparsetreeisconstructed.
TopDownParsing
Wehavelearntinthelastchapterthatthetopdownparsingtechniqueparsestheinput,andstarts
constructingaparsetreefromtherootnodegraduallymovingdowntotheleafnodes.Thetypesoftop
downparsingaredepictedbelow:
RecursiveDescentParsing
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
30/61
11/29/2015
CompilerDesignQuickGuide
Recursivedescentisatopdownparsingtechniquethatconstructstheparsetreefromthetopandtheinput
isreadfromlefttoright.Itusesproceduresforeveryterminalandnonterminalentity.Thisparsing
techniquerecursivelyparsestheinputtomakeaparsetree,whichmayormaynotrequirebacktracking.
Butthegrammarassociatedwithitifnotleftfactoredcannotavoidbacktracking.Aformofrecursivedescent
parsingthatdoesnotrequireanybacktrackingisknownaspredictiveparsing.
Thisparsingtechniqueisregardedrecursiveasitusescontextfreegrammarwhichisrecursiveinnature.
Backtracking
Topdownparsersstartfromtherootnodestartsymbolandmatchtheinputstringagainsttheproduction
rulestoreplacethemifmatched.Tounderstandthis,takethefollowingexampleofCFG:
SrXd|rZd
Xoa|ea
Zai
Foraninputstring:read,atopdownparser,willbehavelikethis:
ItwillstartwithSfromtheproductionrulesandwillmatchitsyieldtotheleftmostletteroftheinput,i.e.
r.TheveryproductionofSS rXdmatcheswithit.Sothetopdownparseradvancestothenextinput
letteri.e.e.TheparsertriestoexpandnonterminalXandchecksitsproductionfromtheleftXoa.It
doesnotmatchwiththenextinputsymbol.Sothetopdownparserbacktrackstoobtainthenext
productionruleofX,Xea.
Nowtheparsermatchesalltheinputlettersinanorderedmanner.Thestringisaccepted.
PredictiveParser
Predictiveparserisarecursivedescentparser,whichhasthecapabilitytopredictwhichproductionistobe
usedtoreplacetheinputstring.Thepredictiveparserdoesnotsufferfrombacktracking.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
31/61
11/29/2015
CompilerDesignQuickGuide
Toaccomplishitstasks,thepredictiveparserusesalookaheadpointer,whichpointstothenextinput
symbols.Tomaketheparserbacktrackingfree,thepredictiveparserputssomeconstraintsonthe
grammarandacceptsonlyaclassofgrammarknownasLLkgrammar.
Predictiveparsingusesastackandaparsingtabletoparsetheinputandgenerateaparsetree.Boththe
stackandtheinputcontainsanendsymbol$todenotethatthestackisemptyandtheinputisconsumed.
Theparserreferstotheparsingtabletotakeanydecisionontheinputandstackelementcombination.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
32/61
11/29/2015
CompilerDesignQuickGuide
Inrecursivedescentparsing,theparsermayhavemorethanoneproductiontochoosefromforasingle
instanceofinput,whereasinpredictiveparser,eachstephasatmostoneproductiontochoose.There
mightbeinstanceswherethereisnoproductionmatchingtheinputstring,makingtheparsingprocedure
tofail.
LLParser
AnLLParseracceptsLLgrammar.LLgrammarisasubsetofcontextfreegrammarbutwithsome
restrictionstogetthesimplifiedversion,inordertoachieveeasyimplementation.LLgrammarcanbe
implementedbymeansofbothalgorithmsnamely,recursivedescentortabledriven.
LLparserisdenotedasLLk.ThefirstLinLLkisparsingtheinputfromlefttoright,thesecondLinLLk
standsforleftmostderivationandkitselfrepresentsthenumberoflookaheads.Generallyk=1,soLLk
mayalsobewrittenasLL1.
LLParsingAlgorithm
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
33/61
11/29/2015
CompilerDesignQuickGuide
WemaysticktodeterministicLL1forparserexplanation,asthesizeoftablegrowsexponentiallywiththe
valueofk.Secondly,ifagivengrammarisnotLL1,thenusually,itisnotLLk,foranygivenk.
GivenbelowisanalgorithmforLL1Parsing:
Input:
string
parsingtableMforgrammarG
Output:
IfisinL(G)thenleftmostderivationof,
errorotherwise.
InitialState:$Sonstack(withSbeingstartsymbol)
$intheinputbuffer
SETiptopointthefirstsymbolof$.
repeat
letXbethetopstacksymbolandathesymbolpointedbyip.
ifXVtor$
ifX=a
POPXandadvanceip.
else
error()
endif
else
/*Xisnonterminal*/
ifM[X,a]=XY1,Y2,...Yk
POPX
PUSHYk,Yk1,...Y1/*Y1ontop*/
OutputtheproductionXY1,Y2,...Yk
else
error()
endif
endif
untilX=$
/*emptystack*/
AgrammarGisLL1ifA>alpha|baretwodistinctproductionsofG:
fornoterminal,bothalphaandbetaderivestringsbeginningwitha.
atmostoneofalphaandbetacanderiveemptystring.
ifbeta=>t,thenalphadoesnotderiveanystringbeginningwithaterminalinFOLLOWA.
BottomupParsing
Bottomupparsingstartsfromtheleafnodesofatreeandworksinupwarddirectiontillitreachestheroot
node.Here,westartfromasentenceandthenapplyproductionrulesinreversemannerinordertoreach
thestartsymbol.Theimagegivenbelowdepictsthebottomupparsersavailable.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
34/61
11/29/2015
CompilerDesignQuickGuide
ShiftReduceParsing
Shiftreduceparsingusestwouniquestepsforbottomupparsing.Thesestepsareknownasshiftstepand
reducestep.
Shiftstep:Theshiftstepreferstotheadvancementoftheinputpointertothenextinputsymbol,
whichiscalledtheshiftedsymbol.Thissymbolispushedontothestack.Theshiftedsymbolis
treatedasasinglenodeoftheparsetree.
Reducestep:WhentheparserfindsacompletegrammarruleRHSandreplacesittoLHS,itis
knownasreducestep.Thisoccurswhenthetopofthestackcontainsahandle.Toreduce,aPOP
functionisperformedonthestackwhichpopsoffthehandleandreplacesitwithLHSnonterminal
symbol.
LRParser
TheLRparserisanonrecursive,shiftreduce,bottomupparser.Itusesawideclassofcontextfree
grammarwhichmakesitthemostefficientsyntaxanalysistechnique.LRparsersarealsoknownasLRk
parsers,whereLstandsforlefttorightscanningoftheinputstreamRstandsfortheconstructionof
rightmostderivationinreverse,andkdenotesthenumberoflookaheadsymbolstomakedecisions.
TherearethreewidelyusedalgorithmsavailableforconstructinganLRparser:
SLR1SimpleLRParser:
Worksonsmallestclassofgrammar
Fewnumberofstates,henceverysmalltable
Simpleandfastconstruction
LR1LRParser:
WorksoncompletesetofLR1Grammar
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
35/61
11/29/2015
CompilerDesignQuickGuide
Generateslargetableandlargenumberofstates
Slowconstruction
LALR1LookAheadLRParser:
Worksonintermediatesizeofgrammar
NumberofstatesaresameasinSLR1
LRParsingAlgorithm
HerewedescribeaskeletonalgorithmofanLRparser:
token=next_token()
repeatforever
s=topofstack
ifaction[s,token]=shiftsithen
PUSHtoken
PUSHsi
token=next_token()
elseifaction[s,tpken]=reduceA::=then
POP2*||symbols
s=topofstack
PUSHA
PUSHgoto[s,A]
elseifaction[s,token]=acceptthen
return
else
error()
LLvs.LR
LL
LR
Doesaleftmostderivation.
Doesarightmostderivationinreverse.
Startswiththerootnonterminalonthestack.
Endswiththerootnonterminalonthestack.
Endswhenthestackisempty.
Startswithanemptystack.
Usesthestackfordesignatingwhatisstilltobe
expected.
Usesthestackfordesignatingwhatisalreadyseen.
Buildstheparsetreetopdown.
Buildstheparsetreebottomup.
Continuouslypopsanonterminaloffthestack,
andpushesthecorrespondingrighthandside.
Triestorecognizearighthandsideonthestack,
popsit,andpushesthecorrespondingnonterminal.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
36/61
11/29/2015
CompilerDesignQuickGuide
Expandsthenonterminals.
Reducesthenonterminals.
Readstheterminalswhenitpopsoneoffthe
stack.
Readstheterminalswhileitpushesthemonthe
stack.
Preordertraversaloftheparsetree.
Postordertraversaloftheparsetree.
COMPILERDESIGNRUNTIMEENVIRONMENT
Aprogramasasourcecodeismerelyacollectionoftextcode,statementsetc.andtomakeitalive,it
requiresactionstobeperformedonthetargetmachine.Aprogramneedsmemoryresourcestoexecute
instructions.Aprogramcontainsnamesforprocedures,identifiersetc.,thatrequiremappingwiththe
actualmemorylocationatruntime.
Byruntime,wemeanaprograminexecution.Runtimeenvironmentisastateofthetargetmachine,which
mayincludesoftwarelibraries,environmentvariables,etc.,toprovideservicestotheprocessesrunningin
thesystem.
Runtimesupportsystemisapackage,mostlygeneratedwiththeexecutableprogramitselfandfacilitates
theprocesscommunicationbetweentheprocessandtheruntimeenvironment.Ittakescareofmemory
allocationanddeallocationwhiletheprogramisbeingexecuted.
ActivationTrees
Aprogramisasequenceofinstructionscombinedintoanumberofprocedures.Instructionsinaprocedure
areexecutedsequentially.Aprocedurehasastartandanenddelimiterandeverythinginsideitiscalledthe
bodyoftheprocedure.Theprocedureidentifierandthesequenceoffiniteinstructionsinsideitmakeup
thebodyoftheprocedure.
Theexecutionofaprocedureiscalleditsactivation.Anactivationrecordcontainsallthenecessary
informationrequiredtocallaprocedure.Anactivationrecordmaycontainthefollowingunitsdepending
uponthesourcelanguageused.
Temporaries
Storestemporaryandintermediatevaluesofanexpression.
LocalData
Storeslocaldataofthecalledprocedure.
MachineStatus
StoresmachinestatussuchasRegisters,ProgramCounteretc.,beforethe
procedureiscalled.
ControlLink
Storestheaddressofactivationrecordofthecallerprocedure.
AccessLink
Storestheinformationofdatawhichisoutsidethelocalscope.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
37/61
11/29/2015
CompilerDesignQuickGuide
ActualParameters
Storesactualparameters,i.e.,parameterswhichareusedtosendinputtothe
calledprocedure.
ReturnValue
Storesreturnvalues.
Wheneveraprocedureisexecuted,itsactivationrecordisstoredonthestack,alsoknownascontrolstack.
Whenaprocedurecallsanotherprocedure,theexecutionofthecallerissuspendeduntilthecalled
procedurefinishesexecution.Atthistime,theactivationrecordofthecalledprocedureisstoredonthe
stack.
Weassumethattheprogramcontrolflowsinasequentialmannerandwhenaprocedureiscalled,its
controlistransferredtothecalledprocedure.Whenacalledprocedureisexecuted,itreturnsthecontrol
backtothecaller.Thistypeofcontrolflowmakesiteasiertorepresentaseriesofactivationsintheformof
atree,knownastheactivationtree.
Tounderstandthisconcept,wetakeapieceofcodeasanexample:
...
printf(EnterYourName:);
scanf(%s,username);
show_data(username);
printf(Pressanykeytocontinue);
...
intshow_data(char*user)
{
printf(Yournameis%s,username);
return0;
}
...
Belowistheactivationtreeofthecodegiven.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
38/61
11/29/2015
CompilerDesignQuickGuide
Nowweunderstandthatproceduresareexecutedindepthfirstmanner,thusstackallocationisthebest
suitableformofstorageforprocedureactivations.
StorageAllocation
Runtimeenvironmentmanagesruntimememoryrequirementsforthefollowingentities:
Code:Itisknownasthetextpartofaprogramthatdoesnotchangeatruntime.Itsmemory
requirementsareknownatthecompiletime.
Procedures:Theirtextpartisstaticbuttheyarecalledinarandommanner.Thatiswhy,stack
storageisusedtomanageprocedurecallsandactivations.
Variables:Variablesareknownattheruntimeonly,unlesstheyareglobalorconstant.Heap
memoryallocationschemeisusedformanagingallocationanddeallocationofmemoryfor
variablesinruntime.
StaticAllocation
Inthisallocationscheme,thecompilationdataisboundtoafixedlocationinthememoryanditdoesnot
changewhentheprogramexecutes.Asthememoryrequirementandstoragelocationsareknownin
advance,runtimesupportpackageformemoryallocationanddeallocationisnotrequired.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
39/61
11/29/2015
CompilerDesignQuickGuide
StackAllocation
Procedurecallsandtheiractivationsaremanagedbymeansofstackmemoryallocation.Itworksinlastin
firstoutLIFOmethodandthisallocationstrategyisveryusefulforrecursiveprocedurecalls.
HeapAllocation
Variableslocaltoaprocedureareallocatedanddeallocatedonlyatruntime.Heapallocationisusedto
dynamicallyallocatememorytothevariablesandclaimitbackwhenthevariablesarenomorerequired.
Exceptstaticallyallocatedmemoryarea,bothstackandheapmemorycangrowandshrinkdynamicallyand
unexpectedly.Therefore,theycannotbeprovidedwithafixedamountofmemoryinthesystem.
Asshownintheimageabove,thetextpartofthecodeisallocatedafixedamountofmemory.Stackand
heapmemoryarearrangedattheextremesoftotalmemoryallocatedtotheprogram.Bothshrinkand
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
40/61
11/29/2015
CompilerDesignQuickGuide
growagainsteachother.
ParameterPassing
Thecommunicationmediumamongproceduresisknownasparameterpassing.Thevaluesofthevariables
fromacallingprocedurearetransferredtothecalledprocedurebysomemechanism.Beforemoving
ahead,firstgothroughsomebasicterminologiespertainingtothevaluesinaprogram.
rvalue
Thevalueofanexpressioniscalleditsrvalue.Thevaluecontainedinasinglevariablealsobecomesanr
valueifitappearsontherighthandsideoftheassignmentoperator.rvaluescanalwaysbeassignedto
someothervariable.
lvalue
Thelocationofmemoryaddresswhereanexpressionisstoredisknownasthelvalueofthatexpression.It
alwaysappearsatthelefthandsideofanassignmentoperator.
Forexample:
day=1;
week=day*7;
month=1;
year=month*12;
Fromthisexample,weunderstandthatconstantvalueslike1,7,12,andvariableslikeday,week,monthand
year,allhavervalues.Onlyvariableshavelvaluesastheyalsorepresentthememorylocationassignedto
them.
Forexample:
7=x+y;
isanlvalueerror,astheconstant7doesnotrepresentanymemorylocation.
FormalParameters
Variablesthattaketheinformationpassedbythecallerprocedurearecalledformalparameters.These
variablesaredeclaredinthedefinitionofthecalledfunction.
ActualParameters
Variableswhosevaluesoraddressesarebeingpassedtothecalledprocedurearecalledactualparameters.
Thesevariablesarespecifiedinthefunctioncallasarguments.
Example:
fun_one()
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
41/61
11/29/2015
CompilerDesignQuickGuide
{
intactual_parameter=10;
callfun_two(intactual_parameter);
}
fun_two(intformal_parameter)
{
printformal_parameter;
}
Formalparametersholdtheinformationoftheactualparameter,dependingupontheparameterpassing
techniqueused.Itmaybeavalueoranaddress.
PassbyValue
Inpassbyvaluemechanism,thecallingprocedurepassesthervalueofactualparametersandthecompiler
putsthatintothecalledproceduresactivationrecord.Formalparametersthenholdthevaluespassedby
thecallingprocedure.Ifthevaluesheldbytheformalparametersarechanged,itshouldhavenoimpacton
theactualparameters.
PassbyReference
Inpassbyreferencemechanism,thelvalueoftheactualparameteriscopiedtotheactivationrecordofthe
calledprocedure.Thisway,thecalledprocedurenowhastheaddressmemorylocationoftheactual
parameterandtheformalparameterreferstothesamememorylocation.Therefore,ifthevaluepointed
bytheformalparameterischanged,theimpactshouldbeseenontheactualparameterastheyshouldalso
pointtothesamevalue.
PassbyCopyrestore
Thisparameterpassingmechanismworkssimilartopassbyreferenceexceptthatthechangestoactual
parametersaremadewhenthecalledprocedureends.Uponfunctioncall,thevaluesofactualparameters
arecopiedintheactivationrecordofthecalledprocedure.Formalparametersifmanipulatedhavenoreal
timeeffectonactualparametersaslvaluesarepassed,butwhenthecalledprocedureends,thelvaluesof
formalparametersarecopiedtothelvaluesofactualparameters.
Example:
inty;
calling_procedure()
{
y=10;
copy_restore(y);//lvalueofyispassed
printfy;//prints99
}
copy_restore(intx)
{
x=99;//ystillhasvalue10(unaffected)
y=0;//yisnow0
}
Whenthisfunctionends,thelvalueofformalparameterxiscopiedtotheactualparametery.Evenifthe
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
42/61
11/29/2015
CompilerDesignQuickGuide
valueofyischangedbeforetheprocedureends,thelvalueofxiscopiedtothelvalueofymakingit
behavelikecallbyreference.
PassbyName
LanguageslikeAlgolprovideanewkindofparameterpassingmechanismthatworkslikepreprocessorinC
language.Inpassbynamemechanism,thenameoftheprocedurebeingcalledisreplacedbyitsactual
body.Passbynametextuallysubstitutestheargumentexpressionsinaprocedurecallforthe
correspondingparametersinthebodyoftheproceduresothatitcannowworkonactualparameters,
muchlikepassbyreference.
COMPILERDESIGNSYMBOLTABLE
Symboltableisanimportantdatastructurecreatedandmaintainedbycompilersinordertostore
informationabouttheoccurrenceofvariousentitiessuchasvariablenames,functionnames,objects,
classes,interfaces,etc.Symboltableisusedbyboththeanalysisandthesynthesispartsofacompiler.
Asymboltablemayservethefollowingpurposesdependinguponthelanguageinhand:
Tostorethenamesofallentitiesinastructuredformatoneplace.
Toverifyifavariablehasbeendeclared.
Toimplementtypechecking,byverifyingassignmentsandexpressionsinthesourcecodeare
semanticallycorrect.
Todeterminethescopeofanamescoperesolution.
Asymboltableissimplyatablewhichcanbeeitherlinearorahashtable.Itmaintainsanentryforeach
nameinthefollowingformat:
<symbolname,type,attribute>
Forexample,ifasymboltablehastostoreinformationaboutthefollowingvariabledeclaration:
staticintinterest;
thenitshouldstoretheentrysuchas:
<interest,int,static>
Theattributeclausecontainstheentriesrelatedtothename.
Implementation
Ifacompileristohandleasmallamountofdata,thenthesymboltablecanbeimplementedasan
unorderedlist,whichiseasytocode,butitisonlysuitableforsmalltablesonly.Asymboltablecanbe
implementedinoneofthefollowingways:
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
43/61
11/29/2015
CompilerDesignQuickGuide
Linearsortedorunsortedlist
BinarySearchTree
Hashtable
Amongall,symboltablesaremostlyimplementedashashtables,wherethesourcecodesymbolitselfis
treatedasakeyforthehashfunctionandthereturnvalueistheinformationaboutthesymbol.
Operations
Asymboltable,eitherlinearorhash,shouldprovidethefollowingoperations.
insert
Thisoperationismorefrequentlyusedbyanalysisphase,i.e.,thefirsthalfofthecompilerwheretokens
areidentifiedandnamesarestoredinthetable.Thisoperationisusedtoaddinformationinthesymbol
tableaboutuniquenamesoccurringinthesourcecode.Theformatorstructureinwhichthenamesare
storeddependsuponthecompilerinhand.
Anattributeforasymbolinthesourcecodeistheinformationassociatedwiththatsymbol.This
informationcontainsthevalue,state,scope,andtypeaboutthesymbol.Theinsertfunctiontakesthe
symbolanditsattributesasargumentsandstorestheinformationinthesymboltable.
Forexample:
inta;
shouldbeprocessedbythecompileras:
insert(a,int);
lookup
lookupoperationisusedtosearchanameinthesymboltabletodetermine:
ifthesymbolexistsinthetable.
ifitisdeclaredbeforeitisbeingused.
ifthenameisusedinthescope.
ifthesymbolisinitialized.
ifthesymboldeclaredmultipletimes.
Theformatoflookupfunctionvariesaccordingtotheprogramminglanguage.Thebasicformatshould
matchthefollowing:
lookup(symbol)
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
44/61
11/29/2015
CompilerDesignQuickGuide
Thismethodreturns0zeroifthesymboldoesnotexistinthesymboltable.Ifthesymbolexistsinthe
symboltable,itreturnsitsattributesstoredinthetable.
ScopeManagement
Acompilermaintainstwotypesofsymboltables:aglobalsymboltablewhichcanbeaccessedbyallthe
proceduresandscopesymboltablesthatarecreatedforeachscopeintheprogram.
Todeterminethescopeofaname,symboltablesarearrangedinhierarchicalstructureasshowninthe
examplebelow:
...
intvalue=10;
voidpro_one()
{
intone_1;
intone_2;
{\
intone_3;|_innerscope1
intone_4;|
}/
intone_5;
{\
intone_6;|_innerscope2
intone_7;|
}/
}
voidpro_two()
{
inttwo_1;
inttwo_2;
{\
inttwo_3;|_innerscope3
inttwo_4;|
}/
inttwo_5;
}
...
Theaboveprogramcanberepresentedinahierarchicalstructureofsymboltables:
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
45/61
11/29/2015
CompilerDesignQuickGuide
Theglobalsymboltablecontainsnamesforoneglobalvariableintvalueandtwoprocedurenames,which
shouldbeavailabletoallthechildnodesshownabove.Thenamesmentionedinthepro_onesymboltable
andallitschildtablesarenotavailableforpro_twosymbolsanditschildtables.
Thissymboltabledatastructurehierarchyisstoredinthesemanticanalyzerandwheneveranameneedsto
besearchedinasymboltable,itissearchedusingthefollowingalgorithm:
firstasymbolwillbesearchedinthecurrentscope,i.e.currentsymboltable.
ifanameisfound,thensearchiscompleted,elseitwillbesearchedintheparentsymboltableuntil,
eitherthenameisfoundorglobalsymboltablehasbeensearchedforthename.
COMPILERINTERMEDIATECODEGENERATION
Asourcecodecandirectlybetranslatedintoitstargetmachinecode,thenwhyatallweneedtotranslate
thesourcecodeintoanintermediatecodewhichisthentranslatedtoitstargetcode?Letusseethereasons
whyweneedanintermediatecode.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
46/61
11/29/2015
CompilerDesignQuickGuide
Ifacompilertranslatesthesourcelanguagetoitstargetmachinelanguagewithouthavingtheoption
forgeneratingintermediatecode,thenforeachnewmachine,afullnativecompilerisrequired.
Intermediatecodeeliminatestheneedofanewfullcompilerforeveryuniquemachinebykeeping
theanalysisportionsameforallthecompilers.
Thesecondpartofcompiler,synthesis,ischangedaccordingtothetargetmachine.
Itbecomeseasiertoapplythesourcecodemodificationstoimprovecodeperformancebyapplying
codeoptimizationtechniquesontheintermediatecode.
IntermediateRepresentation
Intermediatecodescanberepresentedinavarietyofwaysandtheyhavetheirownbenefits.
HighLevelIRHighlevelintermediatecoderepresentationisveryclosetothesourcelanguage
itself.Theycanbeeasilygeneratedfromthesourcecodeandwecaneasilyapplycodemodifications
toenhanceperformance.Butfortargetmachineoptimization,itislesspreferred.
LowLevelIRThisoneisclosetothetargetmachine,whichmakesitsuitableforregisterand
memoryallocation,instructionsetselection,etc.Itisgoodformachinedependentoptimizations.
Intermediatecodecanbeeitherlanguagespecifice.g.,ByteCodeforJavaorlanguageindependentthree
addresscode.
ThreeAddressCode
Intermediatecodegeneratorreceivesinputfromitspredecessorphase,semanticanalyzer,intheformof
anannotatedsyntaxtree.Thatsyntaxtreethencanbeconvertedintoalinearrepresentation,e.g.,postfix
notation.Intermediatecodetendstobemachineindependentcode.Therefore,codegeneratorassumesto
haveunlimitednumberofmemorystorageregistertogeneratecode.
Forexample:
a=b+c*d;
Theintermediatecodegeneratorwilltrytodividethisexpressionintosubexpressionsandthengenerate
thecorrespondingcode.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
47/61
11/29/2015
CompilerDesignQuickGuide
r1=c*d;
r2=b+r1;
r3=r2+r1;
a=r3
rbeingusedasregistersinthetargetprogram.
Athreeaddresscodehasatmostthreeaddresslocationstocalculatetheexpression.Athreeaddresscode
canberepresentedintwoforms:quadruplesandtriples.
Quadruples
Eachinstructioninquadruplespresentationisdividedintofourfields:operator,arg1,arg2,andresult.The
aboveexampleisrepresentedbelowinquadruplesformat:
Op
arg1
arg2
result
r1
r1
r2
r2
r1
r3
r3
Triples
Eachinstructionintriplespresentationhasthreefields:op,arg1,andarg2.Theresultsofrespectivesub
expressionsaredenotedbythepositionofexpression.TriplesrepresentsimilaritywithDAGandsyntax
tree.TheyareequivalenttoDAGwhilerepresentingexpressions.
Op
arg1
arg2
Triplesfacetheproblemofcodeimmovabilitywhileoptimization,astheresultsarepositionaland
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
48/61
11/29/2015
CompilerDesignQuickGuide
changingtheorderorpositionofanexpressionmaycauseproblems.
IndirectTriples
Thisrepresentationisanenhancementovertriplesrepresentation.Itusespointersinsteadofpositionto
storeresults.Thisenablestheoptimizerstofreelyrepositionthesubexpressiontoproduceanoptimized
code.
Declarations
Avariableorprocedurehastobedeclaredbeforeitcanbeused.Declarationinvolvesallocationofspacein
memoryandentryoftypeandnameinthesymboltable.Aprogrammaybecodedanddesignedkeeping
thetargetmachinestructureinmind,butitmaynotalwaysbepossibletoaccuratelyconvertasourcecode
toitstargetlanguage.
Takingthewholeprogramasacollectionofproceduresandsubprocedures,itbecomespossibletodeclare
allthenameslocaltotheprocedure.Memoryallocationisdoneinaconsecutivemannerandnamesare
allocatedtomemoryinthesequencetheyaredeclaredintheprogram.Weuseoffsetvariableandsetitto
zero{offset=0}thatdenotethebaseaddress.
Thesourceprogramminglanguageandthetargetmachinearchitecturemayvaryinthewaynamesare
stored,sorelativeaddressingisused.Whilethefirstnameisallocatedmemorystartingfromthememory
location0{offset=0},thenextnamedeclaredlater,shouldbeallocatedmemorynexttothefirstone.
Example:
WetaketheexampleofCprogramminglanguagewhereanintegervariableisassigned2bytesofmemory
andafloatvariableisassigned4bytesofmemory.
inta;
floatb;
Allocationprocess:
{offset=0}
inta;
id.type=int
id.width=2
offset=offset+id.width
{offset=2}
floatb;
id.type=float
id.width=4
offset=offset+id.width
{offset=6}
Toenterthisdetailinasymboltable,aprocedureentercanbeused.Thismethodmayhavethefollowing
structure:
enter(name,type,offset)
Thisprocedureshouldcreateanentryinthesymboltable,forvariablename,havingitstypesettotype
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
49/61
11/29/2015
CompilerDesignQuickGuide
andrelativeaddressoffsetinitsdataarea.
COMPILERDESIGNCODEGENERATION
Codegenerationcanbeconsideredasthefinalphaseofcompilation.Throughpostcodegeneration,
optimizationprocesscanbeappliedonthecode,butthatcanbeseenasapartofcodegenerationphase
itself.Thecodegeneratedbythecompilerisanobjectcodeofsomelowerlevelprogramminglanguage,for
example,assemblylanguage.Wehaveseenthatthesourcecodewritteninahigherlevellanguageis
transformedintoalowerlevellanguagethatresultsinalowerlevelobjectcode,whichshouldhavethe
followingminimumproperties:
Itshouldcarrytheexactmeaningofthesourcecode.
ItshouldbeefficientintermsofCPUusageandmemorymanagement.
Wewillnowseehowtheintermediatecodeistransformedintotargetobjectcodeassemblycode,inthis
case.
DirectedAcyclicGraph
DirectedAcyclicGraphDAGisatoolthatdepictsthestructureofbasicblocks,helpstoseetheflowof
valuesflowingamongthebasicblocks,andoffersoptimizationtoo.DAGprovideseasytransformationon
basicblocks.DAGcanbeunderstoodhere:
Leafnodesrepresentidentifiers,namesorconstants.
Interiornodesrepresentoperators.
Interiornodesalsorepresenttheresultsofexpressionsortheidentifiers/namewherethevaluesare
tobestoredorassigned.
Example:
t0=a+b
t1=t0+c
d=t0+t1
[t0=a+b]
[t1=t0+c]
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
50/61
11/29/2015
CompilerDesignQuickGuide
[d=t0+t1]
PeepholeOptimization
Thisoptimizationtechniqueworkslocallyonthesourcecodetotransformitintoanoptimizedcode.By
locally,wemeanasmallportionofthecodeblockathand.Thesemethodscanbeappliedonintermediate
codesaswellasontargetcodes.Abunchofstatementsisanalyzedandarecheckedforthefollowing
possibleoptimization:
Redundantinstructionelimination
Atsourcecodelevel,thefollowingcanbedonebytheuser:
intadd_ten(intx)
{
inty,z;
y=10;
z=x+y;
returnz;
}
intadd_ten(intx)
{
inty;
y=10;
y=x+y;
returny;
}
intadd_ten(intx)
{
inty=10;
returnx+y;
}
intadd_ten(intx)
{
returnx+10;
}
Atcompilationlevel,thecompilersearchesforinstructionsredundantinnature.Multipleloadingand
storingofinstructionsmaycarrythesamemeaningevenifsomeofthemareremoved.Forexample:
MOVx,R0
MOVR0,R1
Wecandeletethefirstinstructionandrewritethesentenceas:
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
51/61
11/29/2015
CompilerDesignQuickGuide
MOVx,R1
Unreachablecode
Unreachablecodeisapartoftheprogramcodethatisneveraccessedbecauseofprogrammingconstructs.
Programmersmayhaveaccidentlywrittenapieceofcodethatcanneverbereached.
Example:
voidadd_ten(intx)
{
returnx+10;
printf(valueofxis%d,x);
}
Inthiscodesegment,theprintfstatementwillneverbeexecutedastheprogramcontrolreturnsback
beforeitcanexecute,henceprintfcanberemoved.
Flowofcontroloptimization
Thereareinstancesinacodewheretheprogramcontroljumpsbackandforthwithoutperformingany
significanttask.Thesejumpscanberemoved.Considerthefollowingchunkofcode:
...
MOVR1,R2
GOTOL1
...
L1:GOTOL2
L2:INCR1
Inthiscode,labelL1canberemovedasitpassesthecontroltoL2.SoinsteadofjumpingtoL1andthento
L2,thecontrolcandirectlyreachL2,asshownbelow:
...
MOVR1,R2
GOTOL2
...
L2:INCR1
Algebraicexpressionsimplification
Thereareoccasionswherealgebraicexpressionscanbemadesimple.Forexample,theexpressiona=a+
0canbereplacedbyaitselfandtheexpressiona=a+1cansimplybereplacedbyINCa.
Strengthreduction
Thereareoperationsthatconsumemoretimeandspace.Theirstrengthcanbereducedbyreplacingthem
withotheroperationsthatconsumelesstimeandspace,butproducethesameresult.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
52/61
11/29/2015
CompilerDesignQuickGuide
Forexample,x*2canbereplacedbyx<<1,whichinvolvesonlyoneleftshift.Thoughtheoutputofa*a
anda2 issame,a2 ismuchmoreefficienttoimplement.
Accessingmachineinstructions
Thetargetmachinecandeploymoresophisticatedinstructions,whichcanhavethecapabilitytoperform
specificoperationsmuchefficiently.Ifthetargetcodecanaccommodatethoseinstructionsdirectly,that
willnotonlyimprovethequalityofcode,butalsoyieldmoreefficientresults.
CodeGenerator
Acodegeneratorisexpectedtohaveanunderstandingofthetargetmachinesruntimeenvironmentandits
instructionset.Thecodegeneratorshouldtakethefollowingthingsintoconsiderationtogeneratethe
code:
Targetlanguage:Thecodegeneratorhastobeawareofthenatureofthetargetlanguagefor
whichthecodeistobetransformed.Thatlanguagemayfacilitatesomemachinespecificinstructions
tohelpthecompilergeneratethecodeinamoreconvenientway.Thetargetmachinecanhaveeither
CISCorRISCprocessorarchitecture.
IRType:Intermediaterepresentationhasvariousforms.ItcanbeinAbstractSyntaxTreeAST
structure,ReversePolishNotation,or3addresscode.
Selectionofinstruction:ThecodegeneratortakesIntermediateRepresentationasinputand
convertsmapsitintotargetmachinesinstructionset.Onerepresentationcanhavemanyways
instructionstoconvertit,soitbecomestheresponsibilityofthecodegeneratortochoosethe
appropriateinstructionswisely.
Registerallocation:Aprogramhasanumberofvaluestobemaintainedduringtheexecution.
ThetargetmachinesarchitecturemaynotallowallofthevaluestobekeptintheCPUmemoryor
registers.Codegeneratordecideswhatvaluestokeepintheregisters.Also,itdecidestheregistersto
beusedtokeepthesevalues.
Orderingofinstructions:Atlast,thecodegeneratordecidestheorderinwhichtheinstruction
willbeexecuted.Itcreatesschedulesforinstructionstoexecutethem.
Descriptors
Thecodegeneratorhastotrackboththeregistersforavailabilityandaddresseslocationofvalueswhile
generatingthecode.Forbothofthem,thefollowingtwodescriptorsareused:
Registerdescriptor:Registerdescriptorisusedtoinformthecodegeneratoraboutthe
availabilityofregisters.Registerdescriptorkeepstrackofvaluesstoredineachregister.Whenevera
newregisterisrequiredduringcodegeneration,thisdescriptorisconsultedforregisteravailability.
Addressdescriptor:Valuesofthenamesidentifiersusedintheprogrammightbestoredat
differentlocationswhileinexecution.Addressdescriptorsareusedtokeeptrackofmemory
locationswherethevaluesofidentifiersarestored.TheselocationsmayincludeCPUregisters,
heaps,stacks,memoryoracombinationofthementionedlocations.
Codegeneratorkeepsboththedescriptorupdatedinrealtime.Foraloadstatement,LDR1,x,thecode
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
53/61
11/29/2015
CompilerDesignQuickGuide
generator:
updatestheRegisterDescriptorR1thathasvalueofxand
updatestheAddressDescriptorxtoshowthatoneinstanceofxisinR1.
CodeGeneration
Basicblockscompriseofasequenceofthreeaddressinstructions.Codegeneratortakesthesesequenceof
instructionsasinput.
Note:Ifthevalueofanameisfoundatmorethanoneplaceregister,cache,ormemory,theregisters
valuewillbepreferredoverthecacheandmainmemory.Likewisecachesvaluewillbepreferredoverthe
mainmemory.Mainmemoryisbarelygivenanypreference.
getReg:CodegeneratorusesgetRegfunctiontodeterminethestatusofavailableregistersandthe
locationofnamevalues.getRegworksasfollows:
IfvariableYisalreadyinregisterR,itusesthatregister.
ElseifsomeregisterRisavailable,itusesthatregister.
Elseifboththeaboveoptionsarenotpossible,itchoosesaregisterthatrequiresminimalnumberof
loadandstoreinstructions.
Foraninstructionx=yOPz,thecodegeneratormayperformthefollowingactions.LetusassumethatLis
thelocationpreferablyregisterwheretheoutputofyOPzistobesaved:
CallfunctiongetReg,todecidethelocationofL.
DeterminethepresentlocationregisterormemoryofybyconsultingtheAddressDescriptorofy.If
yisnotpresentlyinregisterL,thengeneratethefollowinginstructiontocopythevalueofytoL:
MOVy,L
whereyrepresentsthecopiedvalueofy.
Determinethepresentlocationofzusingthesamemethodusedinstep2foryandgeneratethe
followinginstruction:
OPz,L
wherezrepresentsthecopiedvalueofz.
NowLcontainsthevalueofyOPz,thatisintendedtobeassignedtox.So,ifLisaregister,update
itsdescriptortoindicatethatitcontainsthevalueofx.Updatethedescriptorofxtoindicatethatit
isstoredatlocationL.
Ifyandzhasnofurtheruse,theycanbegivenbacktothesystem.
Othercodeconstructslikeloopsandconditionalstatementsaretransformedintoassemblylanguagein
generalassemblyway.
COMPILERDESIGNCODEOPTIMIZATION
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
54/61
11/29/2015
CompilerDesignQuickGuide
COMPILERDESIGNCODEOPTIMIZATION
Optimizationisaprogramtransformationtechnique,whichtriestoimprovethecodebymakingitconsume
lessresourcesi.e.CPU,Memoryanddeliverhighspeed.
Inoptimization,highlevelgeneralprogrammingconstructsarereplacedbyveryefficientlowlevel
programmingcodes.Acodeoptimizingprocessmustfollowthethreerulesgivenbelow:
Theoutputcodemustnot,inanyway,changethemeaningoftheprogram.
Optimizationshouldincreasethespeedoftheprogramandifpossible,theprogramshoulddemand
lessnumberofresources.
Optimizationshoulditselfbefastandshouldnotdelaytheoverallcompilingprocess.
Effortsforanoptimizedcodecanbemadeatvariouslevelsofcompilingtheprocess.
Atthebeginning,userscanchange/rearrangethecodeorusebetteralgorithmstowritethecode.
Aftergeneratingintermediatecode,thecompilercanmodifytheintermediatecodebyaddress
calculationsandimprovingloops.
Whileproducingthetargetmachinecode,thecompilercanmakeuseofmemoryhierarchyandCPU
registers.
Optimizationcanbecategorizedbroadlyintotwotypes:machineindependentandmachinedependent.
MachineindependentOptimization
Inthisoptimization,thecompilertakesintheintermediatecodeandtransformsapartofthecodethat
doesnotinvolveanyCPUregistersand/orabsolutememorylocations.Forexample:
do
{
item=10;
value=value+item;
}while(value<100);
Thiscodeinvolvesrepeatedassignmentoftheidentifieritem,whichifweputthisway:
Item=10;
do
{
value=value+item;
}while(value<100);
shouldnotonlysavetheCPUcycles,butcanbeusedonanyprocessor.
MachinedependentOptimization
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
55/61
11/29/2015
CompilerDesignQuickGuide
Machinedependentoptimizationisdoneafterthetargetcodehasbeengeneratedandwhenthecodeis
transformedaccordingtothetargetmachinearchitecture.ItinvolvesCPUregistersandmayhaveabsolute
memoryreferencesratherthanrelativereferences.Machinedependentoptimizersputeffortstotake
maximumadvantageofmemoryhierarchy.
BasicBlocks
Sourcecodesgenerallyhaveanumberofinstructions,whicharealwaysexecutedinsequenceandare
consideredasthebasicblocksofthecode.Thesebasicblocksdonothaveanyjumpstatementsamong
them,i.e.,whenthefirstinstructionisexecuted,alltheinstructionsinthesamebasicblockwillbeexecuted
intheirsequenceofappearancewithoutlosingtheflowcontroloftheprogram.
Aprogramcanhavevariousconstructsasbasicblocks,likeIFTHENELSE,SWITCHCASEconditional
statementsandloopssuchasDOWHILE,FOR,andREPEATUNTIL,etc.
Basicblockidentification
Wemayusethefollowingalgorithmtofindthebasicblocksinaprogram:
Searchheaderstatementsofallthebasicblocksfromwhereabasicblockstarts:
Firststatementofaprogram.
Statementsthataretargetofanybranchconditional/unconditional.
Statementsthatfollowanybranchstatement.
Headerstatementsandthestatementsfollowingthemformabasicblock.
Abasicblockdoesnotincludeanyheaderstatementofanyotherbasicblock.
Basicblocksareimportantconceptsfrombothcodegenerationandoptimizationpointofview.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
56/61
11/29/2015
CompilerDesignQuickGuide
Loading[MathJax]/jax/element/mml/optable/GeneralPunctuation.js
Basicblocksplayanimportantroleinidentifyingvariables,whicharebeingusedmorethanonceinasingle
basicblock.Ifanyvariableisbeingusedmorethanonce,theregistermemoryallocatedtothatvariable
neednotbeemptiedunlesstheblockfinishesexecution.
ControlFlowGraph
Basicblocksinaprogramcanberepresentedbymeansofcontrolflowgraphs.Acontrolflowgraphdepicts
howtheprogramcontrolisbeingpassedamongtheblocks.Itisausefultoolthathelpsinoptimizationby
helplocatinganyunwantedloopsintheprogram.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
57/61
11/29/2015
CompilerDesignQuickGuide
LoopOptimization
Mostprogramsrunasaloopinthesystem.Itbecomesnecessarytooptimizetheloopsinordertosave
CPUcyclesandmemory.Loopscanbeoptimizedbythefollowingtechniques:
Invariantcode:Afragmentofcodethatresidesintheloopandcomputesthesamevalueateach
iterationiscalledaloopinvariantcode.Thiscodecanbemovedoutoftheloopbysavingittobe
computedonlyonce,ratherthanwitheachiteration.
Inductionanalysis:Avariableiscalledaninductionvariableifitsvalueisalteredwithintheloop
byaloopinvariantvalue.
Strengthreduction:ThereareexpressionsthatconsumemoreCPUcycles,time,andmemory.
Theseexpressionsshouldbereplacedwithcheaperexpressionswithoutcompromisingtheoutputof
expression.Forexample,multiplicationx*2isexpensiveintermsofCPUcyclesthanx<<1and
yieldsthesameresult.
DeadcodeElimination
Deadcodeisoneormorethanonecodestatements,whichare:
Eitherneverexecutedorunreachable,
Orifexecuted,theiroutputisneverused.
Thus,deadcodeplaysnoroleinanyprogramoperationandthereforeitcansimplybeeliminated.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
58/61
11/29/2015
CompilerDesignQuickGuide
Partiallydeadcode
Therearesomecodestatementswhosecomputedvaluesareusedonlyundercertaincircumstances,i.e.,
sometimesthevaluesareusedandsometimestheyarenot.Suchcodesareknownaspartiallydeadcode.
Theabovecontrolflowgraphdepictsachunkofprogramwherevariableaisusedtoassigntheoutputof
expressionx*y.Letusassumethatthevalueassignedtoaisneverusedinsidetheloop.Immediately
afterthecontrolleavestheloop,aisassignedthevalueofvariablez,whichwouldbeusedlaterinthe
program.Weconcludeherethattheassignmentcodeofaisneverusedanywhere,thereforeitiseligible
tobeeliminated.
Likewise,thepictureabovedepictsthattheconditionalstatementisalwaysfalse,implyingthatthecode,
writtenintruecase,willneverbeexecuted,henceitcanberemoved.
PartialRedundancy
Redundantexpressionsarecomputedmorethanonceinparallelpath,withoutanychangein
operands.whereaspartialredundantexpressionsarecomputedmorethanonceinapath,withoutany
changeinoperands.Forexample,
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
59/61
11/29/2015
CompilerDesignQuickGuide
[redundantexpression]
[partiallyredundantexpression]
Loopinvariantcodeispartiallyredundantandcanbeeliminatedbyusingacodemotiontechnique.
Anotherexampleofapartiallyredundantcodecanbe:
If(condition)
{
a=yOPz;
}
else
{
...
}
c=yOPz;
Weassumethatthevaluesofoperands(yandz)arenotchangedfromassignmentofvariableatovariable
c.Here,iftheconditionstatementistrue,thenyOPziscomputedtwice,otherwiseonce.Codemotioncan
beusedtoeliminatethisredundancy,asshownbelow:
If(condition)
{
...
tmp=yOPz;
a=tmp;
...
}
else
{
...
tmp=yOPz;
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
60/61
11/29/2015
CompilerDesignQuickGuide
}
c=tmp;
Here,whethertheconditionistrueorfalseyOPzshouldbecomputedonlyonce.
file:///C:/Users/ANANDA/Downloads/Compiler%20Design%20%20Quick%20Guide.htm
61/61