You are on page 1of 26

WebDataManagement170708

MostFrequentlyGTUQuestionswithSolution

1. Discuss:XMLfulfilrequirementsforsemistructureddatamodel.
Or
Whatissemistructureddata?XMLdocumentisasemistructureddatamodelJustify
withsuitableexample.
Or
Howsemistructureddatamodelresolvesissuesinexistingwebdata
management?
Or
Howsemistructureddatamodelresolvesissuesinexistingwebdata
management?
Or
AnXMLdocumentisalabeled,unranked,orderedtree."Explainwithexample.

Solution:

Asemistructureddatamodelisbasedonanorganizationofdatainlabeledtrees(possiblygraphs)
andonquerylanguagesforaccessingandupdatingdata.Thelabelscapturethestructural
information.Sincethesemodelsareconsideredinthecontextofdataexchange,theytypically
proposesomeformofdataserialization(i.e.,astandardrepresentationofdatainfiles).Indeed,the
mostsuccessfulsuchmodel,namelyXML,isoftenconfusedwithitsserializationsyntax.

TreeRepresentationLabeledonEdges

We next presentinformallyastandardsemistructureddatamodel.WestartwithanideafamiliartoLisp
programmers of association lists, which are nothing more than labelvalue pairs and are used to
representrecordlikeortuplelikestructures:
{name:"Alan",tel:2157786,email:"agb@abc.com"}
This is simply a set of pairs such as (name, "Alan") consisting of a label and avalue.Thevaluesmay
themselvesbeotherstructuresasin
{name:{first:"Alan",last:"Black"},tel:2157786,email:"agb@abc.com"}

PreparedByProf.ChintanDave,SPCE,Visnagar.

We may represent this data graphically as a tree. See, for instance, the label structure is captured by
tree edges, whereas data values reside at leaves. the second, all information resides in the
vertices.Such representations suggest departing from the usual assumption made about tuples or
association lists that the labels are unique, and we allow duplicate labels as in {name: "Alan", tel:
2157786,tel:2498762}

2.DiscussimportanceofDTDandXSDwithexample.

OR
CompareDTDandXSD.

DTD, or Document Type Definition, and XML


Schema, which is also known as XSD, are two ways of
describing the structure and content of an XML document. DTD is the older ofthetwo,andassuch,it
has limitations that XML Schema has tried to improve. The first difference between DTD and XML
Schema, is namespace awareness XML Schema is, while DTD is not. Namespace awareness
removes the ambiguity that can result in having certain elements and attributes from multiple XML
vocabularies,bygivingthemnamespacesthatputtheelementorattributeintocontext.
Part of the reason why XML Schema is namespace aware while DTD is not, is the fact
that XML
Schema is written in XML, and DTD is not. Therefore, XML Schemas can be programmatically
processed just like any XML document. XML Schema also eliminates the need
to learn another
language,asitiswritteninXML,unlikeDTD.
Another key advantage of XML Schema, is its ability toimplementstrongtyping.AnXMLSchemacan
define the datatype of certain elements, and even constrain it to within specificlengthsorvalues.This
ability ensures that the data stored in the XML document is accurate. DTD lacks strong typing
capabilities, and has no way of validating the content to data types. XML Schema has a wealth of
derived and builtin data types to validate content. This provides the advantage stated above. It also
has uniform data types, but as all processors and validators need to support thesedatatypes,itoften
causesolderXMLparserstofail.
A characteristic of DTD that people often consider both as an advantage and disadvantage, is the
ability to defineDTDsinline,whichXMLSchemalacks.Thisisgood
whenworkingwithsmallfiles,as it
allows you to contain both the content and the schema within the same document, but whenitcomes
to larger documents, this can be a disadvantage, as you pull content every time you retrieve the
schema.Thiscanleadtoseriousoverheadthatcandegradeperformance.
Summary:
1.XMLSchemaisnamespaceaware,whileDTDisnot.
2.XMLSchemasarewritteninXML,whileDTDsarenot.
3.XMLSchemaisstronglytyped,whileDTDisnot.
4.XMLSchemahasawealthofderivedandbuiltindatatypesthatarenotavailableinDTD.
5.XMLSchemadoesnotallowinlinedefinitions,whileDTDdoes.
DTDExample:
<?xmlversion="1.0"encoding="UTF8"?>
<!ELEMENTemployees(Efirstname,Elastname,Etitle,Ephone,Eemail)>
<!ELEMENTEfirstname(#PCDATA)>
<!ELEMENTElastname(#PCDATA)>
<!ELEMENTEtitle(#PCDATA)>
PreparedByProf.ChintanDave,SPCE,Visnagar.

<!ELEMENTEphone(#PCDATA)>
<!ELEMENTEemail(#PCDATA)>

XSDExample:
<?xmlversion="1.0"encoding="UTF8"?>
<xsd:schemaxmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:od="urn:schemasmicrosoftcom:officedata">
<xsd:elementname="dataroot">
<xsd:complexType>
<xsd:sequence>
<xsd:elementref="employees"minOccurs="0"maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attributename="generated"type="xsd:dateTime"/>
</xsd:complexType>
</xsd:element>
<xsd:elementname="employees">
<xsd:annotation>
<xsd:appinfo>
<od:indexindexname="PrimaryKey"indexkey="Employeeid"primary="yes"

unique="yes"clustered="no"/>
<od:indexindexname="Employeeid"indexkey="Employeeid"primary="no"unique="no"

clustered="no"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:complexType>
<xsd:sequence>
<xsd:elementname="Elastname"minOccurs="0"od:jetType="text"

od:sqlSType="nvarchar">
<xsd:simpleType>
<xsd:restrictionbase="xsd:string">
<xsd:maxLengthvalue="50"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:elementname="Etitle"minOccurs="0"od:jetType="text"od:sqlSType="nvarchar">
<xsd:simpleType>
<xsd:restrictionbase="xsd:string">
<xsd:maxLengthvalue="50"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:elementname="Ephone"minOccurs="0"od:jetType="text"

od:sqlSType="nvarchar">
<xsd:simpleType>
<xsd:restrictionbase="xsd:string">
<xsd:maxLengthvalue="50"/>
PreparedByProf.ChintanDave,SPCE,Visnagar.

</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:elementname="Eemail"minOccurs="0"od:jetType="text"
od:sqlSType="nvarchar">
<xsd:simpleType>
<xsd:restrictionbase="xsd:string">
<xsd:maxLengthvalue="50"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:elementname="Ephoto"minOccurs="0"od:jetType="text"

od:sqlSType="nvarchar">
<xsd:simpleType>
<xsd:restrictionbase="xsd:string">
<xsd:maxLengthvalue="50"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>

3.WhatisXPath?DescribeXPathdatamodelwithasuitableexample.

Solution:

XPath(theXMLPathlanguage)isalanguageforfindinginformationinanXMLdocument.

WhatisXPath?

XPathisasyntaxfordefiningpartsofanXMLdocument
XPathusespathexpressionstonavigateinXML
documents
XPathcontainsalibraryofstandardfunctions
XPathisamajorelementinXSLT
XPathisalsousedinXQuery,XPointerandXLink
XPathisaW3Crecommendation

XPath Example
WewillusethefollowingXMLdocument:
PreparedByProf.ChintanDave,SPCE,Visnagar.

<?xmlversion="1.0"encoding="UTF8"?>

<bookstore>

<bookcategory="cooking">
<titlelang="en">EverydayItalian</title>
<author>GiadaDeLaurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>

<bookcategory="children">
<titlelang="en">HarryPotter</title>
<author>JK.Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>

<bookcategory="web">
<titlelang="en">XQueryKickStart</title>
<author>JamesMcGovern</author>
<author>PerBothner</author>
<author>KurtCagle</author>
<author>JamesLinn</author>
<author>VaidyanathanNagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>

<bookcategory="web">
<titlelang="en">LearningXML</title>
<author>ErikT.Ray</author>
<year>2003</year>
<price>39.95</price>
</book>

</bookstore>

InthetablebelowwehavelistedsomeXPathexpressionsandtheresultoftheexpressions:
XPathExpression

Result

/bookstore/book[1]

Selectsthefirstbookelementthatisthechildof
thebookstore
element

/bookstore/book[last()]

Selectsthelastbookelementthatisthechildofthe
bookstoreelement

PreparedByProf.ChintanDave,SPCE,Visnagar.

/bookstore/book[last()1]

Selectsthelastbutonebookelementthatisthe
childofthe bookstoreelement

/bookstore/book[position()<3]

Selectsthefirsttwobookelementsthatare
childrenofthe
bookstoreelement

//title[@lang]

Selectsallthetitleelementsthathaveanattribute
named
lang

//title[@lang='en']

Selectsallthetitleelementsthathavea"lang"
attribute
withavalueof"en"

/bookstore/book[price>35.00]

Selectsallthebookelementsofthebookstore
elementthat haveapriceelementwithavalue
greaterthan35.00

/bookstore/book[price>35.00]/title

Selectsallthetitleelementsofthebookelements
ofthe bookstoreelementthathaveapriceelement
withavaluegreaterthan35.00

4.XQuerywithExample.

XQueryistoXMLwhatSQListodatabasetables.
XQueryisdesignedtoqueryXMLdatanotjustXMLfiles,butanythingthatcanappearasXML,
includingdatabases.

XQueryExample
for$xindoc("books.xml")/bookstore/book
where$x/price>30
orderby$x/title
return$x/title

XQueryis
the
languageforqueryingXMLdata
XQueryforXMLislikeSQLfordatabases
XQueryisbuiltonXPathexpressions
XQueryissupportedbyallmajordatabases
XQueryisaW3CRecommendation

PreparedByProf.ChintanDave,SPCE,Visnagar.

<?xmlversion="1.0"encoding="UTF8"?>

<bookstore>

<bookcategory="COOKING">
<titlelang="en">EverydayItalian</title>
<author>GiadaDeLaurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>

<bookcategory="CHILDREN">
<titlelang="en">HarryPotter</title>
<author>JK.Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>

<bookcategory="WEB">
<titlelang="en">XQueryKickStart</title>
<author>JamesMcGovern</author>
<author>PerBothner</author>
<author>KurtCagle</author>
<author>JamesLinn</author>
<author>VaidyanathanNagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>

<bookcategory="WEB">
<titlelang="en">LearningXML</title>
<author>ErikT.Ray</author>
<year>2003</year>
<price>39.95</price>
</book>

</bookstore>

PathExpression:doc("books.xml")/bookstore/book/title
Output:
<titlelang="en">EverydayItalian</title>
<titlelang="en">HarryPotter</title>
<titlelang="en">XQueryKickStart</title>
<titlelang="en">LearningXML</title>

PreparedByProf.ChintanDave,SPCE,Visnagar.

5.ExplainFlowerExpressionWithexample.

Solution:
Expression:
for$xindoc("books.xml")/bookstore/book
where$x/price>30
orderby$x/title
return$x/title

Output:

<titlelang="en">LearningXML</title>
<titlelang="en">XQueryKickStart</title>

FLWORisanacronymfor"For,Let,Where,Orderby,Return".
Theforclauseselectsallbookelementsunderthebookstoreelementintoavariablecalled$x.
Thewhereclauseselectsonlybookelementswithapriceelementwithavaluegreaterthan30.
Theorderbyclausedefinesthesortorder.Willbesortbythetitleelement.
Thereturnclausespecifieswhatshouldbereturned.Hereitreturnsthetitleelements.

6.WhatisXMLTyping?Whyitisessentialinwebdatamanagement.

Solution:

PerhapsthemaindifferencewithtypinginrelationalsystemsisthattypingisnotcompulsoryforXML.It
isperfectlyfinetohaveanXMLdocumentwithnoprescribedtype.However,whendevelopingand
usingsoftware,typesareessential,forinteroperability,consistency,andefficiency.Wedescribethese
motivationsnextandconcludethesectionbycontrastingtwokindsoftypechecking,namelydynamic
andstatic.
Interoperability:
Schemasservetodocumenttheinterfaceofsoftwarecomponents,andprovide
thereforeakeyingredientfortheinteroperabilitybetweenprograms:aprogramthatconsumesanXML
documentofagiventypecanassumethattheprogramthathasgeneratedithasproduceda
documentofthattype.
Consistency
:Similarlytodependenciesfortherelationalmodel(primarykeys,foreignkeyconstraints,
etc.),typinganXMLdocumentisalsousefultoprotectdataagainstimproperupdates.
StorageEfficiency.SupposethatsomeXMLdocumentisveryregular,say,itcontainsalistof
companies,with,foreach,anID,aname,anaddressandthenameofitsCEO.Thissameinformation
maybestoredverycompactly,forinstance,withoutrepeatingthenamesofelementssuchasaddress
foreach
company:
Thus,aprioriknowledgeonthetypeofthedatamayhelpimproveitsstorage.
QueryEfficiency.ConsiderthefollowingXQueryquery:
for$bindoc("bib.xml")/bib//*
where$b/*/zip=12345
return$b/title
Knowingthatthedocumentconsistsofalistofbooksandknowingtheexacttypeofbookelements,
onemaybeabletorewritethequery:
PreparedByProf.ChintanDave,SPCE,Visnagar.

for$bindoc("bib.xml")/bib/book
where$b/address/zip=12345
return$b/title
thatistypicallymuchcheapertoevaluate.Notethatintheabsenceofaschema,asimilarprocessing
ispossiblebyfirstcomputingfromthedocumentitselfadataguide,i.e.,astructuralsummaryofall
pathsfromtherootinthedocument.Therearealsoothermoreinvolvedschemainferencetechniques
thatallowattachingsuchanaposteriorischematoaschemalessdocument.
DynamicandStaticTyping
AssumethatXMLdocuments(atleastsomeofthem)areassociatedwithschemasandthatprograms
usetheseschemas.Inparticular,theyverifythatprocesseddocumentsarevalidagainstthem.Mostof
thetime,suchverificationisdynamic.Forinstance,aWebserververifiesthetypewhensendingan
XMLdocumentorwhenreceivingit.Indeed,XMLdatatendtobecheckedquiteoftenbecause
programsprefertoverifytypesdynamically(whentheytransferdata)thanriskingtorunintodataof
unexpectedstructureduringexecution.

7
.WhatisXQuery?Describedoc()andcollection()functionwithsuitableexample.

Solution:

XQueryisafunctionallanguagethatisusedtoretrieveinformationstoredinXMLformat.XQuerycan
beusedonXMLdocuments,relationaldatabasescontainingdatainXMLformats,orXMLDatabases.
Characteristics

FunctionalLanguageXQueryisalanguagetoretrieve/queryingXMLbaseddata.

AnalogoustoSQLXQueryistoXMLwhatSQListodatabases.

XPathbasedXQueryusesXPathexpressionstonavigatethroughXMLdocuments.

UniversallyacceptedXQueryissupportedbyallmajordatabases.

W3CStandardXQueryisaW3Cstandard.

BenefitsofXQuery

UsingXQuery,bothhierarchicalandtabulardatacanberetrieved.

XQuerycanbeusedtoquerytreeandgraphicalstructures.

XQuerycanbedirectlyusedtoquerywebpages.

XQuerycanbedirectlyusedtobuildwebpages.

XQuerycanbeusedtotransformxmldocuments.

Collection()Function:
Thefn:collectionfunctionreturnsacollection.whichmaybeanysequenceofnodes,identifiedbya
URI.Oftenitreturnsthedocumentnodesofanumberofdocuments.ExactlyhowtheURIis
associatedwiththenodesisdefinedbytheimplementation.Saxon,forexample,dereferences$argto
retrieveanXMLcollectiondocumentthatlistsallofthedocumentsinthecollection.Adatabase

PreparedByProf.ChintanDave,SPCE,Visnagar.

implementationmightallowyoudefinecollections(andadddocumentstothem)usingtheproduct's
userinterface.
If$argisarelativeURI,itisresolvedbasedonthebaseURIofthestaticcontext.ThebaseURIofthe
staticcontextmaybesetbytheprocessoroutsidethescopeofthequery,oritmaybedeclaredinthe
queryprolog.
Thefn:collectionfunctionisstable.Thismeansthatifyoucallthefn:collectionfunctionmorethanonce
withtheexactsameargument,withinthesamequery,theresultisthesame,evenifsomehowthe
resourcesassociatedwiththeURIhavechanged.

Doc()Function:
Returnsthevalueofthedocumenturipropertyforthespecifiednode

Thefn:docfunctionreturnsthedocumentnodeoftheresourcefoundatthespecifiedURI.
Ifyouareaccessingdocumentsonafilesystem,yourimplementationmayrequireyoutoprecedethe
filenamewithfile:///,useforwardslashestoseparatedirectorynames,andescapeeachspaceinthe
filenamewith%20.
RelativeURIreferencesarealsoallowed.If$uriisarelativeURI,itisresolvedbasedonthebaseURI
ofthestaticcontext.ThebaseURIofthestaticcontextmaybesetbytheprocessoroutsidethescope
ofthequery,oritmaybedeclaredinthequeryprolog.
Notethatthefn:docfunctionreturnsthedocumentnode,nottherootelementnode.Therefore,you
needtoincludetherootelementnodeinyourpath

8.WhatisXPathAxes?DescribefollowingAxesinbrief.
1)Axes2)Sibling3)descendant4)parent5)attribute6)self7)preceding
8)descendant9)child10)ancestor
Solution:

PathAxes
Anaxisdefinesanodesetrelativetothecurrentnode.

AxisName
ancestor

Result
Selectsallancestors(parent,grandparent,etc.)ofthe
currentnode

ancestororself

Selectsallancestors(parent,grandparent,etc.)ofthe
currentnode andthecurrentnodeitself

attribute

Selectsallattributesofthecurrentnode

child

Selectsallchildrenofthecurrentnode

PreparedByProf.ChintanDave,SPCE,Visnagar.

descendant

Selectsalldescendants(children,grandchildren,etc.)
ofthecurrentnode

descendantorself

Selectsalldescendants(children,grandchildren,etc.)
ofthecurrentnode

following

andthecurrentnodeitself

Selectseverythinginthedocumentaftertheclosing
tagofthecurrentnode

followingsibling

Selectsallsiblingsafterthecurrentnode

namespace

Selectsallnamespacenodesofthecurrentnode

parent

Selectstheparentofthecurrentnode

preceding

Selectsallnodesthatappearbeforethecurrentnode
in

thedocument,exceptancestors,attribute

nodesandnamespacenodes
precedingsibling

Selectsallsiblingsbeforethecurrentnode

self

Selectsthecurrentnode

9.ExplainXLinkandXPointerwithexample.

Solution:

XPointer
(theXMLPointerlanguage)allowshyperlinkstopointtospecificparts(fragments)ofXML
documents.

XPointerallowsthelinkstopointtospecificpartsofanXMLdocument

XPointerusesXPathexpressionstonavigateintheXMLdocument

XPointerisaW3CRecommendation

<?xmlversion="1.0"encoding="UTF8"?>

<mydogsxmlns:xlink="http://www.w3.org/1999/xlink">

<mydog>
<description>
Antonismyfavoritedog.Hehaswonalotof.....
</description>
<factxlink:type="simple"xlink:href="http://dog.com/dogbreeds.xml#Rottweiler">
PreparedByProf.ChintanDave,SPCE,Visnagar.

FactaboutRottweiler

</fact>
</mydog>

<mydog>
<description>
Plutoisthesweetestdogonearth......
</description>
<factxlink:type="simple"xlink:href="http://dog.com/dogbreeds.xml#FCRetriever">

FactaboutflatcoatedRetriever

</fact>
</mydog>

</mydogs>

XPointeralsoallowsashorthandmethodforlinkingtoanelementwithanid.Youcanusethevalueof
theiddirectly,likethis:xlink:href="
http://dog.com/dogbreeds.xml#Rottweiler
".

XLink
(theXMLLinkinglanguage)definesmethodsforcreatinglinkswithinXMLdocuments.

XLinkisusedtocreatehyperlinkswithinXMLdocuments
AnyelementinanXMLdocumentcanbehaveasalink
XLinksupportssimplelinks(likeHTML)andextendedlinks(forlinkingmultipleresourcestogether)
WithXLink,thelinkscanbedefinedoutsidethelinkedfiles
XLinkisaW3CRecommendation

InHTML,the<a>elementdefinesahyperlink.However,thisisnothowitworksinXML.InXML
documents,youcanusewhateverelementnamesyouwantthereforeitisimpossibleforbrowsersto
predictwhatlinkelementswillbecalledinXMLdocuments.
BelowisasimpleexampleofhowtouseXLinktocreatelinksinanXMLdocument:
<?xmlversion="1.0"encoding="UTF8"?>

<homepagesxmlns:xlink="http://www.w3.org/1999/xlink">
<homepagexlink:type="simple"xlink:href="http://www.w3schools.com">VisitW3Schools</homepage>
<homepagexlink:type="simple"xlink:href="http://www.w3.org">VisitW3C</homepage>
</homepages>

10.ExplainWebGraphMining.
Solution:
PreparedByProf.ChintanDave,SPCE,Visnagar.

Recommendations on the Web is a general term representing a specific type of information filtering
technique that attempts to present information items (queries, movies, images, books, Web pages,
etc.) that are likely of interest to the users. With the diverse and explosive growth ofWebinformation,
how to organize and utilize the information effectively and efficiently has become more and more
critical. ThisisespeciallyimportantforWeb2.0relatedapplicationssinceusergeneratedinformationis
more freestyle and less structured, which increases the difficulties in mining useful information from
these data sources. In order to satisfy the information needs of Web users and improve the user
experienceinmanyWebapplications,RecommenderSystems.
Typically, recommender systems are based on Collaborative Filtering which is a technique that
automatically predicts the interest of an active user by collecting rating information from other similar
users or items. The underlying assumption of collaborative filtering is that the active user will prefer
those items which other similar users prefer. Based on this simple but effective intuition, collaborative
filtering has been widely employed in some large, wellknown commercial systems, including product
recommendation at Amazon, movie recommendation at Netflix etc. But, collaborative filtering
algorithms require a useritem rating matrix which contains userspecific rating preferences to infer
users characteristics. However, in most of the cases, rating data are always unavailable since
informationontheWebislessstructuredandmorediverse.
Fortunately, on the Web, no matter whattypesofdatasourcesareusedforrecommendations,inmost
cases, these data sources can be modeled in the form of various types of graphs. If we can design a
general graph recommendation algorithm, we can solve many recommendation problems. However,
while designing a framework for recommendations on the Web, there areseveralchallenges.Thefirst
challenge is that it is not easy to recommend latent semantically relevantresultstousers.TakeQuery
Suggestion as an example, there are several outstanding issues that can potentially degrade the
quality of the recommendations. The first one is the ambiguity which commonly exists in the natural
language. For example queriescontainingambiguoustermsmayconfusethealgorithmsdegradingthe
informationneedsoftheusers.
Moreover users tend to submit short queries consisting of only one or two terms under most
circumstances, and short queries are more likely to be ambiguous. Adding to that, in most cases, the
reason why users perform a search is because they have little or even no knowledge about the topic
they are searching for. In order to find satisfactory answers, users have to rephrase their queries
constantly. The second challenge is how to take into account the personalization feature.
Personalization is desirable for manyscenarioswheredifferentusershavedifferentinformationneeds.
For instance, Amazon.com has been the early adopter of personalization technology to recommend
products to shoppers on its site,basedupontheirpreviouspurchases.Theadoptionofpersonalization
will not only filter out irrelevant information to a person, but also providemorespecificinformationthat
isincreasinglyrelevanttoapersonsinterests.
The last challenge is that it is time consuming and inefficient to design different recommendation
algorithms for different recommendation tasks. Hence a general framework is needed to unify the
recommendationtasksontheWeb.Moreover,mostofexistingmethodsarecomplicated andrequireto
tune a large number of parameters. In this paper, aiming at solving the problems analyzed above, a
general framework is proposed for the recommendations on theWeb.This frameworkis builtuponthe
heatdiffusiononbothundirectedgraphsanddirectedgraphs,andhasseveraladvantages.
Itisageneralmethod,whichcanbeutilizedtomanyrecommendationtasksontheWeb.
Itcanprovidelatentsemanticallyrelevantresultstotheoriginalinformationneed.
Thismodelprovidesanaturaltreatmentforpersonalizedrecommendations.
Thedesignedrecommendationalgorithmisscalabletoverylargedatasets.
PreparedByProf.ChintanDave,SPCE,Visnagar.

The empirical analysis on severallargescaledatasetsshowsthattheproposedframeworkiseffective


andefficientforgeneratinghighqualityrecommendations.

11.WhatisOntology?DescribeRDF,RDFSinBrief.

OR

OWL
Solution:

The
WebOntologyLanguage
(
OWL
)isafamilyof

knowledgerepresentation
languagesforauthoring
ontologies
.Ontologiesareaformalwaytodescribetaxonomiesandclassificationnetworks,essentially
definingthestructureofknowledgeforvariousdomains:thenounsrepresentingclassesofobjectsand
theverbsrepresentingrelationsbetweentheobjects.Ontologiesresemble

classhierarchies
in
objectorientedprogramming
butthereareseveralcriticaldifferences.Classhierarchiesaremeantto
representstructuresusedinsourcecodethatevolvefairlyslowly(typicallymonthlyrevisions)whereas
ontologiesaremeanttorepresentinformationontheInternetandareexpectedtobeevolvingalmost
constantly.Similarly,ontologiesaretypicallyfarmoreflexibleastheyaremeanttorepresent
informationontheInternetcomingfromallsortsofheterogeneousdatasources.Classhierarchieson
theotherhandaremeanttobefairlystaticandrelyonfarlessdiverseandmorestructuredsourcesof
datasuchascorporatedatabases.

What is RDF?
RDF stands for Resource Description Framework
RDF is a framework for describing resources on the web
RDF is designed to be read and understood by computers
RDF is not designed for being displayed to people
RDF is written in XML
RDF is a part of the W3C's Semantic Web Activity
RDF is a W3C Recommendation from 10. February 2004
RDF - Examples of Use
Describing properties for shopping items, such as price and availability
Describing time schedules for web events
Describing information about web pages (content, author, created and modified date)
Describing content and rating for web pictures
Describing content for search engines
Describing electronic libraries

RDFResource,Property,andPropertyValue
PreparedByProf.ChintanDave,SPCE,Visnagar.

RDFidentifiesthingsusingWebidentifiers(URIs),anddescribesresourceswithpropertiesand
propertyvalues.
ExplanationofResource,Property,andPropertyvalue:
A
Resource
isanythingthatcanhaveaURI,suchas"http://www.w3schools.com/rdf"
A
Property
isaResourcethathasaname,suchas"author"or"homepage"
A
Propertyvalue
isthevalueofaProperty,suchas"JanEgilRefsnes"or
"http://www.w3schools.com"(notethatapropertyvaluecanbeanotherresource)
ThefollowingRDFdocumentcoulddescribetheresource"http://www.w3schools.com/rdf":
<?xmlversion="1.0"?>

<RDF>
<Descriptionabout="http://www.w3schools.com/rdf">
<author>JanEgilRefsnes</author>
<homepage>http://www.w3schools.com</homepage>
</Description>
</RDF>

Theexampleaboveissimplified.Namespacesareomitted.

RDFStatements
ThecombinationofaResource,aProperty,andaPropertyvalueformsa
Statement
(knownasthe
subject,predicateandobject
ofaStatement).
Let'slookatsomeexamplestatementstogetabetterunderstanding:
Statement:"Theauthorofhttp://www.w3schools.com/rdfisJanEgilRefsnes".
Thesubjectofthestatementaboveis:http://www.w3schools.com/rdf
Thepredicateis:author
Theobjectis:JanEgilRefsnes
Statement:"Thehomepageofhttp://www.w3schools.com/rdfishttp://www.w3schools.com".
Thesubjectofthestatementaboveis:http://www.w3schools.com/rdf
Thepredicateis:homepage
Theobjectis:http://www.w3schools.com
RDFExample
HerearetworecordsfromaCDlist:
Title

Artist

Country

Company

Price

Year

EmpireBurlesque

BobDylan

USA

Columbia

10.90

1985

Hideyourheart

BonnieTyler

UK

CBSRecords

9.90

1988

BelowisafewlinesfromanRDFdocument:
<?xmlversion="1.0"?>

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22rdfsyntaxns#"
PreparedByProf.ChintanDave,SPCE,Visnagar.

xmlns:cd="http://www.recshop.fake/cd#">

<rdf:Description
rdf:about="http://www.recshop.fake/cd/EmpireBurlesque">
<cd:artist>BobDylan</cd:artist>
<cd:country>USA</cd:country>
<cd:company>Columbia</cd:company>
<cd:price>10.90</cd:price>
<cd:year>1985</cd:year>
</rdf:Description>

<rdf:Description
rdf:about="http://www.recshop.fake/cd/Hideyourheart">
<cd:artist>BonnieTyler</cd:artist>
<cd:country>UK</cd:country>
<cd:company>CBSRecords</cd:company>
<cd:price>9.90</cd:price>
<cd:year>1988</cd:year>
</rdf:Description>
.
.
.
</rdf:RDF>
ThefirstlineoftheRDFdocumentistheXMLdeclaration.TheXMLdeclarationisfollowedbytheroot
elementofRDFdocuments:
<rdf:RDF>
.
The
xmlns:rdf
namespace,specifiesthatelementswiththerdfprefixarefromthenamespace
"http://www.w3.org/1999/02/22rdfsyntaxns#".
The
xmlns:cd
namespace,specifiesthatelementswiththecdprefixarefromthenamespace
"http://www.recshop.fake/cd#".
The
<rdf:Description>
elementcontainsthedescriptionoftheresourceidentifiedbythe
rdf:about
attribute.
Theelements:
<cd:artist>,<cd:country>,<cd:company>,
etc.arepropertiesoftheresource.

12.
Explainwebcrawlingprocessindetails.
Solution:
WhenmostpeopletalkaboutInternetsearchengines,theyreallymeanWorldWideWeb
searchengines.BeforetheWebbecamethemostvisiblepartoftheInternet,therewerealreadysearch
enginesinplacetohelppeoplefindinformationontheNet.Programswithnameslike"gopher"and
"Archie"keptindexesoffilesstoredonserversconnectedtotheInternet,anddramaticallyreducedthe
amountoftimerequiredtofindprogramsanddocuments.Inthelate1980s,gettingseriousvaluefrom
theInternetmeantknowinghowtousegopher,Archie,Veronicaandtherest.
Today,mostInternetuserslimittheirsearchestotheWeb,sowe'lllimitthisarticletosearchengines
thatfocusonthecontentsofWebpages.
Beforeasearchenginecantellyouwhereafileordocumentis,itmustbefound.Tofind
informationonthehundredsofmillionsofWebpagesthatexist,asearchengineemploysspecial
PreparedByProf.ChintanDave,SPCE,Visnagar.

softwarerobots,called
spiders
,tobuildlistsofthewordsfoundonWebsites.Whenaspiderisbuilding
itslists,theprocessiscalled
Webcrawling
.(Therearesomedisadvantagestocallingpartofthe
InternettheWorldWideWebalargesetofarachnidcentricnamesfortoolsisoneofthem.)Inorder
tobuildandmaintainausefullistofwords,asearchengine'sspidershavetolookatalotofpages.
HowdoesanyspiderstartitstravelsovertheWeb?Theusualstartingpointsarelistsofheavily
usedserversandverypopularpages.Thespiderwillbeginwithapopularsite,indexingthewordsonits
pagesandfollowingeverylinkfoundwithinthesite.Inthisway,thespideringsystemquicklybeginsto
travel,spreadingoutacrossthemostwidelyusedportionsoftheWeb.
Googlebeganasanacademicsearchengine.Inthepaperthatdescribeshowthesystemwasbuilt,
SergeyBrinandLawrencePagegiveanexampleofhowquicklytheirspiderscanwork.Theybuilttheir
initialsystemtousemultiplespiders,usuallythreeatonetime.Eachspidercouldkeepabout300
connectionstoWebpagesopenatatime.Atitspeakperformance,usingfourspiders,theirsystem
couldcrawlover100pagespersecond,generatingaround600kilobytesofdataeachsecond.
Keepingeverythingrunningquicklymeantbuildingasystemtofeednecessaryinformationtothe
spiders.TheearlyGooglesystemhadaserverdedicatedtoprovidingURLstothespiders.Ratherthan
dependingonanInternetserviceproviderforthedomainnameserver(DNS)thattranslatesaserver's
nameintoanaddress,GooglehaditsownDNS,inordertokeepdelaystoaminimum.
WhentheGooglespiderlookedatanHTMLpage,ittooknoteoftwothings:
Thewordswithinthepage
Wherethewordswerefound
Wordsoccurringinthetitle,subtitles,
metatags
andotherpositionsofrelativeimportancewerenoted
forspecialconsiderationduringasubsequentusersearch.TheGooglespiderwasbuilttoindexevery
significantwordonapage,leavingoutthearticles"a,""an"and"the."Otherspiderstakedifferent
approaches.

13.DescribeFailuremanagementindistributedsystem.
Solution:

Inacentralizedsystem,ifaprogramfailsforanyreason,thesimple(and,actually,standard)solutionis
toabortthenrestartitstransactions.Ontheotherhand,chancestoseeasinglemachinefailarelow.
Thingsarequitedifferentinthecaseofadistributedsystemwiththousandsofcomputers.Failure
becomesapossiblyfrequentsituation,duetoprogrambugs,humanerrors,hardwareornetwork
problems,etc.Forsmalltasks,itisjustsimplertorestartthem.Butforlonglastingdistributedtasks,
restartingthemisoftennotanacceptableoptioninsuchsettings,sinceerrorstypicallyoccurtoooften.
Moreover,inmostcases,afailureaffectsaminorpartofthetask,whichcanbequicklycompleted
providingthatthesystemknowshowtocopewithfaultycomponents.
Somecommonprinciplesaremetinalldistributedsystemsthattrytomakethemresilienttofailures.
Oneofthemostimportantisindependence.Thetaskhandledbyanindividualnodeshouldbe
independentfromtheothercomponents.Thisallowsrecoveringthefailurebyonlyconsideringits
initialstate,withouthavingtotakeintoaccountcomplexrelationshipsorsynchronizationwithother
tasks.Independenceisbestachievedinsharednothingarchitectures,whenboththeCPUandthelocal
diskofaserverruninisolationoftheothercomponentsoftheservers.
PreparedByProf.ChintanDave,SPCE,Visnagar.

Thankstoreplicationmethodsexaminedearlier,afailurecanusuallyberecoveredbyreplacingthe
faultynodebyamirror.Thecriticalquestioninthiscontextistodetectthatasystemmetafailure.Why
forinstanceisaClientunabletocommunicatewithaserver?Thismaybebecauseofafailureofthe
serveritself,ofbecausethecommunicationnetworksuffersfromatransientproblem.TheClientcan
waitforthefailednodetocomeback,butthisrunsagainstavailability,sincetheapplicationbecomes
idleforanunpredictableperiodoftime.
Failurerecovery

Figure14.5:

Recoverytechniquesforcentralized(left)andreplicatedarchitectures(right)

recallsthemainaspectsofdatarecoveryinacentralizeddatamanagementsystem,anditsextensionto
distributedsettings.
Considerfirstaclientserverapplicationwithasingleservernode(leftpart).(1)TheClientissuesa
write(a).Theserverdoesnotwriteimmediatelyainitsrepository.Becausethisinvolvesarandom
access,itwouldbeveryinefficienttodoso.Instead,itputsainitsvolatilememory.Now,ifthesystem
crashesorifthememoryiscorruptedinanyway,thewriteislost.Therefore,theserverwritesinalog
file(2).Alogisasequentialfilewhichsupportsveryfastappendoperations.Whenthelogmanager
confirmsthatthedataisindeedonpersistentstorage(3),theservercansendbackanacknowledgment
totheClient(4).Eventually,themainmemorydatawillbeflushedintherepository(5).
Thisisstandardrecoveryprotocol,implementedincentralizedDBMSs.Inadistributedsetting,the
servermustlogawriteoperationnotonlytothelocallogfile,butalsoto1,2ormoreremotelogs.The
issueisclosetoreplicationmethods,themainchoicebeingtoadopteitherasynchronousor
asynchronousprotocol.
Synchronousprotocol.
TheserveracknowledgestheClientonlywhenalltheremotenodeshavesentaconfirmationofthe
successfulcompletionoftheirwrite()operation.Inpractice,theClientwaitsuntiltheslowerofallthe
writerssendsitsacknowledgment.Thismayseverelyhindertheefficiencyofupdates,buttheobvious
advantageisthatallthereplicasareconsistent.
Asynchronousprotocol.
PreparedByProf.ChintanDave,SPCE,Visnagar.

TheClientapplicationwaitsonlyuntiloneofthecopies(thefastest)hasbeeneffectivelywritten.
Clearly,thisputsariskondataconsistency,asasubsequentreadoperationmayaccessanolderversion
thatdoesnotyetreflecttheupdate.
Themultilogrecoveryprocess,synchronousorasynchronous,hasacost,butitbringsavailability(and
reliability).Iftheserverdies,itsvolatilememoryvanishesanditslocallogcannotbeusedforawhile.
However,theclosestmirrorcanbechosen.Itreadsfromitownlogastateequivalenttothatofthedead
server,andcanbegintoanswerClientsrequests.ThisisstandardREDOprotocol,describedindetailin
anyclassicaltextbookoncentralizeddatabase.Wedonotelaboratefurtherhere.

14.ExplainXLinkandXPointerwithexample.
Solution:

XLinkisusedtocreatehyperlinksinXMLdocuments.

XLinkisusedtocreatehyperlinkswithinXMLdocuments
AnyelementinanXMLdocumentcanbehaveasalink
WithXLink,thelinkscanbedefinedoutsidethelinked
files
XLinkisaW3CRecommendation

XLinkSyntax
InHTML,the<a>elementdefinesahyperlink.However,thisisnothowitworksinXML.InXML
documents,youcanusewhateverelementnamesyouwantthereforeitisimpossibleforbrowsersto
predictwhatlinkelementswillbecalledinXMLdocuments.
BelowisasimpleexampleofhowtouseXLinktocreatelinksinanXMLdocument:
<?xmlversion="1.0"encoding="UTF8"?>

<homepagesxmlns:xlink="http://www.w3.org/1999/xlink">
<homepagexlink:type="simple"xlink:href="http://www.w3schools.com">Visit
W3Schools</homepage>
<homepagexlink:type="simple"xlink:href="http://www.w3.org">VisitW3C</homepage>
</homepages>
TogetaccesstotheXLinkfeatureswemustdeclaretheXLinknamespace.TheXLinknamespaceis:
"http://www.w3.org/1999/xlink".
Thexlink:typeandthexlink:hrefattributesinthe<homepage>elementscomefromtheXLink
namespace.
Thexlink:type="simple"createsasimple"HTMLlike"link(means"clickheretogothere").
PreparedByProf.ChintanDave,SPCE,Visnagar.

Thexlink:hrefattributespecifiestheURLtolinkto.

XLinkExample
ThefollowingXMLdocumentcontainsXLinkfeatures:
<?xmlversion="1.0"encoding="UTF8"?>

<bookstorexmlns:xlink="http://www.w3.org/1999/xlink">

<booktitle="HarryPotter">
<description
xlink:type="simple"
xlink:href="/images/HPotter.gif"
xlink:show="new">
AshisfifthyearatHogwartsSchoolofWitchcraftand
Wizardryapproaches,15yearoldHarryPotteris.......
</description>
</book>

<booktitle="XQueryKickStart">
<description
xlink:type="simple"
xlink:href="/images/XQuery.gif"
xlink:show="new">
XQueryKickStartdeliversaconciseintroduction
totheXQuerystandard.......
</description>
</book>

</bookstore>
Exampleexplained:
TheXLinknamespaceisdeclaredatthetopofthedocument
(xmlns:xlink="http://www.w3.org/1999/xlink")
Thexlink:type="simple"createsasimple"HTMLlike"link
Thexlink:hrefattributespecifiestheURLtolinkto(inthiscasean
image)
Thexlink:show="new"specifiesthatthelink
shouldopeninanewwindow

PreparedByProf.ChintanDave,SPCE,Visnagar.

XPointer
XPointerallowslinkstopointtospecificpartsofanXML
document
XPointerusesXPathexpressionstonavigateintheXML
document
XPointerisaW3CRecommendation

XPointerExample
Inthisexample,wewilluseXPointerinconjunctionwithXLinktopointtoaspecificpartofanother
document.
WewillstartbylookingatthetargetXMLdocument(thedocumentwearelinkingto):
<?xmlversion="1.0"encoding="UTF8"?>

<dogbreeds>

<dogbreed="Rottweiler"id="Rottweiler">
<pictureurl="http://dog.com/rottweiler.gif"/>
<history>TheRottweiler'sancestorswereprobablyRoman
droverdogs.....</history>
<temperament>Confident,bold,alertandimposing,theRottweiler
isapopularchoiceforitsabilitytoprotect....</temperament>
</dog>

<dogbreed="FCRetriever"id="FCRetriever">
<pictureurl="http://dog.com/fcretriever.gif"/>
<history>Oneoftheearliestusesofretrievingdogswasto
helpfishermenretrievefishfromthewater....</history>
<temperament>Theflatcoatedretrieverisasweet,exuberant,
livelydogthatlovestoplayandretrieve....</temperament>
</dog>

</dogbreeds>

PreparedByProf.ChintanDave,SPCE,Visnagar.

ThefollowingXMLdocumentcontainslinkstomoreinformationofthedogbreedforeachofmydogs:
<?xmlversion="1.0"encoding="UTF8"?>

<mydogsxmlns:xlink="http://www.w3.org/1999/xlink">

<mydog>
<description>
Antonismyfavoritedog.Hehaswonalotof.....
</description>
<factxlink:type="simple"xlink:href="http://dog.com/dogbreeds.xml#Rottweiler">

FactaboutRottweiler
</fact>
</mydog>

<mydog>
<description>
Plutoisthesweetestdogonearth......
</description>
<factxlink:type="simple"xlink:href="http://dog.com/dogbreeds.xml#FCRetriever">

FactaboutflatcoatedRetriever
</fact>
</mydog>

</mydogs>

15.Differentiatebetweenrankedtreesandunrankedtrees.

OR
AutomataonTrees.

Solution:
*
Automataonwordsareusedtodefinewordlanguages,thatis,subsetsof
forsome
alphabet.Similarly,itispossibletodefinetreeautomatawhosepurposeistodefinesubsets
ofthesetofalltrees.Fortechnicalreasons,itiseasiertodefinetreeautomataforranked
trees,i.e.,treeswhosenumberofchildrenpernodeisbounded.

Itiseasytodefineanotionofrighttoleftautomaton,andalsoeasytoseethatintermsof
acceptedlanguages,thereisabsolutelynodifferencebetweenlefttorightandrighttoleft
automata.Fortrees,thereisadifferencebetweentopdownandbottomupautomata.
Intuitively,inatopdownautomato.
BottomUpTreeAutomata:
Letusstartwiththeexampleofbottomupautomataforbinary
trees.Similarlytowordautomata,abottomupautomatononbinarytreesisdefinedby:
PreparedByProf.ChintanDave,SPCE,Visnagar.

1.
2.
3.
4.
5.

Afiniteleafalphabet
Afiniteinternalalphabet,with =

AsetofstatesQ
AsetofacceptingstatesFQ
Atransitionfunctionthatmaps:
Q
aleafsymboll toasetofstates(l)2

aninternalsymbola,togetherwithapairofstates(q,q)toasetofstates
Q
(a,q,q)2
.
Thetransitionfunctionspecifiesasetofstatefortheleafnodes.Thenif(a,q,q)containsq,
thisspecifiesthatiftheleftandrightchildrenofanodelabeledaareinstatesq,q,
respectively,thenthenodemaymovetostateq.

Twobinarytrees

UnrankedTrees:
Wehavedefinedintreeautomata(andregulartreelanguages)overthesetofrankedtrees,
i.e.,treeswherethereisanaprioriboundofthenumberofchildrenofeachnode.ButXML
documentsareunranked(takeforexampleXHTML,inwhichthenumberofparagraphs<p>
insidethebodyofadocumentisunbounded).
ReconsidertheBooleancircuitexamplefromExample
3.2.3
.Supposewewanttoallowand/or
gateswitharbitrarymanyinputs.Thesetoftransitionsofabottomupautomatonbecomes
infinite:

Soanissueistorepresentthisinfinitesetoftransitions.Todothat,wecanuseregular
expressionsonwords.

PreparedByProf.ChintanDave,SPCE,Visnagar.

16.
DiscussRequiredpropertiesofadistributedsystem.
Reliability
Reliabilitydenotestheabilityofadistributedsystemtodeliveritsservicesevenwhenoneorseveralof
itssoftwareofhardwarecomponentsfail.Itdefinitelyconstitutesoneofthemainexpectedadvantages
ofadistributedsolution,basedontheassumptionthataparticipatingmachineaffectedbyafailurecan
alwaysbereplacedbyanotherone,andnotpreventthecompletionofarequestedtask.Forinstance,
acommonrequirementsoflargeelectronicWebsitesisthatausertransactionshouldneverbe
canceledbecauseofafailureoftheparticularmachinethatisrunningthattransaction.Animmediate
andobviousconsequenceisthatreliabilityreliesonredundancyofboththesoftwarecomponentsand
data.Atthelimit,shouldtheentiredatacenterbedestroyedbyanearthquake,itshouldbereplacedby
anotheronethathasareplicaoftheshoppingcartsoftheuser.Clearly,thishasacostanddepending
oftheapplication,onemaymoreorlessfullyachievesucharesilienceforservices,byeliminating
everysinglepointoffailure.
Scalability
Theconceptofscalabilityreferstotheabilityofasystemtocontinuouslyevolveinordertosupporta
growingamountoftasks.Inoursetting,asystemmayhavetoscalebecauseofanincreaseofdata
volume,orbecauseofanincreaseofwork,e.g.,numberoftransactions.Wewouldliketoachievethis
scalingwithoutperformanceloss.Wewillfavorherehorizontalscalabilityachievedbyaddingnew
servers.But,onecanalsoconsiderverticalscalabilityobtainedbyaddingmoreresourcestoasingle
server.
Toillustratetheseoptions,supposewehavedistributedtheworkloadofanapplicationbetween100
servers,inasomehowperfectandabstractmanner,witheachholding1/100ofthedataandserving
1/100ofthequeries.Nowsupposeweget20%moredata,or20%morequeries,wecansimplyget20
newservers.Thisishorizontalscalabilitythatisvirtuallylimitlessforveryparallelizableapplications.
Nowwecouldalsoaddextradisk/memorytothe100servers(tohandletheincreaseindata),andadd
extramemoryorchangetheprocessorstofasterones(tohandletheincreaseinqueries).Thisis
verticalscalabilitythattypicallyreachesratherfastthelimitsofthemachine.
Inparallelcomputing,onefurtherdistinguishesweakscalabilityfromstrongscalabilityTheformer
analyzeshowthetimetoobtainasolutionvarieswithrespecttotheprocessorcountwithafixeddata
setsizeperprocessor.Intheperfectcase,thistimeremainsconstant(perprocessor),indicatingthe
abilityofthesystemtomaintainaperfectbalance.Strongscalabilityreferstotheglobalthroughputofa
system,forafixeddatasetsize.Ifthethroughputraiseslinearlyasnewserversareadded,thesystem
doesnotsufferfromanoverheadduetothemanagementtasksassociatedtoadistributedjob.(Note
thattheabovediscussionassumesalinearcomplexityofthesystembehavior,whichistrueatleastfor
basicread/write/searchoperations.)

PreparedByProf.ChintanDave,SPCE,Visnagar.

Dimensionsofscalability

Itisactuallyacommonsituationthattheperformanceofasystem,althoughdesigned(orclaimed)to
bescalable,declineswiththesystemsize,duetothemanagementorenvironmentcost.Forinstance
networkexchangesmaybecomeslowerbecausemachinestendtobefarapartfromoneanother.
Moregenerally,itmayhappenthatsometasksarenotdistributed,eitherbecauseoftheirinherent
atomicnatureorbecauseofsomeflawinthesystemdesign.Atsomepoint,thesetasks(ifany)limit
thespeedupobtainedbydistribution(aphenomenonknownasAmdahlslawintherelatedcontextof
parallelcomputing).
Ascalablearchitectureavoidsthissituationandattemptstobalanceevenlytheloadonallthe
participatingnodes.Letusconsiderthesimplecaseofaserverthatwouldcarryout10%morework
thattheothers,duetosomespecialrole.Thisisasourceofnonscalability.Forsmallworkloads,such
adifferenceisunnoticeable,buteventuallyitwillreachanimportancethatwillmakethestressed
nodeabottleneck.However,anodededicatedtosomeadministrativetasksthatisreallynegligibleor
thatdoesnotincreaseproportionallytotheglobalworkloadisacceptable.
ManyarchitecturespresentedintherestofthischapterareoftypeoneMastermanyServers.The
Masterisanodethathandlesafewspecifictasks(e.g.,addinganewservertotheclusteror
connectingaclient)butdoesnotparticipatetothecorefunctionalitiesoftheapplication.Theservers
holdthedataset,eitherviaafullreplication(eachitemispresentoneacheachserver)or,more
commonly,viasharding:thedatasetispartitionedandeachsubsetisstoredononeserverand
replicatedonafewothers.ThisMasterServerapproachiseasiertomanagethanaclusterwhereall
nodesplayanequivalentrole,andoftenremainsvalidonthelongrun.
Availability:
Ataskthatispartiallyallocatedtoaservermaybecomeidleiftheservercrashesorturnsouttobe
unavailableforanyreason.Intheworstcase,itcanbedelayeduntiltheproblemisfixedorthefaulty
serverreplacedbyareplica.Availabilityisthecapacityofasystemtolimitasmuchaspossiblethis
latency(notethatthisimplicitlyassumesthatthesystemisalreadyreliable:failurescanbedetected
andrepairactionsinitiated).Thisinvolvestwodifferentmechanisms:thefailure(crash,unavailability,
etc.)mustbedetectedassoonaspossible,andaquickrecoveryproceduremustbeinitiated.The
processofsettingupaprotectionsystemtofaceandfixquicklynodefailuresisusuallytermedfailover.
Thefirstmechanismishandledbyperiodicallymonitoringthestatusofeachserver(heartbeat).Itis
typicallyassignedtothenodededicatedtoadministrativetasks(themaster).Implementingthis
PreparedByProf.ChintanDave,SPCE,Visnagar.

mechanisminafullydistributedwayismoredifficultduetotheabsenceofawellidentifiedmanager.
StructuredP2PnetworkspromoteoneofthenodesasSuperpeerinordertotakeinchargethiskind
ofbackgroundmonitoringsurveillance.NotethatsomeP2Papproachesassumethatanodewillkindly
informitscompanionswhenitneedstoleavethenetwork,anassumption(sometimescalledfailstop")
thatfacilitatesthedesign.Thismaybepossibleforsomekindsoffailures,butisunrealisticinmany
cases,e.g.,forhardwareerrors.
Thesecondmechanismisachievedthroughreplication(eachpieceofdataisstoredonseveral
servers)andredundancy(thereshouldbemorethanoneconnectionbetweenserversforinstance).
Providingfailuremanagementattheinfrastructurelevelisnotsufficient.Asseenabove,aservicethat
runsinsuchanenvironmentmustalsotakecareofadoptingadaptedrecoverytechniquesfor
preservingthecontentofitsvolatilestorage.
Efficiency:
Howdoweestimatetheefficiencyofadistributedsystem?Assumeanoperationthatrunsina
distributedmanner,anddeliversasetofitemsasresult.Twousualmeasuresofitsefficiencyarethe
responsetime(orlatency)thatdenotesthedelaytoobtainthefirstitem,andthethroughput(or
bandwidth)whichdenotesthenumberofitemsdeliveredinagivenperiodunit(e.g.,asecond).These
measuresareusefultoqualifythepracticalbehaviorofasystematananalyticallevel,expressedasa
functionofthenetworktraffic.Thetwomeasurescorrespondtothefollowingunitcosts:
1. numberofmessagesgloballysentbythenodesofthesystem,regardlessofthemessagesize
2. sizeofmessagesrepresentingthevolumeofdataexchanges.
Thecomplexityofoperationssupportedbydistributeddatastructures(e.g.,searchingforaspecifickey
inadistributedindex)canbecharacterizedasafunctionofoneofthesecostunits.
Generallyspeaking,theanalysisofadistributedstructureintermsofnumberofmessagesis
oversimplistic.Itignorestheimpactofmanyaspects,includingthenetworktopology,thenetworkload
anditsvariation,thepossibleheterogeneityofthesoftwareandhardwarecomponentsinvolvedindata
processingandrouting,etc.However,developingaprecisecostmodelthatwouldaccuratelytakeinto
accountalltheseperformancefactorsisadifficulttask,andwehavetolivewithroughbutrobust
estimatesofthesystembehavior.

PreparedByProf.ChintanDave,SPCE,Visnagar.

You might also like