XML Motor

aXML-Motor
XML Document Parsing

version 2011.11.04
Algorithm
Abhishek Kumar ~=ABK=~ http://github.com/abhishekkr http://www.twitter.com/abionic Algorithm, Ruby Source and Gem: [axml-motor] @GitHub: http://github.com/abhishekkr/axml-motor.git rubygem's src @GitHub: http://github.com/abhishekkr/rubygem_xml_motor.git gem install @RubyGems:http://rubygems.org/gems/xml-motor
Algorithm-Walk-through
Example XML Content:
<BODY> <DIV id='banner'> <H1>aXML-Motor</H1> <H5>A new algorithm based compact XML Parser with <I>no dependencies</I>. </H5> </DIV> <DIV id='details'> <SPAN class='github'>@github: <A href='http://github.com/abhishekkr/axml-motor.git'> axml-motor</A> </SPAN> <DIV class='gem'> <SPAN id='source' class='github'>@github: <A href='http://github.com/abhishekkr/rubygem-xmlmotor.git'>rubygem-xml-motor</A> </SPAN> <SPAN class='rubygems'>@rubygems: <A href='http://rubygems.org/gems/xml-motor.git'>xml-motor</A> </SPAN> </DIV> <I> It's a new algorithm implemented to build a real compact parser (v0.0.2 has less than 200 ruby source code lines) without any dependencies.</I> </DIV> </BODY>
[Step.1]
Split
the XML Content
(1.1) Split by '<' store as XMLNodes

[0] BODY> [1] DIV id='banner'> [2] H1>aXML-Motor [3] /H1> [4] H5>A new algorithm based compact XML Parser with [5] I>no dependencies [6] /I>. [7] /H5> [8] /DIV> [9] DIV id='details'> [10] SPAN class='github'>@github: [11] A href='http://github.com/abhishekkr/axml-motor.git'>axml-motor< [12] /A> [13] /SPAN> [14] DIV class='gem'> [15] SPAN id='source' class='github'>@github: [16] A href='http://github.com/abhishekkr/rubygem-xml-motor.git'>rubygemxml-motor [17] /A> [18] /SPAN> [19] SPAN class='rubygems'>@rubygems: [20] A href='http://rubygems.org/gems/xml-motor.git'>xml-motor [21] /A> [22] /SPAN> [23] /DIV> [24] I> It's a new algorithm implemented to build a real compact parser (v0.0.2 has less than 200 ruby source code lines) without any dependencies. [25] /I> [26] /DIV> [27] /BODY>
(1.2) Split previous step1.1 result by '>' update XMLNodes
[0] [ 'BODY', '' ] [1] ['DIV id='banner', '' ] [2] ['H1', 'aXML-Motor' ] [3] ['/H1', ''] [4] ['H5', 'A new algorithm based compact XML Parser with '] [5] ['I', 'no dependencies'] [6] ['/I', '.'] [7] ['/H5', ''] [8] ['/DIV', ''] [9] ['DIV id=\'details\'', ''] [10] ['SPAN class=\'github\'', '@github: '] [11] ['A href=\'http://github.com/abhishekkr/axml-motor.git\'', 'axml-motor']
[12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27]
['/A', ''] ['/SPAN', ''] ['DIV class=\'gem\'', ''] ['SPAN id=\'source\' class=\'github\'', '@github: '] ['A href=\'http://github.com/abhishekkr/rubygem-xml-motor.git\'', 'rubygem-xml-motor'] ['/A', ''] ['/SPAN', ''] ['SPAN class=\'rubygems\'', '@rubygems: '] ['A href=\'http://rubygems.org/gems/xml-motor.git\', 'xml-motor'] ['/A', ''] ['/SPAN', ''] ['/DIV', ''] ['I', 'It's a new algorithm implemented to build a real compact parser (v0.0.2 has less than 200 ruby source code lines) without any dependencies.'] ['/I', ''] ['/DIV', ''] ['/BODY', '']
(1.3) Split first element per line by space/tab, mark 1 st part as tag_name and split latter part by '=', iterating to make key=value pair per attribute... turning XMLNodes to update XMLNodes
[0] [ ['BODY', {}], '' ] [1] [ ['DIV', {'id'=>'banner'}], '' ] [2] [ ['H1', {}], 'aXML-Motor' ] [3] [ ['/H1', {}], ''] [4] [ ['H5', {}], 'A new algorithm based compact XML Parser with '] [5] [ ['I', {}], 'no dependencies'] [6] [ ['/I', {}], '.'] [7] [ ['/H5', {}], ''] [8] [ ['/DIV', {}], ''] [9] [ ['DIV', {'id'=>'details'}], ''] [10] [ ['SPAN', {'class'=>'github'}], '@github: '] [11] [ ['A', {'href'='http://github.com/abhishekkr/axml-motor.git'}], 'axml-motor'] [12] [ ['/A', {}], ''] [13] [ ['/SPAN', {}], ''] [14] [ ['DIV', {'class'=>'gem'}], ''] [15] [ ['SPAN', {'id'=>'source', 'class'=>'github'}], '@github: '] [16] [ ['A', {'href'=>'http://github.com/abhishekkr/rubygem-xml-motor.git'}], 'rubygem-xml-motor'] [17] [ ['/A', {}], ''] [18] [ ['/SPAN', {}], ''] [19] [ ['SPAN', {'class'=>'rubygems'}], '@rubygems: '] [20] [ ['A', {'href'=>'http://rubygems.org/gems/xml-motor.git'}], 'xml-motor'] [21] [ ['/A', {}], ''] [22] [ ['/SPAN', {}], ''] [23] [ ['/DIV', {}], ''] [24] [ ['I', {}], 'It's a new algorithm implemented to build a real
compact parser (v0.0.2 has less than 200 ruby source code lines) without any dependencies.'] [25] [ ['/I', {}], ''] [26] [ ['/DIV', {}], ''] [27] [ ['/BODY', {}], '']
Here, we have the XMLNodes as we wanted them. Now it's turn to Indexify them. [Step.2] Index the processed XMLNodes There are three things involved in Indexing of XMLNodes Tag_Name : Iterating through all elements of XMLNodes, every element has three components including Tag Name, which is available at XMLNodes.all[ [TAG_NAMES, *], *] Depth: The place/level of the Node in XML Node Tree starting from '0'. Index: The index value of Node as per depending upon the XMLNode Array How to Index-ify? There will be an element per Tag_Name with a Hash of Keys as the 'Depth' where it is found which has array of 2*number_of_nodes (starting and ending 'Index' for that same Node) Example: From above XMLNodes, the ['DIV'] would hold {1=>[1,8, 9,26], 2=>[14,26]} Because 'Tag_Name' DIV has 'Index' set of 1,8 and 9,26 for 'Depth' of 1. Similarly 'Index' set of 14,26 for 'Depth' of 2. Indexed XMLTags for above processed XMLNodes will be as follows: calculated XMLTags
['BODY'] = {0=>[0,27]} ['DIV'] = {1=>[1,8, 9,26], 2=>[14,23]} ['H1'] = {2=>[2,3]} ['H5'] = {2=>[4,7]} ['I'] => {3=>[5,6], 2=>[24,25]} ['SPAN'] => {2=>[10,13], 3=>[15,18, 19,22]} ['A'] => {3=>[11,12], 4=>[16,17, 20,21]}
[Step.3]
Grab My Node
from processed XMLNodes using XMLTags
Now suppose, I aim for a Tag_Name 'XYZ'..... then look for XMLTags['XYZ'], iterate through all of its depths and extract 2 indexes at a time. These two indexes per time indicate the start and end node, fetch all value within those nodes from XMLNode.
This will return set of values held by Tag_Name 'XYZ'. Suppose a tree form is provided as 'ABC.XYZ', then start from top nodes as 'ABC' in this context. Grab all it's node. Now move on to lower nodes and filter the Indexes found only within the Node Index ranges provided by the earlier node. This would end with the filtered set of Indexes for 'XYZ' falling only under the Index-Range of 'ABC'. To check for a Tag_Name with attribute, for every filtered Index-Range, just check if it has the required attribute as it's key-value pair.
Example:
Case: Grabbing 'SPAN', with attribute class=''github' It's a single node, grab all its Index-Range (10,13), (15,18) and (19,22). Here, just XMLNodes[10] and XMLNodes[15] have required attribute. Now, grab all data between XMLNodes[10][1] to XMLNodes[13-1][1] and XMLNodes[15][1] to XMLNodes[18-1][1]. Result: ['@github: <A href='http://github.com/abhishekkr/axml-motor.git'>axml-motor</A>' ,
'@github: <A href='http://github.com/abhishekkr/rubygem-xml-motor.git'>rubygem-xmlmotor</A>']
Case: Grabbing 'H5.I' Top node is 'H5', grab all its Index-Range (4,7). Second node 'I', grab all falling between ranges from previous node (5,6). Now, grab all data between XMLNodes[5][1] to XMLNodes[6-1][1].. Result: ['no dependencies'] Below, you'll also see that you need not give entire hierarchy to fetch any descendant from child tree of any node. Just giving the major scope nodes would do the work as fine as providing exact hierarchy. Case: Grabbing 'DIV.A' Top node is 'DIV', grab all its Index-Range (1,8), (9,26) and (14,23). Second node 'A', grab all falling between ranges from previous node (11,12), (16,17) and (20,21). Now, grab all data between XMLNodes[5][1] to XMLNodes[6-1][1].. Result: ['axml-motor', 'rubygem-xml-motor', 'xml-motor']

XML Motor

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

XML Motor

Uploaded by

Copyright:

Available Formats

aXML-Motor

XML Document Parsing

the XML Content

(1.1) Split by '<' store as XMLNodes

(1.2) Split previous step1.1 result by '>' update XMLNodes

from processed XMLNodes using XMLTags

You might also like