Professional Documents
Culture Documents
12 Sep 2006
This tutorial on information modeling is the second in a series of five tutorials that can
help you prepare for the IBM™ certification Test 142, XML and Related
Technologies. This tutorial analyzes XML data, contrasts narrative documents with
record-like documents, and models a small data problem using Document Type
Definition (DTD) grammar and several iterations of the World Wide Web Consortium
(W3C) XML Schema. It finishes with a comparison of DTD and XML Schema to help
you choose one or the other in your design.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 1 of 33
developerWorks® ibm.com/developerWorks
associated with information modeling, XML processing, XML rendering, and Web
services; has a thorough knowledge of core XML-related W3C recommendations;
and is familiar with well-known, best practices.
Objectives
After completing this tutorial, you will know how to:
• Analyze data and documents
• Represent structure in XML syntax
• Use namespaces appropriately
• Define DTDs
• Define grammars using XML Schema
• Determine when to use a DTD versus an XML Schema
Prerequisites
This tutorial is written for developers who have a background in programming and
scripting and who have an understanding of basic computer-science models and
data structures. You should be familiar with the following XML-related,
computer-science concepts: tree traversal, recursion, and reuse of data. You should
be familiar with Internet standards and concepts, such as Web browser,
client-server, documenting, formatting, e-commerce, and Web applications.
Experience designing and implementing Java™-based computer applications and
working with relational databases is also recommended.
System requirements
To complete the steps as shown in this tutorial you will need an up-to-date browser
and a validating XML editor. XMLSpy was the XML editor used in this tutorial. See
Information modeling
Page 2 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Narrative examples
A narrative-style XML grammar can define documents that render into printed
matter. Extensible HTML (XHTML), an XML grammar for the Web, and DocBook, a
markup for technical publications, are examples of narrative markup grammars. See
Listing 1 for a simple example of a DocBook document.
Narrative documents have many uses in addition to rendering the printed word. For
example, Speech Synthesis Markup Language (SSML) defines documents that
render as synthetic speech audio. VoiceXML (VXML) is a speech-oriented XML
grammar specified by the W3C for bidirectional human voice interaction with a
computer. Listing 2 shows a small example.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 3 of 33
developerWorks® ibm.com/developerWorks
A record-like data XML grammar is generally stricter than a narrative XML grammar.
Narratives are art for human consumption. Data more resembles science for
computational consumption. There is a plethora of standard industry-oriented
narrative grammars. A standardized well-known grammar is important if anonymous
folks are going to read a document. On the other hand, many record-like grammars
are specialized, often existing only for one application.
Record-like examples
If you work with Web applications, you've probably encountered Web application
deployment descriptors and JavaServer Pages™ (JSP) tag library descriptors.
These are good examples of XML used to define data records. A Web application
server needs a structured view of the semantics and locations of deployment
artifacts. The hierarchical nature of an XML-based web.xml document fills the
requirement nicely. Listing 3 shows a deployment descriptor for a simple Spring
Framework Web application.
Information modeling
Page 4 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
</web-app>
Listing 4 contains a JSP tag library descriptor used to associate a taglib name
with a Java class that implements the logic of a custom JSP markup tag.
HTML
HTML is a rather loose markup language based on Standard Generalized Markup
Language (SGML). Browsers try to make rendering assumptions about markup
omissions and errors. The result often varies across vendors or releases. HTML
doesn't conform to the simple rules of well-formed XML. See an example of
nonrigorous HTML markup in Listing 5. The document has no associated DTD. The
paragraph tag is not closed. The title tag is in lowercase, but the other tags are
uppercase. Figure 1 shows how the Firefox browser renders the document correctly
despite its sloppy markup.
<HEAD>
<title>XML Tutorial</title>
</HEAD>
<BODY>
<H1>This is a heading</h1>
<P>This is a paragraph.
<P align=center>This is centered</P>
<P><B>This is bold</P></B>
</BODY>
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 5 of 33
developerWorks® ibm.com/developerWorks
Note that HTML is not XML. An XML document is well-formed. If an XML parser
encounters a parsing error, then its document is not really an XML document. The
document is useless until you repair it.
XHTML
An XHTML document is a variant of HTML based on the well-formed rigor of XML
markup. Table 1 illustrates that XHTML is partially about what forms of markup are
not allowed.
You must encode an XHTML document in UTF-8 or UTF-16, or prefix it with an XML
declaration that declares the encoding in force. It must also have a PUBLIC ID
DOCTYPE specifying an XHTML DTD. Table 2 shows five public DTDs that describe
Information modeling
Page 6 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Exercise
In this exercise, you will create an XHTML version of the HTML data from Listing 5
and then edit it to render as shown in Figure 1. First, invoke your XML editor, create
a new document, and choose type XHTML. The XHTML should resemble Figure 2.
If you aren't using XMLSpy, you might see some differences. Check that the doctype
is XHTML 1.0 strict.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 7 of 33
developerWorks® ibm.com/developerWorks
Next, edit the boilerplate to incorporate a clean version of the sloppy HTML from
Listing 5. If you supply well-formed, valid XHTML, the validation test will pass, as
shown in Figure 3.
Information modeling
Page 8 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Notice that you have to use a Cascading Style Sheet (CSS) text-align style to center
the middle paragraph, because the DTD won't allow the HTML paragraph center
attribute. The document would not pass the validation test with the paragraph
center attribute.
To test rendering, save the XHTML file and view it in a browser, or click the Browser
tab in XMLSpy. Figure 4 shows the result. Notice that it renders identical to Figure 1.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 9 of 33
developerWorks® ibm.com/developerWorks
Unlike the sloppy HTML example, a browser refuses to render the XHTML file if it's
not valid. A single grammar error invalidates the entire document.
Browsers must make guesses to render sloppy markup in HTML files. There is no
standard for these assumptions, so the HTML might render unwanted visual artifacts
in one browser but not in another.
The valid file is XML, so all parsers are obligated to parse it and supply reasonable
visual rendering. Moreover, notice that you used an XML tool to create and edit the
file. A real value of XHTML is that you may use any XML tool or library to work with
XHTML.
Information modeling
Page 10 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Some authorities recommend creating a DTD or schema before you ever make a
sample document. I find this approach to be too abstract. Instead, I prefer to borrow
a page from the test-driven methods used in program design: Create a document,
then create a DTD that validates that document. The document is the test, and the
DTD is the application. You'll be able to iterate the design by changing either
document as you proceed. Later, I'll show you how to discard the DTD in favor of
creating an XML Schema for the test document, because this tutorial is about how to
model and constrain data in XML by using either approach.
You still need that test document. Your task is to model a catalog of published
books, but you might want to extend the catalog to other kinds of publications in the
future. You can think of the catalog as a list of publications. Every XML document is
a single-rooted hierarchy, so you can use the word "publications" as the root
element. See Listing 6.
Listing 6. Beginnings
The publications catalog contains zero or more books. It seems reasonable that a
book should be a child element of publications, as shown in Listing 7.
Each book has several common attributes: title, author, copyright, and ISBN
number. These items are attributes of a book, but do you model them as actual XML
attributes, or as XML elements? Review some capability differences between
attributes and elements, as shown in Table 3.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 11 of 33
developerWorks® ibm.com/developerWorks
The title, author, copyright, and ISBN number seem to be immediate children of a
book. Will they eventually need children of their own? You're not sure at this point,
but you don't want to prevent that kind of extension for no good reason. This is a
point in favor of modeling them as elements.
The items probably need no ordering within a book as long as an application can
parse them by name, but ordering seems -- well -- more orderly. You can argue
each way with respect to ordering. Award no points to either side here.
Some of the items seem to be simple strings, but copyright is really a four-digit
number that you might enforce in a future version of a schema for your publications
catalog. In addition, you might later impose a formatting pattern on an ISBN. So
award a solid point to modeling as elements here.
If an XML binding technology, such as Java™ Architecture for XML Binding (JAXB),
is part of an architecture, then consider that elements translate into classes, while
attributes become properties of those classes. Thus, the number of classes is
proportional to the number of kinds of elements. This could mean more sizeable and
possibly more complex source code. However, the binding tool generates this code.
The source document is really the schema. Maintainers normally don't modify the
classes manually. Thus, XML binding might not be a factor in the elements versus
attribute decision. Award no points to either side for JAXB or XML binding in general.
The argument is currently two to zero in favor of elements over attributes -- for this
problem only. The characteristics of verbosity and readability can be matters of
personal taste or part of the design requirements. You must evaluate your actual
design task yourself. It sometimes comes down to a matter of personal taste.
Some schemas allow using either an attribute or an element in a given place. Both
Apache ANT and DocBook documents allow this behavior in places.
In this tutorial, use elements for title, author, copyright, and ISBN number, with
lowercase letters for ISBN. You can reserve the option to allow optional or required
actual attributes to the book element, such as image used to imbed an optional
picture of a book, and id to impose a unique identifier on a book for use as a
reference key by applications. Listing 8 shows the test document at this point. It has
no DTD or XML schema. That's your next job. This tutorial shows you how to do one
of each.
Information modeling
Page 12 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
You can impose a DTD-based grammar on the test document by describing the
document in DTD-speak, an older markup language borrowed from SGML. First,
what does it mean to impose a grammar?
An XML grammar applies at a different point. A user could alter an XML document
with Microsoft® Notepad, an application ignorant of XML or its grammar documents.
An XML grammar applies during parsing, or recognizing, the XML document during
reading. Thus, an XML grammar is about reading valid information, not writing it. A
well-formed document meets the requirements of XML markup, but can be invalid
according to the associated grammar. This is a go/no-go vote during parsing. A
validity miss here renders the document useless. There is no carrying on, as
browsers do with the sloppy HTML.
Defining a DTD
A DTD largely consists of <!ELEMENT ... > and <!ATTRIBUTE ... > markup
statements.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 13 of 33
developerWorks® ibm.com/developerWorks
asterisk (*) for "zero or more," a plus sign (+) for "one or more," a question mark (?)
for "one or none," or no suffix for one book allowed. This is part of regular
expression notation. Thus, the first line of the DTD looks like this:
Each book contains exactly one title, author, copyright, and isbn element,
in that order. (For the purposes of this tutorial, multiple authors are entered in the
single author element. When you create a DTD, remember to consider how to set up
the DTD to permit one or more elements.) Thus, the next markup statement is:
The remaining elements are leaf-node elements that contain character data. You
use parentheses to indicate containment, as usual. You need to declare the kind of
character data. The character strings are parsed character data, indicated by the
literal, #PCDATA:
You give each book a required unique identification key through the special XML ID
kind of attribute. In addition, you can enable an optional image attribute that
contains a URL of a picture of the book cover. The ATTLIST markup takes an
element argument followed by a tuple for each attribute associated with the element.
Each tuple consists of an attribute name, its type, and an indicator of whether it's
optional or required. The DTD specification allows 10 attribute types listed in Table
4.
Information modeling
Page 14 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
You indicate a required attribute by appending #REQUIRED after the type. You can
stipulate an optional attribute by appending #IMPLIED. The grammar has one
attribute of each kind on the book element. The single ATTLIST statement looks like
this:
How would you associate the DTD with the document it describes? You could imbed
the DTD into the XML document inline. The textbook approach is to keep the two
documents separate. The application could explicitly use the publications.dtd
document to validate the publications2.xml document. Instead, implicitly link the DTD
to the XML document:
Listing 10 shows the XML document linked to the new DTD. It assumes that the
DTD is located in the current directory.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 15 of 33
developerWorks® ibm.com/developerWorks
What easily-procured tools could you use to carry out validation testing? You could
use the freely downloadable Altova XMLSpy Home Edition to test the document for
validity. Place the publications.dtd file and the publications2.xml file in the same
directory, open the XML document, and then click F8. Figure 5 shows successful
validation.
What happens when the document is not valid? Remove the required id attribute
from the first element, then click F7 to check the document for legal XML syntax.
Information modeling
Page 16 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
The status reports yellow. This means that the document is well-formed. Now click
F8. Figure 6 shows the result. The status is red, meaning it's an invalid document.
XMLSpy will complain that the document is invalid if you try to save it.
You can design an XML Schema to constrain your document to a greater degree
than is possible by using the DTD. For example, an XML Schema grammar can
specify that exactly four apple elements must always be the immediate children of
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 17 of 33
developerWorks® ibm.com/developerWorks
a basket element. You can define complex types, building on string types. For
instance, you could require a zipcode element to have a pattern facet of value
"\d\d\d\d\d-\d\d\d\d", so that values such as "95123-4823" are valid, but
"abcde-fghi" or "27703" are invalid.
What is the meaning of the term facet? An XML Schema considers a facet to be an
aspect of possible values for a simple data type. Table 5 shows the XML Schema
facets.
This is a hint at the granularity you have at your disposal, but you'll begin by making
an XML Schema that matches the capability of the DTD. Later, I'll show you how to
tighten it a bit to show the advantage of schema.
To begin, declare the schema that XML Schema uses. Don't use a namespace for
the grammar until later in the tutorial. The schema itself uses a namespace. By
convention, you'd use the prefix "xs." You could use any character string as the
prefix, even "radish," but why obscure convention?
xmlns:xs="http://www.w3.org/2001/XMLSchema"
For now, declare that your own declared elements and attributes are unqualified:
Information modeling
Page 18 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
elementFormDefault="unqualified"
attributeFormDefault="unqualified"
The following is the XML Schema root element after you put this together:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="unqualified"
attributeFormDefault="unqualified">
Next, you can begin to specify the publications, as well as the book, title,
author, copyright, publisher, and isbn element declarations. You can
specify the root element as:
<xs:element name="publications">
<xs:element name="publications">
<xs:complexType>
<xs:sequence>
<xs:element name="book" maxOccurs="unbounded">
The book is a subsequent complex type that contains its own sequence of title,
author, copyright, publisher, and isbn element declarations:
<xs:element name="publications">
<xs:complexType>
<xs:sequence>
<xs:element name="book" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="title"/>
<xs:element name="author"/>
<xs:element name="copyright"/>
<xs:element name="publisher"/>
<xs:element name="isbn"/>
</xs:sequence>
...
Did I forget to have you add the id attribute and the image attribute to the book
element? No, you simply defer those to the end of the complex type enclosed by the
book element.
The rules (grammar) of XML Schema state that you place attributes last in the
complex type enclosed by their element. An attribute is a schema element of the
form <xs:attribute … />.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 19 of 33
developerWorks® ibm.com/developerWorks
Thus, you can add the id and image attributes, as shown here:
<xs:element name="publications">
<xs:complexType>
<xs:sequence>
<xs:element name="book" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="title"/>
<xs:element name="author"/>
<xs:element name="copyright"/>
<xs:element name="publisher"/>
<xs:element name="isbn"/>
</xs:sequence>
<xs:attribute name="id" type="xs:string" use="required"/>
<xs:attribute name="image" type="xs:string"/>
</xs:complexType>
The id attribute is required, but the image attribute use defaults to optional.
Notice the type, xs:string. You could have specified a complex type based on a
string. To XML, that is still a string, but to XML Schema, it is a particular kind of
string. I'll say more about user-defined complex types a bit later.
That just about completes your schema. Add closing markup to the open elements,
as shown in Listing 11.
You can open publication3.xsd in XMLSpy and click F8 to validate it against the
http://www.w3.org/2001/XMLSchema, as if it were an XML document --
because it is an XML document.
Information modeling
Page 20 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
The schema is only useful when applied to an XML document. How can you
associate a schema to a document? The application could explicitly use the
publications3.xsd document to validate the publications3.xml document. Instead, you
want to implicitly associate the schema with the XML document.
You can modify the document root element to link to the schema through a special
attribute. The schema doesn't use a namespace -- yet. You must add an attribute to
the publications root that shows the parser where to find the no-namespace
schema:
<publications xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="publications3.xsd">
That does it! See the publications3.xml document in Listing 12. It's the same
document contents as that shown in Listing 8, except for the schema association.
Imposing a namespace
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 21 of 33
developerWorks® ibm.com/developerWorks
http://rogers60.com/xmltutorial/2
Use pub for the prefix, although you can use any legal string unique within the
document and schema. With a DTD, pub:book behaves as if it were pubbook.
The DTD behavior doesn't necessarily prevent namespace collisions. On the other
hand, the XML Schema behavior can prevent namespace collisions, if properly
declared, because the DNS-based URI is unique. Take your next baby step in the
evolution of the schema by giving it a default namespace with a prefix of pub:
xmlns:pub="http://rogers60.com/xmltutorial/2"
You need to specify both attributes to target a namespace and use a default
namespace. The general convention is to enable unprefixed elements to assume the
default namespace, but leave unprefixed attributes out of any namespace:
elementFormDefault="qualified"
attributeFormDefault="unqualified"
Information modeling
Page 22 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
You need to perform minor surgery on the XML document to connect it to the
namespace. The surgery is minor because the entire document is in the default
namespace. Instead of specifying a no-namespace schema location attribute for the
root publications element, you specify:
xsi:schemaLocation="http://rogers60.com/xmltutorial/2 publicationsNS4.xsd"
xmlns="http://rogers60.com/xmltutorial/2"
xmlns:pub="http://rogers60.com/xmltutorial/2"
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 23 of 33
developerWorks® ibm.com/developerWorks
The W3C XML Schema specification imposes a set of simple built-in data types.
Table 6 lists the built-in XML Schema simple types and their descriptions. The test
case used in this tutorial implicitly uses the string type, except where it explicitly
specifies it in the image and id attributes of the book element.
A data document schema for a complex application can become large and difficult to
maintain unless it is refactored to some normalized form. In your case, you've
declared a schema that declares elements and attributes at the structural point
where you use each. You mixed the document's structure with declarations of the
elements and attributes used to build that structure. This can obscure the clarity of
the schema to a human trying to understand or maintain it. It also reduces potential
reuse of types and makes it difficult to find a type when a maintenance person needs
to change it.
Information modeling
Page 24 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Why not centralize type declarations at one point and then refer to those types in a
separate data structure portion of the schema? If you do this, you can even break a
large schema into separate files that confine themselves to type sections or structure
sections.
Try it. Refactor the schema to separate the declaration from the structure, thus
declaring all elements in a section at the top of the document, followed by attributes,
followed by the document structure. The structural part refers to an element or
attribute by using a ref attribute with a name value of the element or attribute name.
The attribute reference must use the namespace prefix, because you made all
attributes namespace-agnostic.
This normalized schema layout sometimes makes a huge document easier to read,
because humans can read the declarations separated from the somewhat
less-verbose structure. In addition, this promotes reuse of items.
Listing 15 shows the small schema in this normalized form. It is actually longer than
the original, but it's easier to maintain because of the separation of declaration from
structure, and because of the potential for declaring something once instead of
multiple times.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 25 of 33
developerWorks® ibm.com/developerWorks
After linking the test document to the schema, it still validates as before, so I won't
repeat it here. Now open the schema in XMLSpy and the click the Schema/WSDL
tab to render the diagram, as shown in Figure 7. Notice the connector symbols for
the sequences, the namespace-qualified labels, and the stack of pub:book
elements.
The schema shows a slight error. You want to allow an empty publications list,
but notice that there must be at least one pub:book. An empty publications list
doesn't validate. It's always good to test boundary conditions. You can repair this by
adding a minOccurs attribute to that element:
The W3C XML Schema contains built-in simple types, but part of its attraction lies in
its ability to constrain values to more granular user-defined simple types. You'll
create two simple types -- one for the isbn element and one for the copyright
element. Base each upon xs:string, but apply restrictive patterns to them. First,
tackle the copyright format. Stipulate that it is always a four-digit number. While
there are alternative approaches for this simple restriction (for example, a decimal
with a specified length), you'll use a pattern for the form "dddd" where each "d" is
a decimal digit:
<xs:simpleType name="year">
<xs:restriction base="xs:string">
<xs:pattern value="\d\d\d\d"/>
</xs:restriction>
</xs:simpleType>
Similarly, restrict an ISBN number to have the form "d-dddd-dddd-d" where each "d"
is also a decimal digit. You know that isn't the real definitive format. ISBN recently
changed from 10 digits to 13 digits, because it ran out of numbers. However, this is a
tutorial, and the pattern fits the ISBN numbers in the test document:
Information modeling
Page 26 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
<xs:simpleType name="isbn">
<xs:restriction base="xs:string">
<xs:pattern value="\d-\d\d\d\d-\d\d\d\d-\d"/>
</xs:restriction>
</xs:simpleType>
Insert these two XML stanzas above the element and attribute declarations of the
normalized schema. Then you can refer to the new types anywhere you need them
by using a ref attribute aimed at the new type:
You use the namespace prefix in the ref value because the attribute references
default to no namespace, as specified in the schema by
attributeFormDefault="unqualified". Listing 16 shows the latest revision
of the XML Schema that uses the simple types. I won't display the XML document
listing again here, because it doesn't vary, except to target the latest name of the
schema.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 27 of 33
developerWorks® ibm.com/developerWorks
</xs:sequence>
<xs:attribute ref="pub:id" use="required"/>
<xs:attribute ref="pub:image" use="optional"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The test document still validates when you test it using XMLSpy. Now remove the
first dash in the first isbn element and revalidate. This should cause it to flunk
validation. Figure 8 shows what happens.
You can use a divide-and-conquer technique to make your schema even easier to
understand and maintain. Borrow a technique from programming and break the
schema into declarations that reside in separate files. First, create a schema that
Information modeling
Page 28 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
constrains only the simple types, elements, and attribute declarations of your
previously refactored schema. Listing 17 shows how.
Next, remove those items from a copy of the original schema, replacing them with
the following markup:
<xs:include schemaLocation="publicationsRedefine8.xsd"/>
This produces the structural schema shown in Listing 18. Its ref attributes refer to
the included schema. Notice how each file is easier to read. One is about element,
attribute, and type declaration. The other is about arranging those into a document
structure. When you alter the linkage in the test XML file to point to
publications8.xsd, the file validates correctly in XMLSpy.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 29 of 33
developerWorks® ibm.com/developerWorks
<xs:element ref="pub:title"/>
<xs:element ref="pub:author"/>
<xs:element ref="pub:copyright"/>
<xs:element ref="pub:publisher"/>
<xs:element ref="pub:isbn"/>
</xs:sequence>
<xs:attribute ref="pub:id" use="required"/>
<xs:attribute ref="pub:image" use="optional"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Table 7 lists the basic validation features of DTDs. There is little granularity in the
control of format and types of element values and attribute values. This is usually
sufficient for narrative-style documents. Indeed, an astounding number of standard
DTDs are available for various industry-oriented narrative exchange documents.
Data record-like documents form the other major division of XML applications.
Object serialization to and from XML requires precise specification of content. Here
is where the W3C XML Schema shines. Table 8 contains a high-level description of
its constraint features. Notice that the XML Schema shows some overlap with DTD,
but XML Schema is able to compose new data types for a grammar. The overlap
features are misleading. XML Schema enables more precise control of items such
as element occurrence. You can stipulate that your publication list consist of 10 and
only 10 books, for example. This is not possible with DTDs.
Information modeling
Page 30 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
It sounds as if XML Schema always wins over DTD for defining new data-oriented
grammars, but DTD can do one thing better than XML Schema. Remember entities?
Those are the macro-like declarations that can substitute named items into a
document. You can define them easily in a DTD. That functionality is difficult to
duplicate in XML Schema. General entities see common use in narrative grammars,
where DTD use remains entrenched.
XML Schema is another application of XML. In fact, it is constrained by its own XML
Schema. DTD is not XML, but rather a separate markup language. Some people
consider this a disadvantage of DTDs. Others point out that XML Schema is wordy
and can be hard to read. You've seen that normalizing a schema into sections
mitigates this somewhat. Schemas can be harder to write from scratch than DTDs.
Modern tools provide hinting assistance as you type, thus countering this argument
somewhat.
It's not valid to say, "Always design with schema." In the end, you must make the
decision based on your application, but now you have some arguments on either
side to guide you.
Section 7. Conclusion
Summary
Part 1 of this series covered XML architecture. This second tutorial discussed the
characteristics of data, and narrative documents. It went on to model a simple case
study in XML, while showing several iterations of a grammar. Part 3 shows you how
to process XML in an application. Part 4 concentrates on transforming XML
documents into new documents, and Part 5 explains testing and tuning XML and
common related technologies.
If you study the complete series, you should have sufficient background to help you
prepare to take the IBM certification Test 142, XML and Related Technologies, to
attain the IBM Certified Solution Developer - XML and Related Technologies
certification.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 31 of 33
developerWorks® ibm.com/developerWorks
Resources
Learn
• XML on developerWorks: Get the resources you need to advance your XML
skills with technical articles and tips, tutorials, standards, and IBM Redbooks.
• New to XML page (developerWorks): Browse this overview if you want to learn
about XML but don't know where to start.
• IBM XML 1.1 certification: Become an IBM Certified Developer in XML 1.1 and
related technologies.
• Introduction to XML tutorial (Doug Tidwell, developerWorks, August 2002):
Learn what XML is, why it was developed, and how it's shaping the future of
electronic commerce. You'll also cover a variety of important XML programming
interfaces and standards.
• XML Matters: Comparing W3C XML Schemas and Document Type Definitions
(DTDs) (David Mertz, developerWorks, March 2001): Compare schemas and
DTDs and clarify just what is going on in the XML schema world.
• Validating XML tutorial, by (Nicholas Chase. developerWorks, August 2003):
Learn what validation is and how to check a document against a Document
Type Definition (DTD) or XML Schema document.
• XML in a Nutshell, 3rd Edition (Elliotte Rusty Harold and W. Scott Means,
O'Reilly Media, 2004, ISBN: 0596007647): Check out this comprehensive XML
reference with everything from fundamental syntax rules, DTD and XML
Schema creation, XSLT transformations, processing APIs, XML 1.1, plus SAX2
and DOM Level 3.
• XML Schema Part 0: Primer Second Edition on the W3C Web site: Read about
the XML Schema and how to create schemas using the XML Schema
language.
• W3C Markup Validation Service: With this free service, check Web documents
in formats like HTML and XHTML for conformance to W3C Recommendations
and other standards.
• XHTML: Learn more about the Extensible HyperText Markup Language
(XHTML) on the Wikipedia Web site.
• VoiceXML (VXML): Read more about this XML format for interactive voice
dialogues between humans and computers on the Wikipedia Web site.
• Speech Synthesis Markup Language (SSML): Find out more about this
XML-based markup language for speech synthesis apps on the Wikipedia Web
site.
• developerWorks technical events and webcasts: Stay current with technology in
these sessions.
Get products and technologies
Information modeling
Page 32 of 33 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Trademarks
IBM, DB2, Lotus, Rational, Tivoli, and WebSphere are trademarks of IBM
Corporation in the United States, other countries, or both.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
Information modeling
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 33 of 33