Professional Documents
Culture Documents
Part 1: Architecture
Learn where and when to use XML in system design
29 Aug 2006
Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 1 of 17
developerWorks® ibm.com/developerWorks
Anyone working in software development for the last few years is aware that XML
provides cross-platform capabilities for data, just as the Java™ programming
language does for application logic. This series of tutorials is for anyone who wants
to go beyond the basics of using XML technologies.
This tutorial lays the groundwork for Part 2, which focuses on information modeling,
including the use of namespaces and the definition of Document Type Definition
(DTD) schemas.
This tutorial is written for Java programmers who have a basic understanding of
XML and whose skills and experience are at a beginning to intermediate level. You
should have a general familiarity with defining, validating, and reading XML
documents and a working knowledge of the Java language.
Objectives
After completing this tutorial, you will know how to:
• Implement Java classes using Java Architecture for XML Binding (JAXB)
Prerequisites
This tutorial is written for developers who have a background in programming and
scripting and who have an understanding of basic computer-science models and
Architecture
Page 2 of 17 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
System requirements
You need a system with an up-to-date browser.
Uses of XML abound, such as Asynchronous JavaScript and XML (Ajax) for
dynamic Web pages, and Rich Site Summary (RSS) for blogs and feeds. The future
will bring even more. This series focuses on the core technologies, including Simple
API for XML (SAX), Document Object Model (DOM), DTD, XML Schema, XPath,
XLink, and XQuery.
You can find an alphabet soup of acronyms out there. Just read technical articles
and you'll find XDI, RDF, REST, SVG, XUL, and much more. That's to be expected,
as XML is not just a hot topic, it's the über-hot topic. Why all the hype? The main
reason is that XML offers cross-platform, cross-language capabilities for data, just as
Java offers cross-platform support for application logic. Take a look at some uses of
XML that have hit the world market recently:
Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 3 of 17
developerWorks® ibm.com/developerWorks
These uses are depicted in Figure 1 and Figure 2, which show how you can
integrate XML technologies in an application architecture for e-business and the
dynamic Web, respectively.
Architecture
Page 4 of 17 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
To benefit from any of these uses, you need grounding in XML technologies, which
is what this tutorial series provides.
If you've ever received a support call late at night for a system with a
less-than-optimal architecture, you know how important it is to make wise choices in
the technologies you use. Architecture comes in different aspects, including physical
and logical. Figure 3 shows an example of a physical architecture.
Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 5 of 17
developerWorks® ibm.com/developerWorks
Many definitions of system architecture abound. For the purposes of this tutorial,
let's view software system architecture as:
A particular technology can help certain areas in an architecture and not help others.
In the example system from Figure 3, XML could play a role in multiple areas:
• Browser
You can render Web pages using XML content and related XSL
stylesheets. XSLT supports this capability as well as conversion to many
different formats.
• Client request
An XMLHttpRequest is at the heart of Ajax.
Architecture
Page 6 of 17 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
• Server reply
When an XMLHttpRequest comes back to you, the response contents
can be in XML. But even if they aren't, the browser will use the DOM to
manipulate the Web page. As you'll see in Part 3 of this tutorial series, the
DOM is built from XML.
• Web services
SOAP is an XML-based protocol for exchanging information through
HTTP (in other words, over the Web). Its primary use is to request Web
services remotely. It is a successor to XML Remote Procedure Call
(RPC).
• Reporting
Besides rendering for Web browsers, PDAs, and other devices, you can
render XML for reports. In addition to being useful for rendering Web
page content, you can also use XSLT to render reports in multiple
formats.
• Database
This isn't your dad's database anymore. Not wanting to be left out of the
XML opportunities, both IBM® and Oracle have come out with native XML
databases that store XML document structures and support XQuery. The
third installment of this series will cover this in more detail, but for now
keep in mind that XML is plain-text at heart, so you can store it in flat files
and databases even if you don't have an XML-aware database.
BIRT
Business Intelligence and Reporting Tools (BIRT) is an open source
Eclipse-based framework written in Java that supports the design of
reports with output to HTML and PDF. The report designs are
stored on disk as XML .rptdesign files (see Resources).
This is just one example architecture. Kevin Dick's book, XML: A Managers Guide
(p. 216; see Resources), lists five different enterprise applications that receive
significant benefits from the use of XML:
1. Workforce automation
Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 7 of 17
developerWorks® ibm.com/developerWorks
2. Knowledge management
4. Application integration
5. Data integration
The point is that XML can be used in many different domains, including yours.
OK, now that you have some ideas of where XML can play a part, how do you
choose which technologies and which locations in your system to actually use it? I'll
touch on a number of considerations in this part of the tutorial, so read on.
In addition, some products and frameworks use XML for configuration files. For
example, struts uses a struts-config.xml file to define how the controlling servlet
should work; Web applications use web.xml files to define how to deploy the
application for running on a server. More peripheral uses of XML are appearing all
the time. Your applications can certainly make good use of these capabilities.
I'll focus instead on the more core, integrated use of XML technologies with your
applications. Table 1 lists some characteristics of applications, and gives advice on
when XML technologies can play a role.
Output targets and formats The more types of output, the Use XML when multiple output
(PDA, browser, iPod, PDF) more benefit from XML formats are required.
transformation.
Content size The larger the content, the more Use XML when messaging and
performance hurdles you'll have processing efficiency is less
to overcome using XML. This important than interoperability
leads to consideration of and availability of standard
alternatives, such as tools.
compression, or another format
entirely, such as Abstract
Architecture
Page 8 of 17 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Searching XML supports relatively simple Don't use XML documents when
queries through XPath and searching is important. Instead,
more complex queries with the store the content in a database
more recent XQuery. While or use an XML-aware database.
maturing, XML technologies
have been relatively weaker at
searching. It is yet to be seen if
XML-aware databases can help
with this, since they store the
XML in a tree structure. See
XML-aware databases.
Summarizing XML technologies are weak at Don't use XML documents when
summarizing data -- for summarization is important.
example, for reports. See Instead, store the content in a
XML-aware databases. database or use an XML-aware
database.
Project size To use XML, you need a parser For small projects with simple
and code to deal with the XML requirements, you might not
events or tree. want to incur the overhead of
XML.
XML-aware databases
Database vendors want to support projects using XML technologies, but relational
databases don't make it easy to store and retrieve XML files. IBM has introduced a
new DB2® version formerly known as Viper, which supports XML data storage and
indexing in a native format (in other words, it doesn't pull the XML apart to fit a
relational model). Databases that store XML support XQuery, which is the XML
equivalent of SQL.
So, what do these new database capabilities mean for your projects? The main thing
is that you can achieve the typical strengths of databases, such as searching and
summarizing, with XML data in its native form.
Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 9 of 17
developerWorks® ibm.com/developerWorks
Performance
In this section of the tutorial, I'll discuss some of the issues that can affect
performance when using XML technologies.
As outlined in the book, Designing Web Services with the J2EE™ 1.4 Platform:
JAX-RPC, SOAP, and XML Technologies (see Resources), you can choose from
one of four main XML processing models, available through the following APIs:
SAX and DOM comprise the most common programming models. Along with XSLT,
these two models are available through Java API for XML Processing (JAXP). The
XML data binding model is available through the JAXB technology.
All of these choices will be discussed later in this series of tutorials, but let's examine
the implications of the processing model on performance. Table 2 compares some
attributes of the SAX parser to the DOM parser.
Scales to large sizes with little change in memory Larger documents take more memory
use
Must write to new document to change the Can manipulate the document in memory
contents
More control over parsing, but can be more work Generally, less work for you
for you
Architecture
Page 10 of 17 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
The system requirements, as in most things, usually determine which parser to use.
Some examples include:
• Merging documents
This certainly requires working with a DOM tree. It hurts my head to think
about doing this tag-by-tag using SAX.
• Small devices
If memory is a premium, SAX uses very little. DOM must build a tree
structure of the entire document.
• Complex manipulation
If changes to different parts of the document are required based upon
values from other portions of the document, then it will most likely be
easier to use the DOM parser.
Finally, you can also use the two parsers in tandem. For example, you can parse a
number of small documents with the SAX parser to pull out information that you
need to merge into an existing document, and modify the document using the DOM
parser and tree manipulation.
StAX
A new API called Streaming API for XML (StAX) is to be released in
late 2006. It's a pull API, as opposed to SAX's push model, so it
keeps control with the application rather than the parser. StAX can
also modify the document being parsed. See Resources for more
details.
Caching stylesheets
If you use XSLT to convert XML documents into different formats, you can cache the
compiled thread-safe stylesheet Templates in memory, and reuse them for
individual users to create their own Transformers (see Figure 4). This results in a
smaller footprint for your application, and it saves the time for parsing and compiling
the stylesheets.
Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 11 of 17
developerWorks® ibm.com/developerWorks
Using namespaces
As you might know already, namespaces are used to declare names in your
documents independent of names declared elsewhere. This can become an issue
when stylesheets and other documents are incorporated through statements such as
include or import. It can also be an issue if you merge multiple documents, each
with their own grammar. If you use a colon in an element or attribute name, you can
distinguish between the namespace prefix (to the left of the colon) and the name
within the context of the namespace (in other words, local to the namespace). For
example, xmlns:prefix=URI would allow you to use names like this:
prefix:myname.
An upcoming tutorial in this series will discuss namespaces at length. At this time,
though, I'll mention how namespaces affect performance. As you saw earlier, SAX is
an event-based parser. When the parser encounters a namespace declaration, it
sends the application a startPrefixMapping call and an endPrefixMapping
call. These callbacks slow down your application processing. The point is not to
avoid namespaces altogether -- in fact, you probably can't -- but rather to use them
sparingly if you think performance will be an issue.
As you know, XML documents contain tags and other content in a plain-text format.
This incurs a performance hit. What if you could speed this up? I'll discuss two ways:
JAXB and XSLT Compiler (XSLTC).
JAXB
JAXB takes XML documents and creates a semantic tree of Java objects that
represents the document contents (see Figure 5). You can then manipulate these
objects according to the rules defined in the related XML schema, which you
previously compiled and used to create a JAXB binding framework. You can also
use this framework to marshal the tree into a resulting XML document.
Besides being faster to process documents, JAXB enables you to manipulate XML
through Java objects. JAXB also makes it easy to keep up with schema changes.
Architecture
Page 12 of 17 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Figure 5. JAXB
Note: JAXB does not support the use of DTDs -- you must use XML Schema as
your schema language.
Schemas
Technically speaking, DTDs, XML Schemas (capital S), and RELAX
NG are all types of XML schema (little s). XML Schemas (capital S)
are strictly called W3C XML Schemas. In this tutorial, whenever you
see XML Schema, realize that it is the W3C language and not the
generic schema document description.
XSLT Compiler
You know what XSL Transformation is. XSLTC adds a compiled aspect to the mix.
XLSTC is composed of two parts (see Figure 6). The first part is a compiler that
creates a translet, which is a set of Java classes, from an XSL stylesheet. The
second part is a processor that applies the translet to an XML instance document to
transform it to the desired output format. This allows you to parse the stylesheet
once and reuse it later, and thus speed up processing.
Figure 6. XSLTC
Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 13 of 17
developerWorks® ibm.com/developerWorks
Security
Applications must feature end-to-end data security when they communicate over the
Internet. No one whose computer is hit with a virus or whose site is hacked into will
question the importance of securing a company's information.
So, what is available to secure communications involving XML? At its heart, sending
XML document contents over the Internet securely involves both XML encryption
and XML digital signature.
XML encryption involves converting the content into an unintelligible form to enforce
confidentiality. Of course, the intended recipient must be able to convert it back to its
original form. XML encryption has some unique capabilities too, such as being able
to encrypt certain elements or element contents. This is useful, for example, when
conducting sales transactions between a customer, a vendor, and the customer's
bank, where different parties need to read certain portions of the document contents
but should not read other portions.
XML digital signature handles the integrity part of XML security (in other words, it
determines if content was changed in any way). Like its encryption peer, XML digital
signature allows more granularity -- in other words, you can sign portions of
documents.
Architecture
Page 14 of 17 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Issues related to XML digital signatures, such as keeping the order of attributes
during document manipulation, ensure the document can be verified on the receiving
end of a communication. This is beyond the scope of this tutorial, but you can read
more about it on the JavaWorld Web site (see Resources).
Section 3. Conclusion
XML technologies have numerous uses in the marketplace. The key to their
successful integration into an application architecture is to recognize where to use
them to leverage their strengths. Knowledge of the core XML technologies as well as
an understanding of architectural choices are key to the successful introduction of
XML into your projects.
Summary
In this tutorial on Architecture, you learned how to:
Part 2 of this five-part series focuses on information modeling, including the use of
namespaces and the definition of DTDs and schemas.
Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 15 of 17
developerWorks® ibm.com/developerWorks
Resources
Learn
• XML: A Manager's Guide, Second Edition (Kevin Dick, Addison-Wesley
Professional, 2002): Read about uses of XML technologies in enterprise
applications.
• The BIRT home page: Learn more about Business Intelligence and Reporting
Tools (BIRT).
• An Introduction to StAX (Elliotte Rusty Harold, O'Reilly Media, September 17,
2003): Read more about Streaming API for XML (StAX) in this article.
• Java and XSLT (Eric M. Burke, O'Reilly Media, September 2001): See Chapter
5 for an implementation of a stylesheet cache.
• On Systems Architecture (Tom DeMarco, proceedings of the 1995 Monterey
Workshop on Specification-Based Software Architectures, US Naval
Postgraduate School, Monterey, California, September, 1995): Look at the
technological, economic, political and sociological influences on architecture.
• Yes, you can secure your Web services documents, Part 1: XML Encryption
keeps your XML documents safe and secure by Ray Djajadinata (JavaWorld,
August 23, 2002): Learn about XML encryption -- what the technology is, why
you want to understand it, and how to implement it.
• Yes, you can secure your Web services documents, Part 2: XML Signature
ensures your XML documents' integrity (Ray Djajadinata, JavaWorld, October
11, 2002): Learn about the XML Signature standard, and how to write XML
Signature code.
• XML in a Nutshell, Third Edition (Elliotte Rusty Harold and W. Scott Means,
O'Reilly Media, 2004): Learn about parsing with SAX and DOM as well as
validating using DTDs and XML Schemas.
• Designing Web Services with the J2EE 1.4 Platform: JAX-RPC, SOAP, and
XML Technologies (Inderjeet Singh, Sean Brydon, Greg Murray, Vijay
Ramachandran, Thierry Violleau, and Beth Stearns, Addison-Wesley, 2004), In
this free online book, read about XML technologies and Web services, including
advice on application design.
• Tip: Compress XML files for efficient transmission (Uche Ogbuji,
developerWorks, April 2004): Examine working with binary XML files and
compression that prepares XML for transmission over Web services.
• Managing XML data: Native XML databases (Elliotte Rusty Harold,
developerWorks, June 2005): Read about using XML-aware databases in this
article.
• Binary XML proponents stir the waters (Michael S. Mimoso, November 2004):
Explore binary options for storing and processing XML files.
• IBM XML 1.1 certification: Become an IBM Certified Developer in XML 1.1 and
Architecture
Page 16 of 17 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
related technologies.
• XML: See developerWorks XML Zone for a wide range of technical articles and
tips, tutorials, standards, and IBM Redbooks.
• developerWorks technical events and webcasts: Stay current with technology in
these sessions.
Get products and technologies
• IBM® WebSphere® Application Server Version 6.1: Download a free trial
version of this Java 2 Enterprise Edition (J2EE) and Web services
technology-based application platform.
• IBM trial software: Build your next development project with software, available
for download directly from developerWorks.
Discuss
• XML zone discussion forums: Participate in any of several XML-centered
forums.
• developerWorks blogs: Get involved in the developerWorks community.
Trademarks
IBM, DB2, Lotus, Rational, Tivoli, and WebSphere are trademarks of IBM
Corporation in the United States, other countries, or both.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 17 of 17