You are on page 1of 5

XML: Building with Oracle XML Database

TECHNOLOGY: XML Building with Oracle XML Database By Magns Kristjnsson Exploiting XML DB capabilities, Theriak Software built a production-grade survey application.

As Published In

September/October 2004

As a software development shop involved with XML technologies for several years, Theriak Software (formerly eMR) has witnessed firsthand the many recent improvements Oracle has made in these technologies. This article explores some of the techniques we used to successfully build a production-grade, generic survey system we codeveloped with deCODE Genetics, Inc., using the XML DB capability in Oracle9i Release 2. Based in Reykjavik, Iceland, deCODE Genetics develops new drugs and DNA-based diagnostic protocols based on human genetics research tied to population data, much of which is gathered via survey. Study participants fill out extensive questionnaires that vary from research project to research project; at any given time, deCODE is conducting dozens of research projects, each of which may involve thousands of participants, numerous surveys, and hundreds of questions. To automate the survey process, we developed a Web-based survey system, trademarked as Questor by deCODE. The system allows researchers to easily author questions and compose new study-specific questionnaires from a repository of pretested questions, gathering results over the Web. Likewise, study participants complete the surveys by using a Web browser, and the system securely stores all answers in the database. The survey system is now in its second major release, having been redesigned to take advantage of some of the new or improved XML DB features of Oracle9i Release 2, such as support for XML Schema, the XML DB Repository feature, XPath, and XDBUriType. Reimplementing Questor to use these and other native XML capabilities has led to both increased simplicity (fewer tables, less code) and significantly improved performancethe system runs twice as fast as the initial release. XML Schema and XML DB One of the major changes we made in going from Questor release 1.0 to release 2.0 was to implement the native XMLType and to use XML Schema to define the documents that would be important to any survey systemquestions, surveys, and responses. XML Schema describes the structure of a set of XML documents. It is also used to validate that a given XML document conforms to the structure it defines. It also facilitates automatic definition of O-R (object-relational) database structures for storing XML data. For XML DB to use an XML schema, it must first be registered with (loaded into) the database. The function, in the DBMS_XMLSCHEMA PL/SQL package, accomplishes this task. It parses and validates the XML schema, creates entries in the Oracle data dictionary that describe the XML schema, creates SQL object definitions (based on complexTypes defined in the XML schema), and creates an XMLType table for each global element defined in the XML schema.
registerSchema()

Oracle XML DB supports XML Schema's annotation mechanism, which enables adding vendorspecific information to XML Schema. Oracle uses the annotation mechanism to give developers control over the mapping between an XML schema and the SQL object model, naming, and storage. For example, Listing 1 shows one of the XML Schema definition files used in Questor for survey data. It shows schema annotation being used to specify a default table (xdb:defaultTable="HLQ_SURVEY_XML_DTB"). Register-ing the XML schema shown in Listing 1 creates a default table (HLQ_SURVEY_XML_DTB) in the Questor database schema.

XML: Building with Oracle XML Database

To use annotations, you must declare Oracle's XDB namespace in the XML schema file, as in
xmlns:xdb=http://xmlns.oracle.com/xdb

Listing 1 also shows an annotation (xdb:SQLType="CLOB") being used to specify the storage option. XML DB supports both structured XML O-R storage and CLOB storage. Traditional CLOB storage provides document fidelity, whereas object-relational storage maintains DOM (document object model) fidelity by decomposing the XML into underlying O-R structures. You can use both types of storage in your application. O-R storage facilitates query-rewrite by the Oracle query optimizer, so that XPath queries can directly access the underlying object-relational columns. XPath queries can be evaluated without construction of the full XML document in memory, a feature that can result in better query performance. The Questor system was designed to optimize presentation, so question documents, for example, contained everything needed to render a question in the Web browser. As a result, we chose CLOB storage for the XML content, and we maintained several relational tables from the previous versions of the system. XML DB Repository Functionality Another unique feature of XML DB we used in Questor 2.0 is the XML DB Repository. This component provides path-based access, using open protocols, to resources (XML or other types of files) stored and maintained in a folder hierarchy, similar to a file system. As with traditional file systems, access to resources can be restricted by means of Access Control Lists (ACLs), and resources can be subject to versioning. XML DB Repository can store any kind of data, not just XML files. Anything that can fit in a BLOB, such as a PDF file, a JPG file, or an Excel spreadsheet, can be foldered and stored in XML DB Repository. Oracle XML DB has an embedded protocol handler that enables content in the XML DB Repository to be directly accessible via WebDAV, HTTP, or FTP. Programming with XML DB Repository The DBMS_XDB built-in package provides a wide range of functions for managing Repository resources and ACLs on resources, as well as for managing the XML DB configuration. Together with RESOURCE_VIEW and PATH_VIEW, these tools give developers query-based access to Oracle XML DB. We used the XML DB Repository to store all Questor XML documents as foldered resources. Listing 2 shows how to create a schema owner and set up the necessary Repository folders. The Repository folder hierarchy reflects that Questor documents are of different types. Given a schema owner named HLQ2, the code in Listing 2 creates two document folders: /home/HLQ2/questions and /home/HLQ2/surveys. Optionally, we can limit access to a given folder and all its subfolders, by specifying an ACL, as follows:
EXEC DBMS_XDB.SETACL('/home/&1', '/sys/acls/all_owner_acl.xml');

In Listing 3, we then specify that user HLQ2 owns the resources in the folders, by changing the appropriate resource metadata in the RESOURCE_VIEW, using the existsNode() function and XPath to locate the resources. Note the use of the UNDER_PATH operator, which returns the paths under a given path. At this point, we have folders within which to store documents and can use FTP or HTTP-based protocols, the XDBUriType, the DBMS_XDB package, or the RESOURCE_VIEW to store, retrieve, or manipulate the XML data. In our implementation, we created our own Questor XML API, using a selection of the above techniques, as the following sections outline. Questor Architecture

XML: Building with Oracle XML Database

Let's look at the Questor architecture in more detail. Questions, surveys, and responses in Questor are stored as foldered resources in the XML DB Repository. All XML instance data is stored in default XML tables (as CLOBs) owned by the Questor database schema, as specified in the respective XML schemas. In addition to the CLOBs used for XML documents, all key data needed for document lookupsuch as survey, question, and language codeis extracted from the documents as they get checked for integrity and validation. We chose this approach partly because we wanted to keep some of the relational data, such as participant account and subscription information, from the original system. We also took this approach because our main goal was to optimize the XML documents for presentation purposes. This key lookup data is stored and maintained in relational tables, with links between XML Repository and XML instance data. We simply link the relational information to survey contents and participant answers in XML format. Whether access is via FTP, HTTP, or the PL/SQL-based Questor XML API, an XML document is stored in a folder whenever it is sent to the Repository. When a document is sent to the Repository, the Oracle XML DB server does the following six things: 1. Calls dbms_xdb.createResource, using the XML instance document and the requested Repository path as parameters 2. Inspects the instance document and identifies its XML schema reference 3. Locates the registered XML schema 4. Determines into which default XML table (as annotated in the XML schema) the document should be inserted 5. Executes the insert trigger on the default XML table in the Questor database schema The insert trigger, shown in Listing 4, calls routines that perform various validation and integrity checks on the incoming XML document, raising exceptions if the document has errors and thus causing the insert operation to abort. If there are no errors, the trigger code extracts key values from the XML document that can be used for subsequent document lookup and stores them in the Link Table. The Link Table is a relational table that links the XML data to the relational data model. As shown in Listing 4, the XDBUriType() constructor generates the key that gets stored in the Link Table for subsequent retrieval operations by the Questor XML API. Calling these check routines in a default table trigger ensures the quality of the XML documents, regardless of access method (FTP, HTTP, or PL/SQL). 6. Inserts a row in XML DB's XDB$RESOURCE table with a reference to the XML instance document in the default XML table (in the Questor database schema). Native XML and the Questor XML API Let's take a closer look at how the Questor API goes about storing, retrieving, and updating XML Repository documents. We developed a custom API that consists of some PL/SQL packages in the Questor database schema. These packages, in turn, are wrapped as Java methods, running as servlets in the middle tier. We first created an object type (hlq_dockey_ty) that can be used to wrap all document keys. As shown in Listing 5, our document key (dockey) object type comprises a URI, a rowid, and a REF, as

XML: Building with Oracle XML Database

well as several member functions for manipulating the instances of this object type. Next, we constructed a document key object of type hlq_dockey_ty, as shown in Listing 6. Once these objects and functions were available, we combined them into a single function to insert or update an XML document. As shown in Listing 7, if the document already exists, the function will update the document; if it doesn't exist, the function will create the XML document. The code for retrieving an XML document via a document key is straightforward:
function hlq_get_survey_xml( p_survey_code varchar2 ,p_language_code varchar2) return clob is begin return hlq_get_survey_dockey( p_survey_code ,p_language_code).get_clob(); end;

Indexing and Searching Oracle XML DB provides several alternatives for indexing XML, including function-based indexes using XPath expressions, and traditional Oracle Text indexes. We used a combination of these techniques for the Questor system, as follows: Function-based index. Creating a function-based index by using XPath is straightforward. To get quick access to the number of questions each participant has answered at any point, we created an index on the survey participants' answers, as shown in Listing 8, storing the answer in the element /sml/survey/ progress, under the answeredResponses attribute of hlq_userdata_xml_dtb. Then we created a function-based index on that column, as follows:
create index hlq_userdata_xml_dtb_answered on hlq_userdata_xml_dtb (sys_nc_rowinfo$.extract( '/sml/survey/progress/@answeredResponses',hlq_sml.hlq_get_sml_ns).getNumberVal() );

Be sure to enable query rewrite, so that the index will be used, as follows:
alter session set query_rewrite_enabled='TRUE'; alter session set query_rewrite_integrity='TRUSTED';

If you don't enable query rewrite, the index won't be usedeven if the WHERE clause uses exactly the same extract() expression as the index. Traditional Oracle Text index. The Oracle Text index has been extended to support XMLType columns, and the CONTAINS clause also supports XPath syntax in the INPATH and HASPATH operators. For either the INPATH or the HASPATH operator to work, you must create the index with PATH_SECTION_GROUP, which is used by default. You can create all the basic Oracle Text indexes on an XML Type column, as well as on the sys_nc_rowinfo$ (also called object_value) pseudocolumn of a default XML table. Instead of specifying the sys_nc_rowinfo$ pseudocolumn in the query, we could use an alias,say, Tfor the default XML table and write value(t). Here's an example of creating an Oracle Text index of type ctxsys.context on the question XML default table:
create index hlq_question_xml_ctx on hlq_question_xml_dtb(sys_nc_rowinfo$) indextype is ctxsys.context;

After creating the index, you can issue traditional queries with the CONTAINS clause, but with a new

XML: Building with Oracle XML Database

twistusing the INPATH or HASPATH operator. For example, in the query in Listing 9, we use INPATH to determine the number of questions containing fragments including the words smoke and pipe. The result in Listing 9 shows that only 2 questions are about smoking a pipe. Similarly, we can use HASPATH to search for existence in XPath. In Listing 10, we want to ascertain the total number of multiplechoice ("MC") questions available in the question table. Apparently, 6,290 questions are multiple-choice, as shown in Listing 10. The use of // in the XPath expression causes the index to scan all <response> elements, regardless of their location in the XML. Note that the domain index is used, as shown in the execution plans in Listings 9 and 10, and both Elapsed Time and Statistics indicate that it's very efficient. Conclusion The most important advice we can give developers considering building systems with XML DB is to make sure to start with a well-designed XML schema. Spend at least as much time designing and refining the XML schemas as you would creating a data model for SQL-based applications. One development team design goal should be to design the XML schemaand thus the structure and content of the XML instance documentsto achieve a natural data granularity and grouping that best fits the application's main access and usage patterns. Overall, despite various ramp-up issues, we are pleased with XML DB, because it has enabled us to deliver a working application with less complexity, more flexibility, and better performance than was previously possible with traditional relational techniques. In short, Oracle XML DB enabled us to use XML effectively as the core technology for designing and developing Questor. Magns Kristjnsson (magnus@theriak.is) is an informatics architect with Theriak Software in Iceland and a key developer of Questor. He is an active participant in the Oracle Database 9i Release 2 and Oracle 10g beta programs, and his primary focus has been on the native XML support in the Oracle database (XML DB). Kristjnsson was previously a consultant and solutions architect for Oracle in Oslo, Norway, where he worked on several projects involving XML and other core database technologies. Next Steps DOWNLOAD Oracle Database 10g VISIT the XML technology center LEARN more about XML DB READ more about XML XML: To CLOB or Object? SQL in, XML Out

Send us your comments

You might also like