You are on page 1of 48

DTD

Document Type Definition


DTD
• The DTD defines the constraints on the structure of
an XML document.
• It declares all of the document's element types ,
children element types, and the order and number of
each element type. It also declares any attributes,
entities, notations, processing instructions,
comments, in the document.

• Types:
– Internal DTD
– External DTD
Internal DTD

• Syntax:

<!DOCTYPE root_element [
Document Type Definition (DTD):
elements/attributes/entities/notations/proce
ssing instructions/comments
]>
Example Internal DTD
<?xml version="1.0" standalone="yes" ?>
<!--open the DOCTYPE declaration - the open square bracket indicates an
internal DTD-->

<!DOCTYPE person [

<!--define the internal DTD-->


<!ELEMENT person (id,name)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!--close the DOCTYPE declaration-->
]>
<person>
<id>p101</id>
<name>peter</name>
</person>
Rules for Internal DTD

• The document type declaration must be placed between


the XML declaration and the first element (root
element) in the document .

• The keyword DOCTYPE must be followed by the name of


the root element in the XML document .

• The keyword DOCTYPE must be in upper case


The External DTD
• External DTDs are useful for creating a common
DTD that can be shared between multiple
documents.
• Any changes that are made to the external DTD
automatically updates all the documents that
reference it.
• There are two types of external DTDs:
– Private
– Public.
Rules for External DTD
• If any elements, attributes, or entities are used in
the XML document that are referenced or defined
in an external DTD,
standalone="no"
must be included in the XML declaration

<?xml version=“1.0” standalone=“no” ?>


Private External DTD
• identified by the keyword SYSTEM,
• There are intended for use by a single author or
group of authors.

• Syntax:
<!DOCTYPE root_element SYSTEM "DTD_location">

where:
DTD_location: relative or absolute URL
Example for Private external DTD

<!--inform the XML processor that an external DTD is referenced--


>

<?xml version="1.0" standalone="no" ?>

<!--define the location of the external DTD using a relative URL


address-->
<!DOCTYPE Employee SYSTEM “employee.dtd">

<Employee>
<eid>101</eid>
<name>kumar</name>
</Employee>
Example contd…
The external DTD (“employee.dtd") referenced in the example
contains information about the XML document's structure:

<!ELEMENT Employee(eid,name) >


<!ELEMENT eid (#PCDATA) >
<!ELEMENT name (#PCDATA) >
Public External DTD
• identified by the keyword PUBLIC

• Syntax:

<!DOCTYPE root_element PUBLIC "DTD_name”


"DTD_location">

• These are intended for broad use. The "DTD_location" is used to find
the public DTD if it cannot be located by the "DTD_name".

where:
DTD_location: relative or absolute URL
DTD_name: follows the syntax
"prefix//owner_of_the_DTD//description_of_the_DTD//ISO
639_language_identifier“
Example for Public External DTD
<?xml version="1.0" standalone="no" ?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML>
<HEAD> <TITLE>A typical HTML file</TITLE> </HEAD>
<BODY> This is the typical structure of an HTML file.
It follows the notation of the HTML 4.0 specification, including tags
that have been deprecated (hence the "transitional" label).
</BODY>
</HTML>
prefixes
• The following prefixes are allowed in the DTD name:

Prefix Definition
ISO The DTD is an ISO standard. All ISO standards are
approved.
+ The DTD is an approved non-ISO standard.
- The DTD is an unapproved non-ISO standard.
Element Type Declaration
• set the rules for the type and number of elements
that may appear in an XML document

• what elements may appear inside each other, and

• what order they must appear in

syntax:
<!ELEMENT name allowable_contents>

example : <!ELEMENT employee (#PCDATA)>


Rules for Element Type
• All element types used in an XML document must be
declared in the Document Type Definition (DTD) using an
Element Type Declaration .

• An element type cannot be declared more than once .

• The name in the element type's end tag must match the
name in the element type's start tag . Element names
are case sensitive.

• The keyword ELEMENT must be in upper case


Declaring multiple children (sequence)
• Multiple children are declared using commas (,).

• Commas fix the sequence in which the children are allowed to


appear in the XML document.

<!ELEMENT parent_name child1_name,child2_name,child3_name)>


<!ELEMENT child1_name allowable_contents>
<!ELEMENT child2_name allowable_contents>
<!ELEMENT child3_name allowable_contents>

Rules:
All of the children elements must be declared in a separate
element type declaration
<!ELEMENT name allowable_contents>

Allowable
contents Definition

Refers to tags that are empty. <!ELEMENT IMG EMPTY>


<IMG SRC=“logo.gif"/>, or
EMPTY
<IMG SRC=“logo.gif"></IMG>

Refers to anything at all, as long as XML rules are followed


ANY

You can place any number of element types within another


Children element type
elements <!ELEMENT parent_name
(child1_name,child2_name,child3_name)>
Refers to a combination of (#PCDATA) and children
Mixed contents elements. (parsed character data -> PCDATA)
<!ELEMENT parent_name (#PCDATA|child1_name)*>
Example
<?xml version="1.0"?>
<!DOCTYPE student [
<!--'student' must contain three child elements in the order listed-->
<!ELEMENT student (id,surname,firstname)>

<!--the elements listed below may only contain text that is not markup--
>
<!ELEMENT id (#PCDATA)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT surname (#PCDATA)>
]>
<student>
<id>9216735</id> <surname>Smith</surname>
<firstname>Jo</firstname>
</student>
Declaring optional children

• Optional children are declared using the (?)


operator.
• Optional means zero or one times.

syntax:
<!ELEMENT parent_name (child_name?)>
<!ELEMENT child's_name allowable_contents>

Rules:
If the child element is used in the XML document it must
be declared in a separate element type declaration
Example

<?xml version="1.0"?>
<!DOCTYPE student [
<!--'student' can have zero or one child element of type 'dob'-->
<!ELEMENT student (id,dob?)>

<!ELEMENT id (#PCDATA)>
<!ELEMENT dob (#PCDATA)>
]>

<student>
<id>ECE06117</id>
</student>
Declaring zero or more children

Zero or more children are declared using the (*) operator.

syntax: <!ELEMENT parent_name (child_name*)>


<!ELEMENT child_name allowable_contents>

<?xml version="1.0"?>
<!DOCTYPE student [
<!ELEMENT student (subject*)>
<!ELEMENT subject (#PCDATA)> ]>
<student>
<subject>Mathematics</subject>
<subject>Physics</subject>
<subject>Chemistry</subject>
</student>
Declaring One or More children
One or more children are declared using the (+) operator.

syntax: <!ELEMENT parent_name (child_name+)>


<!ELEMENT child_name allowable_contents>
example:
<?xml version="1.0"?>
<!DOCTYPE student [
<!ELEMENT student (subject+)>
<!ELEMENT subject (#PCDATA)>
]>
<student>
<subject>Mathematics</subject>
</student>
Combinations of Children
A choice between children element types is declared using
the (|) operator.
syntax:
<!ELEMENT parent_name (child1_name|child2_name)>
<!ELEMENT child1_name allowable_contents>
<!ELEMENT child2_name allowable_contents>
• Example:
<?xml version="1.0"?>
<!DOCTYPE student [
<!ELEMENT student (id|surname)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT surname (#PCDATA)>
]>
<student>
<id>9216735</id>
</student>
Nested Elements
<?xml version="1.0"?>
<!DOCTYPE student [
<!ELEMENT student (surname,firstname*,dob?,(origin|sex)?)>
<!ELEMENT surname (#PCDATA)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT sex (#PCDATA)> ]>
<student>
<surname>Smith</surname>
<firstname>Jo</firstname>
<firstname>jerald</firstname>
<sex>female</sex>
</student>
Example
<?xml version="1.0"?>
<!DOCTYPE student [
<!ELEMENT student (surname,firstname)>
<!ELEMENT surname (#PCDATA)>
<!ELEMENT firstname (fullname,nickname)>
<!ELEMENT fullname (#PCDATA)>
<!ELEMENT nickname (#PCDATA)>
]>

<student>
<surname>Smith</surname>
<firstname>
<fullname>Josephine</fullname>
<nickname>Jo</nickname>
</firstname>
</student>
Mixed Content

Mixed content is used to declare elements that contain a


mixture of children elements and text (PCDATA).
syntax:
<!ELEMENT parent_name (#PCDATA|child1_name)*>

Rules:
•(#PCDATA) must come first in the mixed content declaration .
•The operator (*) must follow the mixed content declaration if
children elements are included
Example 1
<?xml version="1.0"?>
<!DOCTYPE student [
<!ELEMENT student (#PCDATA|id)*>
<!ELEMENT id (#PCDATA)>
]>
<student>
Here's a bit of text mixed up with the child
element. <id>9216735</id>
You can put text anywhere, before or
after the child element. You don't even have to include the 'id'
element.
</student>
Example 2
<?xml version="1.0"?>
<!DOCTYPE student [
<!ELEMENT student (#PCDATA|id|surname|dob)*>
<!ELEMENT id (#PCDATA)>
<!ELEMENT surname (#PCDATA)>
<!ELEMENT dob (#PCDATA)> ]>
<student>
You can put text anywhere. You can also put the elements
in any order in the document.
<surname>Smith</surname>
And, you don't have to include all the elements listed in
the element declaration.
<id>9216735</id>
</student>
Attribute List Declarations
• Attributes are additional information associated with an
element type.
•The ATTLIST declaration identifies
• type of attribute
• default value
• element types
syntax:
<!ATTLIST element_name attribute_name
attribute_type default_value> . . .
<element attribute_name="attribute_value">
Rules:
•Attributes may only appear in start or empty tags .
•The keyword ATTLIST must be in upper case
Example

<?xml version="1.0"?>
<!DOCTYPE image
[ <!ELEMENT image
EMPTY>
<!ATTLIST image height CDATA #REQUIRED>
<!ATTLIST image width CDATA #REQUIRED>
]>
<image height="32"
width="32"/>
Attribute types

Three main Attribute types:


• string type
• CDATA – character Data
• tokenized types
• ID, IDREF, IDREFS,
ENTITY,ENTITIES,NMTOKEN,NMTOKENS
• enumerated types
• NOTATION, ENUMERATED
ID

• ID is a unique identifier of the attribute.


• IDs of a particular value should not appear
more than once in an XML document .
• An element type may only have one ID
attribute .
• An ID attribute can only have an #IMPLIED or
#REQUIRED default value .
• The first character of an ID value must be a
letter, '_', or ':'
CDATA

Character data

<?xml version="1.0"?>
<!DOCTYPE image [
<!ELEMENT image EMPTY>
<!ATTLIST image height CDATA #REQUIRED>
<!ATTLIST image width CDATA #REQUIRED> ]>
<image height="32" width="32"/>
Example
<?xml version="1.0"?>
<!DOCTYPE student_name
[ <!ELEMENT student_name
(#PCDATA)> <!ATTLIST student_name
student_no ID #REQUIRED> ]>

<student_name student_no="a9216735">Jo
Smith</student_name>
IDREF

•IDREF value of the attribute must refer to an


ID value declared elsewhere in the document .
• The first character of an ID value must be a
letter, '_', or ':‘
IDREF Example
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE lab_group [
<!ELEMENT lab_group (student_name)*>
<!ELEMENT student_name (#PCDATA)>
<!ATTLIST student_name student_no ID #REQUIRED>
<!ATTLIST student_name tutor_1 IDREF #IMPLIED>
<!ATTLIST student_name tutor_2 IDREF #IMPLIED> ]>
<lab_group>
<student_name student_no="a8904885">Alex
Foo</student_name>
<student_name student_no="a9011133">Sarah
Bar</student_name>
<student_name student_no="a9216735" tutor_1="a9011133"
tutor_2="a8904885">Jo Smith</student_name>
</lab_group>
IDREFS

• Allows multiple ID values separated by


whitespace
ENTITY
• ENTITYs are used to reference data that act as an abbreviation
or can be found at an external location.
• The first character of an ENTITY value must be a letter, '_', or ':‘
EXAMPLE:
<?xml version="1.0" standalone="no"?>
<!DOCTYPE experiment_a [
<!ELEMENT experiment_a (results)>
<!ELEMENT results (#PCDATA)>
<!ENTITY name "http://www.university.com/results/experimenta/a.gif">
]>
<experiment_a>
<results>&name;</results>
</experiment_a>
NMTOKEN
The first character of an NMTOKEN value must be a letter,
digit, '.', '-', '_', or ':‘
Example:
<?xml version="1.0"?>
<!DOCTYPE student_name
[ <!ELEMENT student_name
(#PCDATA)> <!ATTLIST student_name
student_no NMTOKEN #REQUIRED> ]>
<student_name
student_no="9216735">Jo Smith</student_name>
NMTOKENS

Allows multiple NMTOKEN names separated by


whitespace
NOTATION
•Useful when text needs to be interpreted in a particular
way.
• The first character of a NOTATION name must be a
letter, '_', or ':‘
Example:
<?xml version="1.0"?>
<!DOCTYPE code [
<!ELEMENT code (#PCDATA)>
<!NOTATION vrml PUBLIC "VRML 1.0">
<!ATTLIST code lang NOTATION (vrml) #REQUIRED>
]>
<code lang="vrml">Some VRML instructions</code>
enumerated
• make a choice between different attribute values.
• first character of an Enumerated value must be a letter,
digit, '.', '-', '_', or ':'
Example 1:
<?xml version="1.0"?>
<!DOCTYPE ToDoList [
<!ELEMENT ToDoList (task)*>
<!ELEMENT task (#PCDATA)>
<!ATTLIST task status (important|normal) #REQUIRED>
]>
<ToDoList>
<task status="important">This is an important task that
must be completed</task>
<task status="normal">This task can wait</task>
</ToDoList>
Default value
<?xml version="1.0"?>
<!DOCTYPE ToDoList [
<!ELEMENT ToDoList (task)*>
<!ELEMENT task (#PCDATA)>
<!ATTLIST task status (important|normal) "normal">
]>
<ToDoList>
<task status="important">This is an important
task.</task>
<task>This is by default a task that has a normal
status.</task>
</ToDoList>
Default value contd…
<?xml version="1.0"?>
<!DOCTYPE ToDoList [
<!ELEMENT ToDoList (task)*>
<!ELEMENT task (#PCDATA)>
<!ATTLIST task status NMTOKEN #FIXED "monthly">
]>
<ToDoList>
<task>go to the bank</task>
<task>pay the phone bill</task>
</ToDoList>
CDATA Section
• CDATA sections are used to display markup without the XML
processor trying to interpret that markup.
• They are particularly useful when you want to display sections
of XML code.
syntax: <![CDATA[
any characters (including markup)
]]>
CDATA Section
<?xml version="1.0"?>
<!DOCTYPE body [
<!ELEMENT body (#PCDATA)>
]>
<body>
<![CDATA[
Here is an example of an internal DTD:
<!DOCTYPE lab_group [
<!ELEMENT lab_group (student_name)*>
<!ELEMENT student_name (#PCDATA)>
<!ATTLIST student_name student_no ID #REQUIRED>
<!ATTLIST student_name tutor_1 IDREF #IMPLIED>
<!ATTLIST student_name tutor_2 IDREF #IMPLIED>
]>
]]>
</body>
Predefined General Entity
Predefined
How to Declare these entities
General Entities
&lt; <!ENTITY lt "&#38;#60;">

&gt; <!ENTITY gt "&#62;">

&amp; <!ENTITY amp "&#38;#38;">

&quot; <!ENTITY quot "&#34;">

&apos; <!ENTITY apos "&#39;">

<name>&quot;kumar&quot;</name>
INTERNAL (PARSED) GENERAL ENTITY

• Internal parsed entities generally reference text.


Syntax:
<!ENTITY name "entity_value">
where:
entity value: any character that is not an '&', '%' or ' " ',

<?xml version="1.0" standalone="yes" ?>


<!DOCTYPE author [
<!ELEMENT author (#PCDATA)>
<!ENTITY js "Jo Smith"> ]>
<author>&js;</author>

You might also like