CH 2 - HCC Learning Web

CH 8
1
ELEMENT
DECLARATIONS
Objective
2
 Analyzing the document
 ANY
 #PCDATA
 Child elements
 Mixed content
 Empty elements
 Comments in DTDs
Analyzing the Document
3
 First step in creating DTD for particular document is to
understand the structure of the information that you will
encode
 Adding a DTD to this document enables you to enforce
constraints
 When designing a new XML application,
 Writing some actual instance documents first,
 Then designing the DTD
 If an element has not been declared, it cannot be used
 http://www.cafeconleche.org/books/
 bible3/source/04/4-2.xml (link to XML)
Document Type Definitions cont…
4 Television Schedule
The Elements in the
Element
Required
Children
Optional
Children
SCHEDULE
DATE,
STATION(s)
DATE
Text
STATION
CHANNEL
SHOW
NAME,
EPISODE_NUMBER, START_TIME,
START_TIME(s) LENGTH, AIR_DATE, ORIGINAL_
, LENGTH
AIR_DATE, CLOSED_CAPTIONED,
REPEAT, DESCRIPTION, TITLE, RATING,
YEAR_MADE, STARS, DIRECTOR,
WRITER,
PRODUCER, CAST
CAST
ACTOR(s)
NETWORK, CALL_LETTERS, SHOW(s)
ACTOR
GIVEN_NAME, MIDDLE_NAME,
MIDDLE_INITIAL, SURNAME
WRITER
GIVEN_NAME, MIDDLE_NAME,
Element Declarations
5
Element
Required
Children
PRODUCER
DIRECTOR
NAME
Text
TYPE
Text
LETTERS
Text
NETWORK
Text
CHANNEL
Text
EPISODE_NUMB
ER
Text
START_TIME
Text
LENGTH
Text
AIR_DATE
Text
Optional
Children
ANY
6
 The keyword ANY is a content specification
indicating that there are no restrictions on the
content of an element
<!ELEMENT SCHEDULE ANY>
 With the Keyword ANY, This says that all possible
elements as well as plain text can be children of the
SCHEDULE element
 Because ANY is so unrestrictive, it lets you very
quickly create a DTD that will validate a document.
ANY cont…
7
<!ELEMENT SCHEDULE ANY>
<!ELEMENT DATE ANY>
<!ELEMENT STATION ANY>
<!ELEMENT NETWORK ANY>
<!ELEMENT CALL_LETTERS ANY>
<!ELEMENT CHANNEL ANY>
<!ELEMENT SHOW ANY>
<!ELEMENT NAME ANY>
<!ELEMENT TYPE ANY>
<!ELEMENT EPISODE_NUMBER ANY>
<!ELEMENT START_TIME ANY>
<!ELEMENT LENGTH ANY>
<!ELEMENT AIR_DATE ANY>
<!ELEMENT ORIGINAL_AIR_DATE ANY>
<!ELEMENT CLOSED_CAPTIONED ANY>
<!ELEMENT REPEAT ANY>
<!ELEMENT CAST ANY>
<!ELEMENT ACTOR ANY>
<!ELEMENT GIVEN_NAME ANY>
<!ELEMENT SURNAME ANY>
<!ELEMENT PRODUCER ANY>
<!ELEMENT DESCRIPTION ANY>
<!ELEMENT TITLE ANY>
<!ELEMENT MIDDLE_NAME ANY>
<!ELEMENT RATING ANY>
<!ELEMENT YEAR_MADE ANY>
<!ELEMENT STARS ANY>
<!ELEMENT DIRECTOR ANY>
<!ELEMENT WRITER ANY>
<!ELEMENT MIDDLE_INITIAL ANY>
ANY cont…
8
 The DTD in the previous slide does not say very
much
 It place no restrictions on where they may appear
and what they may contain
ANY Cont…
9
<?xml version="1.0"?>
<!DOCTYPE DATE SYSTEM "tvschedule.dtd">
<DATE>
July 3, 2003
<CAST>
<NETWORK>CBS</NETWORK>
<CALL_LETTERS>WCBS</CALL_LETTERS>
<CHANNEL>2</CHANNEL>
</CAST>
<SHOW>
Hollywood Squares
<START_TIME>19:00-0500</START_TIME>
</SHOW>
</DATE>
Figure 1: A Document That’s Valid According to the
DTD
ANY Cont…
10
<?xml version=”1.0”?>
<!DOCTYPE ACTOR SYSTEM “tvschedule.dtd”>
<ACTOR>
<NAME>
<GIVEN_NAME>Frank</GIVEN_NAME>
<SURNAME>Oz</SURNAME>
</NAME>
<ROLE>Yoda</ROLE>
<DATE>May 25, 1944</DATE>
</ACTOR>
Figure 1: A Document That’s invalid According to the
DTD
#PCDATA
11
 When we want to specify that an element will only
contain text, and no child elements, we use the
keyword #PCDATA
 Because this keyword specifies that the element must
contain parsable character data – that is , any text
except the characters less-than (<) , greater-than (>)
, ampersand (&), quote(') and double quote (")
#PCDATA
12
 <DATE>July 3, 2003</DATE>
 <!ELEMENT YEAR (#PCDATA)>
 This declaration indicate text only.
 No child element is allowed
#PCDATA cont…
13
<DATE>
<MONTH>July</MONTH>
<DAY>3</DAY>
<YEAR>2003</YEAR>
</DATE>
However, this DATE element is invalid because it contains
child elements
Child Elements
14
 The first child element is date
 To declare that a SCHEDULE must have a date, we
simply add pair of parentheses
 <!ELEMENT SCHEDULE (DATE)>
 What this simply mean is that each SCHEDULE
element should contain exactly one DATE child
element
Child Elements cont..
15
 <!ELEMENT SCHEDULE (DATE, STATION,
STATION, STATION)>
 This kind of declaration is called a sequence.
 The above mean each SCHEDULE element should
contain exactly one DATE child element, followed by
exactly three STATION elements.
Child Elements cont..
16
 Each element should be declared in its own
<!ELEMENT> declaration exactly once
Child Elements cont..
17
 + One or More Children
<!ELEMENT SCHEDULE (DATE, STATION+)>
 Use plus sign (+) after element name in the
child list
 The above example mean STATION element
have one or more STATION elements
Child Elements cont..
18
 ? Zero or One Child
 We can indicate that a child is optional in
a sequence, that is, it can appear or not by
suffixing its name with a ?
<!ELEMENT ACTOR (GIVEN_NAME?,
MIDDLE_NAME?,
MIDDLE_INITIAL?, SURNAME?)>
<!ELEMENT WRITER (GIVEN_NAME?,
MIDDLE_NAME?,
MIDDLE_INITIAL?, SURNAME?)>
<!ELEMENT PRODUCER (GIVEN_NAME?,
MIDDLE_NAME?,
MIDDLE_INITIAL?, SURNAME?)>
Child Elements cont..
19
<!ELEMENT STATION (NETWORK?,
CALL_LETTERS?, CHANNEL, SHOW+)>
 This above indicate that the STATION can
 Either have a NETWORK,CALL_LETTER , or
 One or more SHOW, and
 ONE CHANNEL
Child Elements cont..
20
 * Zero or More Children
 This mean that that a child can appears zero or more
times
 Can be use for middle names or middle initials
<!ELEMENT ACTOR (GIVEN_NAME?, MIDDLE_NAME*,
MIDDLE_INITIAL*, SURNAME?)>
<!ELEMENT WRITER (GIVEN_NAME?, MIDDLE_NAME?*,
MIDDLE_INITIAL*, SURNAME?)>
<!ELEMENT PRODUCER (GIVEN_NAME?, MIDDLE_NAME*,
MIDDLE_INITIAL*, SURNAME?)>
<!ELEMENT DIRECTOR (GIVEN_NAME?, MIDDLE_NAME*,
MIDDLE_INITIAL*, SURNAME?)>
Child Elements cont..
21
<!ELEMENT SHOW (NAME, TYPE?,
EPISODE_NUMBER?, START_TIME+, LENGTH,
AIR_DATE, ORIGINAL_AIR_DATE?
CLOSED_CAPTIONED?, REPEAT?, RATING?,
STARS?, DIRECTOR*, WRITER*, CAST?,
PRODUCER*, DESCRIPTION)>
 potential child elements of SHOW can appear once, several
times, or not at all, including PRODUCER, DIRECTOR, and
WRITER
Child Elements cont..
22
 Choices
<!ELEMENT PAYMENT (CASH | CREDIT_CARD)>
We can use vertical bar (|) to indicate choice
rather than with a comma(,) in the parent
element declaration
The PAYMENT element must have a single
child element of type CASH or CREDIT_CARD
 This sort of content specification is called a choice
Child Elements cont..
23
 Parentheses
 Each set of parentheses combines several elements
so that the combination is treated as a single unit
when validating
 This parenthesized unit can then be nested inside
other parentheses in place of a single element
 You can then affix a plus sign, an asterisk, or a
question mark to it
Child Elements cont..
24
 Parentheses cont …
 Both choices and sequences appear in parentheses
 These parentheses can also have +, *, or ?
quantifiers suffixed to them
<!ELEMENT ACTOR (GIVEN_NAME|
MIDDLE_NAME | MIDDLE_INITIAL |
SURNAME )+ >
 this declaration says that an ACTOR element can
have one or more of GIVEN_NAME,
 MIDDLE_NAME, MIDDLE_INITIAL, or
SURNAME child elements
Mixed Content
25
 You can declare tags that contain both child
elements and character data. This is called mixed
content
 The
<!ELEMENT CAST (#PCDATA |
ACTOR)*>
above declaration
allow
each CAST to include
text as well as Actor child elements
Empty Elements
26
 Element with no content
<?xml version=”1.0”?>
<!DOCTYPE DOCUMENT [
<!ELEMENT DOCUMENT (TITLE,
SIGNATURE)>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT COPYRIGHT (#PCDATA)>
<!ELEMENT EMAIL (#PCDATA)>
<!ELEMENT BR EMPTY>
<!ELEMENT HR EMPTY>
<!ELEMENT LAST_MODIFIED (#PCDATA)>
<!ELEMENT SIGNATURE (HR, COPYRIGHT,
BR, EMAIL,
BR, LAST_MODIFIED)>
]>
<DOCUMENT>
<TITLE>Empty-element Tags</TITLE>
<SIGNATURE>
<HR/>
<COPYRIGHT>2003 Elliotte Rusty
Harold</COPYRIGHT><BR/>
<EMAIL>[email protected]</EMAIL><BR/>
<LAST_MODIFIED>Wednesday, December 3,
2003</LAST_MODIFIED>
</SIGNATURE>
</DOCUMENT>
Comments in DTDs
27
 DTDs can contain comments, just like the rest of an




XML document
These comments cannot appear inside a declaration,
but they can appear outside one
Comments are often used to organize the DTD in
different parts
Comments, is only for the benefit of people reading
the source code.
XML processors will ignore it
Comments in DTDs cont…
28
<!-- A date in the form Month Day, Year
The year is always written with four digits. ->
<!ELEMENT DATE (#PCDATA)>
 DTDs often use comments to indicate
 Who wrote the DTD
 Copyright for the DTD
 Usage conditions
 Usage instructions
 Customary PUBLIC and SYSTEM identifiers
references
29
 http://stackoverflow.com/questions/918450/difference-between-
pcdata-and-cdata-in-dtd