Extensible Markup Language (XML)

Extensible Markup
Language (XML)
Hamid Zarrabi-Zadeh
Web Programming – Fall 2013
2
Outline
• Introduction
• XML Structure
• Document Type Definition (DTD)
• XHMTL
• Formatting XML
 CSS Formatting
 XSLT Transformations
• JSON
3
What is XML?
• XML is a markup language for encoding
documents in a format that is both humanreadable and machine-readable
• Is designed to transport and store data
• Emphasizes simplicity, generality, and usability
over the Internet
• Has strong support via Unicode for the languages
of the world
4
XML History
• XML is based on SGML, a Standard Generalized
Markup Language (ISO 8879:1986)
• Most of XML comes from SGML unchanged
• First XML specification draft published in 1996
• XML 1.0 became a W3C recommendation in
1998 (fifth edition published in 2008)
• XML 1.1 published in 2004 (revised in 2006), but is
not widely implemented and is rarely used
5
XML Example
• A simple XML example
<?xml version="1.0"?>
<message>
<from>Hassan</from>
<to>Hossein</to>
<body>Please give me a call!</body>
</message>
6
XML Example
• Another example:
<?xml version="1.0"?>
<books>
<book>
<title>Maktub</title>
<author>Paulo Coelho</author>
</book>
<book>
<title>Never Crashed!</title>
<author>Microsoft</author>
</book>
</books>
7
XML versus HTML
• XML and HTML are both markup languages
• HTML is for displaying data, while XML is for
describing data
• XML syntax differences
 New tags may be defined at will
 Tags may be nested to arbitrary depth
 May contain an optional description of its grammar
• XHTML is a version of HTML in XML
8
XML Markup Languages
• Lots of new markup languages have been
created with XML, including:
 XHTML
 RSS for news feeds
 RDF for describing resources
 SVG for scalable vector graphics
 SMIL for describing multimedia for the web
 MathML for describing mathematical notation
 …
9
XML Pros and Cons
• Pros:
 software- and hardware-independent
 simplifying:
 sharing data between applications
 transporting data between different platforms
• Cons:
 verbosity
 rather complex parsing and mapping to type systems
XML Structure
11
XML Tree
• Each XML document forms a tree structure that
starts at the root and branches to the leaves
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
</bookstore>
12
XML Tree Example
13
XML Tags
• XML tags are similar to HTML tags but
 They are case-sensitive
 All tags must be closed
• Like HTML tags they must be properly nested
• All XML documents must have a single root
element that contains all other elements
 This root element can have any name
14
XML Attributes
• XML elements can have attributes
• Attribute values must be quoted with either single
or double quotes
<book title="Let's party!">
<film name='The "Lost"'/>
• Attributes have limitations (use with care)
– Child elements are more flexible alternatives
<book>
<title>It's me</title>
<author>Me who</author>
</book>
Document Type
Definitions (DTDs)
16
Document Type Definitions
• Most applications will not be able to deal with
general XML documents
• Instead, they expect documents that have a
specific structure
• This structure can be defined with an XML
Document Type Definition (DTD)
• A DTD specifies the root node's tag name and
what it contains
17
Valid XML
• A well-formed XML document which conforms to
the rules of a DTD is called a valid XML
<?xml version="1.0"?>
<!DOCTYPE message SYSTEM "message.dtd">
<message>
<from>Hassan</from>
<to>Hossein</to>
<body>Please give me a call!</body>
</message>
18
DTD Example
• A simple DTD for our message example would
look like this
<!DOCTYPE
[
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
]>
message
message (from,to,subject,body)>
from (#PCDATA)>
to (#PCDATA)>
subject (#PCDATA)>
body (#PCDATA)>
19
DTD Building Blocks
• In a DTD we can specify
 Elements – tags and the stuff text between them
 Attributes – information about elements
 Entities – special character &lt;, &gt;, &amp;
 PCDATA – parsed character data
 Parsed by the XML parser and examined for markup
 CDATA – (unparsed) character data
20
Elements
• There are different ways to declare an element
 Empty
 Parsed character data
 Anything
 With a specific sequence of children
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
br EMPTY>
p (#PCDATA)>
x ANY>
message (from,to,subject,body)>
21
Elements with Children
• Child sequences can be specified using a syntax
similar to regular expressions
 <!ELEMENT picture (polygon+)>
 <!ELEMENT picture (polygon+)>
 <!ELEMENT picture (polygon?)>
 <!ELEMENT polygon (point,point,point+)>
 <!ELEMENT picture (polygon|image)>
 <!ELEMENT picture (polygon|image)*>
22
Element Attributes
• We can also specify which attributes an element
has
 <!ATTLIST element-name attribute-name attribute-type
default-value>
<!ATTLIST
<!ATTLIST
<!ATTLIST
<!ATTLIST
polygon
polygon
polygon
point x
boundary CDATA "black">
interior CDATA "white">
fill (true|false) "true">
CDATA "0">
23
Attribute Value Types
•
Attribute values types can be
 CDATA - The value is character data
 (en1|en2|..) - The value must be one from an enumerated list
 ID - The value is a unique id
 IDREF - The value is the id of another element
 IDREFS - The value is a list of other ids
 NMTOKEN - The value is a valid XML name
 NMTOKENS - The value is a list of valid XML names
 ENTITY - The value is an entity
 ENTITIES - The value is a list of entities
 NOTATION - The value is a name of a notation
 xml: - The value is a predefined xml value
24
Default Attribute Values
• Default attribute values can be
 Value - The default value of the attribute
 #REQUIRED - The attribute value must be included in
the element (no default)
 #IMPLIED - The attribute does not have to be included
 #FIXED value - The attribute value is fixed
25
Entities
• Entities are variables used to define common text
 <!ENTITY entity-name "entity-value">
<!ENTITY sut "Sharif University of Technology">
...
[in XML file:]
&sut;
26
Example – Newspaper
<!DOCTYPE
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ATTLIST
<!ATTLIST
<!ATTLIST
<!ATTLIST
<!ENTITY
<!ENTITY
newspaper [
newspaper (article+)>
article (headline,byline,body,notes)>
headline (#PCDATA)>
byline (#PCDATA)>
body (#PCDATA)>
NOTES (#PCDATA)>
article author CDATA #REQUIRED>
article editor CDATA #IMPLIED>
article date CDATA #IMPLIED>
article edition CDATA #IMPLIED>
publisher "Sample Press">
copy "Copyright 2013 Sample Press"> ]>
27
XML Schema
• XML Schema is an XML-based alternative to DTD
• Main differences to DTDs
 XML schemas use XML syntax
 XML schemas support data types
 XML schemas are extensible
28
Schema Example
<?xml version="1.0"?>
<xs:element name="message">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="subject" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
XHTML
30
XHTML
• XHTML is a version of HTML that is proper XML
• XHTML 1.0 released in 2000
• Because it is XML, it is defined using a DTD
• The html tag must have an xmlns attribute
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
</body>
</html>
31
XHTML versus HTML
•
XHTML and HTML have mostly the same tags
•
Main differences have to do with XML syntax
 All tags must be closed
 Empty tags must also be closed
 Elements must be properly nested
 Tag names must be lowercase
 Attribute values must be quoted
 Attributes must have values
 <input type="checkbox" checked="checked" />
 <input type="text" readonly="readonly" />
 The id attribute replaces the name attribute
Formatting XML
33
CSS Formatting
• Formatting information can be added to XML
documents using CSS
• This works by adding a reference to a CSS
stylesheet in the XML document header
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="msg.css"?>
<message>
<from>Hassan</from>
<to>Hossein</to>
<body>Please give me a call!</body>
</message>
34
CSS Example
• The :before and :after CSS pseudo-elements can
be very useful here
from {
display: block;
padding: 10px;
}
from:before {
content: "From: ";
font-weight: bold;
}
35
XSLT Transformations
• Formatting XML with CSS is not the most common
method
• W3C recommends using XSLT instead
• XSLT (eXtensible Stylesheet Language
Transformation) is a language for transforming
XML documents into other XML documents
• To display XML on the web, we could use XSLT to
convert our XML document into an XHTML
document
36
XSLT Example
<?xml version="1.0"?>
<html xsl:version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml">
<body>
<xsl:for-each select="messages/message">
<div style="padding:10px; margin:10px>
<div><b>From</b>:
<xsl:value-of select="from"/></div>
<div><b>To</b>:
<xsl:value-of select="to"/></div>
<div><xsl:value-of select="body"/></div>
</div>
</xsl:for-each>
</body>
</html>
JSON
38
What is JSON?
• JSON stands for JavaScript Object Notation
• It is a lightweight text-data interchange format,
commonly used as an alternative to XML
• JSON is smaller, faster and easier to parse
• Although JSON uses JavaScript syntax, it is still
language and platform independent.
39
JSON Examples
{
"message": {
"from": "Hassan",
"to": "Hossein",
"body": "Please give me a call!"
}
}
{
"books": [
{"title": "Maktub", "author": "Paulo Coelho"},
{"title": "Crashed!", "author": "Microsoft"}
]
}
40
Summary
• XML is used to describe data
• DTDs and Schemas can be used to define valid
documents
• XML can be formatted with CSS and XSLT
• XHTML is a version of HTML which is proper XML
• JSON is a good alternative to XML
41
References
• W3Schools
 http://www.w3schools.com/xml
• Internet Programming by Pat Morin
 http://cg.scs.carleton.ca/~morin/teaching/2405/
• Wikipedia
 http://en.wikipedia.org/wiki/XML