XML and XSL - Purple Technology

XML and XSL Overview
by Alex Chaffee
[email protected], http://www.purpletech.com/
Purple Technology: Open source development
jGuru: Java online resource
FAQs and News and other cool stuff
XML
• eXtensible Markup Language
• Replacement for HTML
• Metalanguage - used to create other
languages
• Has become a universal dataexchange format
Advantages of XML
•
•
•
•
•
Human-readable
Machine-readable (easy to parse)
Standard format for data interchange
Possible to validate
Extensible
– can represent any data
– can add new tags for new data formats
• Hierarchical structure (nesting)
Why not HTML?
• Browsers are too lenient
• Led to sloppy HTML code all over the
Web
– <imG src="foo.gif> is "legal" HTML
• Told HTML, "go to your room and don't
come out until it's clean"
– Out came XML
XML Searching and Agents
• An early motivation for XML
• Allows detailed queries of disparate data
sources
– Find best price for certain product
– Search for properties with different real estate
brokers
• HTML insufficient
– Good for humans, bad for computers
– Doesn't scale
XML Example
<?xml version="1.0"?>
<!DOCTYPE menu SYSTEM "menu.dtd">
<menu>
<meal name="breakfast">
<food>Scrambled Eggs</food>
<food>Hash Browns</food>
<drink>Orange Juice</drink>
</meal>
</menu>
XML Languages
• MML - musical scores
• CML - chemicals
• HRMML - Human Resource
Management (???)
• MathML - equations
• RSS - web syndication
Tag vs. Element
• A tag is a name, enclosed by angle
brackets, with optional attributes
– <foo id=“123”>
• An element is a tree, containing an
open tag, contents, and a close tag
– <foo id=“123”>This is <bar>an
element</bar></foo>
XML Syntax
• Tags properly nested
• Tag names case-sensitive
• All tags must be closed
– or self-closing
– <foo/> is the same as <foo></foo>
• Attributes enclosed in quotes
• Document consists of a single (root) element
• A few other details
Well-Formed vs. Valid
• Well-Formed:
– Structure follows XML syntax rules
• Valid:
– Structure conforms to a DTD
DTD
• Document Type Definition
• A grammar for XML documents
• Defines
– which elements can contain which other
elements
– which attributes are
allowed/required/permitted on which
elements
DTD and Data Exchange
• Both sides must agree on DTD ahead
of time
• DTD can be part of document or
stored separately
DTD Example
<?xml encoding="US-ASCII">
<!ELEMENT menu (meal)*>
<!ATTLIST menu
name CDATA #OPTIONAL>
<!ELEMENT meal (food|drink)*>
<!ATTLIST meal
name CDATA #REQUIRED>
<!ELEMENT food (#PCDATA)*>
<!ELEMENT drink (#PCDATA)*>
Why isn't a DTD in XML?
• It will be someday: XSchema
XML Namespaces
• A single document can use multiple
DTDs
• But! Two DTDs can use the same
element name with different rules
• Solution: Namespaces
• Must prefix tag name with namespace
name
– e.g. <xsl:apply-templates select="."/>
Entities
• Macros / constants
• Values defined once, used in
document
<!DOCTYPE foo SYSTEM "foo.dtd" [
<!ENTITY background "#99FFFF">
]>
<BODY BGCOLOR="&background;">
SML / Minimal XML
•
•
•
•
Simplified Markup Language
Subset of XML, but stripped down
Easier to understand, parse
No
–
–
–
–
DTDs
Attributes
Processing instructions
etc.
XSL: XML Transformation
XSL
• The eXtensible Style Language
• Transforms XML into HTML
• Actually, transforms XML into a tree,
then turns that tree into another tree,
then outputs that tree as XML
XSL Architecture
XSL
Stylesheet
XML
Source
XSL
Processor
HTML
Output
XML is a Tree
<?xml version="1.0"?>
<!DOCTYPE menu SYSTEM "menu.dtd">
<menu>
<meal name="breakfast">
<food>Scrambled Eggs</food>
<food>Hash Browns</food>
<drink>Orange Juice</drink>
</meal>
name
<meal name="snack">
<food>Chips</food>
</meal>
"breakfast"
</menu>
menu
meal
meal
food
food
drink
"Scrambled
Eggs"
"Hash
Browns"
"Orange
Juice"
XML Is A Tree
• Nodes
– Branch nodes contain children
– Leaf nodes contain content
• Attributes, Values, Entities, etc.
• DOM provides API-based access to
tree models
• XSL turns one tree into a different tree
Command Line Invocation
• Apache Xalan
java org.apache.xalan.xslt.Process
-IN faq.xml –XSL faq.xsl –OUT faq.html
• IBM LotusXSL
java com.lotus.xsl.xml4j.ProcessXSL
-in servletfaq.xml -xsl faq.xsl -out faq.html
• And so on…
Formatting Objects
• Forget about it for now
XSLT
•
•
•
•
•
The meat of XSL
Syntax for making XSL template files
Pattern matching
Output formatting
Rules-based (like Prolog)
XPath
• The stuff inside the quotes in XSL
patterns
– "/person/name/firstname"
• A sensible way to locate content in an
XML document
• More straightforward than walking a
DOM tree or waiting for a SAX callback
XPath Syntax
• book/title
– title child of book child of current node
• /book/title
– title child of book child of document root
• @language
– language attribute of current node
• chapter/@language
– language attribute of chapter child of
current node
XPath Syntax (cont.)
• chapter[3]/para
– all the para children of the third chapter
• book/*/title
– all title children of all children of book (but
not of their children)
• chapter//para
– all para children of any child of chapter,
recursively
• ../../title
– title child of parent of parent
– parent::node()/parent::node()/child::title
XPath Abbreviations
.
self::node()
..
parent::node()
//
descendant-orself::node()
@
attribute::
XPath Functions
• para[1]
or
para[position()=1]
– the first para node of the current node
• para[last()]
• para[count(child::note)>0]
– all paragraphs with one or more notes
• para[id("abstract")]
– selects all child nodes like
<para id="abstract">
• para[@type='secret'] or
para[attribute::type='secret']
– selects all child nodes like
<para type="secret">
XPath Functions (cont.)
• para[not(title)]
– selects all child paragraphs with no title elements
• para[position() >= 2 and position() < last()]
– selects all but the first and last paragraphs
• para[lang("en")]
– matches <para xml:lang="en-uk">…</para>
• note[contains(., "alex")]
– . means "test childrens' content too, recursively" in
this context
• note[starts-with(., "hello")]
XPath Disadvantages
• Not XML
– Not hierarchical
– New syntax rules
– Weird mix of /,[],(),*,:,::,.,..,@
• New function set
– Not Java
• Concepts like "axis" not always clear
XSLT Syntax
XSL Rules
• XSL is a series of rules or templates
• Each template matches an element
• Templates can contain XML commands
XSL Commands: apply-templates
• Main rule: apply-templates
– looks for a template match
– applies it
• Usually the template calls applytemplates recursively on its children
• If not, then processing stops at that
node (but continues for its other
siblings that matched this template)
Default Rule
• For a leaf node, output its contents
• For a branch node, apply templates
(recursively) (including default rule)
Some XSL Commands
• value-of
– grabs raw value, good for text elements and
attributes
• if
– executes conditionally
• number
– counts position of element in group
– good for ordered list numbering, table of
contents, etc.
XSL Example
<?xml version="1.0"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY background "#99FFFF">
]>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/XSL/Transform/1.0"
xmlns="http://www.w3.org/TR/REC-html40"
result-ns="">
Example (cont.)
<xsl:template match="menu">
<HTML>
<HEAD>
<TITLE>Menu: <xsl:value-of select="@name"/>
</TITLE>
</HEAD>
<BODY BGCOLOR="&background;">
<H1> Menu <xsl:value-of select="@name"/> </H1>
[Note: Can reuse contents, unlike CSS]
Example (cont.)
<xsl:apply-templates />
</BODY>
</HTML>
</xsl:template>
Example (cont.)
<xsl:template match="meal">
<H2><xsl:value-of select="@name"/></H2><br />;
<UL>
<xsl:apply-templates/>
</UL>
</xsl:template>
Example (cont.)
<xsl:template match="food">
<LI><xsl:apply-templates/></LI>
</xsl:template>
<xsl:template match="drink">
<LI><xsl:apply-templates/></LI>
</xsl:template>
</xsl:stylesheet>
Outputting Attributes
• From This:
– <link>
<name>Stinky</name>
<url>http://www.stinky.com/</url>
</link>
• We Want This:
– <A href="http://www.stinky.com/">Stinky</A>
Outputting Attributes
• The Hard Way:
– <xsl:element name="A">
<xsl:attribute name="href">
<xsl:value-of select="url" />
</xsl:attribute>
<xsl:value-of select="name" />
</xsl:element>
• The Easy Way:
– <A href="{url}">
<xsl:value-of select="name"/>
</A>
Copying Subtrees
• <xsl:template match="*|@*|text()">
<xsl:copy>
<xsl:apply-templates select="*|@*|text()"/>
</xsl:copy>
</xsl:template>
• No, I don't understand it either 
• Default copy rule strips all
tags/attributes
• Also copy-of
XSL conditionals: if
• <xsl:if test="author">
by
<xsl:apply-templates select="author" />
</xsl:if>
• Note: no else (?!?)
XSL Conditonals: choose
• Case 1
– <link>
<name>Stinky</name>
<url>http://www.stinky.com/</url>
</link>
– <a href="http://www.stinky.com/">Stinky</a>
• Case 2
– <link>
<url>http://www.stinky.com/</url>
</link>
– <a href="http://www.stinky.com/">http://www.stinky.com/</a>
• Case 3
– <link>
<name>Stinky</name>
</link>
– Stinky
XSL Conditionals: choose
• <xsl:choose>
<xsl:when test="url">
<A href="{url}">
<xsl:choose>
<xsl:when test="name">
<xsl:value-of select="name" />
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="url" />
</xsl:otherwise>
</xsl:choose>
</A>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="name" />
</xsl:otherwise>
</xsl:choose>
XSL Looping: for-each
• <xsl:for-each select="chapter">
<h2><xsl:value-of select="@title"/>
</h2>
</xsl:for-each>
• Functional overlap with applytemplates
– Difference in programming style
– Use it inside a given template rule
Template Modes
• Same element name, different context ->
different template, different output
• Can invoke apply-templates with a mode,
matches corresponding moded template
• <h1>Table of Contents</h1>
<ol>
<xsl:apply-templates select="chapter" mode="toc"/>
</ol>
• <xsl:template select="chapter" mode="toc">
<li><xsl:value-of select="@title"/></li>
</xsl:template>
• <xsl:template select="chapter">
<h1><xsl:value-of select="@title"/></h1>
<xsl:apply-templates/>
</xsl:template>
XSL vs. CSS
• Similar problem, different solutions
• CSS takes HTML and applies fonts,
styles, positions
• XSL takes any XML and turns it into
anything else
• XSL more powerful than CSS
– e.g. can use same content in multiple
places in result document
XSL Disadvantages
• Confusing syntax and semantics
– Like Prolog+C+XML
– It's really a programming language, but using
markup language syntax – yuck!
• Hard to debug
– XSL Trace helps a little
• Don't have full power of, say, Java inside
templates
– No database access, hashtables, methods, objects,
etc.
• Still need separate .xsl file for each client
device
Other XSL-Based Products
•
•
•
•
•
•
•
•
•
LotusXSL
Resin by Caucho
Cocoon
IBM XSL Trace
Xalan (Apache)
XT
Cocoon
Resin
Lots more
Links: XML
• XML Spec
– http://www.w3.org/TR/REC-xml
• XML FAQ
– http://www.ucc.ie/xml/
• Café con Leche
– http://metalab.unc.edu/xml/
• XML.com
– http://www.xml.com/
• Servlet FAQ in XSL
– http://www.purpletech.com/servlet-faq/
References
• McLaughlin, "Java and XML", O'Reilly
• Eckstein, "XML Pocket Reference",
O'Reilly
• Harrold, "XML Bible"
• Bradley, "The XML Companion",
Addison-Wesley
Q&A