XML and XSL Overview by Alex Chaffee [email protected], http://www.purpletech.com/ Purple Technology: Open source development jGuru: Java online resource FAQs and News and other cool stuff XML • eXtensible Markup Language • Replacement for HTML • Metalanguage - used to create other languages • Has become a universal dataexchange format Advantages of XML • • • • • Human-readable Machine-readable (easy to parse) Standard format for data interchange Possible to validate Extensible – can represent any data – can add new tags for new data formats • Hierarchical structure (nesting) Why not HTML? • Browsers are too lenient • Led to sloppy HTML code all over the Web – <imG src="foo.gif> is "legal" HTML • Told HTML, "go to your room and don't come out until it's clean" – Out came XML XML Searching and Agents • An early motivation for XML • Allows detailed queries of disparate data sources – Find best price for certain product – Search for properties with different real estate brokers • HTML insufficient – Good for humans, bad for computers – Doesn't scale XML Example <?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> </menu> XML Languages • MML - musical scores • CML - chemicals • HRMML - Human Resource Management (???) • MathML - equations • RSS - web syndication Tag vs. Element • A tag is a name, enclosed by angle brackets, with optional attributes – <foo id=“123”> • An element is a tree, containing an open tag, contents, and a close tag – <foo id=“123”>This is <bar>an element</bar></foo> XML Syntax • Tags properly nested • Tag names case-sensitive • All tags must be closed – or self-closing – <foo/> is the same as <foo></foo> • Attributes enclosed in quotes • Document consists of a single (root) element • A few other details Well-Formed vs. Valid • Well-Formed: – Structure follows XML syntax rules • Valid: – Structure conforms to a DTD DTD • Document Type Definition • A grammar for XML documents • Defines – which elements can contain which other elements – which attributes are allowed/required/permitted on which elements DTD and Data Exchange • Both sides must agree on DTD ahead of time • DTD can be part of document or stored separately DTD Example <?xml encoding="US-ASCII"> <!ELEMENT menu (meal)*> <!ATTLIST menu name CDATA #OPTIONAL> <!ELEMENT meal (food|drink)*> <!ATTLIST meal name CDATA #REQUIRED> <!ELEMENT food (#PCDATA)*> <!ELEMENT drink (#PCDATA)*> Why isn't a DTD in XML? • It will be someday: XSchema XML Namespaces • A single document can use multiple DTDs • But! Two DTDs can use the same element name with different rules • Solution: Namespaces • Must prefix tag name with namespace name – e.g. <xsl:apply-templates select="."/> Entities • Macros / constants • Values defined once, used in document <!DOCTYPE foo SYSTEM "foo.dtd" [ <!ENTITY background "#99FFFF"> ]> <BODY BGCOLOR="&background;"> SML / Minimal XML • • • • Simplified Markup Language Subset of XML, but stripped down Easier to understand, parse No – – – – DTDs Attributes Processing instructions etc. XSL: XML Transformation XSL • The eXtensible Style Language • Transforms XML into HTML • Actually, transforms XML into a tree, then turns that tree into another tree, then outputs that tree as XML XSL Architecture XSL Stylesheet XML Source XSL Processor HTML Output XML is a Tree <?xml version="1.0"?> <!DOCTYPE menu SYSTEM "menu.dtd"> <menu> <meal name="breakfast"> <food>Scrambled Eggs</food> <food>Hash Browns</food> <drink>Orange Juice</drink> </meal> name <meal name="snack"> <food>Chips</food> </meal> "breakfast" </menu> menu meal meal food food drink "Scrambled Eggs" "Hash Browns" "Orange Juice" XML Is A Tree • Nodes – Branch nodes contain children – Leaf nodes contain content • Attributes, Values, Entities, etc. • DOM provides API-based access to tree models • XSL turns one tree into a different tree Command Line Invocation • Apache Xalan java org.apache.xalan.xslt.Process -IN faq.xml –XSL faq.xsl –OUT faq.html • IBM LotusXSL java com.lotus.xsl.xml4j.ProcessXSL -in servletfaq.xml -xsl faq.xsl -out faq.html • And so on… Formatting Objects • Forget about it for now XSLT • • • • • The meat of XSL Syntax for making XSL template files Pattern matching Output formatting Rules-based (like Prolog) XPath • The stuff inside the quotes in XSL patterns – "/person/name/firstname" • A sensible way to locate content in an XML document • More straightforward than walking a DOM tree or waiting for a SAX callback XPath Syntax • book/title – title child of book child of current node • /book/title – title child of book child of document root • @language – language attribute of current node • chapter/@language – language attribute of chapter child of current node XPath Syntax (cont.) • chapter[3]/para – all the para children of the third chapter • book/*/title – all title children of all children of book (but not of their children) • chapter//para – all para children of any child of chapter, recursively • ../../title – title child of parent of parent – parent::node()/parent::node()/child::title XPath Abbreviations . self::node() .. parent::node() // descendant-orself::node() @ attribute:: XPath Functions • para[1] or para[position()=1] – the first para node of the current node • para[last()] • para[count(child::note)>0] – all paragraphs with one or more notes • para[id("abstract")] – selects all child nodes like <para id="abstract"> • para[@type='secret'] or para[attribute::type='secret'] – selects all child nodes like <para type="secret"> XPath Functions (cont.) • para[not(title)] – selects all child paragraphs with no title elements • para[position() >= 2 and position() < last()] – selects all but the first and last paragraphs • para[lang("en")] – matches <para xml:lang="en-uk">…</para> • note[contains(., "alex")] – . means "test childrens' content too, recursively" in this context • note[starts-with(., "hello")] XPath Disadvantages • Not XML – Not hierarchical – New syntax rules – Weird mix of /,[],(),*,:,::,.,..,@ • New function set – Not Java • Concepts like "axis" not always clear XSLT Syntax XSL Rules • XSL is a series of rules or templates • Each template matches an element • Templates can contain XML commands XSL Commands: apply-templates • Main rule: apply-templates – looks for a template match – applies it • Usually the template calls applytemplates recursively on its children • If not, then processing stops at that node (but continues for its other siblings that matched this template) Default Rule • For a leaf node, output its contents • For a branch node, apply templates (recursively) (including default rule) Some XSL Commands • value-of – grabs raw value, good for text elements and attributes • if – executes conditionally • number – counts position of element in group – good for ordered list numbering, table of contents, etc. XSL Example <?xml version="1.0"?> <!DOCTYPE xsl:stylesheet [ <!ENTITY background "#99FFFF"> ]> <xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0" xmlns="http://www.w3.org/TR/REC-html40" result-ns=""> Example (cont.) <xsl:template match="menu"> <HTML> <HEAD> <TITLE>Menu: <xsl:value-of select="@name"/> </TITLE> </HEAD> <BODY BGCOLOR="&background;"> <H1> Menu <xsl:value-of select="@name"/> </H1> [Note: Can reuse contents, unlike CSS] Example (cont.) <xsl:apply-templates /> </BODY> </HTML> </xsl:template> Example (cont.) <xsl:template match="meal"> <H2><xsl:value-of select="@name"/></H2><br />; <UL> <xsl:apply-templates/> </UL> </xsl:template> Example (cont.) <xsl:template match="food"> <LI><xsl:apply-templates/></LI> </xsl:template> <xsl:template match="drink"> <LI><xsl:apply-templates/></LI> </xsl:template> </xsl:stylesheet> Outputting Attributes • From This: – <link> <name>Stinky</name> <url>http://www.stinky.com/</url> </link> • We Want This: – <A href="http://www.stinky.com/">Stinky</A> Outputting Attributes • The Hard Way: – <xsl:element name="A"> <xsl:attribute name="href"> <xsl:value-of select="url" /> </xsl:attribute> <xsl:value-of select="name" /> </xsl:element> • The Easy Way: – <A href="{url}"> <xsl:value-of select="name"/> </A> Copying Subtrees • <xsl:template match="*|@*|text()"> <xsl:copy> <xsl:apply-templates select="*|@*|text()"/> </xsl:copy> </xsl:template> • No, I don't understand it either • Default copy rule strips all tags/attributes • Also copy-of XSL conditionals: if • <xsl:if test="author"> by <xsl:apply-templates select="author" /> </xsl:if> • Note: no else (?!?) XSL Conditonals: choose • Case 1 – <link> <name>Stinky</name> <url>http://www.stinky.com/</url> </link> – <a href="http://www.stinky.com/">Stinky</a> • Case 2 – <link> <url>http://www.stinky.com/</url> </link> – <a href="http://www.stinky.com/">http://www.stinky.com/</a> • Case 3 – <link> <name>Stinky</name> </link> – Stinky XSL Conditionals: choose • <xsl:choose> <xsl:when test="url"> <A href="{url}"> <xsl:choose> <xsl:when test="name"> <xsl:value-of select="name" /> </xsl:when> <xsl:otherwise> <xsl:value-of select="url" /> </xsl:otherwise> </xsl:choose> </A> </xsl:when> <xsl:otherwise> <xsl:value-of select="name" /> </xsl:otherwise> </xsl:choose> XSL Looping: for-each • <xsl:for-each select="chapter"> <h2><xsl:value-of select="@title"/> </h2> </xsl:for-each> • Functional overlap with applytemplates – Difference in programming style – Use it inside a given template rule Template Modes • Same element name, different context -> different template, different output • Can invoke apply-templates with a mode, matches corresponding moded template • <h1>Table of Contents</h1> <ol> <xsl:apply-templates select="chapter" mode="toc"/> </ol> • <xsl:template select="chapter" mode="toc"> <li><xsl:value-of select="@title"/></li> </xsl:template> • <xsl:template select="chapter"> <h1><xsl:value-of select="@title"/></h1> <xsl:apply-templates/> </xsl:template> XSL vs. CSS • Similar problem, different solutions • CSS takes HTML and applies fonts, styles, positions • XSL takes any XML and turns it into anything else • XSL more powerful than CSS – e.g. can use same content in multiple places in result document XSL Disadvantages • Confusing syntax and semantics – Like Prolog+C+XML – It's really a programming language, but using markup language syntax – yuck! • Hard to debug – XSL Trace helps a little • Don't have full power of, say, Java inside templates – No database access, hashtables, methods, objects, etc. • Still need separate .xsl file for each client device Other XSL-Based Products • • • • • • • • • LotusXSL Resin by Caucho Cocoon IBM XSL Trace Xalan (Apache) XT Cocoon Resin Lots more Links: XML • XML Spec – http://www.w3.org/TR/REC-xml • XML FAQ – http://www.ucc.ie/xml/ • Café con Leche – http://metalab.unc.edu/xml/ • XML.com – http://www.xml.com/ • Servlet FAQ in XSL – http://www.purpletech.com/servlet-faq/ References • McLaughlin, "Java and XML", O'Reilly • Eckstein, "XML Pocket Reference", O'Reilly • Harrold, "XML Bible" • Bradley, "The XML Companion", Addison-Wesley Q&A
© Copyright 2026 Paperzz