A Practical Introduction to XML in Libraries

A Practical Introduction to
XML in Libraries
Marty Kurth
NYLA
October 22, 2004
What we’ll cover
A functional overview of XML
One use of XML at Cornell
How our MARC to XML converter works
What a Dublin Core XML record looks
like to our users
Concluding thoughts about XML
opportunities
A functional overview of XML
(many thanks to David Ruddy
for the source content of this section)
XML = Extensible Markup Language
A markup language gives meaning to special
characters or character sequences, a.k.a. markup
delimiters
In XML, markup delimiters form rules for content
designation (hold that thought!)
In XML, markup delimiters have no inherent meaning
(allowing them to serve as a flexible, extensible
metalanguage)
XML uses plain text, is non-proprietary, and is
platform and software independent
HTML versus XML
HTML



Procedural markup
Rules govern display
(fonts, layout)
Doesn’t understand
content
XML



Structural markup
Rules establish
relationships among
content components
Doesn’t control
display
A brief detour into metadata:
Two ways to designate content
In MARC: 245 04 $a The Big heat
In XML:
<title>Big heat</title>
<name>value</name>
In XML the name-value pair
comprises an element
An element has these parts:



Start tag
Element content
End tag
<tag>content</tag>
<subject>Goldfinches</subject>
Element rules and features
Elements can hold data
<pubPlace>Boston</pubPlace>
Elements can hold other elements ad infinitum
<sourceDesc>
<biblFull>
<titleStmt>
<title>A letter to Orestes A. Brownson</title>
<author>Hildreth, Richard, 1807-1865.</author>
</titleStmt>
</biblFull>
</sourceDesc>
Elements must be “properly” nested
A quick look at other XML entities
Attributes qualify elements
<note type="500">Caption title.</note>
Document Type Definitions (DTDs) control the
structure of XML documents
<!ELEMENT note (#PCDATA)>
<!ATTLIST note type CDATA #IMPLIED>
XML Schemas give more control than DTDs
<xs:element ref="note" />
Extensible Stylesheet Language Transformation
(XSLT) stylesheets transform one XML document
into another (or into HTML)
What does XML allow us to do?
Structure data with a flexible and
extensible set of rules
Share data in a non-proprietary format,
especially among “incompatible” systems
Reuse data, e.g., in different presentation
formats for different purposes
One use of XML at Cornell
A local reason for moving
MARC data to XML
CUL decided to use ENCompass for
access to networked resources
ENCompass requires XML records
Our records for e-resources are in
MARC, so we needed to get them into
XML
Using MARCXML
MARCXML is lossless—it preserves the
richness of the MARC record in XML
LC offers a toolkit for converting MARC
to MARCXML at
http://www.loc.gov/standards/marcxml/
MARCXML can serve as a “bus”
between MARC and other XML formats
The MARCXML “bus”
Adapting MARCXML tools
We implemented LC’s converter to
convert MARC to Dublin Core in XML
We created a Web interface for systemwide access
We extended LC’s Dublin Core XSLT
stylesheet
How our MARC to XML
converter works
Start with a MARC record
Import it into the converter
<xsl:for-each select="marc:datafield[@tag=245]">
<xsl:variable name="title">
<xsl:value-of select="marc:subfield[@code='a']"/><xsl:text> </xsl:text>
<xsl:value-of select="marc:subfield[@code='b']"/><xsl:text> </xsl:text>
<xsl:value-of select="marc:subfield[@code='f']"/><xsl:text> </xsl:text>
<xsl:value-of select="marc:subfield[@code='g']"/><xsl:text> </xsl:text>
<xsl:value-of select="marc:subfield[@code='p']"/>
</xsl:variable>
<xsl:variable name="cleanTitle">
<xsl:call-template name="clean">
<xsl:with-param name="toClean" select="normalize-space($title)" />
</xsl:call-template>
</xsl:variable>
<xsl:choose>
<xsl:when test="@ind2 &gt; 0">
<title>
<xsl:call-template name="uppercase">
<xsl:with-param name="toUppercase" select="substring($cleanTitle, @ind2 + 1)" />
</xsl:call-template>
</title>
</xsl:when>
<xsl:otherwise>
<title><xsl:value-of select="$cleanTitle" /></title>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
The converter applies our
DC stylesheet
And outputs a Dublin Core XML record
<QDCrecord>
<title>Harmonized tariff schedule of the United States </title>
<alternative>HTS</alternative>
<creator>United States.</creator>
<contributor>United States International Trade Commission. Office of Tariff
Affairs and Trade Agreements.</contributor>
<type>Full text</type>
<publisher>The Commission :</publisher>
<date>[1987-</date>
<description>HTSA provides the applicable tariff rates and statistical categorie
for all merchandise imported into the United States; it is based on the international
Harmonized System, the global classification system that is used to describe most
world trade in goods.</description>
<subject type="LCSH">Tariff--Law and legislation--United States-Periodicals.</subject>
<subject>Education</subject>
</QDCrecord>
What a DC XML record looks
like to our users
The DC XML record is in our
Find Databases system
Users can view the DC record
in a labeled display
The DC XML is behind
the labeled display
Concluding thoughts about
XML opportunities
When XML knocks on your door:



You can pick up XML encoding quickly
With a little up-front IT time and XSLT
skills, you can convert MARC to XML
With XSLT skills, you can modify user
displays in XML-based delivery systems