XML Transformation

Transforming XML Part I
Document Navigation with
XPath
John Arnett, MSc
Standards Modeller
Information and Statistics Division
NHSScotland
Tel: 0131 551 8073 (x2073)
mailto:[email protected]
http://isdscotland.org/xml
Contents
•
•
•
•
•
Document Processing
Nodes and Trees
XPath: Locating Nodes
Summary
Find Out More
Document Processing
• The XSL Family
– XPath (XML Path Language)
– XSLT (Extensible Stylesheet
Language Transformations)
– XSL aka XSL-FO (Extensible
Stylesheet Language Formatting
Objects)
Document Processing
• XPath
– Used to locate specific parts of an
XML source document efficiently
and effectively
– XPath v1.0 W3C Recommendation
Document Processing
• XSLT
– Used to transform XML documents
to another XML or non-XML form –
esp. HTML
– XSLT v1.0 W3C Recommendation
• v2.0 W3C Working Draft
– Uses XPath for locating document
content
Document Processing
• XSL(-FO)
– Used to format XML documents into
fixed sized folios for publication
– XSL v1.0 W3C Recommendation
– Uses XSLT for document
transformation
Document Processing
• Transformation and formatting
– XPath used to locate nodes for
input
– XSLT used to transform input and
generate result tree
– XSL-FO used to format output
document
Document Processing
Source
Tree
Stylesheet Processor
Transform
Result
Tree
Format
Style
sheet
Adapted from XSLT Basics slide presentation by Paul Spencer, alphaXML
Formatted
Output
Trees and Nodes
• XML document is viewed as a source
tree containing different node types
– root
– element
– text
– attribute
– namespace
– processing
instruction
– comment
Trees and Nodes
• XML Appointment example
<!-- radiology appointment -->
<Appointment deptCode=”RADIO”>
<Patient upi=”ABC-123-456”>John Smith
<PhoneNo>0141 662 2673</PhoneNo>
</Patient>
<Clinician>Alison Young</Clinician>
<Slot attendDate=”2003-08-05”>
<StartTime>14:30:00</StartTime>
</Slot>
</Appointment>
Trees and Nodes
• XPath view of Appointment source tree
root
comment
radiology appointment
attribute
deptCode
RADIO
element
Patient
attribute
upi
ABC-123-456
text
John Smith
element
Appointment
element
Clinician
element
PhoneNo
text
0141 662 2673
text
Alison Young
element
Slot
attribute
attendDate
2003-08-05
element
StartTime
text
14:30:00
XPath: Locating Nodes
• XPath expressions (location paths)
– Used to navigate source tree and
locate nodes for input
– Comprised of one or more location
steps
• axis + node test + (optional)
predicate
– May contain functions, e.g.
•position(), count(node-set),
last()
XPath: Locating Nodes
• Axes
– Specify node locations relative to
the current (context) node = self
– May traverse tree forwards or
backwards
XPath: Locating Nodes
• Axes for forward traversal
– child
– attribute
– descendant-or-self
– descendant
– following
– following-sibling
– namespace
XPath: Locating Nodes
• Axes for reverse traversal
– parent
– ancestor
– ancestor-or-self
– preceding
– preceding-sibling
XPath: Locating Nodes
• Node Tests
– Refine node set selection
•* = select all nodes of same type
•node() = select all nodes of any
type
• Select all nodes of type text(),
comment() or processinginstruction()
• Select all nodes with specified
name
XPath: Locating Nodes
• Select Clinician element node
child::Appointment/child::Clinician or
Appointment/Clinician
root
comment
radiology appointment
attribute
deptCode
RADIO
element
Patient
attribute
upi
ABC-123-456
text
John Smith
element
Appointment
element
Clinician
element
PhoneNo
text
Alison Young
element
Slot
attribute
attendDate
2003-08-05
element
StartTime
XPath: Locating Nodes
• Select upi attribute node
child::Patient/attribute::upi or
Patient/@upi
root
comment
radiology appointment
attribute
deptCode
RADIO
element
Patient
attribute
upi
ABC-123-456
text
John Smith
element
Appointment
element
Clinician
element
PhoneNo
text
Alison Young
element
Slot
attribute
attendDate
2003-08-05
element
StartTime
XPath: Locating Nodes
• Select all descendant text nodes
/descendant-or-self::node()/child::text() or
//text()
root
comment
radiology appointment
attribute
deptCode
RADIO
element
Patient
attribute
upi
ABC-123-456
text
John Smith
element
Appointment
element
Clinician
element
PhoneNo
text
0141 662 2673
text
Alison Young
element
Slot
attribute
attendDate
2003-08-05
element
StartTime
text
14:30:00
XPath: Locating Nodes
• Select parent of StartTime element
parent::node() or ..
root
comment
radiology appointment
attribute
deptCode
RADIO
element
Patient
attribute
upi
ABC-123-456
text
John Smith
element
Appointment
element
Clinician
element
PhoneNo
text
Alison Young
element
Slot
attribute
attendDate
2003-08-05
element
StartTime
XPath: Locating Nodes
• Using wildcards
child::* or *
root
comment
radiology appointment
attribute
deptCode
RADIO
element
Patient
attribute
upi
ABC-123-456
text
John Smith
element
Appointment
element
Clinician
element
PhoneNo
text
Alison Young
element
Slot
attribute
attendDate
2003-08-05
element
StartTime
XPath: Locating Nodes
• Abbreviating location steps
– child:: and text() can usually
be omitted
child::Appointment/child::Clinician/child
::text() = Appointment/Clinician
– attribute:: = @
Patient/@upi
– self()::node = .
– parent::node() = ..
XPath: Locating Nodes
• Abbreviating Location Paths
/descendant-or-self::node()/ = //
/descendant-or-self::node()/child::text()
= //text()
In Summary…
• XML document processing:
– XPath - node location
– XSLT - transformation
– XSL-(FO) - document formatting
• XPath used to navigate source tree
• Transformation performed by applying
style sheet to source document
Find Out More
• The Extensible Stylesheet Language
Family (XSL)
– www.w3.org/Style/XSL/
• W3C XML Path Language v1.0
Specification
– www.w3.org/TR/xpath
• TopXML XSLT & XPath Tutorial
– www.vbxml.com/xsl/tutorials/intro/d
efault.asp