CS422 Lecture - To MSOE home page

Representing data with XML
SE-2030
Dr. Mark L. Hornick
1
XML: EXtensible Markup
Language
XML is a technology for defining markup
languages to represent data

designed to define structured data descriptions
that are



Extensible/customizable
Portable across languages,
Portable across operating system platforms
SE-2030
Dr. Mark L. Hornick
2
XML allows you to define your
own markup language

You can define your own tags


XML can use an optional DTD (Document Type
Definition) or XSD (XML Schema Defn) to formally
describe the data




You can create your own tag vocabulary
How tags can nest
What attributes a tag can/must have
i.e. tag grammar
Data that is described by a specified vocabulary &
grammar is called an XML Application
SE-2030
Dr. Mark L. Hornick
3
HTML is an XML application

HTML is a description of data in web page
documents, and how it is structured
<!doctype html>
<html>
<head>
<meta charset=“UTF-8”>
<title>My web page</title>
</head>
<body>
<h1>HTML syntax summary</h1>
<h2>or, all you need to know about HTML</h2>
<p>This is how you write
an HTML document.</p>
<p>The end.</p>
</body>
</html>
SE-2030
Dr. Mark L. Hornick
4
XML schemas define how HTML
can be structured
(for instance, a <title> can appear within a
<head>, but not within a <body>)
html
body
head
h1
p
p
p
title
strong
em
em
em
strong
SE-2030
Dr. Mark L. Hornick
5
What else is XML good for?
XML can be used as a format for storing
data in files in a structured manner

Separating content from presentation, so
that data created by an application written in
Java can be read by an application written in
C (or Javascript, or any other language)
SE-2030
Dr. Mark L. Hornick
6
Scenario
Consider a Java collection of Students:
List<Student> students;
where
public class Student {
String firstname;
String lastname;
int id;
String program;
}
SE-2030
Dr. Mark L. Hornick
7
In Java, we can use serialization to
write/store students to a file, to be read
(later) by another Java application

Provided the other Java application knows the
definition of Student and List

It would be much more difficult for a program written in
C or JavaScript to read the file
Further complications arise when the file is read by an
application running on another HW or OS platform

SE-2030
Dr. Mark L. Hornick
8
XML allows us to create a document
that can be used to represent a
portable collection of Students:
<?xml version="1.0" encoding="ISO-8859-1"?>
<student_list>
<student>
<lastname>Bored</lastname>
<firstname>Bill</firstname>
<id>1111</id>
<program>SE</program>
</student>
XML grammars (like the one here)
<student>
represent all data in plain text, which is
<firstname>Bob</firstname>
most easily interpreted across platforms
<lastname>Sledd</lastname>
<id>1112</id>
<program>CE</program>
</student>
</student_list>
SE-2030
Dr. Mark L. Hornick
9
The XML grammar itself may be defined by
an optional XML Schema Definition (or a
Document Type Defintion/DTD)
<xs:schema targetNamespace="urn:StudentFileSchema" xmlns="urn:StudentFileSchema"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xmlmsdata">
<xs:element name="student_list">
<xs:complexType >
<xs:element name="student" minOccurs="1">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string" minOccurs="1" maxOccurs="1">
<xs:element name="lastname" type="xs:string" minOccurs="1" maxOccurs="1"/>
<xs:element name="id" type="xs:integer" minOccurs="1" maxOccurs="1">
<xs:element name="program" type="xs:string" minOccurs="1" maxOccurs="1"/>
<xs:sequence>
</xs:complexType>
XML Schema Definitions (like the one here)
</xs:element>
define the valid format of the XML data.
</xs:complexType>
For instance the <xs:sequence> tag specifies
</xs:element>
</xs:schema>
that the firstname…program tags MUST appear
in only that sequence
SE-2030
Dr. Mark L. Hornick
10
The XML Schema Definition for the
alternate grammar:
<xs:schema targetNamespace="urn:StudentFileSchema" xmlns="urn:StudentFileSchema"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemasmicrosoft-com:xml-msdata">
<xs:element name="student_list">
<xs:complexType >
<xs:element name="student" minOccurs="1">
<xs:complexType>
<xs:sequence>
<xs:attribute name="firstname" type="xs:string" />
<xs:attribute name="lastname" type="xs:string" />
<xs:attribute name="id" type="xs:integer" />
<xs:attribute name="program" type="xs:string“ />
<xs:sequence>
</xs:complexType>
</xs:element>
</xs:complexType>
</xs:element>
SE-2030
</xs:schema>
Dr. Mark L. Hornick
11
How do you read the data from
an XML document?


Most languages implement XML Parsers that
can interpret an XML file and extract the data
XML Parsers can be “told” to use the optional
XML Schema to ensure that the XML file
being parsed is in a valid format

Validation is optional; you can create XML files
without creating an XML Schema, but then you
have no way of constraining the syntax
SE-2030
Dr. Mark L. Hornick
12
How does a SAX Parser work?




The parser reads an xml file line by line
As each line is read, it performs a syntactical analysis, based on rules of xml
The parser generates various events depending on what type of element it
encounters during analysis
Event handling methods are called, where element information is passed via
parameters of the method
startDocument()
<?xml version="1.0" encoding="ISO-8859-1"?>
<student_list>
<student firstname=“Bill” lastname=“Bored”
id=“1111” program=“SE”
</student>
<student firstname=“Bob” lastname=“Sledd”
id=“1114” program=“CE”
</student>
</student_list>
Note: there are several other
SE-2030
events and event handling methodsDr. Mark L. Hornick
startElement(name, attrs)
(name=“student_list”, attrs=none)
startElement(name, attrs)
(name=“student”, attrs=…)
endElement(name)
(name=“student”)
startElement/endElement is called
again for the 2nd student
endElement(name)
(name=“student_list”)
endDocument()
13
XML SAX Parser
Demonstration
SE-2030
Dr. Mark L. Hornick
14