O`Reilly Network: Writing R

O'Reilly Network: Writing RSS 1.0
1 of 11
http://www.oreillynet.com/lpt/a/network/2000/08/25/magazine/rss_tut.html
Published on O'Reilly Network (http://www.oreillynet.com/)
http://www.oreillynet.com/pub/a/network/2000/08/25/magazine/rss_tut.html
See this if you're having trouble printing code examples
by Rael Dornfest
08/25/2000
A step-by-step guide to building an RSS 1.0 document by hand. (Updated for RSS 1.0 RC1)
(This article assumes a certain familiarity with the basics of XML markup (the "pointies") and perhaps even a
little fiddling with RSS itself. The introductory material is brief, focusing on the distinguishing characteristics
of the recently proposed RSS 1.0.)
Introductions
RSS ("RDF Site Summary") is a lightweight multipurpose extensible metadata description and syndication
format. Whew, that was a mouthful! Let's take that bit by bit, shall we.
Lightweight
Much of the reason RSS has been successful stems from the fact that it is simply an XML document.
You can write an RSS document by hand. With minimal effort, you can have your content-management
system write it for you. Or, if you're a programmer at heart, you can utilize one of the abundant XML
libraries available for your programming language of choice.
Multipurpose
While originally conceived as a portal language, RSS has been repurposed again and again for
aggregation, discussion threads, home and job listings, sports scores, and more. It's not just for breakfast
-- or headline syndication -- anymore.
Extensible
The recently proposed RSS 1.0 supports extension via XML namespaces. I won't go into detail here on
namespaces themselves, but we'll revisit the topic throughout the tutorial.
Metadata
Metadata is data about data, answering questions like "Who wrote this?", "When was this published?",
and "What is/are the topic(s) of discussion?" While the proposed RSS version 1.0 sports a rich metadata
framework through RDF ("Resource Description Framework"), we'll only touch those bits of RDF that
are mechanically necessary to include and not wander off beyond the scope of the task at hand.
Syndication
Now here's the fun bit. ... RSS is a snapshot-in-a-document of what you consider most
interesting/important about your site at the moment. That could be your latest couple of Weblogs,
up-to-the-minute sports commentary ... anything. And you make this available for the world to grab,
pass on, aggregate, or publish online -- with links right back to your site for each item.
That's about all I'll say about the overall picture of RSS. I do realize that this was a rather brief overview, but
since our intention is to actually create an RSS document, I'll leave further introduction to the many
wonderful RSS articles already in existence; visit the Resources section below for a list.
4/17/2009 12:29 PM
O'Reilly Network: Writing RSS 1.0
2 of 11
http://www.oreillynet.com/lpt/a/network/2000/08/25/magazine/rss_tut.html
RSS Document Structure
A basic RSS document (or "channel") is structurally rather simple:
XML Declaration
Container
Channel Description
Image Description (optional)
Item Description
Item Description
...
Item Description
Text Input Description (optional)
Let's start at the top and work our way down, shall we. And since the proposed
RSS 1.0 builds on the foundation of RSS 0.9, we'll start by building a 0.9
document and then cover the few basic mechanical changes necessary to bring
it into compliance; if you're already familiar with RSS 0.9, feel free to breeze
through the first part of the tutorial.
Since we want to focus on the markup, let's keep our example as simple as pie.
Mmmm ... pie. Our online pie shoppe, pie-r-squared.com, features a
continuously changing lineup of delicious pies for download (alright, online
ordering). We'll create an RSS feed to syndicate the choices du jour.
XML Declaration: <?xml version="1.0"?>
Related Articles:
RSS Delivers the XML
Promise
RSS Moves Forward
Developers Explain:
Why RSS 1.0?
RDF and Metadata
(XML.com)
While XML documents are not required to begin with an XML declaration, it is
generally good practice to do so. The declaration says "This is an XML
document" and specifies the version thereof -- the current version of XML itself
is 1.0.
Now the XML declaration does also afford you the opportunity to specify your preferred encoding type -- the
way you'll be dealing with special characters. Unless specified otherwise, RSS 1.0 assumes UTF-8; let's go
ahead and add it for pedantic/illustrative purposes. So the first line of our document (make sure it's the first
line!) looks a lot like this:
<?xml version="1.0" encoding="utf-8"?>
(By the way, I'll be calling out changes in our evolving document as we go along by highlighting new bits in
orange.)
The Container: <rdf:RDF>
Every XML structure can have one and only one outer container -- the "root element." RSS 1.0's root element
is borrowed from the earlier 0.9 version. The root element also affords us the opportunity to declare the
namespaces we'll be using in our document.
Let's take a pit stop and see what we mean by namespaces.
In my sphere, there exist two Tims, two Jons, and a number of Daves (or variations thereof). To avoid
confusion (never mind embarrassment), I have to be sure to clarify which Tim or Jon or Dave I'm referring to.
Thank goodness they all have different last names, making Tim O'Reilly distinct from Tim Berners-Lee.
4/17/2009 12:29 PM
O'Reilly Network: Writing RSS 1.0
http://www.oreillynet.com/lpt/a/network/2000/08/25/magazine/rss_tut.html
Now, since XML elements and attributes don't have last names, it can be difficult to differentiate between
<title> as in the title of a Web page and <title> as in the title of a book. The distinction, using XML
namespaces, may be expressed so:
html:title
book:title
Now these namespace prefixes (the bit before the colon) are not particularly useful if you don't have a decent
definition for what html and book are. They are, therefore, associated with a URL.
xmlns:html="http://www.w3.org/TR/REC-html40"
xmlns:book="http://www.oreilly.com/book"
This scheme effectively identifies the former as "title as defined by the HTML 4.0 specification" and the
latter as "title as in O'Reilly book." URLs are used because they're a convenient way for everyone to invent
unique names under their own control. The URLs don't have to point to anything useful, but it's nice if they
do (documentation, for instance).
Now, since I work for O'Reilly & Associates, a book company, it's fair to assume that when I say the word
"title" in the office I'm referring to a book title. I would always qualify when talking about an HTML
document title by saying, well, "Web page title" or the like. So my "default namespace," then, in the book
world, is declared in XML like so:
xmlns="http://www.oreilly.com/book"
xmlns:html="http://www.w3.org/TR/REC-html40"
You'll notice a lack of prefix associated with the http://www.oreilly.com/book realm, allowing me to
refer to the two types of titles as:
title
html:title
Mind you, the namespace doesn't refer to the "title" itself, but to the vocabulary which defines it. This rather
simplistic example should hopefully provide enough on namespaces to get you going; for more information,
be sure to visit the Namespaces in XML W3C recommendation and Tim Bray's "XML Namespaces by
Example."
We'll add the default namespace for RSS 0.9 and one for RDF itself to our outer rdf:RDF element and drop
the opening and closing tags into our document:
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://my.netscape.com/rdf/simple/0.9/"
>
</rdf:RDF>
Channel
Welcome to the channel element, a place to describe a few aspects of our RSS channel. We're required to fill
in a title, link, and description. How about:
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3 of 11
4/17/2009 12:29 PM
O'Reilly Network: Writing RSS 1.0
http://www.oreillynet.com/lpt/a/network/2000/08/25/magazine/rss_tut.html
xmlns="http://my.netscape.com/rdf/simple/0.9/"
>
<channel>
<title>Pie-R-Squared</title>
<description>
Download a delicious pie from Pie-R-Squared!
</description>
<link>http://www.pie-r-squared.com</link>
</channel>
</rdf:RDF>
The channel is titled "Pie-R-Squared" and suggests to the end-user that they render the title as a link to our
(imaginary) home page.
Image
We can optionally associate a little image (usually 88x33 pixels) with our channel to be used in a
My.Netscape-style newsbox rendering. A title element provides text for the image's alt attribute, a link
element specifies where the image should hyperlink to, and the url element is the location of the image file
itself. We'll use values similar to the channel definition above and add the URL of our imaginary logo.
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://my.netscape.com/rdf/simple/0.9/"
>
<channel>
<title>Pie-R-Squared</title>
<description>
Download a delicious pie from Pie-R-Squared!
</description>
<link>http://www.pie-r-squared.com</link>
</channel>
<image>
<title>Pie-R-Squared du Jour</title>
<url>http://www.pie-r-squared.com/images/logo88x33.gif</url>
<link>http://www.pie-r-squared.com</link>
</image>
</rdf:RDF>
Item(s)
We finally get to the meat (or tofu) of our RSS channel -- the items meant for syndication. There's not much
room here for detail -- just a simple title and link. While we're allowed up to 15 items (1 at a minimum), we'll
just add a couple.
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://my.netscape.com/rdf/simple/0.9/"
>
<channel>
<title>Pie-R-Squared</title>
4 of 11
4/17/2009 12:29 PM
O'Reilly Network: Writing RSS 1.0
http://www.oreillynet.com/lpt/a/network/2000/08/25/magazine/rss_tut.html
<description>
Download a delicious pie from Pie-R-Squared!
</description>
<link>http://www.pie-r-squared.com</link>
</channel>
<image>
<title>Pie-R-Squared du Jour</title>
<url>http://www.pie-r-squared.com/images/logo88x33.gif</url>
<link>http://www.pie-r-squared.com</link>
</image>
<item>
<title>Pecan Plenty</title>
<link>http://www.pie-r-squared.com/pies/pecan.html</link>
</item>
<item>
<title>Key Lime</title>
<link>http://www.pie-r-squared.com/pies/key_lime.html</link>
</item>
</rdf:RDF>
Textinput
Finally we arrive at the textinput element, affording a method for submitting form data to an arbitrary URL
-- a script handling the GET method. While I'm not a big fan of the textinput element, we'll throw it in as a
searchbox for laughs. Title and description are self-explanatory; link is the URL of the receiving script, and
name is the variable to which anything entered into the box is assigned.
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://my.netscape.com/rdf/simple/0.9/"
>
<channel>
<title>Pie-R-Squared</title>
<description>
Download a delicious pie from Pie-R-Squared!
</description>
<link>http://www.pie-r-squared.com</link>
</channel>
<image>
<title>Pie-R-Squared du Jour</title>
<url>http://www.pie-r-squared.com/images/logo88x33.gif</url>
<link>http://www.pie-r-squared.com</link>
</image>
<item>
<title>Pecan Plenty</title>
<link>http://www.pie-r-squared.com/pies/pecan.html</link>
</item>
<item>
<title>Key Lime</title>
<link>http://www.pie-r-squared.com/pies/key_lime.html</link>
</item>
<textinput>
<title>Search Pie-R-Squared</title>
5 of 11
4/17/2009 12:29 PM
O'Reilly Network: Writing RSS 1.0
6 of 11
http://www.oreillynet.com/lpt/a/network/2000/08/25/magazine/rss_tut.html
<description>Search our pie catalog...</description>
<name>keyword</name>
<link>http://www.pie-r-squared.com/search.pl</link>
</textinput>
</rdf:RDF>
So entering "chocolate" into the above-specified searchbox would result in an HTTP GET of
http://www.pie-r-squared.com/search.pl?keyword=chocolate
Onward and upward to 1.0
While getting from our RSS 0.9 compliant document to RSS 1.0 takes only three simple mechanical changes,
it opens up a whole new dimension of RSS extensibility and rich metadata relationships which we'll get to in a
bit. Let's do the the easy mechanical pieces first. ...
As I mentioned at the beginning of this tutorial, the proposed RSS 1.0 builds on the foundation of RSS 0.9. It's
just a 0.9 core with a little "syntactic sugar" mixed in for extensibility's sake.
"Syntactic sugar?" While RSS 0.9 had fledgling hooks for extensibility in its rdf:RDF root element and
rudimentary support for namespaces, a spot more syntax is in order for RDF-enabled software to grok (read:
understand) the structure of an RSS document. Parsers, not being as smart as you or I, don't know the
significance of an item's <link> element, for example, and must be explicitly told: "The <link>'s the URL of
the item we're talking about."
"I don't know a thing about RDF." No worries! The RDF markup is a simple mechanical transformation and
doesn't require any particular understanding of RDF principles or serialization. That's not to say I don't
encourage you to look into RDF, just that it's not necessary. Tim Bray provides a wonderful layperson's guide
to RDF and Metadata, and it's a quick read if you have a sec.
"Why should I care about RDF support?" RDF will allow the computers that currently throw information at
our feet at an alarming rate to instead work with us to make sense of it all and point us at the bits and pieces
we are seeking. Some folks are working very hard to make this happen; with just a few simple additions, you
can aid in this effort with no skin off your nose.
New default namespace
RSS 1.0 has its very own namespace, distinct from that of 0.9. We'll change the root rdf:RDF element's
default namespace declaration to reflect this difference. (I've left off the rest of the document for brevity.)
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
>
...
</rdf:RDF>
What's it all rdf:about?
Each resource (channel, image, item(s), textinput) we describe must have an associated URI to specify
canonically what it is we're describing. This is accomplished in RDF by giving it an about attribute -- as in
4/17/2009 12:29 PM
O'Reilly Network: Writing RSS 1.0
7 of 11
http://www.oreillynet.com/lpt/a/network/2000/08/25/magazine/rss_tut.html
"we're talking about this URI."
Since the <channel> element is talking about the RSS feed itself, we'll use the RSS document's URL.
The <image> element describes a particular URL-retrievable image, so we'll use the image's <url>.
Each <item> is a short description of something URL-retrievable (story, discussion thread, job listing,
whatever), so we'll use the value of the item's <link> element.
In the case of <textinput>, we're talking about the <link> URL to which the GET is sent.
This leaves us with:
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
>
<channel rdf:about="http:///www.pie-r-squared.com/rss.rdf">
<title>Pie-R-Squared</title>
<description>
Download a delicious pie from Pie-R-Squared!
</description>
<link>http://www.pie-r-squared.com</link>
</channel>
<image rdf:about="http:///www.pie-r-squared.com/images/logo88x33.gif">
<title>Pie-R-Squared du Jour</title>
<url>http://www.pie-r-squared.com/images/logo88x33.gif</url>
<link>http://www.pie-r-squared.com</link>
</image>
<item rdf:about="http://www.pie-r-squared.com/pies/pecan.html">
<title>Pecan Plenty</title>
<link>http://www.pie-r-squared.com/pies/pecan.html</link>
</item>
<item rdf:about="http://www.pie-r-squared.com/pies/key_lime.html">
<title>Key Lime</title>
<link>http://www.pie-r-squared.com/pies/key_lime.html</link>
</item>
<textinput rdf:about="http://www.pie-r-squared.com/search.pl">
<title>Search Pie-R-Squared</title>
<description>Search our pie catalog...</description>
<name>keyword</name>
<link>http://www.pie-r-squared.com/search.pl</link>
</textinput>
</rdf:RDF>
Pulling it all together
Now the last thing to do to make your RSS document RDF happy is to tie together the various elements to the
RSS channel itself.
"But aren't they tied together by virtue of being in the same RSS document?" Good question! While
everything's syndicated/published in one document, one can't assume it'll all stay together as it travels the Net.
Aggregators decouple items from their parent channels, stirring them in various combinations to cook up new
RSS feeds for resyndication, incorporation into a Website, commentary, etc. RDFers are really gathering data
("stuff said about a particular URI") and merging them into large data structures for the purposes of poking
4/17/2009 12:29 PM
O'Reilly Network: Writing RSS 1.0
8 of 11
http://www.oreillynet.com/lpt/a/network/2000/08/25/magazine/rss_tut.html
and prodding to extract some meaningful relationships.
What I like to call the channel's "table of contents" allows for this decoupling and munging while retaining
some memory of an RSS item's parentage. The idea is similar to referencing sources of quotes you've used in
your term paper; that [14] after the quote serves to preserve the association between the words you quoted
and the source thereof.
Let's start with the optional elements, image and textinput; this step is only necessary for the optional
element(s) you do use. We simply copy each element's opening tag, replace the rdf:about with
rdf:resource, make it an empty element by adding a / just before the closing angle-bracket, and paste it
inside the <channel> element. A couple of copies, edits, and pastes later:
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
>
<channel rdf:about="http:///www.pie-r-squared.com/rss.rdf">
<title>Pie-R-Squared</title>
<description>
Download a delicious pie from Pie-R-Squared!
</description>
<link>http://www.pie-r-squared.com</link>
<image rdf:resource="http:///www.pie-r-squared.com/images/logo88x33.gif" />
<textinput rdf:resource="http://www.pie-r-squared.com/search.pl" />
</channel>
<image rdf:about="http:///www.pie-r-squared.com/images/logo88x33.gif">
<title>Pie-R-Squared du Jour</title>
<url>http://www.pie-r-squared.com/images/logo88x33.gif</url>
<link>http://www.pie-r-squared.com</link>
</image>
<item rdf:about="http://www.pie-r-squared.com/pies/pecan.html">
<title>Pecan Plenty</title>
<link>http://www.pie-r-squared.com/pies/pecan.html</link>
</item>
<item rdf:about="http://www.pie-r-squared.com/pies/key_lime.html">
<title>Key Lime</title>
<link>http://www.pie-r-squared.com/pies/key_lime.html</link>
</item>
<textinput rdf:about="http://www.pie-r-squared.com/search.pl">
<title>Search Pie-R-Squared</title>
<description>Search our pie catalog...</description>
<name>keyword</name>
<link>http://www.pie-r-squared.com/search.pl</link>
</textinput>
</rdf:RDF>
The items piece is only marginallly more complicated if you've ever used HTML's Order Lists: <ol>
<li>something</li> ... </ol>
We create a new element called items which will hold our list of, well..., items. Since this will be an ordered
list, we'll also wrap them in RDF's concept of a sequence, rdf:Seq. We then do precisely the same thing for
each item that we did for image and textinput, except that we replace "item" with "li" (read: list item) and
4/17/2009 12:29 PM
O'Reilly Network: Writing RSS 1.0
9 of 11
http://www.oreillynet.com/lpt/a/network/2000/08/25/magazine/rss_tut.html
place these items in our preferred order inside our sequence list.
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
>
<channel rdf:about="http:///www.pie-r-squared.com/rss.rdf">
<title>Pie-R-Squared</title>
<description>
Download a delicious pie from Pie-R-Squared!
</description>
<link>http://www.pie-r-squared.com</link>
<image rdf:resource="http:///www.pie-r-squared.com/images/logo88x33.gif" />
<textinput rdf:resource="http://www.pie-r-squared.com/search.pl" />
<items>
<rdf:Seq>
<li rdf:resource="http://www.pie-r-squared.com/pies/pecan.html" />
<li rdf:resource="http://www.pie-r-squared.com/pies/key_lime.html" />
</rdf:Seq>
</items>
</channel>
<image rdf:about="http:///www.pie-r-squared.com/images/logo88x33.gif">
<title>Pie-R-Squared du Jour</title>
<url>http://www.pie-r-squared.com/images/logo88x33.gif</url>
<link>http://www.pie-r-squared.com</link>
</image>
<item rdf:about="http://www.pie-r-squared.com/pies/pecan.html">
<title>Pecan Plenty</title>
<link>http://www.pie-r-squared.com/pies/pecan.html</link>
</item>
<item rdf:about="http://www.pie-r-squared.com/pies/key_lime.html">
<title>Key Lime</title>
<link>http://www.pie-r-squared.com/pies/key_lime.html</link>
</item>
<textinput rdf:about="http://www.pie-r-squared.com/search.pl">
<title>Search Pie-R-Squared</title>
<description>Search our pie catalog...</description>
<name>keyword</name>
<link>http://www.pie-r-squared.com/search.pl</link>
</textinput>
</rdf:RDF>
That's all, folks!
And we're done! Now, that wasn't painful, was it? If you have any questions or recommendations as to how to
make this tutorial even easier, please feel free to drop me a line. If you'd like to participate in the further
development and fine-tuning of RSS 1.0, point your browser at the RSS-DEV Working Group mailing list and
dive on in!
Conversion without being a convert
4/17/2009 12:29 PM
O'Reilly Network: Writing RSS 1.0
10 of 11
http://www.oreillynet.com/lpt/a/network/2000/08/25/magazine/rss_tut.html
Want 1.0-compliance without becoming a convert? Then one of these Web-based RSS conversion tools just
may be for you, my friend.
RSS2RDF
R.V. Guha's lightning-fast RSS2RDF takes the URL of an RSS 0.9 or 0.91 file as an argument and
returns an RSS 1.0 version thereof. Check out this live example using the RSS 0.9 document we created
in the first part of this tutorial. RSS2RDF is also available for download as a Linux binary or MPL'd C
source.
rss_converter
Enjoy the copy-n-paste-n-copy ease of Jonathan Eisenzopf's form-based RSS converter. It slices, it
dices, and it certainly does RSS -- from and to 0.9, 0.91, and 1.0. Simply paste in your RSS document,
choose an output format and click "Convert." Jonathan's converter uses his popular XML::RSS Perl
module.
For more RSS 1.0-compliant tools, libraries, XSLT stylesheets, code snippets, and more, be sure to visit the
RSS-DEV tool shed.
Resources
RSS
"RSS 1.0: The New Syndication Format" by Jonathan Eisenzopf, WebReference
"If Only I Had Known" by Derrick Story, Web Review
"RSS Delivers the XML Promise" by Peter Wiggin, Web Review
"Why Would You Use RSS?" by Peter Wiggin, Web Review
"RSS and You" by Chris Nandor, Perl.com
"Making Headlines with RSS: Using Rich Site Summaries To Draw New Visitors" by Jonathan
Eisenzopf, Web Techniques
"Using RSS News Feeds" by Jonathan Eisenzopf, WebReference
XML Namespaces
"XML Namespaces by Example" by Tim Bray, XML.com
XML Namespaces, W3C
RDF
"RDF and Metadata" by Tim Bray, XML.com
RDF, W3C
For many more resources, visit the Resources section of the RSS 1.0 Specification proposal.
Suggestions, bug reports, and other feedback
We welcome any constructive criticism you might offer. Please post your suggestions, bug reports, praise, and
other feedback to the O'Reilly Network RSS Forum.
Rael Dornfest is Founder and CEO of Portland, Oregon-based Values of n. Rael leads the Values of n
charge with passion, unearthly creativity, and a repertoire of puns and jokes — some of which are actually
good. Prior to founding Values of n, he was O'Reilly's Chief Technical Officer, program chair for the
O'Reilly Emerging Technology Conference (which he continues to chair), series editor of the bestselling
Hacks book series, and instigator of O'Reilly's Rough Cuts early access program. He built Meerkat, the first
web-based feed aggregator, was champion and co-author of the RSS 1.0 specification, and has written and
contributed to six O'Reilly books. Rael's programmatic pride and joy is the nimble, open source blogging
application Blosxom, the principles of which you'll find in the Values of n philosophy and embodied in
Stikkit: Little yellow notes that think.
4/17/2009 12:29 PM
O'Reilly Network: Writing RSS 1.0
11 of 11
http://www.oreillynet.com/lpt/a/network/2000/08/25/magazine/rss_tut.html
Discuss this article in the O'Reilly Network RSS Forum.
Return to the O'Reilly RSS DevCenter.
Copyright © 2007 O'Reilly Media, Inc.
4/17/2009 12:29 PM

Download Report

O`Reilly Network: Writing R

Paperzz.com

Your Paperzz