Introduction to TEI

Introduction to TEI
Magdalena Turska
Introduction to TEI
Workshop outline
• Editing and markup
• TEI Universe
• XML foundations
• TEI Core
– TEI documents
– teiHeader
– text structure: divisions, paragraphs, speeches or ab
– highlights: hi, emph, foreign, title etc
– regularization, abbreviations and corrections
– people and places
– dates
– and other funky stuff
• Querying TEI
• TEI Processing Model and TEI Publisher
What are we doing when we are editing?
What is text? I am not so nave as to imagine that question could ever be finally
settled. Asking such a question is like asking How long is the coast of England?.
J. McGann
Text is what you look at. And how you look at it.
P. Sahle
In getting my books, I have always been solicitous of an ample margin; this is
not so much through any love of the thing in itself, however agreeable, as for the
facility it affords me of penciling in suggested thoughts, agreements, and differences
of opinion, or brief critical comments in general.
Edgar Allan Poe
1
2
Introduction to TEI
What an edition is?
Edition ist die erschlieende Wiedergabe historischer Dokumente.
A scholarly edition is the critical representation of historical documents.
”historical documents”: editing is concerned with documents that already exist.
To create a new document (which doesn’t refer to something preexisting) is not
scholarly editing.
”representation”: covers (abstract) representation as well as presentation (reproduction). Publishing descriptive data (e.g. metadata) without reproduction is not
critical editing. A catalogue, a database, a calendar is not an edition.
”critical / scholarly”: reproduction of documents without critical examination is
not scholarly editing. A facsimile is not a scholarly edition.
Is this the same text?
duBellay-1.png
Markup
A text is more than a sequence of encoded glyphs or lexical tokens
• it has a structure and a communicative function
• it also has multiple possible readings
• its meaning usually requires some level of decoding: deciphering the script,
expanding abbreviations, identifying spatial and temporal context...
• it can be enriched by annotation
The process of inserting explicit markers for implicit textual features is often
called markup, or equivalently encoding; the term tagging is also used informally.
Only that which is explicit can be reliably found and displayed, counted, analysed...
Why would we bother with marking up all that?
• To allow machine-processing and querying
• To facilitate re-use of the same material
– in different formats
– in different contexts
Introduction to TEI
3
– by different users
• Not to be limited to the singular view of one editor or consumer
• To add value by supplying multiple annotations reflecting editor’s knowledge
How to make explicit (to a machine) what is implicit (to a person)?
Procedural markup
In the beginning there was procedural markup
RED INK ON; print balance; RED INK OFF
Problem: Conventions give some clues about the meaning but are often ambiguous and may not be universally understood.
Isn’t it more useful to markup what we think things are than what they
look like?
• Presentational markup cares more about fonts and layout than meaning
• Descriptive markup says what things are, making the interpretation straightforward and leaves the rendition of them for a separate step
• Separating the form of something from its content makes it much easier to
re-use
• It also allows easy changes of presentation across a large number of documents
Again: is this the same text?
duBellay-1.png
Compare two markup approaches
¡pb n=”4”/¿A MONSEI- ¡lb/¿GNEUR LE REVE- ¡lb/¿rendissime Cardinal ¡lb/¿du
Bellay. ¡lb/¿S ¡lb/¿ ¡c rend=”lettrine”¿V¡/c¿EU le Personnaige, ¡lb/¿que tu joues
au Spec- ¡lb/¿tacle de toute l’Europe...
¡div type=”dedicace”¿ ¡head¿A MONSEIGNEUR LE REVERENDISSIME CARDINAL DU BELLAY¡/head¿ ¡salute¿S¡ex¿ire¡/ex¿¡/salute¿ ¡p¿ ¡c rend=”lettrine”¿V¡/c¿EU
le Personnaige, que tu joues au Spectacle de toute l’Europe... ¡/p¿... ¡/div¿
4
Introduction to TEI
Markup is a scholarly activity
• The application of markup to a document is not an automatic process
• In deciding what markup to apply, and how this represents the original, one
is undertaking the task of an editor
• There is (almost) no such thing as neutral markup – all of it involves interpretation
• Markup can assist in answering research questions, and the deciding what
markup is needed to enable such questions to be answered can be a research
activity in itself
• Good textual encoding is never as easy or quick as people would believe
• Detailed document analysis is needed before encoding for the resulting markup
to be useful
The long haul
Presentation by its very nature is ephemeral: bound to be modified over time or
multiplied for other use scenarios
Computer stuff gets obsolete pretty quickly: a decade or two or maybe 2 years
if you’re unlucky
Need to refurbish the interfaces, migrate the underlying software and perhaps
data as well is one constant in long-term preservation
Its the data source, the encoding that is of long-lasting value and not the presentation
Good, formal, well documented data model will lend itself to fairly easy conversion into any new solution that comes
What are our hopes and promises?
(not only when applying for funding)
• scholarly use
• and RE-use
• interchange
• long-term preservation
What ensures the best chance of long-term preservation and reuse is the concentration on openly licensed, quality encoding conforming as best as possible to standards.
Introduction to TEI
5
The markup language we use must be able to ...
• specify all the characters found
• make explicit the structures perceived
• represent that structure in a linear processable form
• additionally supply a variety of metadata or annotations
XML is a good fit (for the most part...)
Community standard: TEI
The Text Encoding Initiative (TEI) is a consortium which collectively develops and
maintains a standard for the representation of texts in digital form.
TEI Guidelines: encoding methods for machine-readable texts, chiefly in the
humanities, social sciences and linguistics.
Widely used by libraries, museums, publishers, and individual scholars to present
texts for online research, teaching, and preservation.
The TEI Consortium is a nonprofit membership organization composed of academic institutions, research projects, and individual scholars from around the world.
Members contribute financially to the Consortium and elect representatives to its
Council and Board of Directors. In commemoration of the TEI community’s 30th
anniversary, it will be awarded the 2017 Antonio Zampolli Prize from the Alliance
of Digital Humanities Organizations.
tei-community.png
Join the TEI, they said...
TEI is often called a de facto standard, likely because it’s a community of users,
developing from the bottom up.
soldier.jpg
If you have questions about anything broadly connected with TEI ask at [email protected]; you get answers not only from us, but TEI experts around
the world. Don’t be shy: questions from those of all levels of ability stop the list
becoming too technical, everyone benefits from having the answers be public – and
you benefit by reading (and sometimes answering!) others’ problems
6
Introduction to TEI
TEI encoding scheme is expressed in XML
The rules and recommendations made in TEI Guidelines are expressed in terms of
what is currently the most widely-used markup language for digital resources of all
kinds: the Extensible Markup Language (XML), as defined by the World Wide Web
Consortium’s XML Recommendation. However, the TEI encoding scheme itself does
not depend on this language; it was originally formulated in terms of SGML (the
ISO Standard Generalized Markup Language), a predecessor of XML, and may in
future years be re-expressed in other ways as the field of markup develops.
XML
Extensible Markup Language is a markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable.
Defined by the W3C’s XML 1.0 Specification and by several other related specifications, all of which are free open standards.
Strictly speaking, XML is a metalanguage, that is, a language used to describe
other languages (or, as we often say: vocabularies). TEI is one of the existing
vocabularies expressed in XML.
UNICODE
XML document is a string of characters.
Almost every legal Unicode character may appear in an XML document.
Markup and content
The characters making up an XML document are divided into markup and content,
which may be distinguished by the application of simple syntactic rules.
• markup either begins with ¡ and ends with a ¿
• or begins with the character & and ends with a ;
• strings of characters that are not markup are content
Tag
A markup construct that begins with ¡ and ends with ¿.
Tags come in three flavors:
• start-tags; for example: ¡div¿
• end-tags; for example: ¡/div¿
• empty-element tags; for example: ¡lb/¿
Introduction to TEI
7
Element
A logical document component which either begins with a start-tag and ends with
a matching end-tag or consists only of an empty-element tag.
The characters between the start- and end-tags, if any, are the element’s content,
and may contain markup, including other elements, which are called child elements.
An example of an element is ¡salute¿Hello, world.¡/salute¿
. Another is ¡lb/¿
.
Attribute
A markup construct consisting of a name/value pair that exists within a start-tag
or empty-element tag. In the example the element img has two attributes, src and
alt: ¡img src=”madonna.jpg” alt=’Foligno Madonna, by Raphael’/¿
Another example would be ¡step number=”3”¿Connect A to B.¡/step¿
where the name of the attribute is number and the value is 3.
Attributes
An XML attribute can only have a single value and each attribute can appear at
most once on each element.
Where a list of multiple values is desired, this must be done by encoding the
list into a well-formed XML attribute with some format beyond what XML defines
itself. Usually this is either a comma or semi-colon delimited list or, if the individual
values are known not to contain spaces, a space-delimited list can be used.
¡div class=”inner greeting-box”¿Hello!¡/div¿
where the attribute class has both the value inner greeting-box and also indicates
the two CSS class names inner and greeting-box.
XML declaration
XML documents may begin by declaring some information about themselves, as in
the following example:
¡?xml version=”1.0” encoding=”UTF-8”?¿
Well-formedness
XML document must be a well-formed text it needs to satisfy a list of syntax rules
provided in the specification.
• The document contains only properly encoded legal Unicode characters
• None of the special syntax characters such as ¡ and & appear except when
performing their markup-delineation roles
• The begin, end, and empty-element tags that delimit the elements are correctly
nested, with none missing and none overlapping
8
Introduction to TEI
• The element tags are case-sensitive; the beginning and end tags must match
exactly.
• A single root element contains all the other elements.
• Tag names cannot contain any of the characters !”#$%&’()*+,/;¡=¿?@[\]ˆ‘{—}˜,
nor a space character, and cannot start with -, ., or a numeric digit.
DOM
The W3C Document Object Model (DOM) is a platform and language-neutral interface that allows programs and scripts to dynamically access and update the content,
structure, and style of a document. The XML DOM defines the objects and properties of all XML elements, and the methods (interface) to access them.
The DOM says:
• The entire document is a document node
• Every XML element is an element node
• The text in the XML elements are text nodes
• Every attribute is an attribute node
• Comments are comment nodes
• Text is Always Stored in Text Nodes
A common error is to expect an element node to contain text. However, the text of
an element node is stored in a text node. In this example: ¡year¿2005¡/year¿, the
element node ¡year¿¡/year¿, holds a text node with the value ”2005”. ”2005” is not
the value of the ¡year¿¡/year¿ element!
Node Parents, Children, and Siblings
The nodes in the node tree have a hierarchical relationship to each other. The terms
parent, child, and sibling are used to describe the relationships.
• Parent nodes have children.
• Children on the same level are called siblings
• In a node tree, the top node is called the root
• Every node, except the root, has exactly one parent node
• A node can have any number of children
• A leaf is a node with no children
• Siblings are nodes with the same parent
Introduction to TEI
9
Node Types
Different W3C World Wide Web Consortium node types and descriptions:
• Document represents the entire document (the root-node of the DOM tree)
• DocumentFragment represents a ”lightweight” Document object, which can
hold a portion of a document
• DocumentType provides an interface to the entities defined for the document
• ProcessingInstruction represents a processing instruction
• EntityReference represents an entity reference
• Element represents an element
• Attr represents an attribute
• Text represents textual content in an element or attribute
• CDATASection represents a CDATA section in a document (text that will
NOT be parsed by a parser)
• Comment represents a comment
• Entity represents an entity
• Notation represents a notation declared in the DTD
Namespaces
• Namespace name is an identifier given to an XML vocabulary
• It looks a lot like URI, eg. TEI namespace name is http://www.tei-c.org/ns/1.0
• It doesn’t mean that this URI actually points to something (like a schema),
this is just a name!
XML namespaces are used for providing uniquely named elements and attributes
in an XML document. An XML instance may contain element or attribute names
from more than one XML vocabulary. If each vocabulary is given a namespace, the
ambiguity between identically named elements or attributes can be resolved.
Mental shortcut to namespaces
If you assume that each XML vocabulary is basically a language, then namespace
is a name of this language.
So saying that a particular element comes from a given namespace resolves the
ambiguity, just like with natural languages, eg. polish:fart means ”luck” and is not
smelly at all, contrary to english:fart spanish:hola meaning ”hi!” is not the same as
polish:hola meaning ”hey, man, now wait a minute!”
Back to XML html:div is not quite the same as tei:div, but they could be used
in the same document
10
Introduction to TEI
How to declare a namespace for a document?
use xmlns attribute on root element, eg:
¡TEI xmlns=”http://www.tei-c.org/ns/1.0”¿
All descendants inherit the namespace from their parent, so no need to repeat it
on all other elements
How to mix elements from different namespaces?
use xmlns attribute on elements from ’embedded’ namespaces, eg: ¡include xmlns=”http://www
href=”elementsummary.xml”/¿
declare namespace up top, give it a prefix and use prefixed element names, eg:
¡TEI xmlns=”http://www.tei-c.org/ns/1.0” xmlns:skos=”http://www.w3.org/2004/02/skos/co
xml:lang=”en”¿ .... somewhere down in the document .... ¡skos:symbol¿blah¡/skos:symbol¿
If you don’t put xmlns attribute anywhere, all elements belong to the default,
empty namespace!
¡?xml version=”1.0” encoding=”UTF-8”?¿ ¡tei:bibl xmlns:tei=”http://www.teic.org/ns/1.0”¿ ¡title¿eXist¡/title¿ ¡x:author xmlns:x=”http://www.tei-c.org/ns/1.0”¿Adam
Retter¡/x:author¿ ¡author¿Erik Siegel¡/author¿ ¡publisher¿O’Reilly¡/publisher¿ ¡date¿2014¡/dat
¡idno type=”ISBN”¿9781449337100¡/idno¿ ¡/tei:bibl¿
All elements above are in the TEI namespace
Different prefixes do not matter! tei:author and x:author are the same
TEI document structure
Structure of a TEI Document
There are two basic structures of a TEI Document:
• TEI (TEI document) contains a single TEI-conformant document, comprising
a TEI header and a text, either in isolation or as part of a teiCorpus element.
• teiCorpus contains the whole of a TEI encoded corpus, comprising a single
corpus header and one or more TEI elements, each containing a single text
header and a text.
TEI basic structures (1)
¡teiCorpus¿ ¡teiHeader¿¡!– required –¿¡/teiHeader¿ ¡TEI¿¡!– required –¿¡/TEI¿ ¡!–
More ¡TEI¿ elements –¿ ¡/teiCorpus¿
TEI basic structures (2)
¡TEI¿ ¡teiHeader¿¡!– required –¿¡/teiHeader¿ ¡facsimile¿¡!– optional–¿¡/facsimile¿
¡sourceDoc¿¡!– optional –¿¡/sourceDoc¿ ¡text¿¡!– required if no facsimile or sourceDoc–
¿¡/text¿ ¡/TEI¿
Introduction to TEI
11
text
What is a text?
• A text may be unitary or composite
– unitary: forming an organic whole
– composite: consisting of several components which are in some important
sense independent of each other
• a unitary text contains
– optional front matter
– body (required)
– optional back matter
Composite texts
A composite text contains
• optional front matter
• group with text inside (required)
• optional back matter
A corpus is a collection of text and header pairs that also has its own header.
group tags may self-nest.
TEI text structure (1)
¡text¿ ¡front¿¡!– optional –¿¡/front¿ ¡body¿¡!– required –¿¡/body¿ ¡back¿¡!– optional
–¿¡/back¿ ¡/text¿
TEI text structure (2)
¡text¿ ¡front¿¡!– front matter to an anthology –¿¡/front¿ ¡group¿ ¡text¿ ¡body¿ ¡p¿ ¡!–
content of first text –¿¡/p¿ ¡/body¿ ¡/text¿ ¡!– more texts here –¿ ¡/group¿ ¡back¿¡!–
back matter to an anthology –¿¡/back¿ ¡/text¿
Another Grouped Text Example
¡group¿ ¡text¿ ¡!– optional front matter –¿ ¡body¿¡!– First Body –¿ ¡/body¿ ¡!– optional back matter –¿ ¡/text¿ ¡text¿ ¡!– optional front matter –¿ ¡body¿ ¡/body¿ ¡!–
optional back matter –¿ ¡/text¿ ¡/group¿
12
Introduction to TEI
’Core’ elements
The so-called ’Core’ module groups together elements which may appear in any kind
of text and the tags used to mark them in all TEI documents. This includes:
• paragraphs
• highlighting, emphasis and quotation
• simple editorial changes
• basic names numbers, dates, addresses
• simple links and cross-references
• lists, notes, annotation, indexing
• graphics
• reference systems, bibliographic citations
• simple verse and drama
Paragraphs
p (paragraph) marks paragraphs in prose
• Fundamental unit for prose texts
• p can contain all the phrase-level elements in the core
• p can appear directly inside body or inside div (divisions)
¡p¿Thanks for yours of this morning. I hope ¡lb/¿you have had my card posted
last Monday. ¡lb/¿On Mond. next I lecture the ¡orgName ref=”Fieldclub”¿Field
Club¡/orgName¿ - ¡lb/¿a Nat. Hist. Association, in the lines of our ¡lb/¿old Society Geological, (you + me) + Botanical ¡lb/¿(New) Do you remember: you¡supplied¿r¡/supplied¿
old ¡lb/¿Black Molt?¡/p¿
Highlighting
By highlighting we mean the use of any combination of typographic features (font,
size, hue, etc.) in a printed or written text in order to distinguish some passage of
a text from its surroundings. For words and phrases which are:
• distinct in some way (e.g. foreign, archaic, technical)
• emphatic or stressed when spoken
• not really part of the text (e.g. cross references, titles, headings)
• a distinct narrative stream (e.g. an internal monologue, commentary)
Introduction to TEI
13
• attributed to some other agency inside or outside the text (e.g. direct speech,
quotation)
• set apart in another way (e.g. proverbial phrases, words mentioned but not
used)
Highlighting Examples
• hi (general purpose highlighting);distinct (linguistically distinct) ¡p¿Last week
I wrote (to order) a strong ¡lb/¿bit of Blank: on ¡hi rend=”ul”¿Antaeus
v. Heracles¡/hi¿. ¡lb/¿These are the best lines, methinks: ¡lb/¿(N.B. Antaeus deriving strength from his Mother Earth ¡lb/¿nearly licked old ¡distinct¿Herk¡/distinct¿.) ¡/p¿
• Other similar elements include: emph, mentioned, soCalled, term and gloss
Quotation
Quotation marks can be used to set off text for many reasons, so the TEI has the
following elements:
• q (separated from the surrounding text with quotation marks)
• said (speech or thought)
• quote (passage attributed to an external source)
• cit (groups a quotation and citation)
¡cit¿ ¡quote¿ ¡l¿... How Earth herself empowered him with her trick,¡/l¿ ¡l¿Gave
him the grip and stringency of Winter,¡/l¿ ¡l¿And all the ardour of th’ invincible
Spring;¡/l¿ ¡/quote¿ ¡bibl¿ ¡author¿Wilfred Owen¡/author¿ ¡title ref=”works.xmlWO123”¿Letter
to Leslie Gunston / The Wrestler¡/title¿ ¡date when=”1917-07”¿July 1917¡/date¿
¡/bibl¿ ¡/cit¿
Simple Editorial Changes: choice and Friends
• choice (groups alternative editorial encodings)
• Errors:
– sic (apparent error)
– corr (corrected error)
• Regularization:
– orig (original form)
– reg (regularized form)
• Abbreviation:
– abbr (abbreviated form)
– expan (expanded form)
14
Introduction to TEI
Choice Example
¡p¿...any might, ¡unclear reason=”scribbled”¿majesty¡/unclear¿, ¡choice¿ ¡abbr¿domin¡/abbr¿
¡expan¿domin¡ex¿ion¡/ex¿¡/expan¿ ¡/choice¿ or power...¡/p¿
Additions, Deletions, and Omissions
• add (addition to the text, e.g. marginal gloss)
• del (phrase marked as deleted in the text)
• gap (indicates point where material is omitted)
• unclear (contains text unable to be transcribed clearly)
Example of add, del, gap, and unclear
¡p¿ ¡add place=”left”¿My ¡/add¿ ¡del rend=”stroked”¿It’s ¡/del¿ ¡add place=”above”¿
¡del rend=”stroked”¿The ¡/del¿ ¡/add¿ subject ¡del rend=”stroked”¿of¡/del¿ is War,
and the ¡unclear¿pity ¡/unclear¿of ¡del rend=”stroked”¿it¡/del¿ War. ¡lb/¿ The Poetry is in the pity. ¡/p¿
Basic Names
• name (a name in the text, contains a proper noun or noun phrase)
• rs (a general-purpose name or referencing string )
The @type attribute is useful for categorizing these, and they both also have
@key, @ref, and @nymRef attributes.
Basic Numbers and Measures
• num (marks a number of any sort)
• measure (marks a quantity or commodity)
• measureGrp (groups specifications relating to a single object)
• While num has simple @type and @value attributes, measure has @type,
@quantity, @unit and @commodity attributes
Number and Measure examples
¡l¿With a ¡num value=”1000”¿thousand¡/num¿ pains that vision’s face was grained;¡/l¿
... only ¡measure type=”distance” unit=”m” quantity=”3218.69”¿two miles¡/measure¿
from the front....
Introduction to TEI
15
Dates
• date (contains a date in any format and includes a @when attribute for a
regularised form and a @calendar attribute to specify what calendar system)
• time (contains a time in any format and includes a @when attribute for a
regularised form)
¡date when=”1917-07”¿July 1917.¡lb/¿ Wednesday¡/date¿
Simple Linking
• ptr (defines a pointer to another location)
• ref (defines a reference to another location, with optional linking text)
• Both elements have:
– @target attribute taking a URI reference
– @cRef attribute for canonical referencing schemes
• If the linking text is able to be generated, ptr and ref might be used in the
same place.
Simple Linking Example
See ¡ref target=”Section12”¿section 12 on page 34¡/ref¿.
See ¡ptr target=”Section12”/¿.
Lists
• list (a sequence of items forming a list)
• item (one component of a list)
• label (label associated with an item)
• headLabel (heading for column of labels)
• headItem (heading for column of items)
Simple List Example
The previous slide contained only:
¡div¿ ¡head¿Lists¡/head¿ ¡list¿ ¡item¿list (a sequence of items forming a list)¡/item¿
¡item¿item (one component of a list)¡/item¿ ¡item¿label (label associated with an
item)¡/item¿ ¡item¿headLabel (heading for column of labels)¡/item¿ ¡item¿headItem
(heading for column of items)¡/item¿ ¡/list¿
¡/div¿
16
Introduction to TEI
Notes
• note (contains a note or annotation)
• Notes can be those existing in the text, or provided by the editor of the electronic text
• A @place attribute can be used to indicate the physical location of the note
• Notes should usually be encoded where its identifier/mark first appears; notes
can also be kept separately and point back to their location with a @target
attribute
¡note¿Painted by ¡persName¿John Singer Sargent¡/persName¿, 1918¡/note¿
Graphics
• graphic (indicates the location of an inline graphic, illustration, or figure)
• binaryObject (encoded binary data embedding a graphic or other object)
• The figure module provides figure and figDesc for more complex graphics with
figDesc
¡figure¿ ¡graphic url=”materials/postcard-front.jpg”/¿ ¡figDesc¿A postcard image of two men relaxing at a table, smoking pipes and drinking. A dog and potted
fruit tree are nearby with a house over the wall in the distance.¡/figDesc¿ ¡/figure¿
Simple Verse
¡lg type=”stanza”¿ ¡l¿It seemed that out of battle I escaped¡/l¿ ¡l¿Down some profound dull tunnel, long since scooped¡/l¿ ¡l¿Through granites which titanic wars had
groined.¡/l¿ ¡/lg¿
Simple Drama
¡sp¿ ¡speaker¿The reverend Doctor Opimiam¡/speaker¿ ¡p¿I do not think I have
named a single unpresentable fish.¡/p¿ ¡/sp¿ ¡sp¿ ¡speaker¿Mr Gryll¡/speaker¿ ¡p¿Bream,
Doctor: there is not much to be said for bream.¡/p¿ ¡/sp¿
The ambivalence continues
• every project is different and we delight in flexibility and customization
• yet we aim at standardization & interoperability
(...) documents worth encoding in TEI are very different from customer letters. But
not that different, and eight out of ten probably will benefit from staying within the
confines of a well thought-out standard schema and its surrounding processing rules.
And even the two that dont may benefit from staying within that standard schema
as far as possible. - M. Mueller
Introduction to TEI
17
TEI Processing Model
The TEI Processing Model (PM) extends the TEI ODD specification format with a
processing model for documents. That way intended processing for all elements can
be expressed within the TEI vocubulary itself. It aims at the XML-savvy editor who
is familiar with TEI but is not necessarily a developer. The editor knows the logical
structure of the text and maps it to a small set of abstract transformation functions, called ”behaviours”. Predefined behaviours are, for example, ”paragraph”,
”heading” or ”note”.
Basic styling features can be set directly within the ODD using CSS. The processing model is media-agnostic: rendition styles are transparently translated into
the different output media types like HTML, XSL-FO, LaTeX, or ePUB.
At the same time, the processing model implements a clean separation of concerns to improve the workflow between editors, designers and developers. The editor
defines how elements are mapped to behavior functions and specifies basic styling
rules, the designer controls the overall presentation of the material, and the developer concentrates on the general application framework andif necessarysupports the
editor by providing custom behavior functions as extension modules.
rahtzrationale.png
A word of warning though: while the processing model was developed as part
of the TEI Simple initiative, it is not necessarily trivial to comprehend nor ”simple” regarding the knowledge and skill it requires. The goal rather was to improve
workflows and interoperability by providing a minimal abstraction for the processing
rules applied to a document. Using the processing model definitely simplifies the life
of the developer, who often has to write a few thousand lines of code just to render
a particular TEI document into HTML, only to repeat the same tedious process for
PDF output. Last but not least the processing model empowers the scholary editor
to create high-quality prototype websites from a given data set without even relying
on a developer.
TEI Publisher
TEI Publisher does a lot more than just implement the TEI processing model. It
consists of the following components:
1. A set of library modules to render a TEI document into various output formats,
which can be imported into other applications. This corresponds to the TEI
Publisher library package.
2. The core application to work and experiment with various source documents
and processing model instances. This is the main TEI Publisher app.
18
Introduction to TEI
3. An application generator, which takes a processing model instance and creates
a user interface around it, resulting in a standalone web application for a
certain corpus of documents. The generated application shares most of its
XQuery code and interface with the main TEI Publisher app, but can be
customized to meet your needs.
In a strict sense, only the first component implements the processing model specification. The larger part of the toolbox deals with graphical user interface components,
adding pagination, navigation and search features.
TEI Publisher
TEI Publisher is distributed as two eXist application packages, making it easy to
install on any local or remote eXist database instance. Just go to the dashboard,
open the package manager and install the TEI Publisher application package from
the public repository. The TEI Publisher library package will be installed automatically as a dependency.
Note: You need at least eXist 3.0 or a nightly build.
Once installed, play around with the provided documents or upload your own
via the file upload panel in the right sidebar. You may modify any of the supplied
ODD files and see how the rendering changes.
To create your own custom ODD, tailored to the data set you are working on,
the general procedure is as follows:
• Create a new customization
• Overwrite the standard processing model rules for selected TEI elements or
add model rules for elements not handled by the standard ODD.
• Test your modifications by applying the ODD and rendering a few documents.
The steps will be described in detail in the following sections.
Create a new customization
Create a new customization by entering a name and clicking ”Create” at the bottom
of the right sidebar panel listing the available ODDs. Reload the page to see the
new ODD appear in the list.
Open the created ODD using the ”source” button. This will create a new browser
tab showing the editor which is part of eXist, called ”eXide”. The ODD will be
loaded into it.
As you can see, the ODD is nearly empty, containing just boilerplate code.
However, it imports the standard teipublisher.odd within the schemaSpec element:
¡schemaSpec ident=”myteisimple” start=”TEI teiCorpus” source=”teipublisher.odd”¿
This means your new ODD extends the standard teipublisher.odd, which provides useful defaults for the TEI core elements. In general, most of those defaults
should be ok and you only need to overwrite a hand full of selected mappings within
your own ODD.
Introduction to TEI
19
Modify the ODD
Now we created a custom ODD, we can make a first modification. We’ll use the
Letter #6 from Robert Graves to William Graves as an example. Rendered through
the standard teipublisher.odd, it is definitely missing some styling to look more like
a letter. For example, we may want to move the dateline in the opener to the right
and get rid of the pb label which currently sits there.
We thus need to overwrite the processing model rules for dateline and opener.
The easiest way to do this is to:
• open the standard teipublisher.odd and search for an existing elementSpec
with @ident=”dateline”.
• if there is an elementSpec already, just copy it
• paste the elementSpec into the new ODD and modify it
If there’s no elementSpec for the element in teipublisher.odd, you have to create a
new elementSpec in your ODD, using @mode=”add” instead of @mode=”change”.
The elementSpec copied from teipublisher.odd looks like this:
¡elementSpec mode=”change” ident=”dateline”¿ ¡model behaviour=”block”/¿
¡/elementSpec¿
To make the text right-aligned, we add a outputRendition:
¡elementSpec mode=”change” ident=”dateline”¿ ¡model behaviour=”block”¿ ¡outputRendition¿textalign: right;¡/outputRendition¿ ¡/model¿ ¡/elementSpec¿
We also want to hide the pb label, so we add another elementSpec:
¡elementSpec mode=”change” ident=”pb”¿ ¡model behaviour=”omit”/¿ ¡/elementSpec¿
To test the changes, go back to the main page of the app, select the new ODD
and click on ”Regenerate”, then browse to the letter and view the result. In fact, the
”regenerate” step is not absolutely necessary as the library will usually recognize the
change to the ODD when you try to view the letter and regenerate automatically.
However, if you made mistakes in the ODD, the regenerate will just fail silently. So
during development, it is advisable to regenerate manually.
Processing Model syntax
model element
model element is primarily used to document intended processing for a given element. One or more of these elements may appear directly within an elementSpec
element specification to define the processing anticipated for that element. Where
multiple model elements appear, they are understood to document mutually exclusive processing scenarios, possibly for different outputs or applicable in different
contexts.
A processing model defines on an abstract level how a given element may be
transformed to produce one or more outputs. The model is expressed in terms
of behaviours and their parameters, using high-level formatting concepts, such as
block, inline, note or heading. A processing model is thus a template description,
20
Introduction to TEI
used to generate the code needed by the publishing application to process the source
document into required output.
Example below depicts a situation where a single model is defined for app element. As no @predicate or @output are specified, this model applies for all contexts
in which app may appear and all possible outputs. Thus for all app elements they
will be transformed into inline chunks of text containing only contents of app’s
lem child and omitting any possible rdg children. ¡elementSpec mode=”change”
ident=”app”¿ ¡model behaviour=”inline”¿ ¡param name=”content”¿lem¡/param¿
¡/model¿ ¡/elementSpec¿
model children and attributes:
• @predicate: the condition under which this model applies, given as an XPath
Predicate Expression
• @behaviour: names the function which this processing model uses in order
to produce output; possible values include: alternate, block, figure, heading,
inline, link, list, note, paragraph
• @output: identifier of the intended output for which this model applies; applies
to all output if no @output is present on a model
• @useSourceRendition: whether to obey any rendition attribute which is present
in the source document
• @cssClass: one or more CSS class names which should be added to the resulting output element where applicable
• param: allows to pass parameters to @behaviour function; parameters available depend on the behaviour in question; when parameters are not explicitly
passed, default values for those are assumed; all behaviour functions use current element as default content
• outputRendition: supplies information about the desired output rendition in
CSS; its attribute @scope provides a way of defining pseudo-elements eg: firstline, first-letter, before, after
Simple model explicitly specifying content parameter: for app entries only content of its lem child is to be displayed (as an inline chunk of text) ¡elementSpec
mode=”change” ident=”app”¿ ¡model behaviour=”inline”¿ ¡param name=”content”¿lem¡/para
¡/model¿ ¡/elementSpec¿
Model specifying output rendition: contents of ex elements are to be displayed in
italic and wrapped in parentheses ¡elementSpec mode=”change” ident=”ex”¿ ¡model
behaviour=”inline”¿ ¡outputRendition¿font-style: italic;¡/outputRendition¿ ¡outputRendition scope=”before”¿content:”(”;¡/outputRendition¿ ¡outputRendition scope=”after”¿co
¡/model¿ ¡/elementSpec¿
Sometimes different processing models are required for the same element in different contexts. For example, we may wish to process the quote element as an inline
italic element when it appears inside a p element, but as an indented block when it
Introduction to TEI
21
appears elsewhere. To achieve this, we need to change the specification for the quote
element to include two model elements as follows: ¡elementSpec mode=”change”
ident=”quote”¿ ¡model predicate=”ancestor::p” behaviour=”inline”¿ ¡outputRendition¿fontstyle: italic;¡/outputRendition¿ ¡/model¿ ¡model behaviour=”block”¿ ¡outputRendition¿leftmargin: 2em;¡/outputRendition¿ ¡/model¿ ¡/elementSpec¿
The first processing model will be used only for quote elements which match
the XPath expression given as value for the @predicate attribute. Other element
occurrences will use the second processing model. Set of multiple model statements
is regarded as an alternation and only the first model with @predicate matching
current context is applied
modelSequence and modelGrp
Summary of elements that can be used to document one or more processing models
for a given element:
• model describes the processing intended for a specific context
• modelSequence (sequence of processing models) a group of model elements
documenting intended processing models for this element, to be acted upon in
sequence
• modelGrp (processing model group) a group of model elements documenting
intended processing models for this element
The modelGrp element may be used to group alternative model elements intended for a single kind of output. The modelSequence element is provided for
the case where a sequence of models is to be processed, functioning as a single unit.
Common use case would be to use modelSequence to generate table of contents along
with the reading text as shown in the example below: ¡elementSpec mode=”change”
ident=”body”¿ ¡modelSequence¿ ¡model behaviour=”index”¿ ¡param name=”type”¿’toc’¡/param¿
¡/model¿ ¡model behaviour=”block”/¿ ¡/modelSequence¿ ¡/elementSpec¿
Behaviours
Most of the processing complexity is hidden behind behaviour functions which were
designed to cover majority of commonly occurring processing tasks. Function names
are wherever possible based on commonly used terms such as ’inline’, ’block’, ’note’
or ’link’.
Behaviour functions accept a range of parameters, depending on the function in
question. Where these parameters are left unspecified in the model, default values
are used.
Available Behaviours All behaviour functions take at least one parameter: content. It will be added by default unless specified and contains the content of the
currently processed node. You may change this by explicitely setting a content
parameter inside the model.
In the parameter lists below we skip the content parameter as it is available for
every behaviour.
22
Introduction to TEI
alternate
Display alternating elements for displaying the preferred version and an alternative, both at once or by some method of toggling between the two. The
concrete implementation depends on the output format.
Parameter Description
default
the content to display by default
alternate
alternate content
anchor
Create an anchor to which you can link, identified by the given id.
Parameter Description
id
the id
block
Create a block structure, usually a div in HTML or fo:block in fo.
body
Create the body of a document. In HTML this will result in a ¡body¿ tag.
break
Create a line, column, or page break according to type.
Parameter Description
type
e.g. ”page”, ”column”, ”line”
cell
Create a table cell. If the @cols or @rows attribute is specified, the cell may
span several columns/rows.
cit
Show a citation, with an indication of the source.
Parameter Description
source
the citation source
document
Start a new output document.
figure
Make a figure with provided title argument as caption
Parameter Description
title
a caption
graphic
Display the graphic retrieved from the given url.
Introduction to TEI
Parameter
url
width
height
scale
title
23
Description
the url to load the graphic from
the width of the graphic, e.g. ”300px”, ”50%” ...
the height of the graphic, e.g. ”300px”, ”50%” ...
a scaling factor to apply. If specified, width and height will be output as percentage based o
a title for the graphics element. Usually not shown directly.
heading
Creates a heading.
Parameter Description
level
the structural level of this heading. In HTML mode, this translates to ¡h1¿, ¡h2¿ etc.
inline
Outputs an inline element.
link
Create a hyperlink.
Parameter Description
uri
the link url
list
Creates an ordered or unordered list, depending on the type attribute (e.g.
type=”ordered”). If a label is present before each item, a description list is
output instead, using the label as definition term.
listItem
Outputs an item in a list.
metadata
Outputs a metadata section, e.g. a ¡head¿ in HTML.
note
create a note, often out of line, depending on the value of place; could be
”margin”, ”footnote”, ”endnote”, ”inline”
Parameter Description
place
defines the placement of the note, e.g. ”margin”, ”footnote” ...
label
the label to use for the footnote reference, usually a number.
omit
Do nothing, skip this element, do not process children
paragraph
Create a paragraph.
row
Create a table row.
24
Introduction to TEI
section
Create a new section in the output document. In HTML mode, this translates
to a ¡section¿ element being output.
table
Create a table.
text
Output literal text.
title
Output the document title. In HTML mode, this creates a ¡title¿ element. In
LaTeX, it adds the title to the document metadata.
Output formatting options
The intended rendering for a particular behaviour of a processing model may be
documented in one or all of the three following ways. Firstly, the @cssClass attribute
may be used to specify the name of a CSS style in some associated CSS stylesheet
which is to be applied to each occurrence of a specified element found (in a given
context, for a specified output). Secondly, the attribute @useSourceRendition may
be used to indicate that the rendition specified in the source document should be
applied. Thirdly, the styling to be applied may be specified explicitly as content of
a child outputRendition element.
When more than one of these options is used, they are understood to be combined
in accordance with the rules for multiple declaration of the styling language used.
It is strongly recommended that use outputRendition should be limited to strictly
editorial decisions, such as ’conjectures are to be displayed in square brackets’ and
not as means to record all typesetting and layout specific design choices.
The processing model library translates the CSS styles into the target media
format. Restrictions apply due to differences between the output formats. Not all
CSS properties are supported for every format. Please refer to the section on Output
media settings for further information.
Extensions to the Processing Model Specification
XQuery Instead of XPath The implementation directly translates processing
model instructions into an XQuery 3.1 module by generating executable XQuery
code. This is straightforward as the resulting XQuery will closely resemble the
specification in the ODD, thus being easy to debug. It also leads to very efficient
code, which is as fast or even faster as a hand-written, optimized transformation.
As a welcome side effect, any valid XQuery expression might be used wherever
the spec expects an XPath expression, e.g. in predicates or parameters. For example,
one can define variables inside a parameter using a standard XQuery let $x := ...
return ....
Introduction to TEI
25
External Parameters The script calling the processing model may pass external
parameters into the ODD. They will be available in the variable $parameters, which
is an XQuery map. Access parameters using the XQuery lookup operator.
For example, one can use this feature to control how specific parts of the document are output, without having to define a separate output mode, which would
result in much more code. Below we display a short header for the document, but
only if the parameter ”header” is set to ”short”:
¡elementSpec mode=”change” ident=”fileDesc”¿ ¡modelSequence predicate=”$parameters?header=’sho
¡model behaviour=”block” cssClass=”header-short”¿ ¡param name=”content”¿titleStmt¡/param¿
¡/model¿ ¡model behaviour=”block” cssClass=”header-short”¿ ¡param name=”content”¿editionStmt¡/para
¡/model¿ ¡/modelSequence¿ ... ¡/elementSpec¿
Define Additional CSS Classes The processing model automatically generates
class names for all elements. However, they just use the name of the TEI element
suffixed with a number corresponding to the position of the generating model within
the elementSpec. The class names are thus subject to change whenever one modifies
the ODD. Also, in most projects, a web designer will create a design before the
developer writes the ODD. He thus expects the class names generated from the
ODD to correspond to those in a design template, not vice versa.
Our implementation thus adds a possibility to explicitely specify class names
inside the model. Those class names will be added to the automatically generated
ones. There are two ways to define CSS classes:
• via the @cssClass attribute: specifies a whitespace separated list of static class
names to be appended to the output element
• via the cssClass child element of a model: the text content of this element
is interpreted as an XPath/XQuery expression. The evaluation result of this
expression will be cast to a sequence of strings which will be added to the the
output element. This is useful if the css class depends on the dynamic context
- which is often the case in css frameworks.
Nested Models According to the current specs, the processing model only allows
to map one TEI input element to exactly one output element. Sometimes this is not
sufficient as there might not be a direct one-to-one correspondence. For example,
one may want to wrap a block around a sequence of elements or a link around a
heading.
The implementation thus supports nesting of model, modelSequence or modelGrp elements inside another model. If no parameters are specified for the wrapping
model, the nested elements may directly appear as child elements and will be applied to the content of the outer behaviour before it is evaluated (replacing the
$content parameter). Otherwise, parameters in param elements may also contain
nested elements, whose evaluation result will become the value of the parameter.
Note that the context node doesn’t change for nested model and modelSequence
elements. It will always contain the node processed by the outer model.
26
Introduction to TEI
Best Practice Recommendations
• While the ODD may describe the rendition of an element using CSS, this
should be used with care: styling imposed by the ODD should be generic and
not interfere with application-specific design choices.
For example, defining a font family for a certain element in the ODD makes
it difficult for web designers to set the font via an external stylesheet.
The HTML as well as the FO output function libraries provide ways to customize the styling through additional, user-supplied CSS.
Configure the ODD via processing instructions
Sometimes you want to use a different ODD for a specific TEI document. Before
displaying a document, TEI publisher will check if a processing instruction exists
at the start of the document, telling it which ODD to use (along with other configuration parameters). For example, the following processing instruction associates
the document with the dta.odd ODD and switches to a page-by-page display (along
TEI page break boundaries):
¡?teipublisher view=”page” odd=”dta.odd”?¿
If your document does not have page breaks or you want to display entire divisions instead, there are two additional settings which control the amount of content
displayed at a time:
¡?teipublisher depth=”2” fill=”6” odd=”dta.odd”¿
odd
The ODD file to use for rendering the document
view
Default view to show when browsing the document. Supported values are div
or page.
depth
When viewing entire divisions, the software tries to determine if it should show
child divisions in separate pages or include them with the current div. depth
indicates the nesting level up to which divisions should be shown separately.
So setting it to ”2” will result in divisions on level 3 or greater to be shown
together with their enclosing div.
fill
If child divisions appear on separate pages, it may happen that the enclosing
div contains just a heading or a single line of text. In this case, the algorithm
will try to fill the page by showing the first child division as well. The fill
paramter defines the number of elements which should at least be present on
a page. If not, the software tries to fill it up.
Introduction to TEI
27
Using the App Generator
The App Generator takes an ODD file and generates a complete, standalone application out of it, including features like a simple search facility. Click on ”App
Generator” in the menu bar and fill out the form. The following form fields are
important:
Name
This is the main identifier for you app and should be a globally unique URI.
It does not need to correspond to any existing web site.
Abbreviation
The abbreviation will be used as the name of the root collection of your app.
It should be unique within one database instance.
Data Collection
Only specify something here if you have existing data inside the database or
if you want to ship the data set as part of a second, separate app. In all other
cases, leave this field empty.
User/Password
The user account which will own all application files. For security reasons, it
is advisable to create a new account for every app.
Once you created the new application, log into it using the account details you
provided. You can then upload XML documents using the upload panel in the right
sidebar.
Modifying the App
When you are logged in, the ”Admin” menu in the top navbar provides various links
to further extend your app:
Recompile ODD
After changing the application’s ODD, click here to update the app.
Update Document Metadata Index
The application maintains a separate index of titles and authors, which is
used to filter the list of documents shown on the landing page. This index is
not refreshed automatically, so you need to click this menu entry once after
uploading data.
Edit ODD
Opens the application’s ODD in eXide for editing.
Edit Configuration
Edit the main configuration parameters of the app. Configuration is done via
a set of variables in an XQuery module.
Edit Styles
Opens the main LESS stylesheet where you can change colors, fonts etc.
28
Introduction to TEI
Generated Code Overview
XQuery Code The generated app shares most of its XQuery libraries with
the main TEI Publisher app. A copy of those is included in the lib/ collection of
the generated app and should not be modified! This way you can later update the
libraries to a newer TEI Publisher release without breaking your app.
Including the libraries in the generated app creates some redundancy, but we
chose to accept this trade off to make it easier to view and modify everything relevant
to the app.
The following core XQuery modules in every app are safe to be modified:
config.xqm
The main configuration file for the app. It contains a number of parameters
which control things like: how should the TEI content be split into viewable
chunks?
pm-config.xql
This file defines the functions to be called for rendering TEI content via the
processing model. It imports the modules generated from your ODD and
assigns them to variables as function pointers. This approach is much more
efficient than the dynamic lookups done by the main TEI Publisher app. It
has been production tested on large web sites. The downside is that the
connection to the ODD is hard-coded. If you need to switch between different
ODDs, you would need to change pm-config.xql and insert a proper switch
there depending on external parameters.
index.xql
Is used to generate indexes on document metadata like author, title etc. It is
called when you click ”Update Metadata Index” in the admin menu.
autocomplete.xql
Contains the queries used to provide autocomplete suggestions when the user
types into a search box. If you need different indexes for your app, you may
also have to modify those queries.
app.xql
Add your own HTML templating functions here. This is basically where the
application logic of your app should go.
Styling All of the app styling is done via a set of modularized less stylesheets,
residing in resources/css. The main file is style.less, which defines a number of core
parameters. Ideally this should be the only file you ever need to modify. Limiting
yourself to style.less will make later updates much easier.
Exporting the Finished App
To save your finished application or exchange it with other people, you need to save
it as an application archive. Application archives use a standardized format: the
Introduction to TEI
29
resulting .xar file can be uploaded to any eXist instance via the dashboard and the
package manager will take care of the deployment.
There are two ways to create a .xar file from your application:
1. Use the ”Application”/”Download App” menu entry in eXide to directly download a .xar
2. Synchronize the application to a directory on disk via ”Application”/”Synchronize”
in eXide
The second approach is the recommended one. It requires that you have access to
the file system of the server running eXist though, so it’s usually only an option if
you run your own eXist. The synchronize steps in detail:
• Prerequisite: you need to have the Apache Ant build tool installed.
• Open one resource belonging to your application in eXide. It doesn’t matter
which one. The only important thing is that the name of your app is displayed
next to ”Current app:” on the top right of the eXide window. If this is not
the case, stop and check again!
• Click ”Application”/”Synchronize” in the menu. It opens up a dialog with
two fields: ”Start time” and ”Target directory”. When you synchronize the
first time, empty the ”Start time” field. Enter a valid, absolute directory path
on your server machine into ”Target directory”.
• Click the ”Synchronize” button. This may take a moment, but you should see
a list of written files at the bottom of the dialog afterwards.
• Change to the directory you specified for synchronize.
• Calling ”ant” inside the directory should create a fresh .xar file in the build/
subdirectory.
Note: for security reasons, the password you entered when creating the app is not
stored in the database, so it cannot be synced to disk. To restore a password for your
app, you thus need to edit the repo.xml file in the directory and add a @password
attribute to the permissions element.
Troubleshooting
Sometimes changes you make to the ODD might cause errors in the app, e.g. if a
predicate expression is wrong. The recompile step tries to check this and if it reports
an error, you should fix it immediately.
If you still happen to reload the page, you might run into an error page. In
this case, go to eXide and open the XQuery file modules/regenerate.xql. This is the
XQuery which would be called by clicking on the ”Recompile ODD” menu entry.
Execute the query once by clicking the ”Eval” button in eXide.
The same applies if you updated the core TEI Publisher library package: your
application may suddenly fail with XQuery errors. Just follow the steps above to
recover.
30
Introduction to TEI
Output Media Settings
The library supports various output media formats and translates styles into the
corresponding format. Currently the following output modes are supported and can
be used in the @output attribute:
web
Produces HTML output
fo
Generates a PDF via XSL:FO
latex
Creates a PDF via LaTeX
print
An alias which applies to both: fo and latex modes.
epub
Similar to web concerning features, but targetted at epub documents
The quality of the generated output may vary a lot for the fo and latex modes,
depending on the type of input document. The following section provides more
details on the configuration of the FO output option:
FO Output
When generating XSL:FO output, the implementation tries to translate the CSS
rules specified for renditions into the corresponding XSL:FO formatting properties.
Not all CSS properties are recognized or can be mapped to FO properties. Unknown
properties defined in a rendition will be ignored.
The default rendering for headings, paragraphs and the like is defined by a separate CSS file. The implementation merges those defaults with the custom renditions
given in the ODD.
The library searches for default CSS styles in a file named ¡odd-name¿.fo.css inside the specified output collection (in which the generated XQuery files are stored).
The style definitions are copied literally into attributes on the output XSL:FO elements, so any property which is a valid attribute for the corresponding element may
be used. For example, teipublisher.fo.css contains:
.tei-text { font-family: ”Junicode”; hyphenate: true; } .tei-floatingText { padding:
6pt; } .tei-p { text-align: justify; }
Every XSL:FO document needs a master layout and a page sequence definition.
Because those tend to be rather verbose as they include things like page margins
etc., they are read from two XML files:
master.fo.xml
Contains the layout master set
page-sequence.fo.xml
Defines the main page sequence
Introduction to TEI
31
The mechanisms for configuring FO output are still very much under development
and we welcome suggestions by users.
LaTeX Output
The latex output mode produces good results for longer texts which fit well into
the pre-defined LaTeX environments. The number of supported CSS properties is
limited though:
• font-weight
• font-style
• font-variant
• font-size
• color
• text-decoration
• text-align
• text-indent
We’re looking for contributors familiar with TeX to support more properties.
Extension Modules
Where possible, developers should stick to the standard processing model functions
for defining behaviours. However, there might be situations in which one has to
generate a specific type of output, which is not handled by the default function
module. To facilitate this, the implementation allows additional extension modules
to be configured:
Configuration is done via an XML file which should reside in the same collection
as the source ODD files. It contains a series of ¡output¿ elements, each listing
the extension modules to be loaded for the given output mode. Each definition
may optionally be limited to a specific ODD, whose name is specified in the @odd
attribute.
¡modules¿ ¡!– General fo extension functions –¿ ¡output mode=”print”¿ ¡module
mode=”print” uri=”http://www.tei-c.org/tei-simple/xquery/ext-fo” prefix=”ext-fo”
at=”../modules/ext-fo.xql”/¿ ¡/output¿ ¡!– Special web configuration for the documentation (to handle ¡code¿) –¿ ¡output mode=”web” odd=”documentation”¿
¡module mode=”html” uri=”http://www.tei-c.org/tei-simple/xquery/ext-html” prefix=”exthtml” at=”../modules/ext-html.xql”/¿ ¡/output¿ ¡/modules¿
Whenever the library tries to locate a processing model function for a given
behaviour, it will first check any extension module it knows to see if it contains a
matching function. One can thus overwrite the default functions as well as define
new ones.
32
Introduction to TEI
To be recognized by the library, an extension function needs to accept at least
4 default arguments, plus any number of custom parameters (to be passed in the
behaviour attribute). The processing model implementation tries to fill each custom
parameter with a corresponding value by looking through the ¡param¿ tags in the
ODD to find one with a name matching the variable name. If no matching parameter
can be found, the function argument will be set to the empty sequence1 . The default
$content will always be filled, except for empty elements.
For example, our extension module ext-html.xql may look as follows:
xquery version ”3.1”; (:˜: Non-standard extension functions, mainly used for the
documentation. :) module namespace pmf=”http://www.tei-c.org/tei-simple/xquery/exthtml”; declare namespace tei=”http://www.tei-c.org/ns/1.0”; declare function pmf:code($config
as map(*), $node as element(), $class as xs:string, $content as node()*, $lang as
item()?) { ¡pre class=”sourcecode” data-language=”{if ($lang) then $lang else
’xquery’}”¿ {$config?apply($config, $content/node())} ¡/pre¿ };
It defines one function, pmf:code, which can be called from the ODD as follows:
¡model behaviour=”code”¿ ¡param name=”lang”¿@lang¡/param¿ ¡/model¿
1
You should not enforce a type or cardinality for any of the custom parameters as this may lead
to unexpected errors. The parameters may always be empty or contain more than one item.