Islandora and Linked Data in Colonial Architecture

ISSUES RELATED TO
LINKED DATA
Frits van Latum
Delft University of Technology Library
Islandora Camp 2015, Madrid
Colonial Architecture
• Content types:
• Buildings, Maps, Images, Films, Documents, Persons,
Organisations
• + Concepts (from AAT) and Geographical concepts (from
Geonames)
• Extra RELS EXT triples between these objects
• triple = object + predicate + subject
• the generation of triples was more or less “hard coded” by DG
based on the content types in Colonial Architecture
• (and assembly of metadata records for indexing and
display from several linked fedoraObjects)
Search for
Berlage
Book by Berlage
metadata
of the
document
(MODS)
RELS-EXT
triple to a
person
object
Berlage
metadata
of the
person
(SFOAF)
RELS-EXT
triples to other
Fedora
objects
Building by Berlage
metadata
of the
building
(MODS)
Building by Berlage
RELS-EXT
triples to
other
Fedora
objects
Book by Berlage
this is
redundant
because names, variants
of names, and other data
about Berlage are in the
metadata of this object
generated by our
MODS XML form:
…however…
MODS 3.6 finally has
an identifier element as
a subfield of name
Algorithm: extract a triple from XML
1.
the subject is:
• info:fedora/uuid:f0cb49e2-ddb9-4c2c-ab3f-8adbd7e7cef7
2.
find the predicate
• is the value of attribute xlink:arcrole in the element name
• gives http://id.loc.gov/vocabulary/relators/arc
3.
find the object
• is the value of attribute xlink:href in the element name
• gives po:13
4.
and add this triple to RELS-EXT
Result
• arc is confusing LoC speak for what should be called
hasArchitect, see http://id.loc.gov/vocabulary/relators
• and is not at all related to xlink:arcrole
Berlage
Algorithm: find more triples from
XML
1.
the subject of all the triples is info:fedora/po:13
2.
foreach xpath /tud:person/tud:education/tud:
organization
• make a triple with predicate: educatedAt
• and object: the content of the xpath
3.
and (analogously) foreach xpath /tud:person/tud:
altName
• make triples with predicate name
4.
and the same for more predicate object combinations
Result
literals
URI’s
outside of
Fedora
fedoraObjects
Remarks
• so think derivation rules for getting triples out of XML
• or better to say: to pick triples from XML
• these are about predicate - object combinations
• and of course this could be done by XSLT transformations …
• but the idea is:
• provide a form somewhere in /admin/islandora in which the repository
administrator can set up these rules
• and let Islandora do the derivation work using the provided
rules as parameters
• PHP and XPath (possibly with the help of XSLT or xQuery)
• inspiration for derivation rules: RDF Mapping Language (RML)
RML “standard”
• RML works with sets of rules
• Each set of rules consists of
1.
a source definition
2.
an iterator: an xPath expression that results in a list of xml elements
to apply the rules to:
3.
(a rule for finding the subject)
4.
several predicate-object rules to find predicates and objects
• => each set of rules “defines” a list of triples
with the same subject
• It is possible (and often necessary) to apply more sets of rules
to one source
In the case of Islandora
1: source definition:
• should be a content model and a datastream name
• (and optionally a collection to restrict the application of the set of
rules to the fedoraObjects of that collection)
2: iterator xPath:
• in most cases just the root element of the xml datastream
• but sometimes more than that: example follows
(3: a rule for finding the subject)
• can be omitted: the subject is always the actual fedoraObject )
In the case of Islandora
4: several predicate-object combination rules:
• a predicate rule can be a constant, e.g.: http://xmlns.
com/foaf/0.1/name
• or a reference: an xPath, e.g. find the predicate in: name/xlink:
arcrole
• or a template with one or more xPath’s
• an object rule can also be a constant, e.g. just use a fixed URI or
literal (a use case may be hard to find)
• or a reference: an xPath, e.g. find the object in attribute: tud:link
• or a template when something has to be added, like in: info:
fedora/{tud:education/tud:organization}
• the xPath’s are relative to the iterator
Example: Berlage
Example: Book by Berlage
General algorithm
• given a fedoraObject find the applicable sets of rules
• foreach applicable rule set:
• find the elements given by the iterator xpath
• foreach of these elements:
• and foreach predicate-object combination in the rule set:
• “generate the triples”
• add or replace them in RELS-EXT
• add or replace them in the triplestore (fedora 4)
• “generate the triples”
• generate the predicates from the expression in the field predicate
(in most cases just one)
• generate the objects from the expression in the field object
(might be more then one)
• and generate a triple for each combination
• the correct choice of the iterator prevents the generation of unwanted triples
Remarks
• it is probably convenient to add and use namespace
definitions to the form
• the triple derivation could be an extra and independent
derivation step
• that works on the standard RELS-EXT that is derived earlier in the
process
• maybe the rulesets should be stored in a datastream
belonging to the content model on which they work
Our use case
PDF
Paged
Content
Collection
Publication
Event
Agent
Geographical
Place
Concept
Image
Museum
Object
Video
Our use case
• Publications
• object with one or more “members” PDF, PPT, Video, etc.
• PagedContent
• with Page objects
• Agents: Persons and Organisation
• Publications => Agent: hasAuthor, hasContributor, etc.
• Events, e.g. Conferences
• Publications => Conferences: isConferencePaperOf
• Conferences => Agents: organizedBy, heldIn
• Multimedia: Image, Video, Audio
• maker, imageOf, etc.
• MuseumObjects
• maker
• Geographical places and Concepts
Demo
• TU Delft Library Development environment
Ingest and forms
• For linking purposes forms should have autocomplete
dropdown fields for adding ID’s of existing fedoraObjects
• supported by some sort of query (Fedora4: triple store?)
• or else some sort of pop-up search form to find a fedoraObject
• read-only information from the linked object presented alongside an ID
is a helpful service for repository editors.
• A fedoraObject should exist before it can be linked to
• could be supported by some sort of pop-up form
• Batch import
• first stage: import the objects
• second stage: pick the triples from the XML datastreams
Indexing and presentation
• With linked objects information is distributed over several
XML datastreams of several objects
• the name of an author is in a Person object
• the Publication object only knows the ID of the Person object
• The xml that is sent to SOLR therefore must be
aggregated from datastreams of several objects
• And likewise the html that is shown to the end user must
be aggregated from several objects
Thanks!

Download Report

Islandora and Linked Data in Colonial Architecture

Paperzz.com

Your Paperzz