Final Report of Working Group 5 Interoperation

Final Report of Working Group 5
Interoperation
G. Simons (chair), H. Aristar-Dry,
D. Iannucci, E. Richter, H. Sicard,
N. Thieberger, P. Wittenburg
ELIIP Workshop, Salt Lake City, 12-14 Nov 2009
Interoperation

What is it?
– Interoperability is the ability for two or more
systems to exchange information or services and
to make satisfactory use of what is exchanged.

What does it take for this to happen:
– The systems agree on standardized definitions of
the concepts about which they want to share
– The systems use a standardized format and
protocol for information interchange
2
Why interoperate?




It prevents a centralized service from
duplicating the efforts of others
It maximizes data freshness since updates
are propagated when made by the owner
It makes a centralized service more
sustainable since others bear the cost of
providing data
It allows multiple centralized service to add
value to the same basic information
3
Ways to build a web
information service

Centralized database curation
– The service is self-contained: the service
defines the database, users edit the data
directly, the service curates the information

Centralized database aggregation
– The service has no data of its own: it uses
an interoperation protocol to populate the
database from other sources that curate the
desired information
4
The hybrid approach



The service uses an interoperation protocol to
aggregate all information it can get from
elsewhere.
The service develops a database to handle
new information it will curate (whether missing
data or alternative values). As a “good citizen”
the service shares its unique data with others
via the same protocol.
End users see a combination of the
aggregated and the curated data.
5
What does this mean for
ELIIP?

For each kind of information that the
centralized ELIIP service wants to offer,
it must decide whether to:
– Aggregate it,
– Curate it, or
– Do both

The answer can be different for different
kinds of information
6
What kinds of information?
1.
2.
3.
4.
5.
6.
7.
Web pages about a language
Existing language documentation
Summary index of documentation level
Projects and people
Training and revitalization programs
The language situation
The genetic classification
OUT OF SCOPE: Interoperation over language
data (like dictionaries and interlinear texts)
7
1. Web pages on languages

Two low-bar approaches to interoperation:
– Microformats: Harvestable metadata is
embedded in the HTML coding of a page.
– Predictable URL: A web site that offers
information about many languages has a main
page for each language with a base URL
parameterized by the ISO 639-3 code
8
ELIIP could …



Define microformats and provide a service
for crawling pages on sites that use them
Identify web sites that should implement
predictable URLs and provide funding to
incentivize needed changes on those sites
Provide a service for registering base URLs
and boilerplate metadata so that OLAC
records are generated for all language
codes that yield a page
9
2. Existing documentation




A working interoperation infrastructure
already exists in OLAC
ELIIP should aggregate from OLAC to
avoid duplicatin work
But there are huge gaps in the OLAC
coverage
Thus ELIIP needs a hybrid approach as
OLAC data provider to fill the gaps and
as OLAC service provider to aggregate 10
Filling the gaps
Since …
ELIIP could …
Many language archives
don’t participate in OLAC
Many resources are being
put in generic OAI-based
institutional repositories
Many resources are conventionally published or posted
directly to the web
Many linguists don’t have a
place to deposit their work
Help those archives become
OLAC data providers
Run a service that harvests
those resources and assigns
linguistic metadata to them
Curate a database in which
linguists can enter metadata
for those resources
Curate a digital repository of
language documentation
11
3. Documentation index



A numerical index that summarizes level
of language documentation (as at
AUSTLANG) is desirable
The OLAC aggregator (especially after
ELIIP fills the gaps) provides a list of all
the resources by linguistic data types
What’s needed is a way to convert those
to a measure of extent
12
ELIIP could …

Participate in the OLAC process to refine the
linguistic data type vocabulary as needed
– E.g. add “language instruction”

Participate in the OLAC process to add a new
recommendation for <dc:extent>
– E.g. lexicon/0, lexicon/1, lexicon/2, lexicon/3


Promote its adoption by all OLAC participants
and add curated judgments where that fails
Develop an overall numerical index that
combines results over all the data types
13
4. People and projects

The OLAC infrastructure can support this

DCMI Type vocabulary:
– Event: A time-bounded occurrence


A project can be described in an OLAC
record using elements like Contributor,
Language, Linguistic data type, Description
An advantage of this approach is that
projects appear with all other resources
in any OLAC-based service
14
ELIIP could …



Propose a metadata refinement to
distinguish a project from other kinds
of “events”
Curate records that allow linguists to
describe their own projects
Help players like funding agencies
with databases of relevant projects to
become OLAC data providers
15
5. Training and revitalization



The OLAC infrastructure can support this
A training course or revitalization program
can be described in an OLAC record with
DCMI Type = “Event” + OLAC resource
type = “language instruction” + Language,
Description, Identifier for a URL
This approach allows these programs to
appear with all other resources for the
language in any OLAC-based service
16
ELIIP could …


Curate records that allow these
programs to describe themselves
Help players who are curating
databases of training events to
become OLAC data providers
17
6. Language situation


No suitable interoperation standard yet
exists for population data, etc.
Are there other projects already curating
this kind of information such that
interoperation is desirable?
– E.g. UNESCO Atlas, Ethnologue, AUSTLANG

But interoperation will only work if all
the players agree to do it
18
ELIIP could …



During proposal phase, identify the projects
that should interoperate and secure
agreement in principle to participate
During the project phase, foster the
process among those players to agree on
standard definitions, format, and protocol
Could use the OAI protocol
– “olac” payload for the metadata
– “eliip” payload for the language information
19
ELIIP could also …


Provide a feedback mechanism that allows
a user to report an error back to the
provider of the aggregated data
Provide a publicly viewable tracking
mechanism to ensure accountability of the
data providers, e.g.
– Is a population in Ethnologue or UNESCO
wrong because they won’t fix it when
someone reports the right data, or because
the person who knows won’t tell them?
20
Nota Bene


None of the “ELIIP could” proposals up to this
point would require the overhead of a
governing body or regional captains to vet
individual data points (though they would still
have a role in recommending and vetting
aggregation sources).
That threshold is crossed if ELIIP chooses to:
– Curate its own version of language situation
data that it judges to be the most correct
21
7. Genetic classification



Same story as for “language situation”
information
If the set of data providers is the same as
for the situation information, then this
could be included in the interoperation
standard as a kind of situation information
If there is a different set of players, ELIIP
could foster the same process to develop
an interoperation standard for classification
22
Thought for the day

The aggregator lies at the sweet spot
in the value chain of today’s web
economy.
– E.g. Google, Amazon, iTunes, Netflix
– Cf. Chris Anderson, The Long Tail (2006)
23
Conclusion

There are many things that ELIIP could
do:
– To exploit the power of interoperation
– For mobilizing our community to share
information about endangered languages
– While minimizing what it must centrally curate

The task for the ELIIP planners is to
decide which of these things they want
to do
24