SDC Update - Coalition for Networked Information

Citation Linking for Electronic
Journal Articles
CNI Fall Task Force Meeting
Phoenix AZ
December 1999
What we’re going to talk about
• General model for reference linking (me)
– NISO / DLF / SSP / NFAIS workshops
– February 1999, Washington DC
– June 1999, Boston MA
• “Appropriate copy” problem (Dale)
– DLF Architecture Committee
– SFX
Why talk about it?
• Publicize the activity so far
• Seeking interested parties
– How can we move this effort forward?
– Who can/should participate?
What are we talking about?
• Reference (or citation) linking
– providing an actionable link from a reference to
an object
– focus on electronic journal articles
• References
– from index databases (A&I services, search
services, citation databases)
– from article “references” section (bibliography)
What are we talking about (cont)
• Links
– maybe URL
– maybe some other link-key (identifier)
• Objects
– works / manifestations / items => creations
– content vs. surrogates / substitutes
“Puddles”
• Closed systems where single agency
controls both citations and content
• Publisher(s)
– Elsevier’s ScienceDirect, Wiley’s InterScience
• Aggregator service
– OCLC ECO
• Discipline
– NASA Astrophysics Data System, PubMed
Puddles (cont.)
• User Community
– OhioLink, University of Toronto
• Problems with Puddles:
– ok when everything a user wants is inside the
puddle
– not ok when content is limited, arbitrary, or
incongruent with user needs
Open Reference Linking
• Any link to any object, regardless of which
system the link,
•
the object,
•
or the user is in.
• Assume multiplicity
• Require interoperability
WHAT WE ARE TRYING TO
ACCOMPLISH
Any old system
Citation
Citation
LINK
CLICK
LINK
Cited Article
MAGIC
Model for open reference linking
Publisher
Reference
Database
Location
Database
Identifiers Identifier
Citation
Client
Content
URLs
URL
Content
Pieces of the problem
• Get a link for a reference
• Resolve the link to one or more locations of
the target document
• Identify the most appropriate copy or copies
of the target document for the user
URL or Identifier
• Multiple locations
• Persistence
• Data management
• Nearly all implementations find identifier
necessary
• Identifier = “name based”
How to get a link: derived vs.
dumb
• Derived: Construct it from data in the
reference
– shared within a discipline (ADS)
– national standard (SICI)
– cope with multiplicity (S-Link-S)
• Dumb: Look it up from data in the reference
– e.g. DOI-X
How to get a link: static vs.
dynamic
• Static: Pre-constructed
– embedded in the source document
– stored in a table associated with the source
– Advantage: opportunity to review and correct
• Dynamic: Supplied on-the-fly
– looked up or calculated when citation or
reference displayed
– Advantage: currency and flexibility
Static and Dynamic Linking
Static
Dynamic
Index
ISI
OhioLink
Article
ADS
OpenJournal
Model: how to get a link
Publisher
Reference
Database
citation
Identifier(s)
Client
Resolve the link to location(s)
• For given identifier
– look up in database mapping identifier to
location(s)
– return list of locations where items may be
found
– return additional information to distinguish
between items (e.g. format)
Model: how to resolve a link
Publisher
Location
Database
Identifier
URL(s)
Client
How to resolve the link
• In puddles
– may be single type of link
– may be handled by system software
• In open reference linking
– will be multiple types of links
– need to find appropriate resolution service(s)
– need protocol for communicating with
resolution service
How to find appropriate resolver
• Currently
– Browser plug-in
– Proxy server
– Tunnel identifier in URL
• Future ?
– URN model of distributed resolution
– web browser support for user configuration of a
hierarchy of identifier resolution services
WHAT IF MORE THAN 1
COPY EXISTS?
• Elsevier journals, for example, are available
from
–
–
–
–
–
Elsevier ScienceDirect
University of Michigan PEAK
OhioLink
University of Toronto
Florida Center for Library Automation
WHICH URL?
Name
Resolver
NAME
URL?
Sciencedirect.com?
Ohiolink.edu?
Utoronto.ca?
Umich.edu?
FCLA.edu?
IT SOMETIMES DEPENDS ON WHO THE USER IS...
SOURCES OF MULTIPLE
COPIES
• Aggregators
– OCLC, EBSCO, Bell & Howell, Lexis/Nexis,
IAC…
• “Local loading”
– OhioLink, University of Toronto, University of
Florida…
• E-print
– xxx (LANL), Cogprints, RePec….
WHY MULTIPLE COPIES
• Performance -- may want highly used
objects “closer” to the user in network terms
• Different players can provide different
service models using same content
– e. g., gathering topically related materials into
knowledge bases (Ovid)
– published and unpublished articles in a single eprint service
WHY MULTIPLE COPIES
(continued)
• Competition in repository services
– Encourages functional innovation
– Rationalizes prices for services
• Archiving
– Institutional failure is as great a danger as
technological failure, particularly when dealing
with commercial players
CURRENT STATE
• Few working solutions (Linkout @ NIH,
SFX prototype @ UGhent and LANL)
• DLF/CNRI discussion of the following 3
models
– All intervene in the name resolution process to
select the appropriate URL to return
1 Name Resolution
Request
Local
Name
Resolver
2. Address (if
found locally)
OR
3. Address
2. Name Resolution
Request (if address
not found locally)
Universal
Name
Resolver
LOCAL CACHE
Filter
Server
2. Name Resolution
Request
1. Name Resolution
Request
3. Addresses
(URL1, URL2,
URL3….)
6. Address
5. Bibliographic
Data
4. Request
Bibliographic
Data (if appropriate
source is ambiguous))
Reference Server
PROFILE-BASED FILTER
Universal
Name
Resolver
Filter
Server
2. Name Resolution
Request
1. Name Resolution
Request
8. Address
3. Addresses
(URL1, URL2,
URL3….)
4. Availability
Query
4. Availability
Query
6. Availability
7. Availability
4. Availability
Query
5. Availability
Content Service 1
Universal
Name
Resolver
Content Service 2
Content Service 3
BROADCAST-RESULT- BASED FILTER
SOME ISSUES
• Ugly, ugly, ugly
– In part because linking is to articles, most
access based on serial title and year
• All solutions require a lot of coordination
• Users who are members of multiple “rights
communities” are a major complication
1. Service cookie-pusher
URL
7. Page of
links
2. Cookie
info
3. Cookie +
redirect to
service
6. Request for
links
SFX Server
EXTERNAL “SFX AWARE” SERVICE
5. Article or citation
+ SFX links based on cookie
4. Service request
+cookie
“Cookie Pusher”
Portal
Service
SFX LINKING SYSTEM
SFX vs Name Based Linking
• SFX
– generalized for many kinds of links (including
to paper copies…)
– requires explicit cooperation of citation source
• SFX does not simplify providing
appropriate link
– but can work with both algorithmic and namebased links
– and methodology provides bibliographic
context for link derivation
So…
• Different approaches have different
strengths
– mix and match possible
• The big issue: who has the motivation to
address this seriously?
• Interested? Contact us!
The requisite URLs:
Report on NISO/DLF/SSP/NFAIS meetings:
http://www.dlib.org/dlib/july99/caplan/07caplan.html
Paper on DLF/CNRI “appropriate copy” discussion:
http://www.niso.org/DLFarch.html
and contact information:
[email protected]
[email protected]