PowerPoint

Evolving Digital Libraries to Support
Geographically Distributed Scientific Research
Rick Luce
Research Library Director
Library Without Walls Project Leader
Los Alamos National Laboratory
Symposium on Knowledge Environments for Science
NSF, October 22, 2002
Some Puzzle Pieces for Digital Libraries
Content:
• Access
• Retrieval
Financial Models
• Funding
• Content licensing
Standards &
Interoperable
Frameworks
User Behavior
• User needs
• Collaboration
• Scholarly communication
changes
• Adoption curves
Enabling
Technologies
& Infrastructure
DL Models: Delivery
Delivery of Content & Services
Trend
• Libraries replicating one another
• Requires integrated framework
• Lack of interoperability
• Tough work
• Publisher pricing flip for e-content
• Old model of libraries facing
decline or aggregation
DL Models: Capture
Content Capture
Ingest repositories
• Easy entry in network environment
• Digitization of old stuff
• E-collections distributed but
archiving is unknown
• Largely publisher controlled today
Trend
New players emerging
Low barrier entry
eprint systems
Capture Systems
Eprint Systems:
xxx or arXiv e-print archive Physics: 1991 Ginsparg, LANL
 RePEc - (Economics - Surrey U - Krichel)
 NCSTRL - (Computer Science - Cornell U - Lagoze)
 NDLTD - (Theses - Virginia Tech - Fox)
 CogPrints - (Cognitive Sciences - Southampton - Harnad)
Harvesters
 ARC & ARCHON - Computer Science Dep’t, ODU
 SCIRUS – Elsevier
 even at the individual level … Kepler - ODU
Digital Library Hybrid
Delivery of
Content & Services
Authentication –
Shibboleth
Content Capture
NSFNormalization
NSDL
DLESE
OAI protocols
Standards
Share usage logs between nodes
Share citations & digital archives
New collaboration opportunities
DRM
29 Institutional Customers in the U.S.
Albany Research Cntr.
 Brooks AFB
 Brookhaven Nat’l Lab
 Eglin AFB
 Enviro Measurem’t Lab
 DOE HQ Energy library
 Fed. Technology Center
 Griffith AFB
 Oak Ridge Nat’l Lab
 Savanah River Co.
 Tyndall AFB
 Hanscomb AFB
 Wright Patterson AFB
 Montana State Univ

Stanford Univ
 Pacific Northwest
Nat’l Lab
 Edwards AFB
 Univ Nevada
 Idaho Nat’l Eng. &
Enviro Lab

4 New Mexico Universities
 Sandia National Labs
 Air Force Research Lab
 Nat’l Renewable Energy Lab
 Santa Fe Institute

Who has access to 80%+ of e-content

Sandia National Labs
Large fraction of scholarly content has significant
access restrictions & cost barriers
~8M full text articles
Open Access
3%
Copyright
restrictions
97%
~60M metadata records
Open Access
6%
Copyright
restrictions
94%
Challenges
FALLOUT: WITH PUBSCIENCE GONE, SIIA SEEKS OTHER
CLOSURES -- With PubSCIENCE now history, the trade association
that lobbied for its dismantling is reportedly set to focus its energies on
other freely accessible government information resources. According to
FEDERAL COMPUTER WEEK, Software and Information Industry
Association (SIIA) public policy director David Le Duc said the group was
"looking into a couple of other databases and agencies," in particular one
"law-related" and one that "has to do with agriculture." After more than a
year of intense lobbying by the SIIA, a major trade association for the
software and digital content industry, the federal government discontinued
PubSCIENCE in early November …They argue, that it is unfair for
taxpayer dollars to fund databases that compete with commercial products.
Library Journal Academic News Wire: November 19, 2002
Repository Models
• Distributed – MIT individual faculty upload and manage their own
scholarly output
• Semi-distributed – UC eScholarship assigns management
responsibility to organizational units (research units, departments) that
then assist faculty with uploading their papers.
• Semi-centralized - CalTech repository sites are set up for any
university unit, but the library uploads the papers on the faculty's behalf.
Its digital collections range from computer science technical reports to
theses and dissertations.
Institutional Repositories: Roy Tennant, 9/15/02
OAI’s role
OAI’s Role
So far: harvesting of descriptive metadata ...
but coming, harvesting of:
 references
 usage logs
 certification metadata
 metadata rights
 citation mapping
 co-citation visualization
 personalization
OpenURL
Information resources allow open linking by including a hook along with
each metadata description... which presents itself as an actionable
OpenURL
Create Shared User Group in MyLibrary
LANL Active Recommendation System
Adaptation of Structure and Semantics –Using Collective Behavior of Users
1. Knowledge contexts categorized
– Keywords & keyword semantic proximity
– Citations and citation proximity
– Semantic proximity
– Traversal proximity
2. Recommendation(s) calculated
3. Traversal proximity analyzed
4. Adaptation in system
Users + Profiles = learning community
LANL Active Recommendation System
Finding the Balance Point
Community specific tools
Encourage/support transdisciplinary research
Small teams
Deployable across Lab or
multiple institutions
New technologies, new tools
Legacy data & systems
Knowledge is represented by
articles, books, etc.
Knowledge characterized by
relationships among objects,
documents & resources
Known path, existing
infrastructure (people, buildings)
institutional pride
Hub/spoke model for DL’s:
balance resources and focused
efforts
Higher Order Thinking* …
• is nonalgorithmic (path cannot be fully specified in advance)
• tends to be complex (total path not visible from one vantage point)
• often yields multiple solutions (each with costs/benefits rather than
unique solutions)
• involves nuanced judgment and interpretation
• involves the application of multiple criteria (which sometimes conflict
with one another)
• often involves uncertainty (not everything bearing on the task is known)
• involves self-regulation of the thinking process (someone else does not
‘call the plays’ at every step)
• involves imposing meaning, finding structure in apparent disorder
• is effortful. (considerable mental work involved in the kinds of
elaborations and judgments required)
*Resnick (’87)
Visualization
• Scientific visualization – use of interactive visual
representation of scientific data, typically physically
based to amplify cognition
• Information visualization – use of interactive visual
representations of abstract, nonphysically based data to
amplify cognition
Successes
 Culture of measurement – long term focus on user
driven requirements and corresponding satisfaction levels
 Open Archives Initiative – small, quick, right players
 Eprint arXiv – communities of common interest,
timeliness, passionate people, didn’t take a lot of $$
 OpenURL – small, quick, right players, passionate
people, (standards efforts too long)
 MyLibrary – personalized, adhoc collaboration
? Recommendation systems with shared
knowledge models – uses available logs, complex,
privacy concerns
Challenges
 IP, copyright limitations
 Post 9/11 pressure to close government access
 Integrating formal and informal systems – need
new mechanisms for peer review and rewards
 Archiving – not glamorous but a research problem
 Problem space is larger than NSF domain –
– Requires cross organizational collaboration (DOE,
NIH, etc.) and international connections