Preliminary Lessons from the Use of JSTOR

Electronic Archiving & JSTOR
Kevin Guthrie
e-icolc, Thessanoliki, Greece
October 2002
www.jstor.org
Overview
• E-Archiving – Defining the Problem
• The Mellon Foundation Grant Program
Explanation
Lessons Learned
• E-Archiving Economics
• JSTOR E-Archiving Approach
E-Archiving – Defining the Problem
The Print Archive
• It occurs as a by-product of access. Journals or books are
purchased because of need and then retained.
• The content must be held locally.
• It is a system of countless local decisions and there is no
system-wide planning effort.
• The buildings housing the libraries lend themselves nicely
to fund raising.
• Library volume counts impact competitive standing.
• There is a significant but relatively stable and predictable
cost stream for maintenance.
How is an E-Archive Different?
• Challenges
– The dynamic nature of the formats
– We cannot predict the course of future software
developments – there can be no black box technological
solution
– We must establish new relationships between
preservation and use so that usage leads to preservation.
Benign neglect is not effective
• Opportunities
– Freedom from time and space
– Economies of scale in distribution
E-Archiving is a Growing Problem
• We are in a time of transition
• Users find the electronic version more
convenient – “copy of record”
• Many libraries bearing double costs, but
increasingly they are cancelling print
subscriptions, taking only electronic
• There is no systematic archiving solution in
place
E-Archiving:
Assumptions and Basic Premise
• The academic community needs a system of
trusted archives of “born digital” journal
content
• The trusted archives must have a
sustainable economic model and be able to
preserve the content for the very long-term
E-Archiving – The Mellon Foundation
Grant Program
E-Archiving:
Mellon Foundation Program
• There were seven grant recipients
•Cornell
•Harvard
•MIT
•NYPL
•Stanford
(LOCKSS)
•University of
Pennsylvania
•Yale
• The goal was to find a workable and sustainable
model for an ongoing e-archiving effort.
• To explore “presentation file” and “source file”
options.
Working Assumptions of the Source
File Archives
• Archive should be independent of publishers
– responsibility of institutions for whom archiving is a
core mission
• Archiving requires active publisher partnership
• Address long timeframes
• Archive design based on Open Archival
Information System (OAIS) model
Working Assumptions of the Source
File Archives
• Archive negotiates relationship with publisher
• Publisher deposits content regularly
• Content accompanied by metadata to support
discovery and preservation
• Archived content only accessible under specific
conditions
• Archive assumes responsibility for long-term
preservation
Questions that Arose
•
•
•
•
•
•
•
•
What is archived?
In what format?
When is archive accessible?
Who can access archived content?
What does the archive “preserve”?
Who does archiving?
How is the archive paid for?
How is the archive governed?
Content of e-journals not just fulllength articles
•
•
•
•
•
•
•
•
Journal description
Editorial board
Instructions to authors
Rights and usage terms
Copyright statement
Ordering information
Reprint information
Indexes, membership lists, errata, etc.
Challenging Content
• Masthead, “front matter” stored as web pages, not
in content management systems
• No control over the format of supplementary
materials (datasets, images, tables, etc.)
• Advertising very complex
– dynamic, frequently from third party, can involve
country-specific complexities
• Links frequently separate from articles
– regularly updated, sometimes dynamic
File Formats?
• PDF? SGML/XML? HTML? All or none?
• PDF ubiquitous but there are concerns
– Proprietary
– Emphasizes presentation, not meaning
– Is it preservable?
• Sometimes only choice
File Formats?
• PDF? SGML/XML? HTML? All or none?
• XML increasingly common
• Migration path seems more clear –flexible
• Many different DTDs. Can we develop a
standard archival exchange DTD?
• NLM/Mulberry/Inera/Harvard effort
Interchange DTD
• How low is the common denominator?
• What gets lost?
– inevitably sacrifices some functionality and original
appearance
• Transformation from publisher’s “native” DTD
involves risks
• Some technically difficult areas
– extended character sets, mathematical and chemical
formulae, tables. “generated text”
Access Terms
• Publishers prefer “dark” archives
– does not compete with publisher’s service
• If “dark”, what “trigger events” make it
accessible?
– after a given period of time (‘moving wall”)?
– when content is not otherwise accessible
(“failsafe”)?
– only when content enters the public domain?
Why is Access Important?
• How do you verify that what you are
preserving is accessible? Users are good
auditors
• Where do you find the resources to
underwrite the costs associated with
archiving?
• Archiving is a public good. Clean air,
public park. A mechanism for payment?
Who Should Pay for the Archive?
• Who benefits?
– Publishers, libraries, authors, scholarly societies…
– Is there a way to share costs?
• Cost categories include
–
–
–
–
Preparation of “archivable” objects
Ingestion and quality control
Long-term storage
Preservation activity
Mellon Foundation Program Findings
• Archiving seems technically feasible
• Publishers indicate that archiving is important to
them
• Progress on developing a common archival
exchange DTD
• Shared understanding that archives are necessary
to establish e-versions as the publications of
record and for it to be possible to let go of paper
subscriptions
E-Archiving:
Mellon Foundation Program Findings
• Challenges in the Organizational Model
– Follow-on grant proposals required substantial $ but
were potentially duplicative and in some ways
overlapping. There appear to be economies of scale.
– Difficult to effect coordinated activity by distinct
universities.
 Individual universities found it difficult to develop a business
model that would distribute fairly the costs, benefits, and
incentives associated with e-journal archiving.
 It was difficult to organize and justify a process for any one
university to take on the archival responsibility for others at the
scale required.
E-Archiving:
Mellon Foundation Program Findings
• Mellon concluded that an organizational entity
which is separate from the universities and which
is dedicated to the task of e-archiving journal
literature is needed.
The largest issue is: How to create a sustainable
economic model in support of an e-archive?
E-Archiving Economics
Archiving Economics:
What About an E-Archive?
• Presently, publishers are offering access to the content.
• We are truly talking about is the long-term preservation of
this content, unbundled from the access.
• The content and the archive are valued, but is there a
willingness to pay?
• Are institutions willing to pay insurance premiums for
archival protection?
• The lack of an economy associated with electronic
archiving is a huge challenge facing the community,
because we have no model in place.
JSTOR’s Mission
• To help the scholarly community take advantage
of advances in information technologies.
• To develop a trusted archive of core scholarly
journal literature, emphasizing conversion of
entire journal backfiles and preservation of future
e-versions.
• To enhance the accessibility of older journal
literature
In pursuing its mission, JSTOR takes a system-wide perspective, seeking
benefits for libraries, publishers and scholars & students.
Why Is JSTOR an Archive?
Mission is critical.
An archive must consider things such as:
– Technological Choices
– Data Backup and Redundancy
– Publisher Relationships – Perpetual Rights to
the Source Content
– Financial Strategies and Economics
– “Moving Wall” to Preserve Future e-Versions
Archiving Economics:
The JSTOR Example
• A&S I – approximately 7,700 volumes
• Building, storage & maintenance:
– Prime space: $125,000
– Remote storage: $31,000
• Circulation
– Prime space: $1 per use
– Remote Storage: $3 per use
• JSTOR Fees
– Archive Capital Fee: $10,000 - $45,000
– Annual Access Fee: $2,000 - $5,000
Archiving Economics:
JSTOR purchase as an example
• Even research librarians do not focus on the JSTOR
archival mission, they often just see JSTOR as a useful
database. Therefore, JSTOR is typically purchased from
the acquisitions budget.
• It does not recognize the overall value, nor the overall
savings to the institution.
• The capital part of the value is not fungible and not
recognized, but it exists.
• Who is the archiving czar? Is there an archive budget?
Archiving Economics:
How To Pay For Complex E-Archive
• There are no organizational and accounting systems set up
to underwrite the “archiving” function.
• No one is used to paying someone else for central
archiving.
• There is no building to name and no volume count to
promote.
• There is no budgetary line item.
• Despite the rhetoric, will institutions be willing to
underwrite a centrally held archive to preserve little-used
materials?
Article in Educause Review:
http://www.educause.edu/ir/library/pdf/erm00164.pdf
Archiving Economics:
Conclusion
• E-archiving requires some level of central
planning and coordination.
• Institutions will have to establish a mechanism to
provide funds to support such an effort.
• Governments may need to subsidize archives in
order to build them on a massive scale.
• The financial systems must be consistent with the
principles being pursued.
• Some form of access may need to be bundled with
the archiving/preservation function
JSTOR’s Approach to E-Archiving
JSTOR E-Archive
• Focused on planning for the receipt of
electronic data in accordance with moving
walls
• First lessons in data ingest connected to
Current Issues Linking effort
• Internal organizational approach has been to
use existing staff working as part of an earchiving working group
JSTOR E-Archive
• Establishing new unit dedicated to e-archiving.
• Have been granted $1.3M in start up funding from
the Mellon Foundation.
• Additional funding for the unit will come from
JSTOR, a “paying customer” of the new unit.
• We will have the same principals as with the print,
but a new business model is needed.
Why JSTOR as Organizational Home
•
•
•
•
•
Not-for-profit status
Mission
Non-competitive with publishers
Dedicated to long term preseravtion
Relationships with over 1,400 libraries in 70
countries
• Relationships with nearly 180 publishers
Why JSTOR as Organizational Home
• Appreciation for IP issues and evolving law
• Experience converting over 10M journal
pages and providing continual access to the
archive
• Experience developing sustainable access
and business models
• Positive and strong relationships with
various granting agencies
JSTOR E-Archive Anticipated
Activities
Mellon Grant: 18 month grant period.
The Goal: To establish a credible and
sustainable operation for e-archiving that
includes all the key components required for
an ongoing archiving enterprise.
JSTOR E-Archive Anticipated
Activities
• Establish the parameters of content.
– Determine what content will be preserved.
• Establish an access model which balances the
needs of publishers, librarians & scholars.
– Address the “public good” problem.
• Secure agreements with publishers.
– Begin with current JSTOR publishers.
– Explore other publisher relationships.
JSTOR E-Archive Anticipated
Activities
• Establish a production operation.
– Apply quality control lessons gained through
experience digitizing print.
– Build on progress made by Mellon program
participants.
• Build a technical infrastructure
– Compatible with the OAIS Reference Model
Where we’re starting
Exciting Challenges:
– Continue with the print.
– Begin archiving the electronic version of the
titles currently within JSTOR.
Kevin Guthrie
President, JSTOR
www.jstor.org