Digital Repository - openbook

Running head: DIGITAL REPOSITORY
1
Introduction
Technology has advanced making obsolete the dreaded walk across campus to the library
to scourer books and periodicals for hopes of finding relevant information for a research topic.
Now a trip to the computer lab or logging on from home will can give you access to digital
repositories with digital information from around the world. There are many repositories that are
“open” enabling nearly unrestricted access to information, some you must be a member.
Eitherway, digital repositories have enabled libraries and historical societies to preserve the past
in a format that will not deteriorate from exposure, now rare artifacts can be stored safely in
vaults but the information they hold available for use thru digital media. In fact, now that
information is available on the global level without stepping on a plane. Digital Repositories are
not without potential faults as they keep up with demand, and technology.
In this chapter, we will give an overview on digital repository associated with open
education environments. It will cover basic descriptions of digital repository, current uses,
benefits, challenging/issues, and solutions. We also discuss how the repositories are employed in
open access environments.
Background Information
The institutions in higher education have contributed to produce numerous intellectual
outputs, and recently the amounts of intellectual products are increasing much more rapidly than
in the past (Heery & Anderson, 2005; McCord, 2003). Deposited content collections of large
amounts of data are stored in information environments called 'repositories'. Thus, repositories
are important for the institutions in order to collect and manage intellectual assets as a part of
their information strategy (Heery & Anderson, 2005; Mark, 2008).
Digital and Nondigital Repository
Running head: DIGITAL REPOSITORY
2
Traditionally, libraries are used as a typical solution for data preservation (Mark, 2008).
This type of repository contains diverse types of data such as books, publications, journals,
audio/visual materials, and other instructional materials. The libraries serve as data storage and
share it with people who can access the library data physically (Duncan, 2003; Hayes, 2005).
Libraries have historically offered repositories of content in paper form. Physical repositories,
however, have limitations: storing large amounts of data, spending much of the budget on data
maintenance, and only allowing limited access to data (Mark, 2008; Saleh, Adly, & Nagi, 2005).
One of the solutions to overcome the limitations is digitalization, which more efficiently
preserves data and gives users easier access.
Basic Description of Digital Repository
These digitalized data require new storage space, so the institutions must be responsible
for having digital repositories. The digital repository can be defined as a digital storage space
where digital contents and resources are saved (Hayes, 2005). The digital repository is used for
searching and retrieving data for further use, as well as storing the digital materials because the
repository offers a convenient digital data base using environments which are able to store,
search, recall, and reuse digital assets (Hayes, 2005; Zainab, 2010). Thus, the digital repository
can be used by many different communities, such as institutions and organizations, where there
is a need to store data in digital infrastructure for diverse purposes. Stored contents may vary
depending on the communities’ purposes. In the institutions, the repository may typically include
digital textbooks, journal articles, research reports, theses/dissertations, and instructional
materials (see Figure 1 for a representative structure of a digital repository and to see how digital
resources are accessed and managed).
Running head: DIGITAL REPOSITORY
3
Figure 1. The structure of digital repository
Characteristics and Examples of a Repository
The advent of digital technology and high speed networks leads to widespread uses of
different types of data storage (Saleh, Adly, & Nagi, 2005). What makes repositories distinctive
from others, such as databases, digital libraries, institutional repositories, and digital archives?
The digital repository can be differentiated from other digital collections by the following
significant characteristics (Heery & Anderson, 2005):

Digitalization—The contents must be digitalized within the arranged data format.
Running head: DIGITAL REPOSITORY

4
Comprehensive—The repository contains different types of digitalized contents and
metadata.

Manageable—The repository must offer a set of basic services: access, search, use, store,
and manage.

Preservation—The repository must be sustainable for preservation.

Open access—The repository must provide its contents to the public as open access.
However, every digital repository does not satisfy all of these conditions. Each repository
has different goals, purposes, functions, and software as shown by the examples of repositories
in Table 1 below (Green, 2005).
Table 1. Digital repository examples
Massachusetts Features: Contents of repository are identified and stored by communities such
Institute of
as departments, laboratories, and research centers. Primary goal is to
Technology
“capture, distribute and preserve the intellectual output of MIT and to offer
the opportunity to provide access to all the research of the institution through
on interface.”
Software/Application used: DSpace
Contents: Materials must be education-oriented in digital format, and produced
by an MIT faculty member.
Access URL: http://dspace.mit.edu/
California
Features: Faculty, staff, and students at the University of California can deposit
Digital
contents. People who have permission from the publisher also can put data
Library:
on the repository. Guideline explains the strategy to “influence scholarly
eScholarship
communication and provide a publishing platform for electronic journals,
Repository
and leverage library buying power.”
Software/Application used: Berkeley Electronic Press software.
Contents: Research or scholarly output such as journals, books, working papers,
conference proceedings, and seminar/paper series
Access URL: http://escholarship.org/
Georgia
Features: Primary purpose of SMARTech repository is to preserve and provide
Institute of
access to the intellectual output in order to support the research and
Technology:
educational endeavors of the Georgia Tech community and to other scholars
SMARTech
all around the world. Contents in SMARTech are open to the public, but
Repository
requests for the items are reviewed in some cases.
Software/Appication used: DSpace
Contents: SMARTech repository contains more than 30,000 items including
Running head: DIGITAL REPOSITORY
University of
Michigan:
Deep Blue
Harvard
University:
Digital
Repository
Service
theses and dissertations from Georgia Tech.
Access URL: http://smartech.gatech.edu/
Features: Faculty, staff, and students at the University of Michigan can deposit
their works and scholarly outputs. Guests who want to access data need to
request “Friend” account and must be permitted before accessing.
Software/Application used: DSpace
Contents: Contents in Deep Blue must be educational, artistic, or researchoriented. The items consist of formal and informal publications, books,
theses/dissertations, data sets, computer programs, audio visual materials,
and learning objects.
Access URL: http://deepblue.lib.umich.edu/about/index.html
Features: Any Harvard organizational unit and individuals in the Harvard
community with organizational sponsor are able to deposit objects. The
repository is not intended “to function as a record management system or an
institutional repository” (i.e., the repository does not capture all of the
research outputs).
Software/Application used: The DRS batch loader and Batch Builder
application
Contents: The DRS allows to deposit “Library-like” objects (intended to support
research and pedagogy, or materials with persistent value)
Access URL: http://hul.harvard.edu/ois/systems/drs/
Digital Repository and Open Access
Benefits of Digital Repositories
Digital repositories offer various benefits to researchers, instructors, students, and
institutional communities as well as having great potential for data preservation. Digital
repositories have many practical benefits:

Retaining and managing intellectual assets: The most important function of digital
repositories is to retain and manage digital resources efficiently.

5
Ease of access: Digital resources in a repository are available to access via client
application/software as well as web browsers which provide quick, easy and remote
access to digital resources.
Running head: DIGITAL REPOSITORY

6
Open access: Digital repositories offer access for everyone who may be interested in the
repositories’ resources. Since the resources are open to service providers such as Google
Scholar, data access is allowed to all Web users.

Use, re-create, and reuse: Data in the repositories can be used, modified, and reposted. It
would facilitate knowledge construction and contribute to new research and education.

Wide range of content: Digital repositories contain various types of materials (i.e., audio,
video, and image) as well as text-based files (journals, theses, dissertations, magazines,
etc.).

Permanent data: Digital repositories offer permanent URL naming which minimizes
broken link problems.
Digital Repository in Open Access
Higher-education institutions are employing digital repositories in order to manage and
share their educational resources for teaching and learning (Margaryan & Littlejohn, 2008). The
advantage of using repositories in the institutions is that it helps institutions to store intellectual
assets and share them with others. One of the more important features of employing digital
repository in institutions is to share the intellectual products with the public in support of the
current worldwide open-access movement. Repositories in institutions may increase
opportunities for efficient use of such data and also encourage collaboration with other groups.
Thus, the repositories can extend the possibility of content use in open education. Here are some
possible uses of digital repositories.
Example 1. Assume that a teacher is looking for a geology lesson plan about how the
Richter scale is determined, how it is classified, and how frequently earthquakes happen.
Running head: DIGITAL REPOSITORY
7
The teacher found a lesson plan in the digital repository and skimmed through it. Then,
she recognized that the lesson plan should include some recent data about how frequently
earthquakes happen. She gained recent data from other resources in the other repository where
she can access without permission. Then, she added it in the lesson plan and uploaded the file on
the digital repository for further use.
Example 2. Assume that a student is looking for a simulation illustrating the different
sidereal period of the planets in the solar system.
The student found that BBB science university has abundant materials about astronomy,
and he tried to access the digital repository at that university. Although he is not a student at the
BBB university, he could reach the information without permission because the repository at the
science university is shared with the university, where the student is enrolled.
The examples about using digital repositories in the teaching and learning context
illustrate small portions of the potential of digital repositories. The repositories can be used for
sharing information and encouraging collaboration in diverse communities. Thus, the digital
repositories will enable open-education environments for teaching and learning.
Users of digital repositories include academic institutions, researchers, scientific
community, governments, businesses; educators, historians and any other group who want to
preserve information either new or old from around the globe. These groups as defined by the
OAIS Reference Model are the “designated community” for a digital repository, the identified
group of potential Consumers who should be able to understand a particular set of information
(Consultative Committee for Space Data Systems, 2002). Each community has the ability to
bring together from a global audience individuals with the same interest, skills or even hobbies.
As with the uniqueness of each community’s needs, the motivation for development and the
Running head: DIGITAL REPOSITORY
8
governing policies for their repositories can differ. The key services provided by a repository
may vary functionally to included: “Enhanced access to resources; New modes of publication
and peer review; Corporate information management (records management and content
management systems); Data sharing (re-use of research data, re-use of learning objects);
Preservation of digital resources” (Heery & Anderson, 2005).
There are many repositories which main purpose and mission vary. The two websites
below provide vast searchable listings of open repositories.

Registry of Open Access Repositories. http://roar.eprints.org: this website provides a
searchable list of open repositories. Search filters include: content, country, type and
software. “The aim of ROAR is to promote the development of open access by providing
timely information about the growth and status of repositories throughout the world.
Open access to research maximizes research access and thereby also research impact,
making research more productive and effective” (eprints, 2011).

SPARC Collected Repositories. The SPARC Collection actually list several sites that
provide directories to repositories. Repository 66 (Repository66.org) “This site is a
//mashup// of data from //ROAR// and //OpenDOAR// overlayed onto //Google maps//.”
(Lewis, 2007) The site displays a Google map and the location of each repository. It is
really interesting to get the global view.
Key Issues
Institutional repositories have emerged over the last decade as a new strategy to allow
universities to expand their capabilities and to increase their value to the educational community.
Today’s challenges include advancing digital repositories to a point where needs are recognized
and defined, technical approaches are at a minimum only superficially mapped out (Lynch, 2003).
Running head: DIGITAL REPOSITORY
9
It seems that with increased interest from institutions to form digital libraries so has the interest
in defining best practices and guidelines for digital repositories. In reviewing The National
Archivers Standard for Record Repositories; The Consultative Committee for Space Data
Systems, Reference Model for an Open Archival Information System and Research libraries
Group, Trusted Digital Repositories: Attributes and Responsibilites reports there are some
common elements that seem to lend to some uniform criteria for development of a successful
digital repository: integrity of the digital repository, controlling who has access and circulating
the use of the repository, remaining current with technology and digital formatting, and
maneuvering copyright laws.
Integrity of Digital Repositories
A successful digital repository must have integrity; its users must be able to rely upon it
to retain information submitted and provide usable material when queried. The academic
institutions, national archives, and museums that have it as part of their mission to hold our
history, art , scientific data and much more for preservation and the educators, students, or
random information seekers who deposit, use, and resubmit information count on the digital
information to be uncorrupted over time. As books and artifacts are placed in deep storage or
new digital information and history is created and held in digital repositories how do managers
ensure that the information is not corrupted? Even more critical is the “born-digital” materials
whose value is solely based on it as an information artifact because being digital is not only a
method of access but also its identity. (Research Libraries Group (RLG), 2002). For these “borndigital” materials there existence lies in the some memory chip in a server. If the original data is
lost, or corrupted, that piece of history or data is essentially gone. In addition to protecting data
from corruption, digital repositories have the responsibility to ensure that the data is retrievable
Running head: DIGITAL REPOSITORY
10
in some usable format. When information is stored in a repository there are a multiple of formats
that the information can be save as. Each format has then different characteristics that can affect
how it can be used and how much information can be retrieved by users of the repository. This
may vary depending on the community’s needs. For example, a repository may house a report on
climate trends, and they will also store the data that goes along with the report. However, based
on the format chosen to store the report and data, only the report may be retrievable by users.
These decisions made to decide the level of preservation must meet the needs of both the
repository and the user.
Digital repositories use metadata which is usually defined as “data about data”.
Repository managers must decide the level of access users will have to the metadata and original
source information. Also, how to ensure proper metadata is used promoting circulation of the
digital materials. There are many different standards for metadata and most standards and best
practices include the metadata having a placeholder authentication information; unfortunately,
they do not all require the use (Research Libraries Group (RLG), 2002). Digital information can
be fragile and because of that, the many fears with digital material focus on how easy it is for the
information to be changed, duplicated or corrupted. The use of any kind of authentication
information in digital material would enable the identification of any such changes. In addition to
metadata fields, there are capabilities to digitally fingerprint an item to verify its origin, or
imbedded check numbers to trigger notification of changes. LeFurgy (2002) from the U.S.
National Archives and Records Administration noted the following in his article on the Levels of
Service for Digital Repositories:
Current research indicates that digital materials can be managed independent of specific
technology. “Persistence” is the term used to indicate the degree to which this is possible.
Running head: DIGITAL REPOSITORY
11
For complete persistence, materials must adhere to strict conditions regarding their
construction and description. These conditions make it possible to use technology to
dynamically recreate a digital object based on explicit and consistent rules defining the
object's content, context, and structure. But in a world where few standards govern the
technical construction of a digital item (a report can exist in any one of a dozen common
file formats) and fewer still govern how an item is described (the report may or may not
identify an author, date of issue, or other descriptors), it is realistic to expect that many
materials will not fully meet the rigorous conditions for persistence (LeFurgy, 2002).
Keeping up with Technology
Technology is continually evolving. PDFs are commonly used document formats, but will
that change? In the beginning of PDF there were several competing formats what if a new format
replaces it like ‘DjVu’ which is promoted as an alternate to PDF and is an open alternative. DjVu
was created in 1996 at AT&T Labs to be a file format that would allow for capturing of printed
material into high quality digital copies and was targeted at libraries and archives for preserving
their books and documents (DjVu). This standard has evolved into the .djv/ .djvu format, which
has had growing success and penetration in the online world for eBooks, catalogs, and imagesharing” (DjVu, 2011). A search on Wikipedia for electronic book formats shows tables of 18 file
extensions and just as many readers. Which one will win the battle, which will become obsolete?
The OAIS model identifies that no matter how well an OAIS maintains its current holdings, it
will eventually need to migrate much of its holdings to different media and/or to a different
hardware or software environment to keep them accessible (Consultative Committee for Space
Data Systems, 2002). The challenge being what digital formats does a repository use that will
have the longest life to reduce the need for migrations.In the design of the repository and as the
Running head: DIGITAL REPOSITORY
12
community manages and collects materials file formats that allow for the persistence of the
repository need to be used. One of the benefits of bringing digital assets into a managed
repository framework is the promise of future proofing against technology obsolescence.
However, current repository software does not fully support the preservation process. The staff
with the necessary skills to undertake this work is few and far between (Heery & Anderson,
2005). Even with strict rules and guidelines as technology advances, the repository needs to
incorporate a “technology watch to manage the risk as technology evolves to protect preserved
materials and provide continuing access and updated methods of access (Research Libraries
Group (RLG), 2002). Thus, at some point in the life of a digital repository the managers will
need to look at prospects of having to migrate information in their repository to maintain pace
with technology and remain a viable tool for users. It at these points they must also keep in mind
the putting their information at risk of corruption. Additionally, what information do they migrate,
all of it? Is it all still used? Do they only migrate parts for use and retain the other in some format
for preservation?
Ownership
Ownership and the rights connected to intellectual property can become an issue of
concern in the long term preservation of material. The digital content can be put at risk by both
the legal restrictions on the information as well as the restrictions on the software itself.
Intellectual property includes both the information and the software are covered under copyright.
Copyright can limit access to information and the ability to change or reuse the information.
Additionally, with the ease of access to the digital information it is easier to copy and possibly
infringe on copyright restrictions. Access is provided through licensing arrangements for many
digital materials that are considered “intergral” to research collections and archives, an
Running head: DIGITAL REPOSITORY
13
organization may own the right to access material or use the software for a specified period there
is often no guarantee of rights beyond the terms or the license (Research Libraries Group (RLG),
2002). If a repository looses rights to use software for digital content in their holdings it seemed
unclear as to what the longevity of that content would be. Gathering rights metadata and
including it in an institutional information system or database will allow users with some basic
copyright understanding to make thoughtful judgments about how the law may affect use of the
work in accordance with a legal exception (Whalen, 2008).
Solutions – Governing policies, cloud computing and cooperation
Repositories are amazing resources, literally placing a world of information and
knowledge at ones fingertips. In spite of innovation and progress institutions have made in the
development of digital repositories there seems to be new hurdles developing on the horizon as
technology advances and the sizes of these digital collections grow. As the repository defines
policies about its “designated community” within that they should also define the materials to be
collected. With knowing that the repository is going to grow from its inaugural deposit, mangers
need to define the levels in which information will be stored. To provide effective resource
discovery and preservation across distributed repositories there must be agreement on an overall
technical architecture: metadata standards, agreed method of linking to digital resources, and
common resource discovery protocols (Heery & Anderson, 2005). Institutions will need to
decide where there actual digital holdings will be. Some may choose to use on site servers
however those are plagued with limited size, limited data storage space, and risk posed by fire or
disaster to the space holding the physical server. As another option with the further development
of “Cloud” computing repositories can have nearly unlimited digital storage space. As well as
peace of mind to know that there information is secure. There are now third party venders such
Running head: DIGITAL REPOSITORY
14
as DuraCloud who release their service in 2011. “The service builds on the pure storage from
expert storage providers by overlaying the access functionality and preservation support tools
that are essential to ensuring long-term access and durability” (DuraSpace). The service they
provide actually uses other cloud storage services but they maintain the information on each of
the clouds. This provides the customer with layers of backup for their information. Then when
one item is updated it shares that with all the other storage locations. Since every cloud storage
device at some point has a physical location, that physical location is vulnerable. By utilizing
multiple cloud storage services, which they state are commercial and non commercial they have
duplicated the information in such a way that a catastrophic lost of one storage device would not
wipe out the information. This is of course based on the unlikely hood of an event taking out all
servers. LeFurgy (2002) sums up the future needs of digital repositories - “Archivists, librarians,
and others with an interest in preserving and making available digital information face an
impending paradox”. This stems from the prospect of developing solutions for long-term
management of digital records, publications, and other objects (LeFurgy, 2002). A time will
come when not all digital information can be or will be saved in easily accessible formats.
Consideration for space and organization of the information needs to be considered. In addition
to schemes being developed and put in place to prevent redundancy of information.
Collaboration among the institutions managing these repositories to share information and share
the workload will be an aid to continued success and development (Heery & Anderson, 2005).
This coordination along with the interoperability of systems will facilitate innovative work and
the future sharing of information.
Summary
As more information is digitized and collections grow digital repositories will need to
Running head: DIGITAL REPOSITORY
15
develop further to meet increase needs. Cloud computing is taking digital repositories to another
level providing effective backup information and enabling larger data file collection. When will
those systems be bogged down, what will tomorrows needs be for continued success. Digital
repositories have enabled global sharing of knowledge.
Running head: DIGITAL REPOSITORY
16
References
Consultative Committee for Space Data Systems. (2002, January). Reference Model for an Open
Archival Information System (OAIS). Retrieved June 1, 2011, from CCSDS.org:
http://public.ccsds.org/publications/archive/650x0b1.PDF
Crow, R. (2002). The case for institutional repositories: A SPARC position paper. Retrieved from
http://www.arl.org/sparc/bm~doc/ir_final_release_102.pdf
Duncan, C. (2003). Digital repositories: E-Learning for everyone. Paper presented at the Elearn
International Conference, Edinburgh, Scotland, UK.
DjVu. What is DjVu. Retrieved June 8, 2011 from http://djvu.org/resources/whatisdjvu.php
DjVu. (2011, April 27). In Wikipedia, The Free Encyclopedia. Retrieved June 10, 2011, from
http://en.wikipedia.org/w/index.php?title=DjVu&oldid=426261999
DuraSpace (n.d.) DuraCloud. Retrieved June 6, 2011, from DuraSpace.org:
http://www/duraspace.org/duracloud.php
eprints. (2011). Registry of Open Access Repositories. Retrieved June 2, 2011, from
http://roar.eprints.org/
Green, A. (2005). Review of digital repositories (Project Report). Retrieved from Yale University,
Integrated Library Technology Services, Research and Planning:
http://www.library.yale.edu/iac/documents/DR_Review_final_27Sept05.pdf
Hayes, H. (2005, August). Digital repositories: Helping universities and colleges. Community
Contributions. Retrieved from http://www.jisc.ac.uk/uploaded_documents/JISC-BPRepository%28HE%29-v1-final.pdf
Heery, R. & Anderson, S. (2005). Digital repositories review (Project Report). Retrieved from
Joint Information Systems Committee on JISC website:
Running head: DIGITAL REPOSITORY
17
http://www.jisc.ac.uk/uploaded_documents/digital-repositories-review-2005.pdf
LeFurgy, W. G. (2002, May). Levels of Service for Digital Repositories. D-Lib Magazine, 8 (5).
Retrieved from http://www.dlib.org/dlib/may02/lefurgy/05lefurgy.html
Lewis, S. (2007). Repository66.org About. Retrieved June 3, 2011, from Repository66:
http://maps.repository66.org/blog/about/
Lowry, C. B. (2006). ETDs and digital repositories: A disciplinary challenge to open access?.
Libraries and the Academy, 6(4), 387-393.
Lynch, C. A. (2003). Institutional Repositories: Essential Infrastructure for Scholarship in the
Digital Age. ARL (226), 1-7. Retrieved from http://www.arl.org/bm~doc/br226ir.pdf
Margaryan, A. & Littlejohn, A. (2008). Repositories and communities at cross-purposes: Issues
in sharing and reuse of digital learning resources. Journal of Computer Assisted Learning,
24, 333-347.
Mark, J. (2008, May 7). Everybody’s repositories. Retrieved from
http://everybodyslibraries.com/2008/05/07/everybodys-repositories-first-of-a-series/
McCord, A. (2003). Institutional repositories: Enhancing teaching, learning, and research.
Retrieved from Lawrence Technological University and University of Michigan,
EDUCAUSE Evolving Technologies Committee:
http://net.educause.edu/ir/library/pdf/DEC0303.pdf
Research Libraries Group (RLG). (2002). Trusted Digital Repositories: Attributes and
Responsibilities. Mountain View, CA: RLG. Retrieved from
http://www.oclc.org/research/activities/past/rlg/trustedrep/repositories.pdf
Saleh, I., Adly, N., & Nagi, M. (2005). DAR: A digital assets repository for library collections.
Research and Advanced Technology for Digital Libraries, 116-127.
Running head: DIGITAL REPOSITORY
18
The National Archives. (2004). Standard for Reocrd Repsotiories, First edition, 2004. Retrived
from http://www.nationalarchivers.gov.uk/documents/informaitonmanagement/standard2005.pdf
Whalen, M. (2008). Rights Metadata Made Simple. In T. Gill, A. J. Gilliland, M. Whalen, & M.
S. Woodley, Introdution to Metadata Online Edition, Version 3.0 (Online edition, Version
3.0 ed.). Los Angleles: Getty Publications. Retrieved from
http://www.getty.edu/research/publications/electronic_publications/index.html
Zainab, A, N. (2010). Open access repositories and journals for visibility: Implications for
Malaysian libraries. Malaysian Journal of Library & Information Science, 15(3), 97-119.