Born-Digital Cultural Content Preservation and Permanent Access Ben Fino-Radin It can no longer be said that our world is becoming digital. In an era where the names of websites are vernacular verbs, we have arrived. We are already swimming in a vast sea of information and cultural content that has never existed in any form but digital. Everything – from journalism, books, and music, to visual art – is becoming digital in creation and consumption. We are entering the era of a borndigital cultural legacy. The term born-digital1 has been used to describe the current generation of youth. These future adults are growing up in a world completely saturated with digital devices, and cannot imagine a time before Youtube existed. While this is not the respect in which we will be using the term, its use is evocative of our current transitional point in history. When we talk about born-digital cultural content, this describes content that was created digitally, and is experienced within the digital environment. Specific examples would include emails, web pages, digital photographs, text messages, text files, and this paper itself. Some strictly define born digital material as having no analog counterpart, however this is too closed a definition. Rather we can say that this paper is born digital, in the sense that it’s primary form is a file on a computer – it’s physical form is simply an analog duplication of it’s primary form. This paper will explore the biases of the digital medium, and the unique challenges it presents to the field preservation. Palfrey, John G., and Urs Gasser. Born Digital: Understanding the First Generation of Digital Natives. New York: Basic, 2008. Print. 1 Let us first investigate the born-digital manuscript, and the process of writing. Except for dwindling traditionalists, most writers compose their works digitally. Early adopters of the personal computer, and word processing such as Salman Rushdie have been making use of the digital environment as their creative medium since the early 1980’s. As is the case with any medium, the computer carries it's own biases and tendencies. The computer affects everyone from creators to consumers in myriad of ways that we are only beginning to understand2. The manuscript, and the process of writing are heavily affected by what are possibly the medium's greatest strengths, and greatest challenges: malleability and impermanence of editing3. All physical forms of writing allow for an evocative glimpse into the writer’s process. With hand written text, we can observe erasures, with typewritten text, whiteouts or strikethroughs. With the digital medium, the difference (some see this as it’s beauty, some a problem) is its lack of evidence. I just erased an entire sentence. Currently if writers wish to preserve the process of their creation of works, this would require them to save a new copy of their file every time they wished to preserve it's current state. Whereas the physical medium provided a window into the writer’s process (edits, or lack thereof) by default, the digital medium does not. This dictates a form of process to the creative individual that requires a self-conscious form of preservation. It need not be this way, this is simply a reality that has been made default by software developers. The digital written word is one of the simplest forms of data, and it’s files are very small. This is one file format that has not significantly bloated in bytes over the course of time. A topic of much research and writing. Some recent publications include “The Shallows” by Nicholas Carr, “Cognitive Surplus” by Clay Shirky, and “Program or be Programmed” by Douglas Rushkoff 3 Schmitz, Dawn. “The Born‐Digital Manuscript as Cultural Form and Intellectual Record” Proc. of Time Will Tell, But Epistemology Won't: A Conference on Richard Rorty's Archive, University of California, Irvine. Web. <http://virtualpolitik.org/rorty/Schmitz_Rorty_paper.pdf> 2 Simultaneously, the cost of storage is constantly decreasing as drive capacities steadily increase. Why not take advantage of this discrepancy - offer writers an authoring tool that archives the evolution of their work. There are a few word processors that are targeted towards writers4, however most focus on "features" such as organizing the document into chapters, and offering a place for notes and character development. Rather than mediating the creator’s environment, we should be focused on helping them to preserve their process and legacy. There would be no understanding or appreciation of the flawless perfection of Mozart's edit-free manuscripts if they were digital. It would not enter our minds to consider whether or not they had been edited heavily, or written flawlessly. A common misconception is that digital content is more stable than traditional media. This is patently false. Consider the amount of ancient sculpture, pottery, painting, and manuscripts contained in the world's museums. What success might you have opening ten-‐year-‐old text documents on your current computer? With traditional media there is essentially one form of degradation: physical. Paintings fade, paper acidifies, sculptures fall apart. Born-‐digital content is affected by four forms of degradation: 1) physical obsolescence 2) physical deterioration 3) data obsolescence 4) data deterioration. To illustrate these aspects, lets say that you are a born-‐digital preservation specialist, and you work in the archives of a major library. A renowned writer donates their papers to your institution – wonderful. However, their papers are born-‐digital, taking the form of a spectrum of disks, hard drives, and computers. First consideration would be given to Physical obsolescence. This refers to the fact that storage media is in constant evolution. Without a way of interfacing with a computer 4 yWriter, PageFour, Manuscript, to name a few. data retrieval would be rendered impossible. Try reading a floppy disk with your current computer – chances are there isn’t a place to insert one. If there is a place for one, it is for a 3 ½” disk. If the hypothetical materials in question included 5 ¼” floppy disks, you would need to find hardware that could read this storage media – either a computer with a built-‐in drive, or an external floppy drive. This is of course to assume that the storage media has not suffered from physical deterioration. Most forms of storage are quite delicate5. In fact, the most common form of data storage – the hard drive – is the most tenuous. The inside of a hard drive closely resembles a record player, including metal platters that contain data, and a small arm, which reads the data. These delicate mechanical devices are highly prone to failure. Similarly, if a portion of our hypothetical born-‐digital collection included a CD, this media may be unreadable to do any number of physical factors (writable optical media will inevitably be unreadable in as few as five to ten years due to exposure to light). If neither physical obsolescence nor deterioration befall the collection, there are two remaining hurdles. Data obsolescence refers to the fact that all data is encrypted in a particular format. It is compressed so that it is more easily readable by whatever software it was created with. This is a problem since as software evolves, support for formats is discontinued, and files become reliant on the software they were created with. This is a problem because all software works within the framework of the particular hardware and operating system it was intended to run on. Since software and hardware exist in a perpetual symbiotic relationship of development, there are inevitably formats of data in our hypothetical collection that will require significant digital archeology to achieve a successful recovery. As our fourth and final challenge Storage is becoming less and less physically delicate though. Flash based memory is entirely digital has no mechanical parts. Although currently rather expensive per gigabyte, it will eventually completely replace hard drives and forms of portable storage. 5 data deterioration presents the threat that even if we successfully elude all three of the previous forms of degradation, the data in question may have become corrupt at some point. Any time a file is moved, edited, or transferred to different storage media, there is a very real risk of one bit being put out of its proper place, or written incorrectly. In some cases this creates minor but permanent glitches, in worst cases, it renders the data completely useless – unreadable by the computer. This illustrates the main difference between degradation of traditional materials and degradation of born-digital materials. In most cases born-digital materials are rendered completely inaccessible by degradation of any scale. This is why born-digital preservation is referred to as permanent access. Without proper preservation, we simply lose access to the data. Forever. Imagine a point in time where the English language has been forgotten. Books containing the language, immense cultural legacies, masterpieces, all would still exist in physical form - people would be able to see English words on the written page, but they would not be able to read, understand, or derive any knowledge from the dead language. Not only has history witnessed near occurrences of this hypothetical, but most people who have used computers for at least a few years have experienced this on a personal level. As time passes, software is re-written. People decide that they need a faster computer for a new piece of software they are using; only to find that another piece of older software they still rely on does not work on the new machine. Incompatibilities are simply a fact of life fed by an endless cycle of upgrades to hardware and software, fed by a combination of innovation and commerce-driven planned obsolescence6. The Internet is arguably the most complicated of all born digital materials because Laforet, Anne, Aymeric Mansoux, and Marloes De Valk. “Rock, Paper, Scissors And Floppy Disk” Pi.kuri.mu. July 2010. Web. 30 Oct. 2010. <http://pi.kuri.mu/rock/> 6 all of the aspects of degradation apply, and in addition to this, materials simply disappear. Content creators may discontinue their website for any number of reasons. However, there is something that has a far more severe impact than an individual taking down their site: large content publishing systems being discontinued. For example - imagine if wordpress or tumblr suddenly announced that they were closing down – all content would be gone as soon as they took down the site and disconnected their servers. This does and has happened. It usually occurs after the site has fallen into obscurity, and its inhabitants have moved on to the next platform. Without initiatives focused on the preservation of the Internet, a vast and diverse legacy would slowly but surely disappear. One recent example occurred on October 26th 2009. This is the day that Yahoo discontinued the web hosting service Geocities. During the mid 1990s Geocities was a vibrant community of DIY websites. This was an important period in the evolution of the Internet, as it was one of the largest, early examples of like-minded people gathering and forming virtual relationships, communities, and discussions, online. Thankfully, word of the discontinuation of Geocities spread, and a few key organizations effectively preserved its contents. Despite these seemingly endless challenges, there are also great and unique opportunities offered to the conservator. As institutions are faced with these challenges, and best practices develop, incredible innovations have begun to emerge. One major contributor to the field has been the born-digital team at the Manuscript And Rare Book Library (MARBL) at Emory University. In 2007 renowned author/critic Salman Rushdie donated his papers. Included in this collection were 4 computers and an external hard drive. These computers contained manuscripts, unfinished projects, and correspondence – all things of great interest to researchers. This situation provides not only the question of how to recover, stabilize, and preserve the data (most of which being over twenty years old) – but how to provide it to researchers in a practical and useful way that respects the privacy and wishes of a living writer? One of the solutions to emerge was a searchable and browse-able database of all of the files in a modern format useful to scholars - including file descriptions and full text access. Not only can Rushdie scholars read drafts of his written works, character studies, and emails to his publisher, but all of this data is available instantly. A search for “Vena”, a character from Rushdie’s novel “Ground Beneath Her Feet”, yields over 100 fulltext document results. By far the greatest and most unique innovation achieved by the team at Emory is an emulation of Rushdie’s Performa 5400. MARBL’s team of technologists was able to create an intact, bootable disk image of the computer. This in combination with custom designed software running on a contemporary computer allows researchers to essentially browse Rushdie’s computer as he left it. This offers the opportunity for an unprecedented human understanding of the way that the author used the tool of his trade. This level of perspective on the author’s creative process is like being able to see how Camus used a typewriter, or how he organized his desk. Now that we are in the midst of an era where our global cultural legacy is borndigital, we must plan for the future of our data footprint. If we don't we run the risk of partial and permanent loss of a moment in history7. Planning and Kuny, Terry. "A Digital Dark Ages? Challenges in the Preservation of Electronic Information." Proc. of 63RD IFLA Council and General Conference. 1997. 7 implementation must occur on two levels, personal and public. Personal, in the sense that creators must be custodians of their digital content: redundant backups, and a thorough understanding of the medium. Otherwise, their born-digital materials may not last long enough to reach the hands of an institution with specialists on hand. Beyond the personal scale, it must be recognized that such materials are very much at the mercy of the corporations that develop the hardware and software, which the data is created on, and is preserved with. Consortiums must be developed in order to establish standards, legislation even, for the future of data. There must be a global commitment to the standard of permanent access. This has been recognized by the U.S. Federal Government, and the Library of Congress which, in December of 2000 founded the National Digital Information Infrastructure and Preservation Program. "NDIIPP is based on an understanding that digital stewardship on a national scale depends on public and private communities working together. The Library has built a preservation network of over 130 partners from across the nation to tackle the challenge…" 8 The legislation appropriated $100 million, which has funded a broad range of projects devoted to preserving our cultural legacy. Archive-It9, is a subscription service that allows institutions to build customized archives of web-based resources. All of the data is hosted by the Internet Archive10, and is searchable as any other database subscription a library would use. The Web Archiving Service (WAS) is a “Web-based curatorial tool that enables libraries and archivists to “About the Program.” Digital Preservation (Library of Congress). Web. Oct. 2010. <http://www.digitalpreservation.gov/library/> 9 “About Archive-It.” Archive-It.org. Web. Oct. 2010. <http://www.archive-it.org/public/about-us.html> 10 “The Internet Archive is a nonprofit organization founded in 1996 to build an Internet library, with the purpose of offering permanent access for researchers, historians, and scholars to historical collections that exist in digital format.” - http://www.digitalpreservation.gov/partners/ia/ia.html 8 capture, curate, analyze, and preserve Web-based government and political information.” 11 This is significantly different than the Internet Archive, and Archive-It, as it gives more control to the institution, hosting the archive on their own servers if they so desire. Audit Control Environment (ACE) is a project that developed a tool for validating the integrity of digital files in migration, and helps institutions to perform audits of their collections.12 If there is anything certain of born-digital preservation, it is that the field itself is in constant evolution. The fact is that the very tools we use to preserve born-digital materials are born-digital materials themselves. The software, systems, theories, methods, and best practices of born-digital preservation will perpetually remain in a state of flux. Libraries, museums, archives, and private collections will encounter more and more born-digital materials. The key for any organization that finds itself dealing with born-digital materials, is to approach it’s immediate preservation with care and expediency. An institution with any type of born-digital collection must seek the expertise of a specialist – whether a small collection seeking a part-time consultant, or large institution employing a diverse team. The tenuous nature of the digital medium’s stability is perhaps the conservator’s worst enemy – as without continuous development and stewardship, there is great risk for the loss of a vast cultural legacy, and moment in history. “Web Archives: Yesterday's Web; Today's Archives.” Web Archiving Service. Web. Oct. 2010. <http://webarchives.cdlib.org/> 12 “An Approach to Digital Archiving and Preservation Technology.” Ace. Web. Oct. 2010. <https://wiki.umiacs.umd.edu/adapt/index.php/Ace> 11
© Copyright 2026 Paperzz