PATCO Digitization Procedures - Georgia State University Library

PATCO Digitization Procedures
Georgia State University Library
Digitization of the PATCO Records funded in part by the NHPRC
Planning
Project timeline
Project Timeline
0-3 months
3-6 months
Project Manager submits the position announcements to
Library Administration and initiates search for the Scanning
Tech Assistant, Archival Assistant and Student Assistant(s)
positions
Project Manager set up regular project meetings, meets with
project team (all Project Staff listed) to finalize project
dependencies and establish a Gantt chart of tasks and
milestones.
2 review meetings (6 week interval) by project team with
review of milestones
LTA and Project Archivist set up process for statistical records,
metadata template and workflow.
Project Archivist begins preparing Series folders for workflow
Project Manager sets up spreadsheet for metadata
Library Technical Assistant (LTA) begins test scanning and
testing project workflow
Project Archivist trains newly hired Archival Assistant on folder
preparation and digitization workflow and image / metadata
management
Digital Projects Librarian trains newly hired LTA and student
assistants on scanning procedures and image / metadata
management
LTA meets with and establishes a scanning schedule for
student assistants
Student assistants begin scanning
Near end of the period: 3-month project review meeting by
project team
2 review meetings (6 week interval) by project team with
review of milestones
Ongoing scanning of documents by student assistants, LTA
Ongoing folder preparation and management by archival
assistant
LTA and archival assistant begin checking images, correct for
skew and cropping and migrating images and metadata into
the spreadsheet
Continuing project review meetings by project team
LTA begins final checking and uploading images and metadata
from spreadsheets into CONTENTdm
Coordinator for Arrangement and Description begins approval
and publishing of records in CONTENTdm
Near end of the period: 6-month project review meeting by
project team
Project Timeline
6-9 months
9-12 months
12-15 months
15-18 months
18-20 months
2 review meetings (6 week intervals) by project team with
review of milestones
Ongoing scanning of documents by student assistants, LTA
Ongoing final checking and uploading of images/metadata
from spreadsheets into CONTENTdm
Near end of the period: 9-month project review meeting by
project team
2 review meetings (6 week intervals) by project team with
review of milestones
Ongoing scanning of documents by student assistants, LTA
Ongoing final checking and uploading of images/metadata
from spreadsheets into CONTENTdm
Near end of the period: 12-month project review meetings by
project team
Completed scanning of documents by student assistants, LTA
LTA meets with student assistants about finalizing project work
2 review meetings (6 week intervals) by project team with
review of milestones
Ongoing final checking and uploading of images/metadata
from spreadsheets into CONTENTdm
Ongoing approval and publishing of records in CONTENTdm
Near end of the period: 15-month project review meetings by
project team
Web Development Librarian customizes home page/landing
page in CONTENTdm for project collection
Testing home/landing page for project
Near end of the period: 18-month project review meetings by
project team
Final checking and uploading and approval of records
Final project review meetings by project team Launch of
home/landing page
Ongoing scanning of documents by student assistants, LTA
Document Preparation
Document preparation




Before handling any documents, make sure that hands are clean and dry, free of dirt,
food, lotion, and moisture. If you need to handle photographs, use gloves.
Go through each page of each file separately, keeping all pages in order.
Inspect for mold, damage, and fragility.
Using a microspatula, remove all metal (paperclips, staples) extremely carefully. There should
be no damage done to any document during this process. If something seems stuck or a
document seems too fragile, consult the archivist for advice.

For all pages organized into groups (by paperclip, staple, etc), either place group inside of a
folded sheet of bond paper or group using a small slip of bond paper and a stainless steel
paperclip.
Scanning
Guidelines for Scanning










Before handling any documents, make sure that hands are clean and dry, free of dirt,
food, lotion, and moisture. If you need to handle photographs, use gloves.
Remove one folder at a time from the box you’re working on, and place an out card or
placeholder where you have removed the folder.
Each folder in each box will be given a name using the OPUS software with the following naming
convention:
o The project name is PATCO [Series number].
o The object name is PATCO_[Series number]_[Box number]_[Folder number] – so folder
one in box one of the first series for PATCO = PATCO_01_01_01.
Keep all pages in order; turn pages as if you are reading a book. At all times, documents should
be on the scanner or in the folder. Do not place them in other areas around the scanning area; if
they become misplaced, it will be very difficult to determine to which folder they should be
returned.
Scan the front of the file folder (with as much of the collection, box, and folder information
visible as possible) as the first page of each new object.
Be sure to scan the complete file, including any pages with even a small amount of writing; skip
all blank pages, however.
Do not scan extra copies of items in the folder. Only exact copies must be skipped; if a
document contains small changes, handwritten notes, a slightly different layout, or other small
changes, it must be scanned.
Scan each page separately in the center of the bed. For books, booklets, or pamphlets, open the
document and place the spine on the seam of the bed. For pages or booklets which do not lay
flat, adjust one bed or both beds and close top cover of scanner. Be sure to adjust scanner
settings as you go to optimize results.
Books, booklets, pamphlets, or other publications with corporate or government publishers
and/or that are under copyright should not be scanned in their entireties; only PATCOoriginated documents, reports, and publications should be scanned.
o The covers of books, booklets, and pamphlets will be scanned so that their placement in
the folder can be noted.
o News clippings and magazine articles will not be scanned in their entireties, only PATCOoriginated documents, reports, and publications. The titles and publication information
of news clippings and magazine articles will be scanned and the body of the article will
be covered or redacted. It will be accompanied by this note: This item was not scanned
in its entirety because Georgia State University does not hold copyright. You may be
able to obtain a copy of this document at your local library. PATCO Records
Digitization Project, Georgia State University Library.
Personal, sensitive information is skipped, including social security numbers, credit card
numbers, medical evaluations, etc. When encountering these, mask using the statement sheet
that GSU is not scanning these or mark them for redaction.
PATCO Scanning Instructions Using OPUS FreeFlow Software with a Bookeye 3 Scanner
1. After logging into the computer station, turn on the Bookeye scanner by pressing and holding
the Start button. Release when start up begins. Scanner must be on before the software can
open.
2. Open the OPUS FreeFlow software by clicking the icon.
3. Maximize window and select “New” to create a new item.
4. In this window you assign a Project and Object Name to the new item. The Object Name will
become the name of the future derivative file, hence it is important to follow the correct file
naming convention. The ID number is assigned by OPUS and is how the original TIFF scans are
named.
5. Click ok.
6. Raise book cradle to top position for scanning paper or adjust accordingly for a bound volume
using the corresponding arrow buttons on the scanner.
7. To scan, use the “Scan Now” button in the software (when it is green), the Start button on the
scanner, or the foot pedal. Allow each scan time to process so that the Scan Status says Ready.
The scanner automatically crops items, but this is often imperfect as seen in the example below.
8. If necessary, use the Setup menu to adjust the color or black/white preferences and specify
whether you are using the glass, among other things. This can also be done using the scanner
directly.
9. If you missed something or something scanned incorrectly, you can DELETE, REPLACE, or INSERT
(before or after) the currently selected image (noted by red border around the thumbnail at the
bottom when selected).
10. When scanning is complete, select the Image Treatment tab. The software automatically
performs “two up” image splitting, book-fold correction, crop, de-skew, adjustment of the
border, and eliminates fan and gutter. Click on the small page noted above to allow you to
further manipulate the clips for each scan. Here you can crop, skew and rotate your image as
needed. Book curvature produced from scanning a tightly bound book may be realigned using
Opus’ book curve correction tool.
11. Click “Perform IT on All Images” when you are finished. A green progress bar will let you know
when this stage is complete. If you place no clips on an image(s), you still have to “Perform IT”
to send those unedited images to the Export stage. Do this if the scan requires no cropping.
12. When complete, review your images to insure proper image treatment before exporting. You
can go back and forth between the Image Treatment and Export tabs as much as needed,
performing IT on any image that requires adjustment until you are satisfied with the end result.
Use the “Perform IT on Current Image” if you need to change only one. Click “Export Images”
button when ready.
13. Here you specify your file type and location. “Choose Base Output Directory” is where your
derivative file will be sent. You can change this setting each time as needed down to a specific
folder level. The file will end up in a new folder named with its OPUS ID#. Choose options here
to create derivative PDF-A files (for upload to CONTENTdm) using a JPEG compression at 300 DPI
resolution with 24 bits color depth (RGB) and JPEG image format at 80 quality. OCR and
intelligent interpretation is performed during derivative creation, so that the PDF is full-text
searchable.
14. When finished, you must click back to the SCAN tab in order to close the item. The newly
created item will remain in the MANAGE cue on the local hard drive for 7 days before it is
deleted.
File Management and Metadata
File management

It is best to copy all necessary files to the dark archive drive immediately after creation.
Derivative files can be found in the chosen output location. Keep all tiff files in one folder and all
PDF files in a separate folder.
o The original tiff scans can be found organized by the OPUS ID number in the Active
Object Hive folder of the Working Data folder in the Opus program folder. Move full
scans to a new file folder for each folder of material. Include a copy of the metadata file
created when scanning with the files.
o For PDF derivatives, move the file from your chosen output location into a new file
folder for each box of material.
File checking protocol


For PDF files (to be uploaded to CONTENTdm):
o Check file to ensure agreement with folder name on the first scanned image of
file (file folder cover).
o Skim each page for personal information (evaluations, private information,
medical information), social security numbers, credit card numbers, copyrighted
works and duplicates.
o Check number of scanned images to be sure they agree with number of pages
listed on spreadsheet.
o Mark any items on the spreadsheet for filename corrections or for redaction
(for redaction, provide page/scan number).
For Tiff files (for backup/archival storage):
o Check file to ensure agreement with folder name on the first scanned image of
file (file folder cover).
o Check to be sure the file name correctly configured (make sure the file name
corresponds to the individual tiff images).
o Indicate if the tiff has more than one copy of the scan.
o Mark any items on the spreadsheets for discrepancies.
Metadata

Fill out a spreadsheet containing the following metadata fields for each object:
o Identifier – [object name]
o Title – [folder title]
o Date of Original – [folder date]
o Decade – [from folder date]
o Description – [modified description from series scope and content note]
o Creator – Professional Air Traffic Controllers Organization
o Contributors – [blank]
o Digital Publisher – Georgia State University
o Curatorial Area – Southern Labor Archives
o Collection – Professional Air Traffic Controllers Organization
o Series – [series number and description]
o Rights Information – [as listed on finding aid]
o Citation – [as listed on finding aid]
o Ordering – [blank]
o Language - English
o WAV – [blank except for audio files]
o MP3 – [blank except for audio files]
o Location Depicted – United States
o Subject – [subject terms]
o Subject (Depicted) – [blank]
o Subject (Name) – Professional Air Traffic Controllers Organization (Washington, D.C.)
o Source Format – Files (document groups)
o Source Type – text
o Source Dimensions – [page count]
o Note – [blank]
o Relation – [shortened, modified description from series scope and content note]
o Publication History – [blank]
o Format – text/pdf
o Full Text – [from OCR performed on image during scanning process]
o Other Format – tiff
o Object File Name – [object name].pdf
Upload
Upload to CONTENTdm
1. Copy and paste a small set of PDF files (8-10) into an empty folder on your desktop.
2. Prepare the metadata:
a. Copy and paste the metadata corresponding to the copied PDF files (with titular row)
into a new spreadsheet.
b. Save/export the spreadsheet as a tab delimited text file.
c. Open the text file and remove all quotation marks; the fastest way to do this is to
perform a “Replace All” search to replace quotation marks with nothing.
d. Remove any additional spaces that may be at the end of the document.
e. Save the document.
3. Open the CONTENTdm Project Client, and open the Professional Air Traffic Control Organization
(PATCO) project.
4. Select the “Add Multiple Items” option on the right menu.
5. Choose the “Import using a tab-delimited text file” option and select the correct file.
6. Choose the “Import files from a directory” option and select the appropriate folder from your
desktop.
7. When asked, “Do you want CONTENTdm to generate display images from items you import?”,
select yes.
8. Ensure that all metadata fields correlate correctly on the Map Metadata Fields screen.
9. Click the
“Add Items”
button.
10. The progress of the uploads will display as a moving bar.
11. After the uploaded is displayed in the client, right click on the automatically generated
thumbnail, click the “Replace Thumbnail” option, and choose the PATCO thumbnail jpeg file.
12. Check the information for each uploaded file separately using the metadata spreadsheets.
13. Select all files and click the Upload for Approval button.
14. The coordinator for the project will log into the CONTENTdm Administration site. Under the
“Items” tab is the option to both “Approve” and “Index” which must occur in that
order. Recently uploaded content can be checked from the Approval stage and edited if
necessary before approved. Content will be present and searchable in the collection after a
successful indexing.
Long-term Digital Preservation
Digital Preservation Plan
Written by Melanie L. Maxwell, PATCO Library Tech, Digital Projects, January 30, 2013
Updated by Jeremy T. Bright, PATCO Library Tech, Digital Projects, October 15, 2013
In keeping with best practices of digital file permanence, Digital Projects has been aware since we were
awarded the PATCO grant of the need for redundancy and bit check of our dark archive, adequate
storage to maintain and increase our digital collections, consistency in our digital file naming
conventions and formats, and eventual format migration to avoid file obsolescence.
Development of a Comprehensive Preservation Plan
The GSU Library is currently working to develop a library-wide, long-term digital preservation plan. The
need to manage the significant amount of digital content generated by the PATCO project served as a
catalyst for creating a plan to manage our ever-increasing digital content.
Redundancy
In December 2012, our network systems administrator (Trevor Sookdeo) organized official contracts for
off-site cloud storage by Peachnet and a more reliable local backup in the basement of the Library South
building for the network drive containing our archived digital files (T:). At the time of this report, library
staff are in the process of investigating geographically distributed storage options.
Bit Check
The local backup runs an MD5 cryptographic hashing function to create a 128-bit hash value when
performing file comparisons for sync verifications on all files.
Storage
Currently the network drive (T:) has a holding capacity of 19 TB. At the time of this report, the T: drive
has 8.2 TB of used space, leaving 10.8 TB free for additional projects.
Naming Conventions
Digital Projects employs its own naming conventions to its mass digital projects. The PATCO files are
named according to archival series/box/folder (PATCO_01_01_01), while periodicals (The Great Speckled
Bird and The Signal) are named according to date published (GSB_1968_3.15 and GSUS1979-04-13,
respectively).
File Formats
Digital Projects keeps the original, uncompressed tiff scan and a PDF derivative of its digital files. These
files are organized into mirrored folders by format type. Individual tiff pages are renamed to reflect the
pdf file name as our scanning software isn’t able to intelligently name the individual tiff pages.
Format Migration
The Digital Projects Coordinator will keep current on which file formats are reaching obsolescence and
migrate as appropriate. A spreadsheet of each digital collection and its archived formats is actively
managed. Levels of support for certain formats we commit to preserving, or decide to let go, are
documented.