Ingest and Presentation of Serial Publications

Ingest and Presentation
of Serial Publications
Diane Boehr
John Doyle
Doron Shalvi
DC Fedora Users Group Meeting
October 6, 2014
National Library of Medicine
National Institutes of Health
U.S. Department of Health and Human Services
The NLM Digital Repository
• Digital Library
– Search, browse and access
resources held in NLM’s collection
• Preservation Repository
– Safeguards for long-term viability
of digital content
Winky the Watchman. 1945
Anatomy, Descriptive and Surgical.
Gray, 1859
Back-end: redundant storage & services
Primary Data Center
Browser
20%
80%
BIG-IP
Backup Data Center
Browser
3DNS
BIG-IP
Fedora
Primary
#2
Fedora
Primary
#1
Read
Only
External
Storage
Fedora
Backup
#2
Fedora
Backup
#1
External
Storage
Managed
Storage
Managed
Storage
Managed
Storage
Managed
Storage
Solr
Solr
Solr
Solr
Resource
Index
Resource
Index
Resource
Index
Resource
Index
Fedora DB
Fedora DB
Fedora DB
Fedora DB
Front-end: “Digital Collections”
• Browse & Search (Blacklight)
• Open source OPAC application, widely used
• Rich faceting, responsive, lightweight design
• Book Viewer (NWU)
• Open source software from Northwestern University
• Open source JPEG2000 server (Djatoka)
• Video Player with Search (NLM)
• Features video transcript search & play-ahead jump
• Web Service for third-party programmatic search
Descriptive Metadata
MARC bib records to OAIDC,
DMDIndex
Voyager
Voyager bibs updated with
resource permalinks,
date of ingest
Fedora
Repository
Content
• Audiovisuals
– 106 digitized films
• Working off DVD access copies
• Higher quality digitization coming soon
• Texts
– 12,000 digitized volumes
• Single & multi-vol monographs
• Mostly scanned onsite
• And now, the first serial titles
What is a serial?
• A resource issued in successive parts, usually
having numbering, that has no predetermined
conclusion (e.g., a periodical, a monographic
series, a newspaper)
Challenges presented by serials
• Because they are ongoing, information about
them may change
– Title
– Issuing body
– Frequency, etc.
• Numbering schemes vary among (and within)
titles; the number of possible levels is
unpredictable
Challenges presented by serials
• They may have supplements, indexes, special
issues or reprints
• They often have relationships to other serials
– Earlier/later titles
– Companions
– Language editions or translations
Serials metadata
• A catalog record for a serial is created at the
title level and the information presented
applies to the serial as whole
• Indexing records are created at the article or
issue level
Common serial display
NLM’s display goals
• Provide the user with a unified title level
display that provides complete information
about all the issues available, in chronologic
order and easily navigable to any issue desired
Display issues
• How do we work with a single catalog record,
yet provide a user display that uniquely
identifies each issue that has been digitized?
• How do we best allow the user to see the full
structure of the contents of the serial?
• To what level of detail should we drill down?
Abstract Serial Hierarchy
Descriptive
Metadata
Intermediate nodes
(series, volume, part)
Leaf nodes
(issue, number)
Page
images
Root node
(serial)
Leaf nodes with missing hierarchy
Descriptive
Metadata
Intermediate nodes
(series, volume, part)
Leaf nodes
(issue, number)
Page
images
Root node
(serial)
Encoding of serial hierarchy
Scanning
Spreadsheet
(Excel)
Tabular listing of each scanned issue (leaf node),
with hierarchy for that issue (e.g., Part II, Volume 3)
STRUCT
(XML)
Normalized hierarchical structure of serial
Display
STRUCT
(HTML)
HTML display representation of serial
computed in advance
Scanning Spreadsheet
Display Call No
Copy
Title
Number
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 2
Author
Bib ID
Date String
376418
July 25, 1918
Start
Year
1918
End Year Child Level
level1
Number 2
Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 3
376418
August 3, 1918
1918
Number 3
Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 4
376418
August 10, 1918
1918
Number 4
Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 5
376418
August 17, 1918
1918
Number 5
Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 6
376418
August 24, 1918
1918
Number 6
Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 7
376418
August 31, 1918
1918
Number 7
Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 8
376418
September 7, 1918
1918
Number 8
Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 9
376418
September 14, 1918
1918
Number 9
Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 10
376418
September 21, 1918
1918
Number 10 Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 11
376418
September 28, 1918
1918
Number 11 Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 12
376418
October 5, 1918
1918
Number 12 Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 13
376418
October 12, 1918
1918
Number 13 Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 14
376418
October 19, 1918
1918
Number 14 Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 15
376418
October 26, 1918
1918
Number 15 Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 16
376418
November 2, 1918
1918
Number 16 Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 17
376418
November 9, 1918
1918
Number 17 Volume 1
WX 2 A2N8 18 1
Bombproof. / vol. 1, no. 18
376418
November 16, 1918
1918
Number 18 Volume 1
level2
level3
level4
level5
Note
NLM does
not hold all
the issues in
this title.
STRUCT XML
Serials data models
Root
Intermediate
Leaf
PID
PID
PID
DC
DC
DC
RELS-EXT
RELS-EXT
RELS-EXT
DMDINDEX
DMDINDEX
DMDINDEX
THUMB
THUMB
THUMB
Preview
Preview
Preview
STRUCT
METS
DisplaySTRUCT
OCR
PDF
Demonstration
http://collections.nlm.nih.gov