Using Scalable and Secure Web Technologies to Design Global Format Registry Muluwork Geremew, Sangchul Song and Joseph JaJa Institute for Advanced Computer Science Studies Department of ECE, University of Maryland Sponsored by Library of Congress and NSF 1 Motivation • Handling of digital formats is an essential part of long-term preservation. • Format obsolescence – Technology evolution and the obsolescence of systems and applications software may leave users unable to access their old files. – Software developers may go out of business and no longer support the applications. • Digital preservation requires – Different essential aspects of objects. – Tools for capturing the essential format characteristics of information stored as digital object and processing it. 2 Existing Methodologies • Standardizing the digital contents to few common formats. – JPEG2000, OMF, and PDF/A are among the few selected open standard formats. • Migration – Transforms older versions to newer formats. – Tends to be costly and prone to errors. • Emulation – The original bit-streams are executed using an emulator. – Implementing such a strategy is extremely challenging and can be viewed as a transformation. 3 Our Goal • A flexible framework for incorporating advances achieved through the existing approaches. • Development of an efficient, scalable and platform independent prototype to enable the tracking and handling of format obsolescence. – Development of a Global Digital Format Registry (GDFR) – FOrmat CUration Service (FOCUS) – Development of enabler modules that can interface between GDFR and end-user applications. 4 FOCUS Architecture 5 FOCUS on LDAP and SOAP • Interoperability – Protocols are platform independent • Performance – Most operations are read-only queries. LDAP gives high performance in this environment. • Extensibility – LDAP schema can be easily extended • Scalability – By the use of Distributed LDAP • Security – SOAP can be on top SSL (https) – LDAP-based Format Registry can be easily integrated with any other LDAP-based authentication/authorization mechanisms. 6 Global Digital Format Registry • GDFR serves to provide detailed information about formats. • Existing Format Registries: – UPenn’s FRED- (http://tom.library.upenn.edu/fred) – Pronom- (http://www.nationalarchives.gov.uk/pronom/) – Wotzit’s Format- (http://www.wotsit.org) • Not clear how extensible, scalable, or how they can be interfaced with existing preservation systems. 7 FOCUS Software Web Service Agent Global Digital Format Registry Software • The registry contains information – File formats – Software tools • Multiple ways to access GDFR in FOCUS are provided. – Directly through LDAP interface – Indirectly through SOAP interface 8 GDFR-Internal Structure dc=umiacs, dc=umd, dc=edu General descriptive properties. Processing : format taken as input and/or output. ou=Format-Registry ou=Applications General descriptive properties. Processing: rendering, editing, conversion and validation services/systems. ou=Formats Adobe Acrobat v6.0 Adobe PDF v1.4 Adobe Photoshop v7.0 CompuServ GIF 1989a Jhove 1.0 JPEG Image Format 2000 9 Web-Service Agent Format Inquiry Web Service Agent Global Digital Format Registry Client • Mediator between user and registry • Serviced via SOAP • Contains a file format identifier module, FIDER – Java module for format identification – Uses file magic number – Sequential from restrictive to general 10 Web-Service Agent • Tailorability – Specific needs of an existing preservation system can be met by custom-tailoring Web-Service. • Interoperability – Independent of OS and languages • Convenience – Multiple LDAP queries can be reduced to one Web Service function call. – Any updates can be done in a single place, not having to distribute new modules to end users 11 FOCUS- Supplementary Tools • Validation Software – Verifies and validates file formats of given file. • Rendering Software – Interprets bit streams of files into human-friendly representation on the screen. • Editing Software – Adds/Deletes/Modifies the contents of given file, keeping the correct file format. • Conversion Software – Converts a file format to current or emerging formats. 12 FOCUS Service Model Web Service Agent Identificatio n Service Format Registry Locates transformation services to convert DO from source format to format of interest. Conversio n Software Identifies format of a specific Validation DO using the internal signature Software Determines a verification service to verify the format of a specific DO Identifies current rendering conditions Rendering for specific digital format. Software 13 Example Scenario: Digital Object Format Verification Format Registry Web Service Agent Verifier? Format ? App ID / App Info Format ID / Format Info Conversio n service ID Service Verify this? Valid/Well-formed Step requests to the Step Step1: 3: 5: 2:User User Registry requests connects returns for toidentify format the information validation ID service and on Step 4: Registry returns validation Step 6: Validation service returns the format a information fileverifier via Web Service service format available and verify the for this format format ID and information, such as its verification result service location Validation Service Rendering Service 14 Demo 15 Conclusion • FOCUS design offers maximum – Flexibility – Web Service Agent can be easily tailored to meet the various needs of different preservation institutions. – Scalability – Format registry can also be distributed. • FOCUS integrates current format preservation techniques and makes them available through SOAPbased web interface. • In summary, we believe that the FOCUS prototype represents a significant advance towards the development of secure and scalable digital format registry. 16
© Copyright 2026 Paperzz