tranScriptorium : computer aided, crowd sourced transcription of hand written text, for repositories? Rory McNicholl, University of London, [email protected] Dr Tim Miles-Board, University of London, [email protected] Session Type (select one) Panel Presentation Abstract Over the past 10 or so years significant investment has been made by various cultural heritage organisations across Europe in digitising historical collections of handwritten documents. If well planned the output of these digitisation projects may end up in a repository or similar, thus improving access to document images. Can this improvement in access be further enhanced? The Transcriptorium project is a European Commission FP7 funded project (2013-2015) that brings together a suite of tools for the purpose of computer aided transcription and enhancement of digitized handwritten material. These software tools include those for document image analysis (DIA) developed by National Centre for Scientific Research (Greece), handwritten text recognition (HTR) developed by the Universitat Politecnica de Valencia (Spain) and natural language models (NLM) developed by Institute of Dutch Lexicology, Universiteit Leiden (Netherlands). As the project required that these tools be available to other systems they have been developed to operate as software services. The project included the development of a desktop application (University of Innsbruck, Austria) and a crowd-sourcing platform (University College London and University of London Computer Centre, UK) that use the DIA, HTR and NLM outputs to arrive at computer aided transcription solutions, designed with the aim of improving efficiency and reducing cost of the transcription of handwritten documents. Conference Themes Select the conference theme(s) your proposal best addresses: Supporting Open Scholarship, Open Science, and Cultural Heritage Managing Research (and Open) Data Integrating with External Systems Re-using Repository Content Exploring Metrics and Assessment Managing Rights Developing and Training Staff Building the Perfect Repository Keywords Handwritten text recognition, transcription, crowd-sourcing, cultural heritage Audience Librarians, archivists, repository managers, historians, digital humanists, philologists and linguisticians. Background “Looking back” aligns with the target material of this technology, digitised handwritten manuscripts, papers and letters etc. Transcription of cultural heritage material enhances discovery and enables new avenues of research. “Looking forward” the technologies developed by the project partners are cutting edge and have the potential to significantly enhance the discovery, reuse and interoperability of digitised historical (and other) hand written texts held in repositories. Presentation content - - - - Some context: A huge cultural heritage resource that is “hidden” even after digitization as automatic transcription is often not possible and manual transcription is too costly. Many repositories contain significant amounts of digitized historical documents with either no or patchy transcription. Out-line of the Transcriptorium project and the partners involved and some of their previous projects. The individual technologies involved in the project and how they are combined to form a transcription workflow. Demonstration of Transcriptorium platform(s). Document management for the transcription platforms and how repository platforms may play a part (provide the source materials, manage review of crowd-sourced transcription etc) A precursor to tranScriptorium - The Transcribe Bentham project - does not involve any automation however achieves a rate of 100 submitted transcripts per week from volunteers. The combination of automation with a manual crowd sourcing element can make transcription of large collections an affordable reality. What can we do with transcriptions? Enhanced discoverability (via indexed hand-written documents), searching within documents, TEI, readability and accessibility. Conclusion Looking back - there has been much effort to digitise, describe, store and publish historical written material and repositories of various ilk have played an important role in this effort. Looking further back there is still a vast amount of human knowledge inside such documents that remain hidden to some degree from recent communication revolutions. Looking forwards – although repositories have already played a role in safeguarding and enhancing the description and cataloging of historical documents, the next step for such repositories is to interact with new tools that have the potential to unlock the whole document. Both by providing a platform from which resources can be accessed by transcription tools, but also to play a part in capturing and disseminating those enhancements provided by such tools.
© Copyright 2026 Paperzz