Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 5-1-1998 Graphic arts requirements for electronic image management systems for the library and corporate information center Paul Butterfield Follow this and additional works at: http://scholarworks.rit.edu/theses Recommended Citation Butterfield, Paul, "Graphic arts requirements for electronic image management systems for the library and corporate information center" (1998). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Graphic Arts Requirements for Electronic Image Management Systems for the Library and Corporate Information Center by Paul M. Butterfield A thesis degree project submitted of Master of in partial fulfillment Science in the School Sciences in the College of the of of of the requirements for the Printing Management and Imaging Arts and Sciences Rochester Institute of Technology May 1998 Thesis advisor: Professor Frank Cost School of Printing Management and Sciences Rochester Institute of Technology Rochester, New York Certificate of Approval Master's Thesis This is to certify that the Master's Thesis of Paul Marcius Butterfield with a major in Graphic Arts Publishing has been approved by the Thesis Couunittee as satisfactory for the thesis requirement for the master of Science degree at the convocation of May 1998 Thesis Advisor Frank Cost Graduate Program Coordinator Marie Freckleton Director or Designate Brian Bartlett I, Paul Butterfield, give Wallace Memorial Library of the Rochester Institute of Technology permission to reproduce in part, or in full, all parts of the submitted thesis project. ii Table of Contents LIST OF FIGURES v LIST OF TABLES vi ABSTRACT 1. vii Introduction 1 Legacy documents Scope Legacy document content Legacy document format of research What attributes need to be captured? Form follows function Legacy document transformation: What tools What is the 2. Review of the are available? work process? Literature 4 Scanners Computer Platform and Software Output Devices Endnotes 3. Project Goal Define 12 requirements for an electronic Library/corporate information image management system: center market Graphic Arts quality focus Legacy document capture Process documents Republish in 4. Methodology QFD required form Quality Function as a requirement Deployment gathering tool Market definition Interview process in 13 Affinity grouping Customer requirements Technical response House of Quality Endnotes 5. Results 17 QFD Requirements QFD Technical Responses QFD House of Quality Detailed discussion 6. of requirements Summary and Conclusions 27 29 Bibliography Appendix A Interview Transcripts 31 IV LIST OF FIGURES Figure Page 1. Typical Electronic Image Management System 2. QFD House of Quality 1 14 LIST OF TABLES Table page 1. QFD Requirement Categories 2. QFD Technical Responses and and Descriptions 17 Descriptions 18 3. QFD House of Quality (Parti 2) 19 4. QFD House of Quality (Part 2 of 2) 20 of VI ABSTRACT The value of documents those are transformed The image quality observed Graphic Arts field. This constraints of slow Users their of The systems tool, process requirements. has of these systems QFD subsequent analysis. printed is much poorer than form, is increased and competing requirements Quality Function Deployment (QFD), The resulting of it's when into digital form. which quality was is typical in the due to the past bandwidths. Corporate Information Centers requirements relative to because that storage costs, and narrow network in Libraries was chosen only in sought to understand whether the poor computing power, high EIM exist by an Electronic Image Management (EIM) System from many research fundamental quality speed. important documents that legacy documents, interviewed to were for cost, turnaround time assess and was used to gather and process user methodical structure requirements were organized for the interview into a QFD "House process and of Quality", arraying customer requirements against technical responses. Subsequent analysis of the House of Quality and transcripts of the customer interviews suggests that requirements for high speed and low cost, predominate over Graphic Arts quality for most users. The focus on speed and cost was most obvious for those applying EIM to commercial purposes interest in they have in Corporate Information Centers. While EIM, Library users had a shared preservation and conservation. documents that speed are still are speed and In this application, EIM is cost, cannot be sacrificed for quality vn specialty application of used to preserve and save printed deteriorating. For this specialty application, quality is important, they a or speed. paramount. While cost and CHAPTER 1 INTRODUCTION The way we share the World Wide documents are and information is rapidly Web, images is making it The printed weakest more portable and more link in this new are only reused. form. These pervasive growth of the communication. The available as paradigm fully realize increasing Changes in the way their systems are composed of three not viable for transforming benefits in the information systems are the means of information in the printed accessible elements: scanner, computer, and output Printing system ininmH Computer t 1 1 i CD CD-ROM Figure 1. Typical Electronic Image Management System only in age. iiiiiiiiiiiiiiiiiiiiiiiiiiiiiinii m exist legacy documents into iiiiiiiiiiiiiiiiiiiiiiiiiiiiniiii iiiii color context of our electronic such a system: Scanner describe direct digital printing by which legacy documents become fundamental we at the point of need. vast quantities of "hardcopy", they are Internet, device-independent and availability information is that research will explore the options so that we can more Figure 1 diagrams readily information-sharing Electronic Image Management (EIM) electronic is revolutionizing possible to print small quantities of customized information infrastructure. This forms by technology. The Adobe PostScript, Portable Document Format (PDF) including form. When documents electronic transformed and network applications making information systems being writer in device. Note from the example, form, in die one of advantages of an EIM system, the document can be transformed into different a this case, to CD-ROM. Legacy documents The term Scope of research - "document" as applied to printed material can information of document. The corporate any sort; book, magazine, manual, scope of this research will information added value a if they centers. were available in will electronic to describe a staggering variety of things. Printed used brochure, flyer, be limited to This research be memo, letter, electronic capture of determine, for form. This etc., might be considered a legacy documents in libraries this venue, what kinds of documents would and have research will explore system requirements and workflow to capture these documents electronically. Legacy document content What attributes need to be - Before choosing attributes of a method for transforming a legacy document esoteric. this is conveyed some cases, the The basic level is the most largely by the content, physical page text. layout, font However, even information raw and feasibility of capturing be about the which document attributes that must the document is captured. put are also Are there formats constraints of readable be document than just its text. The it pictorial beauty may be important. This research will examine the - which was printed may Form follows function. captured will be important factors in determining the electronic the new uses to which the electronic form of the document about format. Is it important that transformations of information bandwidth that documents, most these attributes. However, subsequent document. In from be important. In important factors in making decisions the document electronically? Will legacy document, consider which also quality, color, and the paper on Legacy document format will form, it is important to content of the there is much more to a document's feel, weight, condition, quality, cost, speed and format in electronic the document need to be captured. There are many attributes of any fundamental to Decisions into captured? must by the target audience? This research be languages, fonts, formats, considered? users be able to search be necessary? or media Are there limitations in the types will examine each of these questions in detail. of Legacy document transformation Once decisions about attributes and capturing documents. There their and ability to forms editable of capture the legacy documents of page transformation. document of images available. of scanners of different types. There To are software packages are new systems that combine describes how these tools defined which achieve different forms of electronic can be both or work process? selected and processes of a document. Scanners vary in for transforming the image finishing are requirements, and for electronic This of text shapes into an captured, for manual or automated scanners and software packages to combined and used. identified for in their ability to handle the sizes, document legacy documents into documents. be They differ that absolute color characteristics of the effect transformation of must What is the for capturing the image elements, for capturing document printing There capture. are available? formats have been made, tools many types quality form, for ensuring formatting are What tools - form, research will fixing errors of simplify the process of a work process must define be work processes to CHAPTER 2 Review As this research seeks to define the for Because this a system to handle work also seeks legacy documents, The literature review which follows is Scanner, Computing Platform/Software, organized by the prudence to define real technical responses to the requirements, the availability and capability of system components must understood. system: the Literature customer requirements suggests a review of research on the subject. customer of also be researched and component elements of a legacy document Output Devices. and Scanner Scanning requirements of Electronic Image of Information implementors publications and of EIM by AIIM cited. Other for capture Although this systems. and others will for EIM help common resolutions "excellent" compression schemes general automation of scanning systems spots per for EIM noted information is may scanning inch work are: is that can not systems be able an overview advantages and storage (spi) is 100 disadvantages the typical resolution used spi for "marginal" information quality, 200 spi for increase with scanning speed of advantages and capturing written on focus, filtering, digitization, gray the darkest image detail of today for guide EIM, on scanning and increasing resolution are "satisfactory" quality. "good" quality, and 400 spi the square of the resolution, though disadvantages legacy information, but over manual with current the specific requirements for EIM etc.3 capture designed to data entry.2 The technologies, to replace manual data entry if human recognition of document type is needed. More detailed information has been reflection, Association effect. available on improve the of publications to define the future requirements for include storage requirements usually minimize this scanning variety fully collected by the to establish the baseline requirements. The quality Also quality. work will attempt a systems.1 In the EIM market, 200 More systems are most Image Management (AIIM). The Association has Previously gathered requirements for document Management (EIM) with scale conversion, the scanner's sensors. As Illumination most EIM must on scanner be scanners used illumination, of sufficient intensity to Charged Coupled Device (CCD) arrays, a high level of illumination is required. constant. Compensation Illumination reflections angle from shiny Illumination or minimization of may Filtering may be also colors or with depth must depth" of analog its It can also cause unwanted printed on coated stocks. when scanning object that not scanned. desired. Proper filtering may permit minimizing faithfully emulate the properties of the human levels intensity information, capture of levels of used of in "dropout" of a scanner. lightness that may be followed intensity information Filtering eye. Suppression color content. filters effect on the range and governs the number of be held be done. from documents applications, if the originals have monochrome must "paste-ups" a panchromatic response will require attention to the The "bit aging across the platen must field, may also be important be scanner to more scanners work on the principle of conversion. of specific color responses are Digitization becomes important because CCD from with or produce unpredictable results bound books allowing the be important for providing when important if metameric color errors, may inks, direction, along perfectly flat, for example, lamp intensity changes effect the appearance of shadows pencil or angle and Uniformity of illumination captured. by analog-to-digital (A-D) that may be For captured. example, a scanner with an 8-bit A-D converter can capture 256 levels of intensity. A scanner with a 10-bit A-D can capture 1024 levels. Both analog design and "bit depth" must be matched to the originals to be scanned. conversion" is the term EIM "Gray scale large bit depth to not common a small one. in EIM systems. to a smaller bit depth. image. If, Although it is references use possible to save an Usually this information CurrenUy popular EIM and when this is done, to describe the process of systems is image in its not stored, even these or other treated in depth here. There is litfie dismissed as additional of these processes are well information in the literature unnecessarily costly, both in terms assumptions are still valid this research. Most in the face of about a bit depth of known in graphic saving high-bit-depth of storage costs and communication technology improvements in stochastic arts, "gray" or one, or a real time binary process used screening, in or some and need not be information. It is time. Whether these storage and communications will a form, it is temporarily, but it converted in dithering, halftoning, methods.4 combination of image from high bit depth image information is discarded. The this transformation might be thresholding, adaptive thresholding, an scale" "gray usually transform images to seven-eighths or more of the transforming be tested by Currently popular leader5 for EIM A typical scanner is color dropout, This is advertise for the EIM market 200 either contrast or color scanners with or x 2000 pixel. Adobe A It spi. It has PhotoShop9 in the captures 18 amount of color with Howell information information referenced above landscape mode, independent The Agfa scanner If we lowest resolution of is products of real-time It scanner is capable of bit depths up to eight, its gray conversion with proprietary allows user selection of a single off of 10 red and 12 bits per channel.8 A resolution of 400 spi, a common high the low end of the scale for graphic arts scanners. is the Agfa DuoScan. This green and blue channels, scanner has each of which typically done an optical resolution of has bit depths a of 12 bits via powerful software packages 1000 per like is per square per square should be at it's best capable of much and scanning 80 of resolution. resolution. Bell & Howell EIM The Agfa Graphic Arts inch. Because the Graphic Arts between EIM capable of inch scanners shows the scanner higher quality than the EIM Graphic Arts scanners pages per minute is scanner is capturing 100 times the scanner. speed and productivity. (ppm) in scanner portrait The Bell mode, and 125 An integral document handler is described in terms of ppm in its reliability, durability. color. scanner can output and small angle skew correction. image information for the two element of comparison throughput and inch. Though it spots per time image processing, as this is image information, it Another The graphic arts process. megabytes of megabytes of monochrome.6 the Graphic Arts market where manufacturers like Agfa and Microtek bit depths comparison of the collected 0. 16 and full no advertised real captures 400 a typical graphic arts scanner can output market blue. quality resolution for the EIM market, is For example, is exclusively binary. Bell & Howell is the is the Bell & Howell Copiscan 8080S. This enhancement, red, green of scanners binary. It has built-in hardware for dramatically different from full line and white and capability.7 monochrome and image processing for predominantly black similar at an optical resolution of principal output are scanners and their product Eastman Kodak have competitor scanning for EIM scanners lists its make the most more than speed as favorable 10 milliseconds selections of 1000 spi, it translates to 10 a rate of per scan line (ms/line) for monochrome, 13 ms/line monochrome 0.7 ppm. 100 times faster than the Graphic Arts This is scanner. image, landscape an enormous ms/line scan of for 8.5 inches, difference in speed; the EIM and No further information seem to part of be emphasized was uncovered about in the the balance between quality and speed for EIM systems. Both literature. Further scanner promotional be exploration of this tradeoff will explored as this research. Computing Platform/Software Though the computing is the heart platform of a outcome of system and software requirements. law doubling predicts a is the fundamental selection of a hardware periodic Software of computer upgrades are meets customer requirements classifies software Because software. by users, spent in interface on hence is the of a on intensive is often proprietary, as documents to electronic While interface image processing software section on It include EIM systems. form in Scanners, at an environment where interface, user work will be must chosen which Avedon11 etc. and specialized focus is one of on Applications and scanning the primary requirements accounted for 43% application.12 Littie a typical but references to in terms of additional of the time information TWAIN13 and other scanner compatibility some amount of automated adjustments packages systems is somewhat with limited. Current relatively low resolution, in monochrome, for image processing is done in contrast and the systems binary raster real-time as an integral for EIM images.14 part of the skew, and algorithms that promise improvements in software. available on several software packages Popular capability Utility software, speed shows standards are relevant by Optical Character Recognition Information is The literature for scanning in the recognition rates hardware are not germane to this research. As can scanning as a standalone offering, relatively simple, scanning documents scanner. particular Moore's operations are performed. are often noted Application software, competitive advantage. standards are available. Information legacy document system. Software Operating System, low-level scanning software, computing platform, they continuity in for document processing speed, capability, format, imaging conversion of paper found was and as an likely. Scanning software for EIM systems cited More important than the the focus on graphic arts elements of EIM systems, this specialized software where hardware months.10 architecture that will allow the user to maintain into categories, of most references treat Computing hardware has become essentially a commodity. capability every 18-24 life to the hardware gives legacy document system, include Adobe Acrobat for Optical Character Capture15 and Recognition, Xerox TextBridge an element of many Professional.16 The TextBridge including word Text product claims to transform printed processing, spreadsheet, Markup Language Acrobat Capture font fidelity in Image only, and bitmaps the and database applications, are electronic 41 output form formats while which preserving page layout include ASCII, many PostScript, Portable Document Format (PDF) and popular Hyper (HTML). for makes similar claims Image and The format-preserving transformation, sole output format Text. In the Normal format, within the document, and allowing of but a greater emphasis is placed on Capture is PDF in three different forms: Normal, original a small images file are captured as size. formatted electronic In the Image only mode, only the text, full- saved, preserving the image content without risk of recognition errors, but providing for no searching capability, and offers the user a selection of their advertisement. permitting searching page It pictures and tables. documents into bitmap are and creating large file In the Image sizes. saved, allowing for searching, and lack product supports the first of and Text mode, both the formatted errors, but with very large file sizes. electronic text The TextBridge and third of these options. Output Devices Output devices form into not well is placed on in this uncovered Xerox,17 suitable element of a for end use. input, not research. specifications of Xeikon,18 information are currently always applied speed and EIM available printing systems may be is This does quality today. If this not as or required to maintain this color Speed, are available resolution, color printing is rarely the constraining at or above to scanning, processing, management system are often slower and more prone hardcopy output systems electronic to quality from loss, fidelity that may be fidelity. major manufacturers like capability, and quality No current printer products are available via manufacturer's web pages. is necessary, systems is document change with trends toward color and color Canon,20 for these portion of a system and others. technology.21 printing This may available Hewlett-Packard,19 readily of these output. portion of an Hardcopy output review of these specifications Resolution legacy document system. They translate the document from Requirements for this defined. Because the scanning emphasis The form a final are the 400 spi file transfer. In for most systems. eleven case studies of increases in quality in an EIM system. In EIM literature, the term imply that printing is unimportant, but that research suggests element EIM, speed is no reference was made to the other elements are or speed are required limiting by customers, the high- speed, direct digital systems like the Xerox 6180 for monochrome or the Xeikon DCP-50 for color can produce 600 spi output with excellent quality. Alternate media like CD-ROM are now in use. Information on the common formats is available in summary references.22 These media include CD-ROM (Compact Disk-Read Many), CD-R (Compact Disc-Recordable) CD-ROM's use a constant electronic publication. WORM disks CD-R's cycle. linear velocity are available in are common in form several allow and They are more cost-effective making Rewritable access and are a widely distributed format for using a master making use a constant angular velocity system incremental writing with WORM (Write Once Read disks. so are produced sizes, and format for optical for data system They are a read-only system, Unlike CD-ROM's, they access. and Only Memory), the short runs or of new for data writing information to disk from CD-ROM, but are writable, individual publications, as process. usually in a and dedicated drive. a single-batch write they do not require the master- process. Rewritable formats are optical less disks typically popular with use a magneto optical EIM systems, as the technology to permit information is usually archival, rewriting of information. These so the rewrite feature is not necessary. The Ricoh,23 up-to-date specifications of Sony,24 and others, All alternate media error checking but they described here and preserve recorders and similar will not provide be devices critical this study's different data integrity, there is Within this study, any reference to systems. CD means of no are available focus from manufacturers on the graphic arts like quality requirements. capturing numerical digital information. As image quality aspect all are to selection of an alternate media format. alternate media will accommodate customer needs for commonality with current Endnotes 1 Avedon, Don M., Introduction Management, 1996), 67. to Electronic Imaging, 3d ed., (Silver Spring, Association for Information and Image 2 Head, Robert, Document Management: The Essentials, (Silver Spring, Association for Information Management, 1997), 3. 3 Black, David B., Document Capture for Document Image Management, 1996), 12. Imaging Systems, and Image (Silver Spring, Association for Information and 4 Stofel, James, C, Graphical 5 WROC TV8 News Evening and Binary Image Processing - April 21, 1998. 7 http://www.kodak.com/daiHome/scanners/scanners.shtml 8 http://www.microtekusa.com/, http://www.agfahome.com 10 graph - Applications, (Dedham, Artech House, 1981.), 289. May 31, 1998. - 6 http://www.bhscanners.com/opening.html 9 http://www.adobe.com and April - - April 21, 1998 April 21, 1998. 21, 1998 "In 1965, Gordon Moore was preparing a speech and made a memorable observation. When he started to about the growth in memory chip performance, he realized there was a striking trend. Each new chip data contained roughly twice as much capacity as its predecessor, and each chip was released within chip." (http://www.intel.com/intel/museum/25ANNTV/hof/moore.htm) previous - April 18-24 months of the 21, 1998. 11 Avedon, 92. 12 Thornton, May A., Image Management, "Unusual for 13 Interesting Name. Electronic Image Management, Case Studies, (Silver Spring, Association for Information and 1993.) 41. computer Initially acronyms, TWAIN has no real meaning-it simply stands for Tool Without An named SAPI-for Scanner Application Programming Interface, TWAIN is the industry for scanning and acquiring graphics from software applications. The idea behind TWAIN is to allow any TWAIN-compliant software to talk to any TWAIN-compliant hardware. TWAIN is an API standard for input devices such as scanners, framegrabbers and digital cameras, which provides standard protocol across-the-board and compatibility between scanners and software. The specification's open and applications programmers to support a wide range of devices by developers hardware device independence industry interface allows writing one standard device driver. The TWAIN specification was developed software vendors that released in spring 1992 application 14 includes and by the Working Group for TWAIN, a consortium of hardware and Aldus, Caere, Eastman Kodak, Hewlett-Packard, and Logitech. The specification lets that supports the TWAIN was any Windows (http://www.spco.com/Techsupp/HM/1902.htm April 21, 1998) scanner manufacturers write a single driver that can work with standard." - Avedon, 15. 'legacy' documents into accurate, "Adobe Acrobat Capture Software turns everyday business and printed page"; (http://www.adobe.com/) searchable electronic files that look exactly like the 15 10 16 "TextBridge Pro 98 is a full-featured, highly accurate and easy (http://www.xerox.com/scansoft/tbpro98win/ April 21, 1998) to use document packag recognition - 17 http://www.xerox.com 18 http://www.xeikon.be/ April 21, 1998. - April 21, 1998. 19 http://www.hp.com/peripherals/main.html - April 21, 1998. 20 http://www.usa.canon.com/corpoffice/printers/index.html April 21, 1998. - 21 May, 6. 22 Avedon, 23. "Ricoh's MP6200 external ATAPI CD-RW drive is the best way to preserve, archive and retrieve data of every type, and its superior design and performance ensure reliable writing and reading of your critical information for years to (http://www.ricohcpg.com/ April 21, 1998) 23 come." - 24 "For many applications in finance, medicine, government, life are critical factors in choosing a data storage and business, permanent, secure data storage and medium." long archival (http://www.ita.sel.sony.com/support/storage/faqs/worm.html 11 - April 21, 1998) CHAPTER 3 PROJECT GOAL This research image will use Quality Function Deployment method to better define management system. information centers. This research systems is due to The Though will attempt minimal Requirements will document processing, array the be research will be limited to the general requirements will the requirements for an electronic market segment of be gathered, emphasis will libraries be and corporate placed on graphic arts quality. to determine whether the low quality of many current electronic image quality requirements or allocated to each and republication customer requirements against a limited capability processing in the required of popular systems. stage of electronic form. A House of image technology management: Quality will technical response to those requirements. completed, which will project the ability of management A be document capture, constructed which will feasibility analysis will be to meet high quality requirements expected of customers. 12 CHAPTER 4 Methodology This research of defining a system used of seeks to analyze in this gathering Originating in grown to This research contacts have been will strategy.2 them to be in other parts of the place great emphasis on the method of insights that dialog, gathering can influence literally, interviews will will be be product development, be applications for more than a structured method specified. requirements and corporate is identification information facilities in the Rochester requirements gained is of the centers. target Personal Telephone interviews area. Practitioners personal customer visits. by first-hand communication observation of the customer's used to by the interviewer's be tape for method will intent will of customers.3 The with in their workplace, the insights are real advantages. Open-ended questioning techniques and prevent undue libraries as a tool with the United States. face-to-face communication, the detailed that can be gained from clearly legacy documents Deployment (QFD) QFD is nothing QFD methodology for gathering established with the managers of these to capture Quality Function education and on the requirements of Most fundamental to the QFD richness of in who wish industry around 19721, individuals, allowing focus conducted with customers QFD Japanese use of application of market. be include its wants and needs of The first step individuals requirements of that will meet their requirements. The research. QFD have die Function Deployment Quality induce customers to share requirements preconceptions. recorded and transcribed. This In order in their own words to capture customer requirements most will minimize note taking and permit an open dialog. After collecting for the requirements root customer requirements. These quotes will be customers together. sorted These from Customer by a process can then be about ten customers, the resulting interview transcripts verbatim quotes of called their needs will be extracted from the affinity grouping, which collects summarized and organized into a list tables" The the next "House of step in the QFD method is the construction of Quality". In this matrix, this list of "quality fundamental 13 like requirements be analyzed transcripts.4 from many of customer wants and needs. or matrices, the customer wants and needs Figure 2-1 below diagrams this: needs.5 technical response to those wants and will is first of which is called arrayed against a Technical Response Customer Needs Relationship and Planning Matrix Matrix Benefits Technical Matrix Figure 2. QFD House QFD QFD is useful a requirement Customer" QFD includes fundamental gathering a personal customer requirements. tool as a Requirement because process QFD imposes grouping" requirements via response. seen "affinity Lasdy, QFD in Figure 2-1, to be used Also like provides a useful means of this matrix can display, in response to those requirements, and the will cluster an based Quality Gathering of the structure interview also of on Tool it imposes on gathering the "Voice probing, open-ended questions designed to discern a useful structure on organizing related customer requirements together to allow a more coherent technical conveying requirement information in intuitive form, Customer Needs relationship between them. An and a Quality" "House of As Benefits, the technical applications software package, QualiSoft, to facilitate creation of the matrix. apparent in Figure 2-1 is the Technical Correlations "roof of the house, which allows tradeoffs that will exist between technical responses, e.g. system cost vs. resolution. The selection of of the important sales points relative to customer competitive selection of specification levels relative to engineering competitive 14 the user to depict Matrix allows benchmarking. The Technical Matrix allows Planning benchmarking. While these elements are useful for the product development, they will not be used in this research, the limited goals of which are definition of requirements. The House requirements of EIM of Quality created EIM users which systems will constitute in the above exercise will is the focus of become the basis for the definition this study. The analysis and conclusions about the principal output of this research. 15 of the Graphic Arts requirements for Endnotes 1 Shillito, Larry M., Advanced QFD : linking technology to Sons, 1994), 1. market and company needs, (New York, John Wiley and 2 3 Cohen, 21. McQuarrie, Edward F., Customer Visits: building a better market focus, (Newbury Park, Sage Publications, 1993), 10. 4 McQuarrie, Edward F., Customer Visits: building a better market focus, (Newbury Park, Sage Publications, 1993), 140. 5 Cohen, Lou, Quality Function Deployment: how to make QFD 11. 16 work for you, (New York, Addison-Wesley, 1995), CHAPTER 5 Results Per the QFD method, interview transcripts found in Appendix A requirements of the As interviewees were interviewees. The individual part of this first As these logical categories. shown in Table 1 below: The logically in further, it became requirements were sorted Types Sizes sizes content content were repeated by sorting" the process called "affinity description of possible to group them into eleven the customer requirements in each category is Archival Cost to EIM Users Speed Types of text content to Types of pictorial or graphical content to be captured Cost requirements of Ease of use and Scan rate and throughput requirements Turnaround time Time Utility Ease to end legacy documents to be captured legacy documents to be captured of of be captured Quality requirements of the EIM process Requirements for longevity Quality user requirements of use and EIM feature users requirements of for job feature EIM users completion requirements of the users of Table 1. QFD Requirement Categories of The Description: Document One requirements. Many of the requirements collection. requirements were then grouped Document types Utility large a eleven categories and a simple Category: Picture into for fundamental process, repeated requirements were removed, so that only a single instance remained, yielding 108 unique requirements. Text gathered were analyzed and EIM-sourced information Descriptions the properties of QFD is that it allows requirements to be arrayed against potential technical responses to those requirements. In this case, the technical responses are essentially the specification attributes of an EIM system. By reviewing was created. are more Because the list of the detailed than those description of customer requirements, a focus of list of twenty technical responses to those requirements this research on quality, the technical responses related to quality requirements of other attributes. Table 2, lists the of each: 17 technical responses and provides a brief Technical Response: Description: Scanner/Camera Type Imaging device used for EIM capture, e.g. reflection scanner, transmission scanner, CCD camera Optical resolution Highest spatial frequency sampled by the scanner, indicator of the ability to capture fine image detail Sample depth (gray) Sample depth (color) Monochrome bit depth, or bits/pixel, indicator of the ability to capture gray information Color bit depth, indicator of the ability to capture color information Dynamic range Range of lightness information over which the scanner can capture information, related to the ability to capture highlight and shadow detail Calibration/Stability Area Scanner capability to achieve a target response and maintain it Scanner control of stray light and unwanted degradation to isolated image detail Maximum area that can be imaged Speed Rate of scan Rate of speed of an automatic Scanner Flare Scanning Scanning Doc' t Handler Speed Doc't Handler Robustness of an automatic document handler document handler, inversely related to failure rate Reliability Metadata parser/editor Image Processing OCR / ICR capability Viewer / File editor conversion utility Electronic distribution Searching capability Storage format / media Capability to extract keyword information from scans or to allow entry or editing Capability to algorithmically enhance quality, usually via digital imaging Optical Character Recognition of printed text or File format and storage medium used to save Table 2. QFD Technical Responses Per the QFD process, the of Quality. QFD central portion of the Low were provided corresponding of Quality is digital information allows relationships House and Descriptions collected requirements and technical responses were arrayed against each other of based the degree to in Table 3 between requirements and technical responses to Quality called the Relationship matrix. customer requirement. shown of Capabilities for transforming digital information back into tangible forms. Ease of use and feature requirements of the users of EIM-sourced information Printing / finishing Utility to end user House Intelligent Character Recognition handwriting Capability to view or edit images for validation or clean up Facility for converting between file formats for either input or output Capability to transfer digital information Tools provided to facilitate finding desired information from an EIM system and which a given If there Correlation values of be shown in a in the High, Medium, and technical response could support the achievement of a was no correlation, no value was provided. Table 4. 18 The resulting QFD House ^"^\^^ Technical Response 4) 'C' & e I I "o to & 1 "a. "a. a o Customer Requirements ^^\^^ a o i E o to 1 8 O s a. is 1 (> M L H Capture bound volumes H L H Capture newspaper clippings H L H Capture manuscripts H L M Doc't Capture black H L M types Capture color photographs H L M Capture postcards Capture Doc't sizes a of art on paper fax H L M H L H H L H H L M Capture engineering documents M L H Ability to handle 35mm aperture cards Ability to scan AO through A4 document sizes H M H H M Fonts M H as small as eight point H H M M M 8 -3 e A. n n L H Capture works 1 t 03 H Capture maps u a Capture office documents, forms Capture different forms of microfilm and white photographs on PS & M L -3 3 a, '1 c o a g 1? M H i 8 B 1 3 B o J* B 'I c Jr i S ,o I I e f o M H H M Reproduce Bodom italic four point M H H M M M H H H H Reproduce Asian fonts M H H M M M H H H H Text Reproduction of light M H H M M M H H H H content Capture carbon copies M H H M M M H H H Capture dot matrix M H H H Smeared pencil input M H H Ability to handle fourth generation copies as input M H H pencil Capture book illustrations M H H M M H H H M H H H M M H H H H H H H H M H H H H H H H H M M H H H H H H M H M M M M H H M H H H H H H H Capture terrain lines in topographic maps M H H H H H H H Scans M H H M M M M H H M H H M M M M H H L Image quality is as good as a photocopy Correct lightness and darkness M H H M M M M H H L M H H M H M M H H M Quality for good video projection M H H H H H M No bent M Capture detail of relief, planographic and Picture Capture specific hues, brightness content Capture structures as small as and 0.02 are processes darkness, level of saturation small as 0.04 mm legible/readable "Recognizable" No intaglio mm. Capture gray information on structures as Quality M M quality images corners H M moire No skew H No M spots Proper cropping H Suppression of unwanted background M "Dropout" H H Representation does not alter the nature of the original content color' quality L H H H M M H H H H H H colors not captured Capture "Tiighiight H H M H H H H H H H M L L H M H L M M H H M M L H M M M Consistency of color to the original M H M H H H H H Color quality better than M H M H H H H H H M H M H H H H H H M L M M M M M M L M M H H H H H H M M H H H H H H H tot H H H H H H H "Good enough" representation color Reproduce difficult a color photocopier colors Capture detail observable at normal Capture detail evident on close Capture finest detail Save electronic viewing distances, unaided eye examination, perhaps with 5X loupe which evidences original means of production image which fully represents original content Relationship Matrix Key; Table 3. QFD House of 19 Quality (Part 1 of H H H H=High M=Medium L=Low BIank=No correlation 2) *^r~ ^^-^^ Technical Response o c e Digital format is CO CO o D 1 % 03 O a 1 8 1 S3 1 03 "o 3 o 'A won1! for information c V V CI n permanent, like paper Electronic form that Archival Maintain 1 U Hi ^"^\^^ bo VI o c i -3 o 1 a S i u Customer Requirements "o o I c s <3 o s U B a a, n u o O S .a a! s 8 a I o '2 o I I r, n H become obsolete H 7- 1 0 years Maintain information for 20-30 years Maintain Capture large Low Cost forever. information numbers of archived documents inexpensively for scanning costs Cost at 1 0 to 20 cents per page. H H H H H H H H H H H H H H H M M H H H H H M M H H H H H H M M H H H H H H H H H H H H H H H H H M M Flexible linked to use Capture only as much information as Automatic document handling Auto feed without Low level of operator intervention Utility to EM User Accept documents already Ability Ease to print documents of entering and Extract metadata time integrated in electronic handwriting a newspaper time in 30 second or less 300 engineering drawings per per day day turnaround of 6 months or information in less H L H H M L H H H H L M M H H H H L M M H H H H H H H H H H H H H H H H H H H H H H H H H H H L H H M M M M M M M L M M L L L L L L L L L H access eye blink) H H H H M R R H H M H Link searching from different sources Provide indexing down to the article level for Format is digital L M H Searchable text to equivalent of paper, end Universally readable electronic form user Output easy to H serial H literature H read, permanent H H compatible with user's system File formats that people can H easily handle H Output TIFF H Output PDF to print and assemble a bid package H H H File formate that print quickly Ability to reprint and bind electronic files Ability M M Searchable keywords/metadata Utility M H the form needed by the liser (less than 30 sec, 1 0 sec, 2 sec, Global distribution of electronic files doc't M H H speed day M H Provide 24 hour turnaround Provide two-to-three M H metadata text Ability to scan at least M H H clipping M H color Recognize H M H H "Instant"' H form Crop Drop out unwanted around H larger than \ \x\ 7 H printed H H H originals Recognize H H H H Turn H H that can be easily Deskew Provide timely H from form fields Scan 10,000 documents around H H Despecide Scan H H adding High throughput, high Turn needed to replicate content Ability to correct and insert documents easiliy Ability to do image clean up if needed Don't damage fragile Speed of images jamming modular components H H Ability to easily capture and store hundreds of thousands Need M H capital costs costs 7, H T. Very low global distribution casts Minimal H H H of H drawings. IIdatloas hip IttstrL\K zy Table 4. QFD House of 20 4-High M-i^ed Quality (Part 2 of 2) urn L'L 0W Biank=N 0 cc rrela non The QFD House requirements and of a useful corresponding technical requirements were the requirement set. Quality provides The focus of framework for responses allow for It's analysis. an ordered organized categories of customer discussion While quality of requirements. this study, the quality requirements must be understood in the context of the entire stated needs for each of the eleven categories of customer requirements are considered below: Document types EIM described users over a dozen categories of forms, engineering documents, faxes, books, document types requires a wide a transmission scanner. art may be fragile being variety Engineering document input types to be Document types included newspapers, manuscripts and photographs. The of scanner types. Conversion documents require large format and require use of an overhead CCD of microfilm does diversity of these to electronic form requires use of scanner capability. camera which not require Manuscripts direct and works of contact with the object scanned. Document Closely related to document aperture at one site The image is for size x each is 24x36 mm. At the large end for punched small end were card, and 35 mm is the AO office At the paper documents color size, 841x1189 mm, and forms used which were 1 1 inches. one approach to needed to capture. The category fonts, light Asian fonts content interviewees defining quality requirements, eight point photocopies. Hollerith a scan volumes were Text included in microfilm mounted for engineering drawings. The largest typically 8V2 sizes types are document sizes, which also spanned a wide range. cards, a single frame of 35 mm transparencies. As captured. were asked of text content contains all those or smeared pencil, were cited as especially dot matrix difficult, image to define the kinds of image content elements that were text they related, and printing, carbon copies, and fourth-generation as was Bodoni italic four point, which one customer used as an acid test of text scan quality. Text users content became is one of evident. "informational the quality related categories where a Corporate Information Center quality" scans. The text that text was legibility. While this content they goal was shared users of differences in EIM cited was systems were among primarily focused groups of on EIM collecting relatively simple, and their quality requirement for by Library EIM 21 requirements users, some Library users wanted to capture the character and nature of the text. quality for this difficult goal the thinness of the The requirement to content was not mere capture Bodoni four legibility, but italic point to capture was one such requirement. "exaggeration between The the thickness and strokes" according to Anne Kenney, Associate Director for the Department of Preservation at Cornell University. Picture All image broad content content that was not text related was grouped under the and varied in their descriptions. Some . capture the 0.2 detail illustration mm that they ink in of darkness, and By way of example, Rodney Perry, and the pale purple create hues, brightness of of picture content. customers offered rather simple terrain lines in topographic maps. Others gave more specific and need to capture specific category and level Rochester Public were descriptions: book illustrations, demanding requirements, and citing, for example, the of saturation. Library cited a desire Cornell's Department an old manuscript. Customers of Preservation to capture the yellowed paper and Conservation wanted to relief, planographic and intaglio processes, preserving the evidence of the structure used to As content. part of a project for the Library of Congress, they had measured structures as small as wanted to capture. Quality All users of legible, systems recognizable comparison or EIM images, to a photocopier. background or Several had elements of shared correct lightness Nearly all said basic quality and they darkness. A few wanted users expressed sensitivity that the digital users cited application-specific capable of customers described at a minimum, general quality in by to avoid quality problems with bent corners, skew, spots representation not alter the nature of the original content. providing quality good video projection in required as the color was for use Patricia Pitkin, college classrooms and of altering copyrighted material. RIT's Wallace Library, distance needed quality learning applications. Suzanne group, wanted "highlight color" quality; perfection was not emphasis only. more stringent set of Cornell project is to as well as concerns with needs: Keenan, in Xerox's Electronic Document Management A far Everyone wanted, cropping. Reasons included the desire for faithful representation, Some requirements. quality EIM to requirements came capture the principle from the application of EIM to book illustration types from the 22 preservation. 19th A current 20th and early centuries. Because Cornell be 1 quite . wants electronic high. Cornell's Anne Capture detail 3 Capture finest detail . At the third level whether an fully represent the levels viewing distances at normal of an expectation was created via beyond the "good quality requirements can the unaided eye. with 5X loupe. the original means of production. which evidences quality, there is original content, their of quality: evident on close examination, perhaps with a illustration well quality which Kenney defined three Capture quality observable 2. images Calotype or to capture, in the scan, information that would distinguish Aquatint, via etching or steel engraving. Clearly this is a level input document were captured without of enough" level requested by many users. Integrity An important attribute to all users of omissions and procedures so without in their having scan systems was that all pages of the Missing pages are a common proper order. for validating the EIM job's integrity. Users wanted to check each item of the job. Those users problem, and several interviewees described to maintain the 100% job quality, but wanted to do doing OCR wanted integrity in the form of faithful recognition of text. Archival When users spoke about the permanence of information, they to the information. The interval varied between 7 the format of the digital information becoming years and cited the amount of time forever. EIM they needed users worried about to retain access both the media and obsolete. Cost Cost was profits. done important to everyone. For Libraries, EIM with that fixed Costs of commercial efforts were amount of money. demanded Quality levels To by customers scanning funded Costs in the low hundred thousands more flexible Corporate Information Centers, with grants. cited for EIM were cited as the systems were costs users, The are common. The driver for this from be six cents to ten dollars per page. wide range of cost. users surveyed capital expense had is high, of use. cost structure which could accommodate variable rates 23 directly to cost governed the amount of work that could scans ranged important too. Two cost translated scanners that cost and some users were Outsourcing $100,000, looking and for was one means of a achieving this hour per goal. Some customers, or per piece who had been scanning locally, had contracted and were scanning paying on a basis. Utility to EIM user A number of requirements related to All were organized users wanted together under the category of and ease of use surfaced utility to the during the interview process. the EIM system. At the most basic user of level, the ability to capture and store hundreds of thousands of images. Because of the cost of storing and managing information, they document features, functionality, content. Users wanted to capture wanted to be only the information that able to scan was necessary documents automatically and and sufficient to represent the reliably without operator intervention. Users, especially Library users, wanted systems to be able to handle fragile originals without them. Users wanted to scan documents like old books and maps damaged. An Users imaging wanted implies the ability to system missing page or edit content They wanted components of EIM infrastructure. Because the They also wanted information infrastructure. metadata from fragile or rare and were systems are systems to with job for image be integrity or clean up job quality spots or recognition errors. modular and easily integrated into an existing wanted systems to help them manage metadata, allowing them to information in a Similarly, which extract to add and edit metadata. out" determines This expensive, an upgrade must not require replacement of the entire system. Similarly, they systems to out" problems. to be able to integrate documents already in electronic form into their existing key document fields, They wanted easily that accommodated these requirements was desired. to be able to easily correct problems add a which were damaging have utilities information gets form, allowing only the for deskew, despeckle, crop, captured variable problems with skew, spots and in the scan, data to be cropping and is saved. color "drop sometimes used to "drop and OCR. Color "drop out" the constant This is especially important if OCR is to be can cause problems for OCR in used. addition to their obvious appearance problems. Speed All be users were concerned, to one extent or another, captured governed how cost effective it was to with speed of capture. capture a 24 large The volume of speed with which documents. One documents user wanted could to be able to capture a newspaper clipping in 30 seconds. Another site was scanning in 300 engineering documents per day. Yet another was scanning in 10,000 imposes different demands Very high rates Currently, on of "informational However, 10,000 as documents or forms per day. Obviously each of these requirements the architecture of the scanning device. per this means lower quality color capability. office day require an levels, discussed quality" scans, so as high automated speed scanners are using high previously, those quality limits document feeding and very high speed scanning. usually limited in resolution, bit depth and speed scanners were more concerned with issue. of current scanners were not an Turnaround time Closely related to speed was In customers. application of scanning quality some cases this meant EIM, medical was turnaround time. 24 hours Fundamentally, EIM less, in or information and the speed with which the forms for on-line others, 6 users wanted months or was required to meet the demands of their less. It in varied electronic access, overnight turnaround was essential. For the the primary requirement, and if the book or a widely with the form. For the user remanufactured replacement was scanning back user old books, on the shelf in six months, that was good enough. Utility to A number of requirements were related was very important. This is rapid access and 30 display. Users seconds to an eye to the timeliness and utility of EIM output to the end user. surprising asked for users wanted to users wanted searchable metadata, entire text of captured textual given that one of the advantages of "instant" access to Speed transforming hardcopy digital information, which of access to digital they translated is to a time of blink. Similarly, EIM Most not end user documents, provide for electronic the keywords that which demands that the distribution and electronic identify jobs. But entire a few searching wanted document be translated, of content. to be able to via search OCR into a the searchable form. EIM users were also concerned with the format and media used to store their and the desired form as akin to paper, universally easy to read display and print easily and quickly. end-users' were able examples, but users were concerned about the permanent. The TIFF long-term utility 25 and of these information. Users described The format had to be PDF formats formats. compatible with were cited as current the Some EIM material. A user specializing in binding, especially customers needed have users wanted when problems to maintain Engineering large bid the ability to do printing drawings packages were noted the need being prepared. printing large image files, so and print for it. 26 centralized Another they maintained a finishing of previously digitized large format printing or cited the problem that some of their local high-speed printer for those that CHAPTER 6 Summary The and Conclusions central goal of this research was to assess the graphic arts unrequited desire for higher quality existed. The question quality has two requirements of answers for EIM each of two users to determine if an different classes of EIM users. The first established a they had class of business established a EIM users was more or service huge based archive of on a interested in high speed and historically achievable images using level low costs than of quality, higher quality. usually "informational". Often Quality expectations were a standard process. They had well established and their customers were satisfied. The quality first. second class of users placed They were the means of capture, processing, and storage to be used. While were not willing to 150 page per minute and recognizable picture quality. users always The had applications The archived on paper. is fragile, or These an electronic in this goal document handlers served defines established for this disintegrating essential about a their requirements and interest in speed and cost, they The first meet the EIM quality systems with class of users 200 dpi maximum legible text requirements of the market: were always in this is the current EIM market. Library by this market. a new market market new market for EIM, is focused is to use documents. Achievement on which springs expediting EIM to from access to a very different basic information help preserve and share of that goal means that the EIM application which is the essential elements of system must capture all that document. users want to capture more than image They want nuances refining a common Monochrome scanning Corporate Information Centers second class of users for the technology. The rare, they had or these two classes of users define two different markets for EIM. by currently available EIM products. resolutions and defining compromise quality. Effectively, market served still which fully captures just the raw information the nature of the original. of pictorials captured. They want faithful 27 color content of a document. They want the original information saved. They want to font to be save reproduced. They also want to do this at production speeds without a libraries lot of operator intervention. The users with these goals were exclusively those in concerned with preservation. To meet these available current EIM system will need technical capabilities which systems with those of current capability will need to captured. Yet reasonable requirements, an EIM be close to those scan rates will need currendy Graphic Arts offered in the scanners. graphic arts blend those Resolutions, bit depths if small structures of to be closer to those of typical EIM scanners to keep currently and color 0.02 scanning of mm are to be costs at a level. This research focused these two groups preservation had function the application of on the requirements of the requirement elements of some in common, Library and Corporate Information Center market. a niche market library users. It would be interesting EIM to preserving and sharing rare or for higher quality became for to see future research focus more narrowly on fragile documents. 28 evident While Bibliography 29 Avedon, Don M., Introduction to Electronic Imaging, 3d ed., (Silver Spring, Association for Information and Image Management, 1996). Black, David B., Document Capture for Document Imaging Systems, (Silver Spring, Association for Information and Image Management, 1996). Cohen, Lou, Quality Function Deployment: how to Eureka, W. E. and make QFD work for you, Ryan, N. E.. The Customer-Driven Company, 2d ed., (New York, Addison-Wesley, 1995) (Dearborn, American Supplier Institute, 1994) Head, Robert, Document Management: The Essentials, (Silver Spring, Association for Information and Image Management, 1997). Kenney, Anne R., Digital to Microfilm Conversion: A Demonstration Project 1994-1996, (Ithaca, Cornell 1996). University Library, Marsh, S. et al Facilitating and Training in Quality Function Deployment, (Metheun, GOAL/QPC, 1991) May, Thonton A., Electronic Image Management Cases, (Silver Spring, Association for Information and Image Management, 1993) McQuarrie, Edward F. , Customer Visits : building a better market focus, Shillito, Larry M., Advanced QFD: linking technology to market and (Newbury Park, company needs, Sage Publications, 1993) (New York, John Wiley and Sons, 1994) Stofel, James C, Graphical and Binary Image Processing and Applications, (Dedham, Artech House, 1981) Turabian, Kate L. et al, Writing, Editing, A Manual for Writers of Term Papers, Theses, and Publishing), (Chicago, University 30 of and Dissertations (Chicago Guides to Chicago Press, 1996) APPENDIX A Interview transcripts 31 The interviews here for conducted as part of this research were recorded on audio tape. They are transcribed and included reference. Michael Majcher Manager, Xerox Technical Information Center Webster, NY March 11, 1998 Butterfield: My research defining surrounds diagram) a block diagram with some sort of the requirement for a system that looks roughly like this (shows scanner, some computer system for gathering that information, and then storing and for providing output of some sort. Majcher: We have everything except the CD-ROM writer. some means of Butterfield: OK. What I What things do you wanted to talk to you about was, overall what are your requirements for that sort of system. look for? Majcher: Are you focusing specifically on the scanner system or across the board. Butterfield: Across the board. What are the qualities you are looking for. Majcher: In terms are automatic skew of scanning, obviously throughput is the primary issue. The faster the better. Some of the nits to correct and insert documents easily. Resolution is an issue. Ease of detection, ability indexing or putting metadata on the document itself is important. Butterfield: What does metadata mean? when we scan internal reports Majcher: For example, if you. there is hybrid systems you can use of course, but. material for example, we have a data sheet with all the appropriate metadata for retrieval: author, tide, organization, . . keywords, etc. We can marry. . . so when we scan the . or the accession number and we can proposal link that document, all we really need is the unique identifier number the metadata to do the search and pull it up. In an invention situation, we don't already have that metadata already in existence. We need to be able to say, this is IP number such and such and give input Butterfield: OK. The is search. with . And invention metadata proposals on the fly. describing all is the characteristics of the document that somebody would one of the uses to which you are putting this use to system. Majcher: Yes. Invention proposals, translations, internal reports, some newsletter data, some journal data. Butterfield: Translations are translations of reports from Fuji Xerox? Majcher: Yes they could be the Fuji Xerox or else documents, journal articles, proceedings papers. Butterfield: In terms of . . . You important. What does that said throughput was Majcher: Number of pages, pages per minute that are enabled through the Butterfield: When you think of throughput, it is the throughput of input, mean? scanning system. not the throughput of output that's the bottleneck? Majcher: Yes. Actually, pushing paper through the scanner fast enough is a bottieneck. Butterfield: OK. How do you do that now? Majcher: We have a WD40 which is actually the scanner of the DocuTech. We put the documents through that. Butterfield: Is there a document feeder of some sort. ? . Majcher: Yes. Butterfield: What about the effect on Is that a problem? Majcher: Well obviously in a . the input. With a document feeder, you're not able to process a book for example. book or a monograph situation, you typically want to maintain that integrity, so that is We don't have the problem but a very manual operation on a flatbed scanner one page at a time. Now there's. that you can't spread the binding where material archival type older there are situations where you have material, .. have to do some optical adjustments and do Butterfield: OK. A CCD camera or something? and you a top-down camera capture. Majcher: Yes. OK. Butterfield: What's the level throughput. . of operator participation, of operator involvement right now? In terms of . Majcher: We've tried to keep that as low level as possible. Basically they take a pile of documents which has been identified by number and feed them through to validate whether. You know, they'll do a QC on it to validate . 32 . pages, folded pages, things skewed Butterfield: Let you of that nature. . rescan and . insert. It is fairly low level. I mean you don't need a do tiiat. computer programmer to me move now to some of your requirements for the quality What of this system. sorts of things do look for? Majcher: Readability. Bent pages, corners, Butterfield: OK. The skew. standard QC stuff. Majcher: What's the name of the organization, it used to be NMA, National Micrographics Association, but now its AIIM American Image and Information Management or something. We use the same criteria that are used or was used for producing of page microform as a orientation, very robust set of standards as to what density of the image, resolution of the image, is the level acceptable archivability, etc. We try of quality in terms to maintain those same quality standards in the electronic image area since there are no real good set standards out there. Butterfield: So you are using existing standards that are already in place for microform? Microform? Majcher: Yes, microform is microfilm, microfiche. . Butterfield: OK. Could What about the kinds of you point me image . Could to those later? content. Do you you look for in terms of the quality of those. Majcher: Readability. Printability. How does it look What do . right now, we use urn., the. . uh well on . photographs. Do I might you find those How does it look on the screen. but that's not scan. . . I on the print? Our system going to say was going to impact scanning. Butterfield: Ah. What kinds of things can go wrong for picture input. When it isn't readable, Majcher: Too dark, too light. Get moir6 patterns sometimes, it is pretty obvious to the eye. we also use the original Butterfield: What as Butterfield: What do Majcher: Well, is tough, and when it isn't usable. color. Reproducibility, legibility, consistency and white. you think are color PDF Is that a requirement? about color? black we use not Majcher: Yes, you got to do color. Butterfield: What are the quality requirements for Majcher: Same standards ? have to handle halftones? . the electronic side, image format but that's you point me to where have to handle of color to the original. the most difficult kinds of images to handle? no question. That's probably the hardest. Photographs would be next. The rest is sort of easy. Butterfield: Do you have to ever scan in kanji fonts or Japanese or Chinese characters? Majcher: Yes. Butterfield: Do those impose Majcher: It depends mitigating factor resolution nature of this system. Majcher: We had two outputs. right to the What kinds All the images desktop. So Butterfield: OK. Are any of your is an image representation. of output are maintained legibility to the desktop people still want paper so we use the electronic customers for readability? is high enough, it is Sometimes, trying is the quality of the original. little bit more difficult. copy is a Butterfield: OK. The browser If your there xerographic web special requirements on the system on your resolution. image in is do you need a central real from this database And again be right legibility there I think now. retrieved by way of a is that second output searchable content right now? The generation system and can important. The as a print master. asking for not that much of a problem. to scan a fax or a fourth some is important. what described so far Majcher: Yes. Butterfield: Does anybody want searchable text. Majcher: Yes. We do that on some editing on the the word OCR because it's "the" doesn't chemical structure a documents, we do OCR but we do not do extensive 99% accuracy range and consuming and you know, if you're in the documents. On little too time get recognized, who cares. doesn't get recognized some This is once, but a throughput production type gets recognized the second picket fence still searchable so we don't issue. If phenol time, it is worry about that too much. Butterfield: What kind of tools do you use to do that. Majcher: Xerox Text Bridge primarily (laughs), mostly Xerox products in OCR are pretty good. Butterfield: Do you ever need the same output, or the same content accessible in different forms? Do image outputs and searchable outputs at the same time? 33 you need both Majcher: It depends on the document type, but we try to keep and image of the document in PDF as the least common denominator. For example, a Word document written in 3.0 Word won't necessarily be readable in Word 7.0. Butterfield: Yes. Majcher: So also rather than go through and reformat all the takes care of the photos, halftones, image documents, we maintain an image database in PDF that type problems that don't necessarily show up in a flat ASCII database. Butterfield: OK. Majcher: So we keep the flat ASCII database or a meta database and then long do you need to keep this information around? the image itself. Butterfield: How Majcher: Depends seven-to-ten year on life the document type. Technical internal reports are kept forever. Invention proposals have a There's. span. of the documents about three inches thick and keep for . . Almost depending a maximum of a year and then Butterfield: And that all corporations have what they call retention That lists schedules. all the corporation that are produced and minimum legal requirements for retention. Ours is within holds true uh some things you keep forever, some things you destroy. the things that you'd be scanning in on this system. of Majcher: No. We typically don't the document type., on scan in anything that has less than say seven year requirement. It is not cost effective. Butterfield: OK. Majcher: Unless there's other requirements over and above Butterfield: I document that guess that less than legal Say you retention. have a ... it's hard to think would be available probably keep tradeoff between its environment that's not created electronically. There's a in a workflow already electronically inherent format, initial creation format, attention period, value of use and number of people. right now of a you would brings us. . a year that you would scan so Two things pop into my mind. One is the kinds . of it costs, but let me come back to that. The other is do you this system beyond the present day. How much of you input now, how much of the things that you are now scanning, that are legacy documents, do you expect to add in the years to come? long term, it is kept basis. In the research environment, even in the engineering and development environment when people are tasked for a new subsystem or something, they may go back ten, twenty, thirty years to see what's over a long already been designed. An engineering design doesn't necessarily lose value over time. That's kept Majcher: Again, let's take internal reports for example. That is our technical memory. For on a permanent term. The reason we use packages and that sort of Butterfield: And that's PDF as an image type is that thing. PDF is, in a sense, the we don't have to worry electronic equivalent to about version control paper for various that can always be recalled. what you need. Majcher: Yes. Butterfield: I get in guess what electronic I form. Are was trying to you even get at was going to need how much of the inputs that to do scanning in five you are scanning are you going to Or are you always years or ten years? have to get into electronic form. going to have paper documents that you are going to Majcher: That's an interesting question, and it can be debated in a lot of ways. Ah, 99.9% of all the worlds of 1% is electronic. Now, people say there's information, right now, currently, today, is in a paper format. that's true, but that accounts for less than on the web. of pages Yes, hundreds of millions or billions electronically l/lO* knowledge. On the other hand, at least here, in a practical sense, we are seeing a significant generated for archival purposes, they're mostly or 80% more, in the number of paper documents actually decrease, collection packages and are available composed in word processing packages, or spreadsheet packages, data them in an electronic version. That distribute created electronically, are stored electronically, and we 1% of the world's electronically, acquire. So on the one hand I'd say there will probably explosively growing aspect to the documents we future for a declining percentage of documents created in foreseeable continue to be a scanning requirement in the micrographics background, and one of the things you do is you start it today and you hardcopy. I come from an old is an decide whether it is worthwhile or cost effective to scan in a what goes retrospective collection, forward from today on. When because that's very expensive, we set up our system for if you start today scanning only system because we only had a internal reports, we started in June of '89. In July of '89, it wasn't a very robust retrospective data that was captured as part of month's data on it. But in March of '98, we've got ten years worth of start at day one and work your way the ongoing process to deal with it. As far as the retrospective, you can either Bradford-Ziff law which says that your old The back from that point, or you can capture on the fly as required. or and start 34 usage declines over time, 80% of your usage is in the first three years and by year ten, you are looking at a 1/10* of a percent that a document will ever be used. Well if a document of that time probability of less than a period is requested, we will take it and scan it and put it in the system so that we probably capture most of the documents that would be in demand over that period of time without having that whole cost/time expense of capturing the That entire retrospective collection. you just box up and put over in Records Retention it and pull out it. when you need Butterfield: Costs. What kind Majcher: Well, it's lot of costs do you associate with this kind of a system? labor, any software, its an up front cost, cost, in essence all the storage, hardware and retrieval software are ongoing sunk costs that you are going to use for the retrieval system for the capture portion of it, so it is really smeared on that side. Scanner, some labor, a cheaper than microfilm. Basically, it's a scanner, one time some disk space and that's about Butterfield: So you don't Majcher: Well, it depends internal it. associate any sort of click charge with this sort of system. I'm charging the customer for a service or whether I'm on whether doing it as part of an process. Butterfield: But in terms of your click charges to a vendor for Majcher: Oh, If it is yes. costs, for example, are there any circumstances like this? under which you'd consider paying a system cheaper than buying the hardware and having the labor done. As a matter of fact, we do outsource some of that. Butterfield: In need multiple terms, rough Majcher: Depends pay for a system like this? depends on the documents, quantity required, depends on whether you depends on whether you have it indexed with metadata or whether you already have can you tell me what you'd expect to on the volume of formats or not, the that as part of your process to generate this information. Butterfield: Well, given the requirements that you've described to me, what is it worth, or what have you spent on this equipment. Majcher: Well, right now we're outsourcing most a couple of hours a day, based on our volumes, the anywhere a dollar from fifteen ten to I a page area. five dollars cents to can you specific prices of our work. When we were scanner was a couple grand. per page if you doing The it internally, we were working bureau's charge service to have it scanned. We're probably down in the less than want. service bureaus are charging that wide range, fifteen cents to five bucks, are there associated quality levels? What accounts for the spread? Majcher: Greed, whatever the market will bear, the documents themselves, is there a lot of prep work involved with it. Do you have to remove staples, there is color so you have to adjust image density, things of that nature. Butterfield: When the Butterfield: I think that's gotten most of my questions... Let me look here. Print finishing requirements. You said that sometimes customers need hardcopy. Majcher: We use it as a print master. Yes. Butterfield: Do they need bound documents. . . ? Majcher: Sometimes. Sometimes Butterfield: What kinds of bindings. Majcher: Spiral, perfect, you know, depends on the customer. We're finding now, as long as the stuff is available on the desktop, a lot of people are printing selected pages at their desktop printers. When they do a formal full document, depending on how much the budget center is willing to pay, or how much they are willing to pay, they . . spiral binding, to perfect binding, to three-ring binding, and everything from a large staple in the comer, to those all have associated costs to them. Butterfield: OK. What level of integration do you expect from a system like this? Can it be modular, an integrated can get system. Majcher: No. I prefer the modules. What I need is a scanning operation that can acquire and feed files to our I'm not going to change my retrieval going to change my database environment, ongoing requirement to accommodate a scanning module. All I need out of a scanning module is the throughput and the to take those files, and either in batch mode or individually, into the existing system. and the system. I'm not ability quality Butterfield: The scanning system would have to be compatible with your existing database and setup. Majcher: Yes, but that's pretty much a given. If you are doing a product design, you know, you are not going to design an entire example. database, If you try storage, retrieval, dissemination system around one component, a doing that, you'll end up like scanning component Wang did and not sell a whole hell of a lot of them. The 35 for components really need to be modular and they need to be standardized so they can be integrated fairly quickly and easily. Butterfield: OK. Ah. I tiiink tiiat Would you mind showing gets most of me what you are my Some questions. doing today other things. and show me the to a couple of your folks and ask them some of these questions. I'm I wondered hardware that hoping if I could see... you' ve got. you can maybe point me If I could talk to some other folks. Majcher: OK. Who do Butterfield: phone Somebody interviews Majcher: Here's I'll you want? geographically nearby, a list of my counterparts give them a call and make the Butterfield: all of this other companies maybe, Syracuse, Buffalo. I may be able to do as well. OK, I understand. information, and that in tliis introduction. I I'll corporation. go through that. from others, can I Put a can't give you The come other back Majcher: Sure, sure. Butterfield: Thank you. [End of interview] 36 check next to the people you want to talk to and that document. thing I and wanted to ask. just test to When I eventually I got it right? make sure synthesize Frank Belli Xerox Technical Information Center Webster, NY March 11, 1998 Butterfield: I'm speaking with Frank Belli. Frank, what is your title? Belli: I basically work for the Technical Information center. I prepare information to Butterfield: I think I've seen some of that information on line. Belli: I the database manager Basis system of IDI of use Columbus, Ohio, Belli: Basis is a relational database, like Oracle. The reason I selected Basis have the ability to search for text longer than x number of characters. That Butterfield: OK. So searchable text is a big requirement? Belli: Yes. Some Butterfield: This of the reports are system that of the Uh. The I documents pretty long. If it was are in form, 80-90% electronic be hardcopy, each page can Butterfield: be a format a minimum of two to three thousand . . . . . So those . are the ones that right now we are if you have three [unintelligible] bytes. What we have done is convert or four pages to PDF so that the sent. Why is Belli: There is printed already in electronic form so we don't have to And uh, we want to make those documents are of report, at that time, Oracle did not limitation for us at that time. those are the problems. electronically so people can see them without having to Now scanning of course of images are hard to do. So documents big scanning and capture of documents in important requirements in your mind for that system? most ones that would is Oracle over was a now, maybe it would be Oracle. scanning. available be Batelle. showed you a rough picture of, What are the them. saving Belli: Most scan. which used to line. Oh, Batelle Research Institute? Butterfield: and make available on lot PDF better? of printers that can print PDF. TIFF images tend to require special softwares on the printer. Butterfield: Do PDF files take up less space, are they smaller? Belli: No, not necessarily. TIFF files are compressed, but we convert to PostScript and PDF so people can print them. Butterfield: The quality requirements of scanning systems. What is involved there? Belli: Quality is as good as making a copy of the original. So if you have an original, nothing special, eight and a half by 1 1 quality is , you know good. Butterfield: So the quality of the system Belli: It is as good as a good copy. you have right now, it is a good as a good copy. good enough? Does it need to be better? Would you like it to be better. Belli: Simple text, simple drawing, something like that, it's OK. Problems come with pictures. Some are in color. Then the quality is not good. Now you can ask, why don't you save color. The question is what do you do with it. Butterfield: That is The image is big. Why is it bad if the image is big. Why is t understand] [Shaking his head, Butterfield: that a problem. doesn' Belli: Butterfield: I know much some of these questions sound time to transfer the files dumb, but I Belli: Yeah. Right now it takes too much time. We there. It's about five months want to make sure I understand. Is it that it takes too over the net? used to do but our own scanning, now we send it out. That's it work. Butterfield: We're looking at a box. Belli: That's about five months work. . Butterfield: Five months work is . fitting in will it cost to get this scanned? Belli: Because I don't have too many, it is Butterfield: So how much Belli: That depends on the amount of money something like that. Butterfield: And that is black Belli: I used a box that's cheaper will you spend time, but and white only. . about one to send it enough out than to one foot maintain by two feet, full of paper. What the system now. to have those scanned? you are talking about dimes per page. Ten to twenty cents per page, . being charged so much per hour. Because I don't have that good quality. to make it worthwhile. My vendor does a good job, they do to be charged so much per page. Now I'm many documents. I don't have foot, by 37 Butterfield: What does it mean to have good quality? Belli: It means to make sure that all the images are there. No pages like distorted, there's Because that becomes the document. Butterfield: When you say distorted, you move your hand like that, you mean skewed. no pages missing. Belli: Yes, instead of coming like this [demonstrates straight with hands], it comes like that [turns document]. Because of the value of the document, you don't mind spending a little bit more to get it right. Butterfield: What about the font size? The size of the character? How small can it be? Belli: Eight point. With resolution, we don't have any problem. I think we are 300. That is been so long I forget. The other problem we have is colored region, [unintelligible] Butterfield: What is the Belli: It nature of the output of this system? always electronic? serves two Butterfield: purposes, electronic or copy material. you're providing that electronic material via the internet on a Any Belli: Yeah right. We have Butterfield: The [unintelligible] a document At the search, on a server. web page? [unintelligible] getting back, they are images, not text? I can't search for text. be to OCR but that is not necessary. There are enough keywords for people, on line the keywords. scans that you are Belli: The only way would to provide Butterfield: So the keywords, which are searchable, let document, looking at the image is good enough. Belli: Is it It has good enough. find the document, people they've found the and once [Nods.] Butterfield: OK. Belli: Customer now sends Butterfield: You Belli: used documents electronically. People documents can scan the electronically. to scanning in house. [unintelligible] Butterfield: Since you used to do that, could I ask you some questions about how automated would you want the scanning process to be? How much operator intervention do you want in that process? Belli: Uh. If you answered that question two or three years ago, I would have answered what I wanted. Now, as I say, [unintelligible] Butterfield: You that provide this output electronically, but if the customer calls and says, I want a print, do you provide hardcopy? reports that are scanned, generally we provide the hardcopy because our printer can make faster copy for them. We have special board in it. Some people cannot get this. A 400-page report can take a long time. Butterfield: Roughly, what kind of turn around time do you need for documents you scan in. Belli: Now the Belli: For scan? Butterfield: You. Belli: Normally, you want we send out and we get in two-to-three days. Provide once a copy, it is scattered. I mean, I really, to talk... [unintelligible]. Butterfield: Now Belli: Four right now, you are months. . . four collecting this stuff for a month... months. Butterfield: Right, four months. Belli: So right now, it doesn't make that much difference. Butterfield: I think that answers most of my questions. What I many people, I talked to Mike, I talked to people at Kodak. Belli: Have you talked to people at Records Center? Butterfield: No is there someone I should talk to there? Do . Belli: Yes. They do quite same one that I Contract House a bit am using, now used of scanning. they use the The have Service - doing with this you have information, I am Belli: Sure. You want to see our Butterfield: Yes, great! scanning talking to a name? . . 274-9125 a QFD House Butterfield: Thanks. Anyway, together, into a matrix. putting to come back if I could when I put all this information together. Did I get it right, etc? I am there, because they use. they used to use the in the Midwest. Might want to talk IBSI (Scanning a real value somewhere by Xerox TIC) Todd Baitsholts am . all this . station now? [End of Interview] 38 .into of Quality. . . I'd like Carl Herrgesell Manager of System Development XSERV/VMS/Electronic Document Management Webster, NY April 13, 1998 Butterfield: Are you using a system like this example? Herrgesell: We are using a number of systems like this [the example]. Our business is divided into two halves, Engineering documents definitions not and Office documents and I can give you better those. And we're attempting to use different platforms and technologies in those two areas. There is a lot of merger across them. Historically, two groups merged January a year ago. There has not been a lot of of platform merger. Butterfield: The image Herrgesell: On the images only, gray finding I'm it's quality of Engineering side, scale. Originally we which that the quality of not sure 400 dpi no and color are scanning information, however, is function a square these systems. How do you evaluate the quality of these systems. is the side I can speak to best, we are scanning to black and white or not so there hardcopy documents good enough at 200 dpi 400 dpi. We or aperture cards at is XA the storage, but close to it. The reason we were scanning at kanji characters, and its my understanding that has not been a was to make sure we could capture readable So problem. There's no there's no color. some of the scanner's we use will scan Butterfield: So you' re Herrgesell: Not only capturing on the Engineering dimension, color gray scale and then and even binary images. Desire side. On the office is actually scale gray thresholding determines where The scanner, avoided. black starts. to capture color? Is there any need to? side, I'm not sure and I would refer you to one of the in my group to talk about that. I'm not even sure what resolution we're scanning office documents Probably, let's see, are you familiar with Cofax Ascent? That is the equipment that has been used at times people They could be at 600 dpi office side. Butterfield: But as far we have. This Engineering might be a is which as you're concerned, Herrgesell: Yeah. On the images are which reduces our storage requirements. 200 is side. but I'm fairly standard, good enough even That is document count, not an on the not sure. for kanji. . . the database now, I'm trying image count, but 810,000. I with other at. about to remember how many can check on whether that is documents or images. Butterfield: 810,000? Herrgesell: 810,000 Butterfield: I I need to be single sheets or documents that we have in a live database. should tell you right now, if anything you plan to tell me is proprietary, able to publish this outside I don't want to hear it because [Xerox]. Herrgesell: I've done this in demos. In benchmarking, it looks like Xerox is ahead of our benchmarks. It is good information. Basically, all our active build-authorized engineering drawings are in this database, and they are distributed globally. Butterfield: What kinds Herrgesell: We have a of things are with multiple settings on end up do, in no. out with a residue of images, image, have So it is important for we have you now? the scanner, changing the threshold, of not us that we scanned in a flag historical I don' whatever database, we Even cards well. the scanner offers. Often, we t know what the percentage is currendy. We that we set that says, somewhere in this image is a bad do that. Now because our being able to scan old aperture despeckling, parts of which are unreadable. our metadata about an database, quality problems for decreasing historical quality problem have taken care of the and currently, we are image, yes or history that we need to put in getting most of our information Quality couldn't be directiy from CAD via HPGL, a neutral format, or field-authorized engineer Particular process issues. around better. Quality problems we have tend to revolve procedural than more It is over to us. it before of the area configuration person fencing the right sending drawing you are going to have quality that is excellent. anything to do with image quality. Butterfield: Could you describe for the record Herrgesell: An aperture card is what an aperture card a microfilm version of a Butterfield: So it is just another type of microform. Herrgesell: Right. It is a Hollerith card, usually with metadata, and then is? drawing. punch data there is a window that has a piece of film. We 39 on it that identifies the drawing will put on aperture cards with some of up to E-size its drawings, actually AO in the ISO standard and possibly beyond that. The quality is hopefully good. Nonetheless, we getting entirely away from aperture cards. We are what the best method of long-term storage investigating Butterfield: You you need to mentioned the drawing sizes that you have What on aperture cards. handle? range is pretty much from A through E in ANSI sizes, or A4 through AO in the ISO identical. We have standardized on the ISO sizes for what is kept in out database. Butterfield: What kinds accessed of output do from the you need sizes. What form does this information take system. will hardcopy document sizes Herrgesell: The are almost are be. do Those when it is later. Herrgesell: Basically searching and viewing. You may not have thought of searching as an output, but it is function for us. People have to find drawings first and then make their choices. There are some additional needs for downloading files in electronic form for use with other software or to ship in packages to vendors that might be bidding. certainly a critical Butterfield: When say searching, is that just searching based document. you the text content of the Herrgesell: Currently, it is metadata only. Now the on the metadata or answer to a number of do customers want to search on these questions are different over in the Office Document Center. Butterfield: So you are Herrgesell: For the metadata. searches on speaking for the documents. Engineering variety of reasons, basically looking at the And in fact, in practice here, most of the users just a That's part number and sometimes revision. costs and benefits, access needs, engineering documents of within we can get basically kept along the scope of our system, which with it to just by the way is XVP (Xerox Virtual Print). Butterfield: The form Herrgesell: Single of the output product, does it need to be bound? I'm not sure we're using a plotter that rolls drawings. Butterfield: Back to scanning for a moment. What level of automation do you need from your scanning systems. Herrgesell: With engineering documents, particularly, there are likely to be a variety of sizes we are scanning, so we prefer scans and sheets are acceptable. scanning manually. In other words, we don't put a stack do quality checks on a screen before they are saved. Butterfield: So you are viewing them on a CRT before they of sheets and put them in. We use individual are stored. Herrgesell: Yeah. Butterfield: Are you comfortable with Herrgesell: We are comfortable though we are live with Xerox Engineering on work process? in the Systems ES8180 in That machine old scanner were using. that our will scan and Are you midst of an Doc Center plot, it will looking for improvement has which funnel off improvement? We right now. a much faster are scan per beginning inch to go rate than the drawings to a database. It has everything the diagram you showed me except the CD ROM production. Butterfield: OK. Do customers are finding you that worry they Herrgesell: Its is not a problem with us. Butterfield: How about requirements for scanning on the input? For example, with book scanning, some document in its original form. They have to slice the binding. about the affect of can't preserve the scanning speed. How long does it take. How fast can you scan and what are your speed. The one set of numbers I Herrgesell: I'm not sure how fast the operation will be with the ES8180, but with the. know is with the aperture cards next door. On a good day we can scan 300 images. Now that might sound . . terrifically low, especially since we have the aperture card scanner with a very high rate of speed that cost us over a hundred thousand dollars, but the issue is checking for quality and adding metadata to the images and collecting the images into documents. Once all that work is done, the throughput drops considerably. And there is even a further overnight check on some of the correct some documents through automation. Not all of them will make it, the operators have to drawings. Butterfield: What is the Herrgesell: The overnight check? overnight check is checking again all the metadata that has been entered, and cross checks between different metadata items. Butterfield: A second shift operator does that? Herrgesell: It is done automatically by software during the error messages. We are aiming for 100% quality. Butterfield: And by that you mean? 40 night. The regular operator the next day would read the Herrgesell: No image errors, no data errors, no document Butterfield: What is a document structure error? Herrgesell: For example, a document in misplaced into another document. Butterfield: What kinds of costs Herrgesell: I don't have do which there should be seven sheets, two are missing or could have been you associate with precise numbers. structure errors. We have this sort of system? gotten numbers on some of these operations with the use of activity based costing, but I don't know what the numbers are. I could send you to someone else for that. For aperture card scanning, basically we have a full time person and a Unix workstation and a Photomatrix scanner that I Fully mentioned earlier. utilized, one shift, every day. And I think there will, at least in the short run, be a similar person, workstation, and hard copy scanner operating throughout, one shift a day. Butterfield: Earlier you mentioned die cost of the aperture scanner at around $100K. Would roughly the same price for the hardcopy scanner. Herrgesell: I'd probably want to ask would I pay the same for such a machine today. No would be cheaper today, significantly cheaper, more reliable and smaller, and so on. Butterfield: Herrgesell: for Hardcopy scanner? Hardcopy scanner, the you expect to pay an aperture card scanner Well there are scanners and scanners. If we were going to get a replacement something like half an inch per second, the price may have dropped moderately. I it originally cost, but it was expensive. Now it would not be the price of a flatbed . . . our old one which scans at don't remember how much 8.5x1 1, it would be an engineering scanner which is relatively expensive. I would say four like that. The ES8180 on the other hand, fills a significant part of the room next door. It is a figures, something different niche and it is in the very low six figures. very scanner that would do Butterfield: So one to two million dollars? Herrgesell: The low hundreds of thousands. Between one and two Butterfield: In this shop, what is the volume of documents that Herrgesell: Let me look up some of my figures. hundred thousand. are captured. Butterfield: OK. Herrgesell: First And the average distributing, I'm of all, the volume, about 810,000 images. The average kilobytes per image is 1 12KB. In terms images per document is 2.4. actually scanning per month and 8,000 which may sound. for the Office I'm going to say it may be around be low, but this is Engineering. I could look that up too, but it would take for the documents you are capturing here are primardy engineers. not sure about that. document business that number of of the number that we are . . would Butterfield: The Herrgesell: Engineering, customers . me a while. . manufacturing, procurement. Butterfield: You've described quality attributes, costs,. deliverables. What are the time requirements for document capture. What are the turnaround time requirements? Herrgesell: All users of computer systems would like everything happening within an eye blink. What we have our average for getting managed to do is give 24 hour turnaround once the document reaches us. That would be . . overnight check. Which is good, obviously we don't have a something in electronic form. That is including the a while for documents to reach us. So sometimes our users out take large backlog. Now, for various reasons, it may In frame. longer time addition, we have a system which delivers drawings around the there will perceive a much And once a drawing, once the first 24 hours pass us, our Mexico. in world to sites Brazil, California, Europe, median delivery time to the rest of the Butterfield: What does that Herrgesell: stand world is less than 24 hours. The name of the system that does that is WIMS. for? Worldwide Information Management System. incremental quality requirements, perhaps at the tradeoff of cost or deliverables? At this point do you see a need for higher quality? For color? For searchable text? requests for better search Herrgesell: I'm not aware that there has been a demand for any of those. There have been Butterfield: Could capability but it is you not justify drawings necessary honorable by OCR'ing the OCR do that for you? Butterfield: Why won't Herrgesell: Most of the words on an engineering drawing are words themselves. like "dia" or blocks of text that appear on demand for this, but there hasn't been there could be some every drawing. Theoretically, and probably practically, project so far enough to make us look seriously at it during the lifetime of the an which is five years. There is more of interest in faster delivery. Butterfield: So if you had to improve one of the three attributes, quality, cost or 41 delivery, it would be delivery? Herrgesell: chronically If delivery asked And I'm means speed, yes. for connecting our information to would go to a configuration system to get a would like that Butterfield: system and our system to So they'd like to list of the be the see a connection category this falls under, but not sure what configuration information. Remember drawings there and to look we are also earlier I at a sub-assembly? said people The users same system. between the image information in drawings and the configuration information that links those drawings together. Herrgesell: Correct. Nobody really has problems with the images. It is the access... how to access them and how fast they can get them. Butterfield: Let me ask you something about the future for systems like this. Do you think that scanning drawings archiving them is something that this group will do for some time, or will that get phased out with drawing eventually already coming to the repository in electronic form. Herrgesell: There will be an asymptotic curve. It will phase out, but it will never go away entirely. Also, there are and bodies drawings lurking in the company that need scanning that have never been caught up in an electronic For example, we are currentiy scanning all of our facilities drawings in as a separate project from our engineering drawings. And there may be more such items out there. Now there are people in my group that have a better answer to that question than I do, but that is my impression. We are headed toward, basically a lights out electronic operation, but we will probably never get there. of system yet. Butterfield: What do Herrgesell: There by a lights you mean would not have to be out? There a staff running manually. will be an even longer term need for staff to do output preparation. Butterfield: And by output preparation, that means? For example, putting together a bid package. If, let's say 400 or more drawings are needed by a to be put in a form expected by a vendor to bid on, it is more cost effective for that person to ask person, purchasing our Doc Center to put that package together for them. So that's part of our business. We make a substantial part of Herrgesell: our revenue that way. Butterfield: I forgot to ask you about output they TIFF files, GIF files, Herrgesell: None of the above. . but with a . formats. These are image file formats. They are binary images. Are . They are proprietary internal header. The stored in MMR reason format, Modified Modified Read. Which is nearly TIFF for that is that's what XVP which is a fairly old system could handle. an externally standard format? going to that. We are upgrading XVP. As part of that upgrade, we will migrate all our images to TIFF. Which will solve a number of problems for us. Butterfield: I think that gets me to the end of my questions. Can I come back, after transcribing and processing Butterfield: Is there a need Herrgesell: Yup. And your comments to for we're bounce the result off you and make sure I got it right? Herrgesell: Yes. Butterfield: Let me thank you again. I have taken Has this been helpful. more time than Herrgesell: Butterfield: Yes it has. Thank you very much. [End of Interview] 42 I said. of Suzanne Keenan Manager XSERVVVMS/Electronic Document Management Webster, NY April 16, 1998 Butterfield: What kinds of things do you do with Electronic Image Management in this shop? Keenan: The Electronic Image Management Group has responsibility for two areas. One is in engineering drawing image management and we use the DocuPlex System XVP as it is known now. We've spend the last couple of years engineering drawings that were being designed, built or reman ufactured and scanned DocuPlex. And now what ve done it to implement DocuPlex out to all of the engineering manufacturing worldwide and they can access the engineering drawings right at their workstation. all of the converting those and put them in design teams and we' Butterfield: How do they gain that access? Keenan: There is a piece of client software it it. We or print in the called EDVP and we can download that to their system. That lets them putting up another product that is a web access which is called IDOCS. And in the engineering world, we also have the engineering drawings and we have CRLV's which are the life variance changes and then English translations if it happens to be a Fuji [Xerox] drawing in kanji. Butterfield: I should say this up front that because I am doing this a research project for RIT, please don't tell me view anything that is Keenan: OK. are also process of proprietary. Butterfield: You said there are two areas and you've Keenan: And the other area which we've just active drawings that they are retrieving always accessing those we are going to migrate to that. using DocuPlex, in Europe, in Brazil, Toronto. We have almost a million images on the system. given me an Butterfield: If are the high level had to pick, you and one of the things that we are trying to do Engineering China, Japan, side, we and then have is called a thousand customers different parts of the US, overview, now if I could ask you some questions about what are the high level requirements of the systems that you Keenan: What . product that just came out that Now in the around the world that are Butterfield: You've . scanning those in two systems. One is the Xerox that is called Excalibur and that's developed by a company and the other is a third party product Excalibur Technologies. Then there is a next generation Retreivalware. And side. frequently and DocuShare product named Engineering up responsibility for late last year, that's office documents, internal customers throughout Xerox that have many, many files, really general office documents. And there are four-drawer files, filled with documents and are is take just described the picked have these jobs for you now. doing requirements. what are the top . . ? ten most important things about the systems you use to capture these documents. Keenan: Well, that it is high quality. Now, it is interesting from a scanning perspective, one of the things in the Engineering world, it was very important that we had 400 dpi; 200 dpi really did not meet what we need because kanji, so that for us was a particular requirement. of those kinds of things? important and that was one of the reasons we went into it because we were Speed is Keenan: Well definitely. very We were trying to take a number of weeks out of the cycle, so we process. product improve the to delivery trying needed to have a very high speed from a retrieval standpoint. I mean, people did not want to sit at their workstation Butterfield: What about speed and cost and something to take a look at an image. It used to take them four hours before they could see been probably one of the biggest issues for us. Because even though there is tremendous value add for the customer, it is very hard to capture what the benefits are to getting a product to market more quickly. The see it as much more expensive, which in fact it isn't. It seems to be a struggle, so cost is really and wait fifty seconds it. From a cost standpoint, that's important. And or one of the reasons we are going to the web is that that will help us reduce the cost significantly. So that has been a real requirement. Butterfield: Could you prioritize for me how you would prioritize those three Keenan: Cost first! Butterfield: Cost is most important. 43 things for . . . Keenan: Yes. Then I say speed. This is kind of odd, because quality, you would think of as normally first. quality, many times that improves the quality of the engineering drawings anyway. Quality not a particular issue in this case which was interesting. But because was would of the scan Butterfield: Costs. How do you evaluate costs for systems like this? How do you rank one system costly than another? What are the factors that you look at? Keenan: When we looked at it, this is kind of a unique situation because Xerox had an image system, it a moot point on for We looked us. different image at a couple of management systems, and when we less as more or was kind we started, of really the lead edge, there were not a lot of systems available. And we looked at the quality, we looked the cost, we at the speed, we looked at all the capabilities. DocuPlex in the end was the least expensive, but part of it was looked because it was the Xerox product. Butterfield: How is that Keenan: What did we sort of a system paid was activity-based for? and went through and costing identified all of the expenses associated the engineering side and the office side and we charge, we have a pricing methodology that we developed and what we tried to do was establish fixed costs which was what was going to keep the system going. So it's the server with and the software and the systems administration people, everything to Basically the infrastructure. Butterfield: . Keenan: Exacdy. And then One control their cost. $1 19 turned out to . . . We the doors open. did was, they by the we developed methodology that helps the customer The first year, we just said, OK, this is a variable cost were real concerned about. . . number of customers that were on the system and it just happened that it but they did not like that because it was to this fixed and variable that said, Alright, a month per customer and that recovered all of our costs, one single cost and here's they had no get our funding So control. what we did was we went through what we call transfer agreements, then individual journal entries. particular group has. So if a group has a hundred customers and they have 50% of the images fixed cost. For the variable, they can control that. We give them get a percentage of that image The fixed to them in a transfer agreement. It'll be a portion and it is based on the number of customers that a given they what we the things that the total cost is. We divided it what is of keep . retrievals a month, and then that is an area that we're they pay a price for every other struggling with, but do we want one. It is really not on the system, then a certain number of a click charge exacdy. to give them a sense of feeling that they And can control their costs, but we recover our cost. Butterfield: You're not looking for that same sort of Keenan: We'd like that. Because that's been relationship from the suppliers of EIM systems? There's really not the capability to do one of the problems. click charges, and that would be the best solution. Butterfield: How do defined what of looking is instandy. Is that 10 for Butterfield: is the So I'm not we very in terms primarily 7336 PhotoMatrix you Handling Butterfield: If I doing document could connect you side. side. We are But if you to someone. The I'm not really it takes using XDDS wanted you to capture a as the scanner Butterfield: And looking for using Engineering documents, we are using sure what the requirement or what the document handling capability on the scanner? what? Office Keenan: On the Office side, office documents. a Xerox capability is in but the thing that we are looking for is a scanner which improves productivity of So as fast as possible. Now with the paper scanner, it depends on the sizes so, a copier, Engineering on the document? for to talk to someone who does a lot of the scan. from the were Keenan: Not in the Butterfield: you went having it come seconds. what the exact numbers are. Butterfield: Are Keenan: Engineering aperture card scanner. particular products, the individual who is I don't know 30 to you? up instandy. Now we haven't I don't know if Carl [Herrgesell] really had into Microsoft and opened up a document, it comes and of the speed and turnaround time that scan on the documents, 8.5"xll", I terms of those seconds or But basically if conversant on the office office and a 2 speed mean document expectation. Any concerns Keenan: Now a particular seconds or that number when you talked to him. right up, that does you assess speed and what Keenan: Speed in terms side I could drop documents into an because the documents side? yes. Any other thoughts about speed or throughput? 44 are so automatic large. document. . . Keenan: One thing from a handling side, there are times when the capability to reduce a document and sometimes they are oversized like a 9x12 and we're trying to scan, be 8.5x11. We have some problems like that. It wouldn't come real high on my priority list, but if you are trying to capture all of the requirements. Butterfield: About quality. What are the characteristics of a job? high-quality Keenan: From image quality an Butterfield: Just standpoint? generally. Keenan: Well, readability. On the Engineering side, because of the quality of the drawings and the age of the drawings, it is important, from a background standpoint, it is important that we control background color. For instance, we is have got a lot not speckled. we be able lot drawing that are old and were hand drawn with pencil that has Quality for us too is being able to have a drawing that to despeckle and that is important to us. Deskewing so that we have it of sepia and a It is important that smeared. of to do some cleanup. We have the capability Those are probably the biggest aligned properly. Butterfield: Is ones. color a requirement? Keenan: Yes. It is We are getting more and more customers giving us drawings do is highlight color. Not necessardy color, you know, all wanting is particularly being asked for. becoming more of a requirement. that are black and white but what they are different colors, but just a highlight color Butterfield: And do they want to see this to color reproduced from their original documents or do they want to add it to them? Keenan: it. Now in the They want to reproduce would expect that we will as office documents, time goes on, but there is not a at this time, big requirement for seeing color. But I right now. But when I look we are not that using Microsoft and doing color presentations, those documents eventually Management System so you had better capture the color and print the color. everyone will get stored in a at Document nature of color in jobs, is it color text, color line art, colored pictures? Keenan: Well, on the Engineering side it is more highlighting areas of change. So if a drawing is changed, they might circle it or they might do the text in color that changed for that particular revision. On the Office document Butterfield: What is the area, that is really all areas, some is text and. Butterfield: The systems you have right now, they don't have the capability to do Keenan: Right now, no, but that is what we are getting asked for now. . . color right now? Butterfield: If you had the ability to capture color and provide that for your customers, how color rendition have to be. Would a highlighted region that was red have to match exacdy? Keenan: No. No. In the Engineering side, I would have to be more like-for-like color. Butterfield: What Keenan: For Butterfield: us are would say that would not be a real high accurate would the priority. In the Office side, it the most difficult kinds of image content to handle. it is kanji data. Any other types? Keenan: We've had quite a bit Butterfield: What difficulty with the manually drawn engineering drawings because of smear, drawing whether we take it in electronically or whether we scan it. of the best highest quality is a CAD portion of your Keenan: Right now, we are inputs do come getting probably and to you now electronically and how do you expect that to change? 80% electronically in the Engineering world. In the Office about 100% and I see in the Office world, I see world, 100% right now that we scan. And I see in the CAD world going to but I think that's a couple years away. Our ramping up and get to eventually all electronic submission, is going after the files to reduce facility space. Once we get those customers out, then they'll start each year toward more electronic data. moving their other documents to us electronically. So I see that ramping up Butterfield: Do you ever see a point when you would get rid of your scanning capability altogether? time. Totally? No I bet we'd get to 80% though. Overtime. But you know not for quite a Keenan: we'll start focus right now long Probably there's a lot of contract documents, different legal documents and job ticket type information that would stay hardcopy. Butterfield: The nature of the output that these systems provide. Right now it is in image form? It is not searchable text? Keenan: In the Engineering side, that is correct. In the Office side, we have optical character recognition, so have content retrieval, attribute retrieval. Excalibur has absolutely phenomenal retrieval ability. It retrieves synonyms. It is powerful. You want to look at it, go on the web under Excalibur. Butterfield: Excalibur is the retrieval side. 45 we Keenan: Exacdy. Butterfield: I think you said the output is via their electronic workstation. Are there print finishing operations here? Keenan: We do some printing here. It is getting less and less. The that we the system is that we configured way put print capability in the work areas so that hopefully they will just have to view it, but if they do need a print, they will be able to print it in their work area. type of thing. And many times only 1 1x17 in their get they work areas. Now in the Engineering full larger want size or Anything larger than side, we do a lot sizes than what of finishing here, they have in bid sets, that their work areas. They can that comes over here for finishing. And there is still a fair that. More than I thought we would have at this stage. They view that as very clerical. In the Office document side, we do get a fair amount of requests for finishing, particularly when its like large folders of information that they want. They do ask us to run those. We're trying to get them away from that. You know, if you create a Word document and you want a print of it, you send it to a printer. It is the same metaphor you would think would be used, but it is not. amount of Butterfield: Tell Keenan: We Museum. There process to be whole project. all used and put litde doing are all about the When level of preparation of input documents ? Do you ever scan that in this particular area, but another group that I have is kinds of cataloged and will point though. aperture me a are not books be and added historical documents that into an imaging system. we are There trying will be books? doing a project for the Xerox to preserve and that is in the a requirement for that. Another did this engineering project, preparation was the most costly, laborious, difficult part of the The lessons learned from that were phenomenal. Just take for example, the primary input was from we card, 35mm. Xerox had purchased a number of companies throughout the course of 30 to 40 years which different attributes, different microfilm standards, different drawing formats, and to pull those all together them into the DocuPlex system was a challenge at best [laughs]. Very difficult. It is also very difficult in the Office documents. Take as an example,. . . diverse. They have had some many different Resource] groups that have a set of Safety. Every time you set up a new document the customers are so types of documents with so many requirements. We have HR [Human attributes, then we have Legal, Environmental Health management collection for a new group, capturing and all of the attribute It is very labor intensive, very costly. Every time $5000 just to set up the scan is minimum. challenging. paper... Butterfield: You are talking information the way that they we put a new customer up that want to we are do it is going to scan about the metadata? Keenan: Yes. Primarily. Butterfield: For example? Keenan: In the HR department, it might me an individual name, their Social Security number, their employee those are a couple examples. number. In the Engineering world, it might be a drawing or part number, a revision. . Butterfield: I think that answers all of my questions. Thank [End of Interview] 46 you very much. . Patricia Pitkin Director Library Services Library Institute of Technology of Wallace Memorial Rochester Rochester, NY Butterfield: Could you tell me what kinds of things you are doing right now relative to a system like this [example]. Pitkin: Oh sure. We do. we capture documents and put them on the web. Students tend to do a lot of that in some public areas that we have, but what they're looking at is really more representational than on quality. We are also . doing . for an electronic review service that we have available where we capture faculty digitize that, put it up, make it available through a web interface off of our catalog and then students have access to it anytime, anywhere. And we are also doing some digital capture of images and making those images available and that's large an issue right now. Say a faculty member takes some photographs and they some capture of material members notes and want to integrate those into them, but we are also starting to digitize into study collections on our reserve environment so that students can refer back to those images, either through the web interface or they can print them out, generally they not high output though, and refer to them later. So that's the kind of environment that we're working with. We also do some transmission that is somewhat related with regard to distributing digital articles through document delivery them as well, so that photographs they be can a class, we will make slides of stored and either put do that through a scanner and a system developed by the Research Library Group called Ariel. And it is primary for digital transmission among libraries of documents to satisfy interlibrary loan requirements. Butterfield: How do you evaluate the quality of the systems you use? What kinds of things are you looking for? and we Pitkin: Actually, we are struggling with that issue right now. Particularly in the distribution of photographic images how we capture those and make them available. The way that we have been doing that is by capturing at least and image, three resolutions of the And then in resolution. faculty on those in the projector images being driving a However, conference familiar very the reaction was that of of the image would be satisfactory from their perspective and we were concerned that the resolution of they some Barco/Sharp fine, were projected the screen size image would not be back out, for their purposes. Now we are environment, not necessarily a quality reproduction environment, and so their this is better than slides. Which was surprising. This was not a large sampling, but I was at a last with if the quality we were teaching/learning kind reaction was museum transmitted through the campus network and projected out through the classroom to see pleasandy surprised, because adequate. thumbnail, a medium quality screen size image, and then at least one higher fourth level resolution. We have just recently done a focus group with a come cases, a week them, it is be something you want to follow up on, but MESL, I'm not sure if you're MESL. I do have a URL, you might want to check them out. They are a and this might of, a museum project, I think funded by NSF for the last two or three years, and they have done some user analysis on they have the same results spread over a variety of focus groups and it validated what we had seen here with our small sampling. That the faculty would initially say they want the highest resolution possible, however when it came down to an actual use, they seemed quite happy with what we had available which is not a very high this and resolution. The fellow who was Berkeley. And they took large. Butterfield: What are those doing focus the presentation was groups across six the resolutions that you are . . Howard Besser, participating and I believe he is universities, so the focus at Stanford or groups were pretty . the Amico standard. going to ask that and I don't remember them. We are following Amico is a project that we have just been selected to be part of, and they have laid out the four resolution standards. Pitkin: I knew you were I don't have that, but I can get that for you. Are you a member of AIIM? Because we are doing a presentation of this tomorrow. They are having a regional meeting here at RIT tomorrow. We'll be doing a presentation on what afternoon around noon. The person who could with digital collections. I believe our portion is in the we are doing tell you about is Milt Cofield. His number is 475-2751. What are the hardest kinds to capture. for that. We have really been taking flat images that are already there, been what level of image manipulation and and some of the images that we've been concerned about have of the image itself, how close does it represent enhancement do we have to do, for example, the quality of capture of capturing that. Do we try to enhance that to what it actually shows, and that is really a question of the process capture what either the film or the digital image just we or do make it close to what the image was supposed to be of image Butterfield: Difficult kinds of Pitkin: I don't think I have a good answer content. 47 That is one of the questions we haven't really resolved. And then the question manipulating and worrying about color balance and altering the color, you really run into the question of what you have done to the original, and what you've done to the capturing item. You've added yet another layer of interpretation that is sort of humanly subjective as opposed to mechanically interpreted, so [laughs] that's kind of a question that we've been wrestling with, and we've tried to stay away from altering images. Because it is only adding another layer of interpretation. That is the place that we're at at the moment. Butterfield: So the goal is reality then. Pitkin: Yes, I think that is probably true. Because some of the situations where we might get copyright clearance to captured and make that available. also becomes, if you take a copy of an comfortable with So hopefully, image that is in Do original object. start we try a book and to enhance that to being able digitize to add any value one of the things we are hoping We that slide. by doing that, but rather do, to at least, it levels are now multiple to get closer to the original or not? try only adding And I removed from the think that none of use another level of feel interpretation. that we minimize the amount that we digitize and rather, acquire images from sources that have already dealt with those issues so that we can be more of the transport, distribution, enough and organization method and sort of the filter, creator, because as opposed to the Butterfield: What kinds of color Pitkin: We don't have We have not really addressed that at all. It is more of: What reproduce. So maybe I don't understand your question. What we get is have to handle you Pitkin: Absolutely. And making Photo CD's description you have? is you see what you get. input? color we'll digitize that. The process that we have been using of taking slides from Photo CD's and dumping those over into digital files and them catalog going And most of our work has been in the areas of interface design and in the area of what we'll do is and then them and make them available. metadata requirements, if any, do any. what we Butterfield: Do and we are not close to the source to make those quality decisions. of content and appropriate searching, the issues of how do you search an image. We've done technical end of image capture. less investigation on the Butterfield: How would you arrange the priorities of the things you've just described, searching, quality, resolution, color... a really functional approach to the output so for us I think we would say if the color is good in some cases it may only be representational, that the color itself in a teaching environment, say, think of a enough, is as important as recognition of the image. I think that's more where we're coming from. And of Art class, Survey Pitkin: We take generally, so being able have the image, find the to us than does the quality really image, match the original. I and display the image think we have a lot of that is recognizable is more important latitude. Sound like low standards to me. Butterfield: Well, I need to understand what your real requirements are. I'm going to be asking you about quality, costs and speed. So you really don't have any quality problems right now. Pitkin: Well, part of that is that we don't have a large enough base. As we capture more images ourselves and create more smaller. digital be to going to . . collections ourselves, that will someone else's. available will or things Pitkin: I think functional, is more of a concern. distributing But I do believe so that the question of our approach quality will be is a do capture, my guess is the ratio of what we capture to what is be things that acquire, digitize, and make available. problems with things like small fonts or moire or not capturing shadow And then those that you anticipate become someone else probably be less than 10% Butterfield: Would detail images that acquire more we would having like that? you are talking and the aesthetic Butterfield: The to the wrong person about that. As I said, I think my orientation is more the I really look to the faculty to drive that as the user of the technology. more. detad is . . nature of the output of these systems that we are talking about. Could you describe them in greater depth? classroom or on screen viewing. going to be display and projection, for for students who want to take away a color for requirement output, some be There may be some requirement may will want to produce physical study guides whether the be remains to I don't it faculty physical output. seen, know, Pitkin: Probably the primary output is from the looking collections. that curriculum, and fair I don't think at these as a vehicle use of to worry about for we know that. And I think publication in the sense of a developing a text book that would use the images that we have. So it, I don't see unless that as one of the of course one of the questions will be, we're not faculty member developing a curriculum and then from these images because one of the huge issues is the copyright they are out of copyright, natural outcomes of 48 they are old enough that we don't have the project, and primarily because of that. Because getting the buying we were from. But clearance for the images that may be available will be difficult. Now it would be less difficult if using just collections, because we would have one organizing unit that we could get copyright collections and in the image arena, often the source would be widely distributed. huge problem for Now us. with color output of have done we images to a text some of Sometimes the images come as individuals you don't some work, now form for even have know or very small segments, so holder is. So that is a who the copyright you talked to Dave Panko. He has done He creation of exhibition catalogs. would be more work a good source in terms of quality side. And he'll have a much higher sense of what the output requirements would be because he is looking for print. And our primary output is looking towards display and projection. Butterfield: How about accuracy. Is that something you look for in terms of these systems. the Pitkin: Again, that we have the not not the original least close to the little bit, a have. It is the accuracy of that in the image arena. It is not likely making a digital representation of. It may be that, the source itself is facsimile of it. So the question becomes, are you trying to replicate to the perspective that we source, but only a facsimile. How at from the source next to us that we are we've facsimile? And done some we digitization haven't really lot worried a about that. of a poster collection, when in fact we I should probably describe do have the source material, the quality should match, and we do have the original. But we have not really for that, I think it has been subjective to the person who was. the image was and we are more concerned then with put together a set of principles captured, the image was put lot of energy into that. Butterfield: These systems, . up on the and the check to screen, . the original was cursory at best. So we have not put a required? Could what level of you tell me automation, about that and what level of operator participation, skill, preparation that is the requirements for those things? something Pitkin: Right now, the digitization process is handled through our Media Resource Center. also be a good person to talk to. They would be more concerned with the quality and more reproduction process They would probably concerned about the issues. But right now, the process goes from film to digitized and we are trying to move into a digital skill level of the people who do the capturing varies from students just setting up a from the beginning. The scanning a series of slides that have been taken and then there will be some quality connection to location of the image or perhaps a litde work on the color. But again, that is pretty subjective and based on the person who is doing it. The process for that is, as I understand it, pretty manual and not very automated. We were slide scanner and very happy when we Butterfield: Is that a got a slide scanner. collections than a digitizing Butterfield: You for concern Pitkin: Absolutely. It is huge you? concern. And again, it is a concern and it is why we would be better off acquiring them ourselves. said you are trying to move into the digital from the beginning instead of the film to digital. Why is that? It is the capital cost of that is something that is being looked at right now and I think the is getting to be a little more affordable. The capital equipment side was a litde bit more than digital cameras, digital scanners. And this did not have a real high we were willing to absorb initially. The cost. whole process. this last until this year, priority process and the effect on the input. Do you need to maintain the input in their Butterfield: How about the Pitkin: It is really the price of cost. equipment . . . . scanning form? Are original you able to do that with the processes you are using now? Pitkin: I believe right now, keeping the slides. The goal is to move away from that the slide, but it is captured digitally. But right now we are keeping the slides. we are Butterfield: Some folks Pitkin: about [Coughs] How who are painful scanning books actually Pitkin: we off not keep books. don't do that. We haven't been that serious you even consider that? Only if we had duplicate copies Butterfield: You told me something Pitkin: A full digital system? I think of the book. about costs. points as least binding No, do it. Butterfield: Would we've slice the that is! Don't say that to David. so that we it relates to cost, had a two we are looking drop at you expect probably to pay for a system like this? $50,000 system. One of the other about a key The timing is right for us to explore this because Ethernet to all the classrooms. I think there is at we are right now at a convergence point. a network upgrade on campus which Ethernet What do has been to each classroom. It is bringing fairly high speed Ethernet, it is 10 meg to the wall. In some high speeds which means we can start to think about situations, it is 100 megabit to the wall, so we are getting bigger bandwidth applications to the classroom, and also, get can pushing larger packets through that network so we 49 that campus upgrade is tiieir environment, bringing access to Ethernet to the die images change, and that will be completed classrooms, so the image, it is all the Less value. learning We infrastructure faculty now have the opportunity to use, in big the end of this year. The other factor here is we've been upgrading all the talk about a capital. when you offices, so the going, again, to be more bandwiddi intensive. That is one at .. be least in our environment, it is not just creating a digital process for that capability in our primary business, and without the in the network, and without an upgrade in the classrooms, without classrooms that have projection, it has upgrade less by faculty which are able to use to be accepted and adopted because those are obstacles. As well as use of these in distance likely applications. support to We have developed a couple of different image collections that support distance learning. using these collections as a base for classes that are being taught at a distance, so that is really our focus. And we have actually probably five or six different image collections for people to use. There is a poster collection which is integrated into our catalog so that if you are searching you can get the image display and the description of are it. We have two design have the integration integrated archive projects which stand alone image based collections; these are all web of some of the material that we are within the catalog. David in Cary has some capturing now, some of the locally image representation within his site produced based. We images, of manuscripts. again We have some representations of the artwork that was student produced available off of our web site so that people can use Those that as part of their resumes. Butterfield: You talked optical character recognition, Pitkin: Yes. All Most all of our is captured text material. A at So we have about 15% documents into in text and put common reader at application. is image collections, have of what you talked about printed text OCR making it searchable and we OCR part. The processing time the doing We did look for the translating of our reserve material baseline format for we are not are the ones that come to mind right now. about searching. opted is done any you electronically searchable form? in the Adobe Capture and use PDF as the an Relatively uniform, out there. away from that. So on that was really. . . we are universal access. taking just page images and willing to accept 50 faculty using was not what we were of our reserves are now electronic. We have about just text, but text is a big part of it. Butterfield: What might have tipped the balance on the decision to save images versus saving searchable text? Pitkin: It is ready the processing time. That was a decision we made about a year and a half ago and one was the And that electronic reserves. quality of capture on the reliable, and the uses more than OCR. It processing was was not as reliable, at least I remember horrendous. This is a fairly quick having a discussion about turnaround environment for us. it, it wasn't that One of the technical folks thought there was a way to do textual searching without the overhead, and we didn't explore that. But in the latest version of Acrobat Suite, there was the capability of doing that without overhead but I don't know the details of that. Butterfield: You said the hundred times too processing was horrendous. How bad was it? Two times, an order of magnitude, a slow? Pitkin: As I recall, it added, my impression, was that it was at least four or five times more than just capturing it. And that was far more than we could absorb. The threshold was significantly more, and I think we also had some it avadable in a form other than concerns in the preservation of the format itself of the original document in making original. And significant our requirement getting it with the is a 24 hour turnaround time increase. We've had something about the transfer. The time, and the delay time caused some problems conversion of was problem for for material that is presented some problems with output of with PDF to PostScript to the printer pretty severe. have answered most timing out. get PDF to it to to us. That could be a shared printer. There output on the printer adds a So for large documents there delay was a users that was of my questions. Are there any other thoughts you'd like to Pitkin: No, I think I've answered all I can. This is really exciting. I really enjoy working with this. Butterfield: Well, I ready appreciate you taking the time. Pitkin: Good. Really not a problem. Butterfield: I think you [End of Interview] 50 share? Rodney Perry Associate Director for Central Rochester Public Library Library Services Rochester, NY April 27, 1998 Butterfield: This library, the Rochester Public Library, is doing some scanning. Can you tell me about it? Perry: We have done a project where we took some of our clipping files. We use a lot of newspaper clippings, those, doing until . this point where we now have a grant to sending them Butterfield: What other kinds, photographs and Perry: We are starting themselves to this. in We don't. But Chapter are We are AIIM on other because it easy, we had good collaborating partners, which City's [Rochester, NY] archives. I think that photos lend seemed with the of the an some fairly significant collection resources, the material that I put the Custer letter and the Lincoln letter are 4/23/98], field, require a manuscripts, of postcards. the second decade of the I was digital treatment some interesting looking at some 20th of in the package [in fairly remarkable rather than a things there of those East Avenue down century East. It is a postcard, the intersection of Main and . for scanning photographs, or electronicady capturing getting in the production business. clippings, photographs, that need to be captured? collaborating have we also think, from my knowledge image treatment. So we have teens, some way. you mentioned that I nineteen develop a plan Everybody understands a photograph. They add meaning and viewpoint and so forth that words of around with photographs satisfied grant conditions. a scan to set up an index. That sort of fell apart because of the amount of time it took and we weren't ready it as a separate project that fit into the normal workflow, so it sort of fell down of its own weight. So that's. try and presentation in a way to Western examples of manuscripts text treatment, by that I mean putting out there. Also have day. Scenes probably in the worth the other by where they are now building that Law have many of those, and that is probably worth digitizing. Another significant area is our clipping files. We have about 60 file cabinets full of clippings on ad sorts of topics of multiple interest and they serve probably well over half of the reference function in the local history building, division. And it is just would of like to digitize a series of and file folders over think partly, they are value. image, the who access to be indexed information. So Photographs between visual Photographic downtown, to make sure that the important words are clear and that the OCR program has read them by key word. So that is another resource that I think would lend itself to digitizing. digitizing is a route to access, for example, this clipping file, they are not interesting themselves, but can they correcdy, mostiy organized around topic, and I know this field better than I ted me that you can do, on top text file in some way, and as long as you spend enough time of old newspaper clippings, distribute those. People the scanning, do OCR software which turns it into a going and we they have information and postcards, they're information and textual just interesting out on the web. I think the textual not quite sure what for information. And I'm and manuscripts are sort of gee whiz information. I'm value. kinds at of things. here, but I they another So I show. not sure which information, like I'm getting Photographs have what is guess more kind of I'd draw information a distinction important in the Gosh, here is this picture I long run. library of the central clippings, probably have a larger proportion of think text has a different information value than a digital image. Butterfield: You've described several that are important. How do you know know that you had different input types, a good scan from perhaps we could talk about the a good scan versus a bad scan. quality characteristics What would you look for to a good scan? I'd probably look at Ught and dark. Is it accurate from that point of view. A woman who is our local expert on scanning, Sue Shippey, would be a good one to talk to. She showed me the Associated Press photo file I had her look up Widie Mays, found a picture of product, it gives access to pictures which are AP photographs. Perry: I guess Willie Mays. This is the famous picture of Mays catching the ball over his head and kind of saving the game in the World Series in 1954. And that is good enough to bring back memories and bring back the event. I'm not sure it is important that it is a crisp and clear photo. There is a way to order the actual photograph, so for library purposes, You've got Wdlie Mays, 1954 catch. I think libraries, and something like that printed out is sort of good enough. run-of-the-mid for the uses, don't need a lot of definition in the picture. We libraries I'm speaking of Public mainly, have a number of users who turn this material we get from the library into other products, and probably Sue is the how the user uses the output. Our thinking at this point about our imaging project is that we will use we don't really have thumbnails and then we'll have second images that are bigger and of some better quality, but final not the product it is first it is line, because we are. major issues with what we are one to speak to quality doing .. 51 Butterfield: What do you mean by that first line, not final product? Perry: Well, you are showing the picture Willie Mays. There is a system, if you want to order it you can order it. If you want to publish it you can publish it. If you are looking for a better image of photographic quality, this photo is on books on the subject you can go further with it. So this is first line of several usages of This requirements. standards are would be demand to business. reproduction for like. Here's a picture of the The Mauritania. You input type second In shows what varying quality happened. Solthinkour out of photo reproduction a couple of examples that of computer They look something like see. business come and they'll get some Essentially it is a catalog. cartoon need you I've put business. in my . . Film-on-paper handout, my printouts, and they've lost something, but certainly characters, it is really the situation. This is can see what an old don't have tight definition, but I'm not sure you level of quality that isn't always appreciated. Butterfield: Let's It demanding. If fact, with our collaborating partners, the City Archives and Photo the photograph to be a substitute for the real thing. They hope that this information will presentation, these happen to be photocopies recognize the people. a tenth graders report. not that probably Lab, they really don't want create people's good enough tug boat looks like. So it. I don't think described you need you it looked what You you get a sense. to spend a lot of money creating a were the manuscripts and you showed us examples of the Custer and Lincoln manuscripts on Friday. What are the quality characteristics you want to capture there. Perry: I think there, you want to get the color of the paper, it is off white, maybe it is brown, maybe it is folded and worn at the edges. The ink was maybe pale purple or something. So there, I'm speaking about a higher standard than the Widie Mays photograph, because you are really recreating the item electronically. You're not going to for example. With a photographic database, you can always send for a produce a photograph on request conventional photograph, but the recreation it is straight is you can't get a conventional manuscript of the one of the object black and white. But itself and a letter from George does it look like. With this photocopy, what digital image should capture all of electronically stands for itself, [unintelligible] Butterfield: The clippings, you talked about capturing the text, but for Custer, so I think don't ready achieve it; the stains, characteristics because there the you object manuscripts you didn't. That wasn't an oversight Perry: Yes. If you a piece of information, incidental. In the about story can't read But that's the way it is. A clipping is in itself and the information, in a sense, in a to have the OCR program turn the word "management George Custer's writing, that is not an object in itself. A manuscript is your problem. an object reorganization" clippings you need Kodak other simdar word. have it turn and So out you want that "management to be user "mammography" organization" in the index friendly in an OCR program and not or some don't have to so you spend a lot of staff time going through the clippings and verifying that the OCR worked. This is not every word, but it is the important words that you are going to organize and classify by. I guess in the clipping file, we begin to meld the image system with the text recognition systems. I don't really understand it very well.. Butterfield: I'm only really interested in the. The technology piece is a response, what I'm really trying to understand is the requirements, and I think you are giving me great information on that. You said merging those . . two; is that because newspapers have both text and pictures? Perry: Well, yes, I hadn't even thought of that Yes, sometimes a newspaper story will have a photography with it My comment had to do with, I think, here is where I don't know technically, I think capture is a scanning/image technology and the sorting the text is a text technology. That is my simple view of it. It begins to take the shift. We capture the image then turn it into the equivalent of a text file. So I don't understand the technology and why you can't pretend it is file [unintelligible]. a text in clippings, what would you like to see. here yesterday working, and I try to stay away from reference questions, but I was helping a guy find. he was writing. he was a new English speaker. he had to do a report on Greenpeace. He got some should be able to put into a computer sources, some internet sources. But taking him as an example, he "Greenpeace" sources and also access to local information news internet and articles and get books and magazine Butterfield: As Perry: I a user of a system that could scan was . . . . . . locally. This is the global picture, the European clipping file about activities of Greenpeace on Lake Ontario chasing freighters that were their boats of one have and American picture, picture, the locally they would be about. Sometimes that would show up in a it what that's it but the That's not oil. theory, case, dumping newspaper but newspapers are not particularly well indexed. That is one of the reasons we do all this out of our newspaper index, clipping go of local to a separate, papers. Probably the Science division either automated or manual file would clip that Greenpeace of newspaper clippings, 52 story. newspaper Users should not clipping information have to ought to show equivalent status of a up the results of searches. Ontario had a photograph Greenpeace motor files and different book catalog Taking boat, or a periodical in the Right now, this is that would show up too. resources to index. So it was just Amalgamating part of the search. theory a litde further, if that Greenpeace chasing a freighter dumping oil in Lake City Archives or City Photo Lab was over in Charlotte taking pictures of the the look But libraries at things. would fantasy land, all but like to integrate the we send people to different results of searches of images files. Butterfield: You talked You about the qualities. said that some of the inputs you described For were colored. example, the postcard of East and Main. The manuscripts had yellow paper and purple ink. What color reproduction. . . how Perry: Well the idea does that good model I bring have to be. color to bear is. . . we have It does a color photo copier. a pretty good job of giving you an the original colors were. But it is only pretty good. And I think we'd probably look for better standards than a photocopier because I think Robin's Egg blue is a lot different than tight blue, and things look of what pretty ugly pretty fast if the color is not quite right so I think the accuracy of color reproduction is important. And this accuracy may be required on the terminal. Printing it out is another level, [unintelligible] But I think you could draw a a distinction between lesser standard of particularly accurate color representation on accuracy than Library style accurate. printers that are capable of matching colors. Butterfield: I think free, you could at getting film the terminal and actual printing it out. You could accept the terminal. The reason I say this is that color copiers are not copies could, I'm sure, copy it good color reproductions, you made some assumptions about much high quality better. And I think the expense of that could be faithful to color printers the costs and capabilities of technologies there. If it was ad color rendition on the terminal and color rendition on the print, is there stdl a difference in importance of what you see on the screen and what you see on the print? Perry: No, that was a pragmatic. Butterfield: What kinds of inputs do Achides' heels of systems you think are most difficult? Where have you seen poor quality? What are the like this? Perry: Frankly, in terms of the image, I'm so amazed that it happens, that the images themselves don't really. Nothing strikes me. I think the important thing from the access point of view is describing the content so you can find it. In other words, this Widie Mays catch, you had to find this through Wdlie Mays, and I didn't search through . World Series. There searches to get you are other ways it might be This descriptive field. How searched. to this. I don't know whether this is done well or receptive poorly because I don't is this to understand . web it particularly, but I'm not familiar with the areas of difficulty in terms of capturing. Butterfield: Do you ever envision having to scan documents that are so faded that even the originals are difficult to see? Do colors or you have requirements for capturing tiny fonts or extremely detailed line drawings? Exotic fluorescent of those sorts of things. any heels. I think maybe Perry: As a public library, probably not. Let me back up to the question of newspapers are heels, I'm not sure. Because of their size and their condition. Those that you want to scan of doing it poorly. The to capture before it disintegrates, are in really difficult shape. That is not ready an issue Achdles' Achides' issue is how do you do it and get the size, [unintelligible] handle. quality newspapers are maybe difficult to quite a few of to digitize is maps. They carry some have we on that mention to I forgot Perry: The other thing early also very heavily used. What was the Town of of same problems as photographs and newspapers. They are Butterfield: So large, poor the Webster like in 1920? [unintelligible] And we have quite a few maps that could away from that because of the size mainly, [unintelligible] s a new kind of input Are the quality requirements Butterfield: So that' of maps be digitized, but we are different from the other staying types of inputs? show up or a topographic map, the Perry: The detati in the map is important. A cross that marks a cemetery must would have a high degree of a information, of map of capturing terrain lines, so I'd say in terms accuracy requirement. I'm not sure color per se, but detail, fine print. quality. What about the costs of Butterfield: If I could, before I go, see some of these. We've talked all about systems like this. What are the cost concerns. other programs. So I think many of these Perry: It is hard to do a lot of in the digitizing area and take money from an issue except to find out of much that are externally funded. As developmental projects, cost is not [unintedigible]. Then, I think as you understand ongoing costs, 53 you can at least make choices of deciding whether to pursue, to proceed in a dramatic way with an extensive effort to get new money or transition what you are doing now. An example, probably the clearest example is the clippings. If we find that we can effectively get access to the clippings, that it does not cost too much to do that, then instead of clipping, we scan the clipping and go directly to electronic. I really can't comment on the degree of cost, as developmental projects you spend what you get and do It is hard to not as much as you can. the print book resources in order to scan postcards or convert microfilm. by I think they tend to be externally funded. To some degree, you build them into your routine, Butterfield: Related to that what about the level of automation to do this sort of thing. How intervention, ends - would be acceptable. of production? Butterfield: In terms Tape intervention operator Perry: In terms [unintelligible] much user of scanning lot a manual notes (not not suitable. Photographs of documents. verbatim) from this point forward. Perry: Automation System production mode. be runs wid be itself. If we individuady handled. Depends on what photographs and were to it is and be doing be scanned from resulting negative. Allows for a have odd sizes. Manuscripts have to the clippings, we how many we have to do. Butterfield: What is to be captured: volume of stuff Perry: We know we that because have 25,000 it is photographs. With city archives and photo labs have about 375,000. We won't do all of interesting. not ad Manuscripts, we have some, but it is not a major part of our collection. Clippings, we have 60 filing cabinets, 200 file drawers. I couldn't guess how many actual items there are, hundreds of thousands certainly. The postcards, is in the several thousands. Not Maps, we have, let's say, 1500. Picture file books, would not calendars. dramatic resource be scanned because sure of the quantity. of copyright issues. It contains Magazine clippings, clippings from Quality varies, color in some, some line drawings, wide variety of uses. To me, that is the most arranged by topic. Copyright process is such an obstacle that we would not be involved for sometime or ever. Butterfield: How important is speed? Perry: Speed Speed of capture of captured is important. Project basis, clipping is pretty important. it in text searchable form. speed Probably is important to need to scan a understand clipping in 30 how much it costs in the future. seconds to scan and know you've Butterfield: How long do source, for Perry: you expect to example for Don't know how long be in the business of scanning? Wdl you ever get electronic input directly from the current newspapers? be in the business of capturing documents electronicady. Marketplace has not made information clear. In the print world, at least you own it in a printed form. In the electronic is the right to look at it and leave. Not clear what electronic resource is. CD-ROM's wear we'll durability of electronic world, all you're buying out. Not clear yet what to do. Not making decisions about this for now. Butterfield: Thank you Perry gives for this interview. Butterfield a tour of pertinent areas of library including postcard collection. [End of Interview] 54 Picture file, newspaper clipping files, maps, Robert Gerlach Sales and Marketing Holsons Technologies Rochester, NY May 1, 1998 Butterfield: I about a system that printing, CD-ROM means of Electronic Image Management Systems and as I do with all my looks roughly like this [diagram] scanner, computer, storage, and some some sort of online access. Is this the sort of system you are using? want to ask you some questions about interviews, I'm talking Gerlach: Yes. writer or do mainly the scanning Butterfield: OK. That's the piece I'm focusing on. Actually we Gerlach: There Then you go. Butterfield: Could in the we're little right conversion part of it location. do here and the kinds of systems you are using? Gerlach: What Holsons does. Our primary purpose is a document conversion center. We're going to take s hardcopy paper and convert it over to electronic image and ship it out to what their system is going to you tell me a about the sorts of work you anybody' handle. That could grown system. be IBM image It doesn't matter If they don't have The giving image. them in, a system, we go OpenText Live Links package. plus. or Excalibur be other one would back their Butterfield: What hardcopy sort of It what the could have, be a file It net system. we are able could be a simple we work with a couple manufacturers of electronic and to go ahead and Alchemy, it is CD Home writer system. to export images out to that system one way or another. implement those systems a straight conversion shot. imaging for them to That just take in systems, such as give them a complete their documentation, paper, and back on a CD and the CD would contain the text the software and an scanning systems are you using. Gerlach: The bread and butter we using the Kodak 900 series scanners. And then we go through Holsons design workflow process. Most of their stuff they try to automate with machinery versus human beings, so they've implemented bar code, sort of OCR their own OCR package process to basically enter electronically and use minimal labor to make the error corrections off the Butterfield: The bar coding you are talking about OCR. is for entering metadata? Gerlach: Yes, metadata, extracting, if it is a form, certain fields, key fields. Butterfield: How do you evaluate the quality of the scanning systems that you use? Gerlach: Hopefully, we evaluate our scanning is going to be as good as anybody else's. Butterfield: What are the attributes of Gerlach: Attributes back at an image And that is because nice. of quality that we quality that you look at? look at is basically, despeckle, deskew, decrop, on your screen, that you are part of the process that we handle you can utilize your scanner at Worst get a clear case image scenario, you try to off of it. When Butterfield: What Gerlach: Quality and enhance do that it to slows going to get work and effort platforms go at image if you can read versus you know, 200 dpi, the have the look when you something you can't read. reason being that right process will come out 300 dpi, that is because the images are so bad that you need it down to its knees, and the production value, as far as it quality problems that you might have now? if it is going to be forms or like a HCFA form ready based on the document itself, ve have triplicates of carbon paper outs and you don't want the red dropouts, or are some of those problems are you' theory holds, Garbage In, Garbage Out. So still and what we everything that people perceive you are, if they need it and that is the key point. Do it? Butterfield: You said HTFAF form? Gerlach: A HTFAF form is a medical you really try to do need that you are it and does take some information and how do is enhance clear you need insurance company. to to get the job done. that going to have the red drop and people think you are going to be able to see that. The not it. Scan speed and also the bring it down you long takes us twice or three times as full going to see something you I claim form that most insurer's use for the doctoral file form to hand to the could show you one. Butterfield: The quality piece of that is the color dropout. for a long time. It all Gerlach: The color dropout It is a standard form. It has been in the insurance industry want you to extract the data, just people Some OCR. system or how people want to do depends on what imaging 55 they don't care about the form itself. Some other people want to see the plus, red is our most difficult color to handle in the document imaging form, so it makes it a litde interesting. And world. Butterfield: You talked about color dropout. Do you have any requirements to be able to capture color, as color? Gerlach: We don't like to do it. It is not our core. Our core is pretty much black and white documentation. If to incorporate color such as photographs or color flow charts, we can do people need We try to separate scanner. avoid it it, and for that we use a at all costs. Butterfield: You try to avoid it. Gerlach: It is just not our core. We are in the large document conversion world which means we get a million We're going to take those midion pieces of paper and put them in electronic form. And to do that to do it as fast as humanly possible. More people are in the graphic world where you need a photograph pieces of paper. you want that is a specialized Butterfield: Do item and a slower process and the cost to you think the work Gerlach: I think the work equipment you need the cropping. In the business is the information back is there high as and I think color do that is much greater. input? people that specialize in that area, die expertise, of documentation, they don't really care resolution scanner, you need world fast as Butterfield: Are there image is there for you need the specialized you are going to do a lot of fine tuning, fine detail, what they want to get about the possible. in documents that are especially tough, tiny fonts. Gerlach: Well, the smaller the fonts, yeah. We always like to see laser printing inputs because that is the easiest especially in OCR applications. The harder ones are actuady dot matrix world and dot matrix carbon copy papers, just because you lose elements generations . . [unintelligible] Backfile conversions, can you tell me what that is? Gerlach: Backfile conversion is simple. What happens is before Butterfield: electronic conversion, before computers were around, paper was your normal mode of document and recording information. Now you have two media, you have files like Word or WordPerfect like now, you're typing into your computer, that's an electronic file. your talking, like Xerox that has a long history. They keep all their records in hardcopy paper. That why this budding is here. What they want to do sometimes is take that history in hardcopy format and convert it into electronic format. That is what they call a backfile conversion. It is any old documentation that they want to bring into the electronic arena that hasn't been in the electronic arena. Butterfield: The backfile is paper archive and the conversion is taking that paper to electronic form. electronic What happens is that Gerlach: Right. That is pretty much what we try to specialize in. There is also what they call Day 1 forward want to process, not everything is electronic of course, people have paper coming in and that is what they incorporate. Butterfield: The kinds of output you get from the systems you are using. For the most part it is image output or do you sometimes produce searchable text? Gerlach: It is mosdy, we're going to output a TIFF image, and that TIFF image, it all depends on how you want to it. If went through the OCR process or what happens is there is really two types of retrieval. There is an index scenario where it could take a document. We have to give that document a name or a retrieval name, name, we scan it and I and ID number to associate with the paper, or you use an OCR field, where Social retrieve Security number, hoc imaging, what I am looking for, but if I want to see everything with Holsons to give me a hit on any Technologies, I'm going to key into my database Holsons Technologies and it is going document that contains Holsons Technologies. Butterfield: Do you provide any sort of printed output to your customers or is it mostly electronic? of paper, not creating it. to avoid printed output because we are in the business of getting rid Gerlach: We don't know if we do ad try Butterfield: So your output basicady becomes an on-line archive someplace. Usually what we do is give back to the customer the hardcopy Gerlach: Yes. We're going to give them back CD with the CD and they can go ahead and import it into their own imaging and electronic format is usuady a whatever it is going to be? RAID or or optical on disk, system. And maybe store it to. . . but you don't Butterfield: Holsons basically acts as a service provider, taking in paper and providing electronic, host the electronic archive. but only if they request Gerlach: We don't host it no. We may keep a copy for their backup security requirement, it. We try to pass it through. Butterfield: How do you evaluate the accuracy of the accuracy, there is two things. There is an accurate. systems Accurate 56 that you use. means if you We go through a process. As far give us a midion pieces of paper in as electronic format. To do that, it is, it is just, you are we go Actually, Holsons has designed a system called MIC/VIC and what do data entry, they do double key. That is the only way to verify if [unintelligible] I'm going to ask someone else them to match it up. If we through. . . sort of a way, when most people going to have a correct image or a get a midion pieces of paper, we're going to run them through a counter, we're going to run them through a scanner, then we're going to run them through a counter again. If all three things match up, then we know we're correct we've got a mdlion pieces of paper. If one of those don't match up, then we go through the process again until it If not then matches up. call accuracy. The to make sure all the a lot of checks and we through a other part O's are human is OCR is very ad the determine the OCR. you you throw software at with an accuracy, which means A's. To do that, s are Gerlach: Well the OCR engine, that is overriding have them verify the and on count. That is one of the parts that we that if I scan in a piece of paper, I want what we use is our OCR process and we go through balances. Butterfield: How do search engines ad big A' O's, it at it and the looking what software, at the same all letter . . OCR is ad about. depends OCR on search stands for Optical Character Recognition, and try to, if you have three engines, is going to go and try to determine what it is, if it is and a zero or an O. If it doesn't get an it is going to kick it back and flag it that image or that letter. And you're going to go and verify what they perceive the images would be. So they will give you, "I think it's an O, I'm not sure it's vote then the paper or O", but it could be an A it could be the content of the word and to the going to be. Butterfield: The software Gerlach: Yes. And flags makes the [unintedigible] You've got a zero. suspect letters going to go and try to fill it in. And usuady based on right of it you should know exacdy what the letter is you are and the and a in. So OCR is key OCR And letters to the left a human actually looks at that forms and that goes into the output and says HTFAF' s is this or that. because that's the using OCR and what it is, it is going to take this piece of Well, it means on 100 letters, then I've got five missing. I A. There is a wide variety of ways that they do it. They break it down just to that and ICR. Most people are paper, I'd scan it through and I'd get 95% accuracy. don't know whether letter. Some my people question mark is an O or an break it down to letter. And then a word and they'll other ones will show you and you go get different flag flag that letter. This O would OK it is O it is some of them show underneath, be in yellow, say, and that is 1 it is a P or whatever. The a the paper and double check that it is going to be an O. So that's the And that usuady makes a big difference in the don't have to because it is time consuming. processes. to the paper if you an costs of how people can do it. You don't go and ICR? Gerlach: ICR is just Inteldgent Character Recognition. The Butterfield: And the OCR nobody has perfected that because everybody's cursive world and with cursive all the letters handwriting are recognition is going to be handwriting, and right now is totally different. Also handwriting usually goes into joined together so you can't see a true separation. So someone ICR package, they'll be very rich. invents Butterfield: What are you looking for in terms of the level of automation in the systems you are using? Gerlach: Really depends on what the customer wants. Most backfile conversions, the reason why the are in a large don't need that information in ten seconds or less. For them to get a simple, warehouse or warehouses is that a true total they put it in, retrieve mission-critical it in 30 seconds or documentation and less I and I can need to find find it it in by filing 100 pieces of paper. Some people, it is To them, they are going to spend the extra through nano-seconds. of indexing that document such as name, social security number, money and either OCR it have multiple ways application is document or request for whatever it is going to be, is of kind what document does this what contain, matter of cost to get it in the system, this a PO, invoice, you name it and specify exactly what it is. For them, it is a information back. want to spend depends on how bad they need that and how much they Butterfield: What about the level of automation of Gerlach: The input It process such as automation? OCR comes or what we call scan scanning these documents. into ready the fix, which speed of your scanner and speed of the software to go is despeckle, deskew, decrop. . Butterfield: What is decrop? sometimes your image is cockeyed. Crop is going to be. going to happen. Deskew if scan field of an 8 Vi by out When you scan in, you are going to have a to get cropped out So 1 1 field. What happens if you put an 8 Vi by 14 piece of paper into it, the bottom is going s. of 8 Vi by 1 1 s and mix a have if you ve got to make sure of these parameters. So what you want to do is black band there see a to are you going what will take place is you set your crop for an 8 Vi by 14. For 8 Vi by 11, what they call decropping. We're that will take up a lot of memory for people, so what you want to do is that's Gerlach: Crop is what is people want a part of the . . document blocked ' you' 57 14' going to take that black border out of there and then the 8 Vi by 1 1 and make the image clean and make the image its not taking as much memory on the CD or on the hard drive. So that's what they mean by decropping. Butterfield: Do you routinely drop documents into a hopper tike you'd do on a copier and have them so run through automatically? Gerlach: We usually have going to a person watching. put a piece of paper on and it is going to of paper and 50,000 times, it has say I take one after another. skewed or crinkled and operator about involved in 10,000 The scanning world is different than the copying world, like I'm copies? The copying world is reams of nicely generated stacks 500 We're on the other end. And after that paper has been around everything else. We're going to go stuff it back through a watching it because there are what we call document jams or things do get you want to watch that. So there's always an you have to make sure. crumpled corners and wrinkles and going to have people folded over and scanner and we're want handling pieces of paper. Butterfield: And the . output of that process. Gerlach: Right That's just. . . documentation. We have the capability And that's, for most people, fast. . Then it despeckle, deskew, decrop, OCR, . . of one scanner in an eight That's for image capture, not OCR'ing. Then you go through the goes through other phases. and that's basically the stages. In fact, over there in the hour shift doing electronic process of other room I can show you the stages. Butterfield: And just I so right get the numbers that is for what resolution scan? Gerlach: 200 dpi Butterfield: Binary output? Gerlach: Yes. Butterfield: Do come you have an issue with having Do bound documents to preserve the original form of the document? to you? Is that a problem? Gerlach: Bound documents, there are scanners if they need them bound and can't break the binding there are production time, so it just comes down to an individual basically what you call flat bed scanners that just slows our what or scan we'll hand do a I can't break it so they call a book scanner. Personal preference. You just case of hey, through an extra layer on and Butterfield: What kinds Gerlach: As far it do you associate with as costs associated with. Butterfield: What It is up to means more cost to the person. of costs . . are the equipment costs? Do we Or do have to you them which way they want to do it. the input system? it put pay for ad in capital and equipment equipment errr? up front? Are there click charges or capital costs? scanner world, it all depends. Most Gerlach: Well, it is just like a copy machine, it is actually a scanner and in the scanner. The one we have which out the lease to are probably going people, if they are a growing company, they conversion places are going to do one of document Most $100,000. about costs paper a of pieces can do'l0,000 day which are slower and throw more people at them. Or you are two things, they are going to buy lower end scanners for for bigger prices and throw less people at them to get the job done. So it is really going to buy higher scanners software some PC's and our be scanners, to equipment our overhead, it is going to be people and its going programmers to get the job done. some and technology you can share publicly? Butterfield: The volume of scanning in this shop, is that something out about 30,000 pieces of paper a day. are we right pumping sites probably two now, Gerlach: Volume, with our It flows. ebbs and Butterfield- If accuracy Gerlach- we are , quadty issues. I talked to I talked to you about you could improve any Throughput which photographic of course. just leads to image lot faster. If and you about your requirements . . , for ^ throughput. , _ And for you improve. one of those elements, which things would The more more profits. versus a microfilm documents Most we can put through people came image, from the and right now, the world when the speeds of those scanner come up to in world of is, a shorter penod of microfilming you time, which is definitely can the better off taking a capture information a meet those of the microfilm world, then a lot of lower cost for the customer. things will happen. Better profit margins and actuady could be better or site specific image quality. customers tell you that they wish the quality your of Any price. We have jobs that come out to $10 per in varies Gerlach: It depends on the application and basically it really mirror everything else. duplication of what's coming back, mirror the fonts, page because they expect perfect exact me close to an care about is get me an image. Find Butterfield- Then we image have and people we do it for I'm happy. It depends six cents a page. on what they All they We're want. 58 doing what their requirements are. Butterfield: I think you answered most of my questions. showing me around your Gerlach: Yes. Sure. scanner. Butterfield: Thank You've been really helpful. you. . If you've . [End of Interview] 59 got just another minute and wouldn't mind Anne R. Kenney Associate Director for the Department Allen Quirk of Preservation Scanning Technician University Cornell Ithaca, NY May 1, 1998 Butterfield: Could Kenney: In terms you tell me about what sort of systems are image capture, of We have moved to can meet them, we do outsourcing a good chunk of our don't care what kind of system they we capturing beyond bitonal and looking at requirements for halftone reproduction. Butterfield: Capture beyond bitonal being used here at Corned? using XDOD's [Xerox Documents On Demand scanner]. material, we basicady define our requirements and who ever some upstairs We are using. which means capture beyond beginning are graphical content and what's the bit one to resolution, bit do investigation into more depth, enhancement kind of per pixel. Kenney: Yes. Butterfield: Why go beyond bitonal? Kenney: Well, production a good chunk of material requires. Butterfield: Could you Kenney: Oh. Well, of our current principally . has . either gray scale or color information, or subtleties of that require gray scale reproduction. describe in photographic investigations intaglio depth which we are on the most prevalent planographic and more kinds what of inputs you're now. requiring materials, works of art on paper, and book idustrations doing under contract for the 19th book illustration types from the processes. And Library of which is the focus Congress. We are of one focusing 20th centuries, for relief, early of them before the introduction of the halftone in and there was quite a range the 1880's. Butterfield: How do you judge the quality of reproduction of those works? Kenney: The quality of the original or the quality of the digital surrogate? Butterfield: The quality of the digital. How do you determine what is a good are you Kenney: We do judgements both depending on which would be what larger text Then simtiar on screen and via printouts, and we are the requirements we've established. and then at the the detail, normal eye would see at what would "Where is the to the detail of an aquatint. scan and what is a bad scan and what for? looking . . . a close up we've looking for levels identified three levels reading distances. Thinking examination or under slight things as halftones. you'll get Or it can of of information, information. One is about the dlustration essence as part of a magnification, say up to maybe evidence of the structure or process used to create the [unintelligible] Which is be And dlustration be exceedingly fine this very fine reticulation of certain processes. It such as the tedtale This 5X, content?" can be black lace [example] happens to be Calotype. Aquatints have, it is like almost a cracked egg surface that is caused by the gelatin process used. We discern Calotypes from Aquatints based on the character of the reticulation. Does the image file allow for positive identification of the process used. Does the stroke and acid bite of an etching, is it evident in the the process of a master file? Does the scoop Butterfield: Are you of the stipple actually looking engraving come out? to make those judgements from the scan of the image instead of the source document? Kenney: We look at the source document itself and that information in the digital surrogate, we know in most cases we probably Butterfield: What kinds would. Because do define those levels what we need you are of color requirements we talking, you of information. If we need to represent to capture. We can obviously scale from that. And 1000, 1500, 2000 dpi for have? You some of these. mentioned color a minute ago. Kenney: We're very interested in converting museum objects, works of art on paper, color photographs and color work to define ICC compliant color space, color appearance is important to us. We are following, with interest the for [unintelligible] and the emerging FlashPix file formats that have ICC conforming and management particularly capabilities associated with them. Butterfield: What are the important attributes of color that you want 60 to capture? Kenney: Well, I'm interested in being able to capture the specific hues, the brightness and darkness, the level of saturation. All of that is important. For many color documents, the color appearance doesn't have to be a total, exact match. Color maps, that sort of thing. It is basically representational color radier than true, kind of appearance color. But for works of art, it is critical for us to be able to get as close to the Chagall blue as possible in the digital file, that kind of thing. So our color requirements will range across a fairly good spectrum. We are interested in color control from the point of either creating a photo intermediate or direct scan via use of emerging color management systems, the use of targets. Butterfield: Kenney: By targets you mean? Gray scale, color targets. Butterfield: Standards. Kenney: Yes. Butterfield: What is important to proportion. you or Kenney: Oh, text is Our belief. is text . most of what you . large a part. . described so far has been Is that mostly graphics elements. what . a good chunk of what we've imaged here. We've converted close to three mdlion pages of text. has been, you put the document there, you know your document, imaging you love it, you define carefully what is the significant information being conveyed by that document. And men you create a digital surrogate that represents, futiy, that informational content. Because that digital surrogate has the best . . at the center of our approach shot for longevity and utility if is rich enough to reflect those attributes and is rich enough for processing. While we may be interested in creating a good, rich digital map, digital image of that document we'll want to be able to process it or create alternative formats or alternative views to meet user needs. We may want to move to a PDF file. We may want to OCR the [unintedigible]. encode that information. We may want to investigate new technologies for searching across Butterfield: Your focus then is processing operations? Kenney: Yes. We don't the technologies from digital image, good sophisticated or as fly capturing the image they all our perspective and then faithfutiy so have to be tied in has been in the at once. I'm that you can That capture of things. alternative views or creating technology itself develops. So what our user wants. resolution think on formats subsequently do those we understand enough. Understanding what changes more we created a good rich . . the most stable of it takes to as users quickly database of images post create a become and then more derive on the interested in the post-processing of raw gray scale information with sufficient so it is not tied to a particular piece of machinery, a particular kind of scanning also to meet specific needs approach. Butterfield: What about the integrity of the original document. Maintaining the physical integrity of the original that a large issue here? document. Is good deal of ready large issue. As we move beyond basic, brittie printed text you get a name of preserving its content. So we are very in the document the to destroy faculty interested in development of solutions for high quality, bound volume scanning. Non-destructive. Butterfield: You have talked a lot about quality. What about storage requirements of documents? Kenney: We have developed an approach that puts our digital masters at the heart of the system. So we need a Kenney: Yes it is. It is a the part of resistance on for providing timely processing of those files to meet ad of our image collections into one for amalgamating user needs. We currently investigating I'm interested in out to bid sometime, maybe, next year. I'm interested in RAID, we'll go and serving capability of declining advantage to take been made what improvements in HS have what hierarchical storage management. storage system that is capable of providing timely . costs in . magnetic storage. Butterfield: Is data access and alternatives are storage space a you are capturing. . big consideration now? Are you making any concessions in terms of the amount of . Kenney: No. Butterfield: Lossy compression? Kenney: No. Quahty is more important to you than. Quahty and migration are both important Butterfield: By migration you mean? Kenney: That if we were to use a lossy compression scheme, Butterfield: . Kenney: Yes. the compression process used in migrating to new we'd have to formats. It is just 61 not only worry one more step. about the file format but Butterfield: So it just be would Kenney: Right. We're not, another attribute of the persistence of that you know. become less storage costs are . . file format. of a concern, so the need to compromise is reduced. Butterfield: Tell me about your requirements something for a level involvement in the scanning process. How important is that. Kenney: Well, to this point, it has been very important. Not level of automation or the of operator so. I mean I would like to see the shift away from item-by-item either at the scanner or at the review, control load to automated processes for inspection. heavy quality For defining requirements for a rich enough bit stream that post processed according to guidelines based on the kind of content that we may be intervention Butterfield: What Kenney: We also do Butterfield: I with. . . more control develop automated methods for done image by the end user and less human evaluation. now? to insure that the system is operating correcdy. We do 100% quality And material. structure, files if there then we'll check the files a check to make sure that the By . becoming. see more of that as we occurring are the sorts of check that are scan targets facsimiles for text based We'll dealing at the same time . you mean the are structured ordering of pages. . seems to appropriately be control on printed a concern with the printouts. in the and stored proper fashion. . Kenney: The ordering of pages, segments, things like that. Butterfield: Metadata? What kinds of metadata are you capturing? . . Kenney: We have more work to do in that area. We have defined requirements for particularly TIFF header information. For basic structuring information for serials and monographs. At the very base level, we can retrieve by page number, variants of page numbers. For serial literature, we want to make sure we provide access down to the article kind level. We do a combination at the point of scan to create references to tables of The beginning of stuff. that tide information. And searching capability. Butterfield: What kinds of and and then we provide through ending beyond that we will. . do of costs issues, we will move to . OCR and you associate with this whole process keying, contents, tight encoding to and how do indexes, access to the article you that level off for text provide pay for it. Are the costs capital. Kenney: We primarily have two midion convert about involved, cents an received outside images, both at This most recent University of Michigan, project which was to the capture costs of. indexing . . which was about fifty image. . Butterfield: The forms Kenney: Web . we were for text [unintedigible]. doing how do you provide it to end users. access. Kenney: In this last project we off the shelf too you either put . of output that you provide, Butterfield: Turnaround time for it back shipment it has to be isn't conversion efforts. and the the selection, review, preparation, scanning, quality control, media, and base Butterfield: And the level of quality Kenney: These were 600 dpi bitonals that books funds for Cornell long, so scanning? did, How fast does scanning have to happen. 100,000 images in about 13 months. We don't like to keep the say that outside of six months from when you pull it from the shelf to about maybe I'd probably on the shelf or put a replacement on the shelf. returned in a month, and then there is this We work with outside elaborate negotiation of. . . vendors, we send out a is the quality fine or if it rescan. Butterfield: You said put the replacement back on the shelf, you are scanning and reprinting books. What is the process there? Kenney: We print on acid-free paper on Butterfield: So the Kenney: Yes. We replacement also book is which was most have done important to Kenney: Quahty. We Butterfield: And [Xerox] DocuTech. book based on the digital scan. some computer output microfilm. Butterfield: We talked about quality, you. the a remanufactured . we talked about if I had to ask you costs, we talked about turnaround time, . are a preservation program specific attributes of here! [laughs] improved? I'd like better capture of halftones. I quality that you'd like to see based level of gray scale with sufficient resolution, and then post processing see a base so that your not tied. document types. I think that's a real nice way to go so that you are instance. for that So requirements. intedectual than its is tied to the physical constraints of the item rather indicated earlier, I'd like to . on parameters specific to . . capture Butterfield: The physical image constraints. 62 . Kenney: object' No, the physical know, I don't have to want If I have s constraints. to disbind it. If I have I create a photo mtermediate. an oversize be want to bound a able to have a I have to volume scanner, map, I have to use a particular use a system that type of scanning device you or I capability for processing, regardless of how I dependent on a machine's native capabilities it in a way that could be clearly defined that is not so Butterfield: These are all the questions I have here. I'd like to take you up on your offer to see your lab. Kenney: Yes. Let me walk you up and I'll show you what we're doing. Let me tell you a little about the project that we're doing. As I said, it is with. it is under contract with the Library of Congress. And what we did was to captured . work with books from and serials So period. halftones . faculty and curator's advisory committee here to we 1850 to about have Calotypes have line. and we . in equating have etchings and we have engravings and photogravure and we have line art a couple of other. And then we worked with the Advisory Committee different levels of view, and I've been having a lot of fun working with my and we what, you minutes of arc are over that select 22 different samples from commercially printed 1920 somewhere that were the most prevalent printing during this simple . to define their requirements at opthamologist around baby "e" know, when you take that test for reading at 16 inches, you know, how many of the digital equivalency. To define an essence level translates into in terms and then we'll verify that against what our audience, what our viewers would say in a printed facsimde produced from the digital file. I suspect our printing requirements will be higher than our capture requirements for replicating that capture requirements for replicating that information. We may capture something at 300 dpi, but print it at 400 or 600 or something to give the appearance of what the human eye sees. So once we've established those requirements, we move it down as low as production process at 1000 dpi, are we we can get seeing it how low a quality we can get away with and Butterfield: So you're looking for necessary Kenney: not Necessary and linear. At want sufficient. We it to. . and still retain those . 800 dpi? I want to not see basic If we attributes. how high a quality are seeing get but we can stiti meet our customer requirements. and sufficient it levels off. Quality and cost and resolution are doing anything more than increasing your costs. So we want to get that curve as point adding more dpi some at or bit depth isn't to hit it right where the curve is level. Butterfield: So when you said quality was most important it not most important in a vacuum. Kenney: It is quality consonant with what the content of the original is. When we worked with Xerox on the joint study, we were very interested in 600 dpi capabihty because we did an assessment of production topography and printing techniques of the last century and a half, looking at the use of metal type and the limitations of that metal type with large print runs. And so printers had to be very careful about how small and closely spaced they made those letters. And I looked at all the common typefaces and they were produced for mass publications at five point type and above. And below that was hand lettered or wonderful evidence of skill, but not made for mass And consumption. greater. you the dpi And necessary for so that look at the size of the smallest material in five point type, it is one midimeter high or at your requirements for capture, that 24 dots across a 1 mdlimeter led us for the kinds of [Library of Congress]. As we In the 600 dpi bitonal is us. to current effort with LC not enough high character gives it. And that translated to idustrations that defined those are present objective. . . a 600 in those As we is in the analog, then we map to a digital equivalency. And if we can for conversion, and we are taking it one step further working our requirements we can begin to define small high-end imaging company called Picture Elements. We will take one instance of that which is objectify the define those, with a look capture to eliminate moire and weirdnesses that wid come out of requirement books, as you as you halftones subjective requirements and create software for what for automatically detecting halftone content region, applying the appropriate settings and... Butterfield: Autosegmentation? Kenney: Exacdy. We were impressed with what Xerox had done with the Autosegmentation, but it doesn't work more flexibly responsive to the very well on older halftone types. So we're looking at something that would be software that we would put up on free become a would then that And kinds of materials that we need to convert. the Calotype. You know, that take to do to What it is capability. that roll out some site and then continue to going kind of stuff. 400 dpi And then budd that eight-bit stream that associated with very fine suite of software to is going to do most reticulation in be able to, from any kind of. probably it is something like interested in. There wid be some compromises . . of what we're some of those things, but you know in the big scheme of things, that is probably going to do it Butterfield: The graphic elements that you said are the most fonts... 63 chadenging content. . . you talked about five point a Kenney: Photographic stuff is [feigns choking] One, it is hard to objectify when you start moving into color. what is the detati. And two, they are so information rich. In particular, Butterfield: Are Asian fonts all we've that you it a requirement? looked into kanji and we've looked at non-Roman alphabets, and actually the Japanese weren't Yes, interested in scanning until it got up to 400 [dpi] because of the. it is very subtle distinctions. Let me show I'm doing a little show and tell. This is our favorite font. It is the Bodoni italic four point. And what we use. Kenney: . this guy, Mr. by was produced 1796 and was used which is typical, Bodoni, and in the late in the Wild West for Wanted two, it is and characterized strokes that are associated with it. If we 18ttl posters. by the century was used to print Dante's Divine Comedy in What is nice about Bodoni is, one, the italic is on an angle, exaggeration can capture this, between the thickness we capture it with that with good assurance Butterfield: So if Kenney: If we we can capture can capture Bodoni. . and the thinness of the capturing 99% of the typographical don't want to do page-by-page review. we are real sure we are information in printed materials. Most pages don't take a 600 dpi. But So . . . we that we are getting it. . Bodoni italic four point, we are capturing whatever is there. It is a bar set higher than the standard. printing Butterfield: It is a pretty chadenging target. Let me take you up and show Kenney: It is! Butterfield: What is the name of your you our baby lab. scanning technician? Kenney: Allen Quirk. Paul about what you've got in terms of the 20 or so different idustrations. Allen him to measure, as best as possible, the finest feature that evidences the been I've asking may process used. So it could be the squiggle of the Calotype, the halftone. This is what he's doing. This is easy because this is line art. It is easy, but as you move beyond, like this is a photogravure, it is very difficult. Those difficult to put your hand on where your resolution requirement is. We're no fools, we start tone processes are Kenney: Aden, can you talk with seek your advice. very with the text, it is easy to measure. And so we are the exacdy the squiggles, but to represent in That's what we'll Butterfield: So be you file, pressing the limits of that looking for. don't actually need All we ready need is not the evidence that the squiggles convey that this is a to be at the Nyquist of the image to replicate Calotype. information, something lower something like 0.04, 0.02 mtilimeters. Kenney: That's right so they are looking but we have not gotten to gravure. Photogravure in right at now, Quirk: That is what we are looking than that. at finer than that. going to be 0.08 midimeters to 0.02 As we are working through the various etching types particular and relief printing, we are mtilimeters. do you mind if I record? By right now. And then as we get Quirk: Oh, that's ok. So that's what we're finding as we are looking at and finer measurements. magnification higher use have to difficult ones, I'm sure that we are going to Kenney: And you are using a 50X. limits. a 50X [loupe] right now and I'm reaching my Quirk: I'm Butterfield: is working the way, . into more . using send me e-mail if there are any questions. Kenney: I'm going to go, as I'm missing this meeting. Listen, appreciate it. Butterfield: Thank you. Thank you very much, I ready ve measured so far with a 50X loupe. Butterfield: The finest structures increments down to 0.02 mm. And m etchings I m finding Quirk50X loupe And that measured with measuring that. Mezzotint Calotype, it is going to exceed that That 1S beyond that And photogravure I'm sure will go way that element is that we are both in terms of measurement and definition, you know, defining try our them the squiggles of the it is going to be defining a block that will show going to have to capture. Or maybe know if we are gomg to 24 bit, Anne I don't believe. Calotype or the grain of the photogravure. And right now, I what going to . may have told Butterfield: Well, you already. per . . she told me about . both monochrome and color and I suppose color presumes . 24 bit. M . . ~ . , . Not 24 bit pixel, right? Anne has m mind tor this. Quirk: Well, that is what I'm asking, I don't know what at? Butterfield: The types of images you are looking n,nhmo photogravure, mezzotint Calotype Quirk: Halftones, Steel engravings, Copper engravings, etchings, print. 64 h AnictaHr Anistatic and Butterfield: Anistatic print? Quirk: I'm not familiar with it and given the implication But I'm still in my education process in this. of the name, I'm surprised to read it was used in 1861 Butterfield: What else can you tell me about your role in this. I said, I have not gotten to the limit of what I am able to do at this exact point We're having someone who will be in on Monday that is going to be able to explain the processes so that between the two of us we will be able to define what is that element that determines that this is a steel engraving as opposed to a engraving. Or what is the element that it was an etching instead of a steel engraving? Does one taper off in a different way and therefore we need something much finer, that type of thing. And then there is going to be the problems with things such as this. We are just getting to the limits of what we are going to be able to measure with Quirk: Well, as this. Butterfield: Are you Quirk: I think that degree angle which Butterfield: Ad worrying about the angles at which these structures appear as well? be finding out if we have to. I know that some of our halftones are we'll is usuady find now. We've been here is monochrome. what we of what I see Quirk: That is what we are intended to monochrome, it me looking at right now. Which finding halftones suggests to me. was printed with one color ink, but as . . and actually, you can not going to be on a 45 on other angles. tell, it is not monochrome. as you start looking It is closer and closer, there is definite depth. I think the plan right now is to consider these gray scale and capture monochrome. And one of the things we are running into is finding, for example here, we measure a line width at 0.02 mm. At 0.04 have a light line, and then a dark line, so at that resolution, we are not into bitonal, even is printed, presumably with one color ink, and this one here is a steel engraving. A very fine steel engraving. And the other thing in terms of a scanning challenge, which I think of as a technician is that I know that I can pick up 0.04 mm even if it is fairly light as long as I don't have to pick up the same area of lightness in a dark point mm we will though this area. In other words, I can have that come out and we can start be getting the detail in the what lights on dark Butterfield: So you need both the resolution and the dynamic to get this adequately, range to get the but then we are not going to black. Quirk: Well, in this particular case, what I am suggesting is that the resolution is cheated towards picking up darkness. In other words, if there is something mere, it wants to make sure it gets it at the expense of picking up very small white area. If this were bitonal, you wouldn't care about dynamic range. Butterfield: Because Quirk: If it was you just just black captured the structure where ever or white, right. You it a existed. could just capture the structure at whatever level of detail you have to worry about dynamic range. When we measure something at 0.04 mm, we are not measuring down to its bitonal structure yet. We're measuring down to maybe the steel plate structure, but a thinner scratch on the steel plate, it will lay less ink on the paper, and as a result it will look gray or black, light or dark. So needed. You either we're wouldn't going to have to measure further down or we are going to have to accept that that is ad we can have. If 0.02 mm, which means two line pairs across it. And that is just not going to happen. But if we we can resolve your can resolve that in gray scale, that would be adequate. That is what I'm assuming that we are that you don't need to go the next step down to say, this is made up of litde hatched black of heavier black lines, within that very fine line, they each have different [Tape ends] [End of interview] 65 going to determine, and this is made up lines, structural characteristics.
© Copyright 2026 Paperzz