Graphic arts requirements for electronic image management

Rochester Institute of Technology
RIT Scholar Works
Theses
Thesis/Dissertation Collections
5-1-1998
Graphic arts requirements for electronic image
management systems for the library and corporate
information center
Paul Butterfield
Follow this and additional works at: http://scholarworks.rit.edu/theses
Recommended Citation
Butterfield, Paul, "Graphic arts requirements for electronic image management systems for the library and corporate information
center" (1998). Thesis. Rochester Institute of Technology. Accessed from
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected].
Graphic Arts Requirements for Electronic Image Management Systems
for the
Library
and
Corporate Information Center
by
Paul M. Butterfield
A thesis
degree
project submitted
of
Master
of
in
partial
fulfillment
Science in the School
Sciences in the College
of the
of
of
of
the requirements for the
Printing
Management and
Imaging Arts and Sciences
Rochester Institute
of
Technology
May 1998
Thesis
advisor:
Professor Frank Cost
School of Printing Management and Sciences
Rochester Institute of Technology
Rochester, New York
Certificate of Approval
Master's Thesis
This is to certify that the Master's Thesis of
Paul Marcius Butterfield
with a major in Graphic Arts Publishing
has been approved by the Thesis Couunittee as satisfactory
for the thesis requirement for the master of Science degree
at the convocation of
May 1998
Thesis Advisor
Frank Cost
Graduate Program Coordinator
Marie Freckleton
Director or Designate
Brian Bartlett
I, Paul Butterfield, give Wallace Memorial Library
of the Rochester Institute of Technology
permission to reproduce in part, or in full, all parts
of the submitted thesis project.
ii
Table
of
Contents
LIST OF FIGURES
v
LIST OF TABLES
vi
ABSTRACT
1.
vii
Introduction
1
Legacy documents
Scope
Legacy document content
Legacy document format
of research
What
attributes need to
be
captured?
Form follows function
Legacy document transformation:
What tools
What is the
2.
Review
of the
are available?
work process?
Literature
4
Scanners
Computer Platform
and
Software
Output Devices
Endnotes
3.
Project Goal
Define
12
requirements
for
an electronic
Library/corporate information
image
management system:
center market
Graphic Arts quality focus
Legacy document
capture
Process documents
Republish in
4.
Methodology
QFD
required
form
Quality Function
as a requirement
Deployment
gathering tool
Market definition
Interview
process
in
13
Affinity
grouping
Customer
requirements
Technical
response
House
of
Quality
Endnotes
5.
Results
17
QFD Requirements
QFD Technical Responses
QFD House
of
Quality
Detailed discussion
6.
of requirements
Summary and Conclusions
27
29
Bibliography
Appendix A
Interview Transcripts
31
IV
LIST OF FIGURES
Figure
Page
1. Typical Electronic Image Management System
2. QFD House
of
Quality
1
14
LIST OF TABLES
Table
page
1. QFD Requirement Categories
2. QFD Technical Responses
and
and
Descriptions
17
Descriptions
18
3. QFD House
of
Quality (Parti
2)
19
4. QFD House
of
Quality (Part 2 of 2)
20
of
VI
ABSTRACT
The
value of
documents
those
are transformed
The image quality
observed
Graphic Arts field. This
constraints of slow
Users
their
of
The
systems
tool,
process
requirements.
has
of these systems
QFD
subsequent analysis.
printed
is
much poorer than
form, is increased
and
competing
requirements
Quality Function Deployment (QFD),
The resulting
of
it's
when
into digital form.
which
quality
was
is typical in the
due to
the past
bandwidths.
Corporate Information Centers
requirements relative to
because
that
storage costs, and narrow network
in Libraries
was chosen
only in
sought to understand whether the poor
computing power, high
EIM
exist
by an Electronic Image Management (EIM) System
from many
research
fundamental quality
speed.
important documents that
legacy documents,
interviewed to
were
for cost, turnaround time
assess
and
was used to gather and process user
methodical structure
requirements were organized
for the interview
into
a
QFD "House
process and
of
Quality", arraying
customer requirements against technical responses.
Subsequent
analysis of the
House
of
Quality and transcripts of the
customer
interviews
suggests
that requirements for high speed and low cost, predominate over Graphic Arts quality for most users. The
focus
on speed and cost was most obvious
for those applying EIM to
commercial purposes
interest in
they have
in Corporate
Information Centers.
While
EIM,
Library
users
had
a shared
preservation and conservation.
documents that
speed are still
are
speed and
In this application, EIM is
cost,
cannot
be
sacrificed
for quality
vn
specialty
application of
used to preserve and save printed
deteriorating. For this specialty application, quality is
important, they
a
or speed.
paramount.
While
cost and
CHAPTER 1
INTRODUCTION
The way
we share
the World Wide
documents
are
and
information is rapidly
Web,
images
is making it
The
printed
weakest
more portable and more
link in this
new
are
only
reused.
form. These
pervasive growth of the
communication.
The
available as
paradigm
fully realize
increasing
Changes in the way
their
systems are composed of three
not viable
for transforming
benefits in the information
systems are the means
of
information
in the
printed
accessible
elements: scanner, computer, and output
Printing
system
ininmH
Computer
t 1
1 i
CD
CD-ROM
Figure 1. Typical Electronic Image Management System
only in
age.
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiinii
m
exist
legacy documents into
iiiiiiiiiiiiiiiiiiiiiiiiiiiiniiii
iiiii
color
context of our electronic
such a system:
Scanner
describe
direct digital printing
by which legacy documents become
fundamental
we
at the point of need.
vast quantities of
"hardcopy", they are
Internet,
device-independent
and
availability
information
is that
research will explore the options
so that we can more
Figure 1 diagrams
readily
information-sharing
Electronic Image Management (EIM)
electronic
is revolutionizing
possible to print small quantities of customized
information infrastructure. This
forms
by technology. The
Adobe PostScript, Portable Document Format (PDF)
including
form. When documents
electronic
transformed
and network applications
making information
systems
being
writer
in
device.
Note from the example,
form, in
die
one of
advantages of an
EIM
system, the
document
can
be
transformed
into
different
a
this case, to CD-ROM.
Legacy documents
The term
Scope of research
-
"document"
as applied to printed material can
information
of
document.
The
corporate
any sort;
book,
magazine, manual,
scope of this research will
information
added value
a
if they
centers.
were available
in
will
electronic
to describe a staggering variety of things. Printed
used
brochure, flyer,
be limited to
This research
be
memo,
letter,
electronic capture of
determine, for
form. This
etc., might be considered a
legacy documents
in libraries
this venue, what kinds of documents would
and
have
research will explore system requirements and workflow
to capture these documents electronically.
Legacy document content What attributes need to be
-
Before choosing
attributes of
a method
for transforming
a
legacy document
esoteric.
this is conveyed
some cases, the
The
basic level is the
most
largely by the
content, physical page
text.
layout, font
However,
even
information
raw
and
feasibility of capturing
be
about the
which
document
attributes that must
the document is captured.
put are also
Are there
formats
constraints of
readable
be
document than just its text. The
it
pictorial
beauty may be important. This research
will examine the
-
which
was printed
may
Form follows function.
captured will
be important factors in
determining
the electronic
the new uses to which the electronic form of the document
about
format. Is it important that
transformations of
information bandwidth that
documents,
most
these attributes.
However,
subsequent
document. In
from
be important. In
important factors in making decisions
the document electronically? Will
legacy document,
consider which
also
quality, color, and the paper on
Legacy document format
will
form, it is important to
content of the
there is much more to a
document's feel, weight, condition,
quality, cost, speed and
format in
electronic
the document need to be captured. There are many attributes of any
fundamental to
Decisions
into
captured?
must
by the target audience? This research
be
languages, fonts, formats,
considered?
users
be
able
to search
be
necessary?
or media
Are there limitations in the types
will examine each of
these questions in detail.
of
Legacy document transformation
Once decisions
about attributes and
capturing documents. There
their
and
ability to
forms
editable
of
capture the
legacy documents
of page
transformation.
document
of
images
available.
of scanners
of
different types.
There
To
are software packages
are new systems that combine
describes how these tools
defined
which
achieve
different forms
of electronic
can
be
both
or
work process?
selected and processes
of a
document. Scanners vary in
for transforming the image
finishing
are
requirements, and for
electronic
This
of text
shapes
into
an
captured, for manual or automated
scanners and software packages to
combined and used.
identified for
in their ability to handle the sizes,
document
legacy documents into
documents.
be
They differ
that absolute color characteristics of the
effect transformation of
must
What is the
for capturing the image
elements, for capturing document printing
There
capture.
are available?
formats have been made, tools
many types
quality
form, for ensuring
formatting
are
What tools
-
form,
research will
fixing
errors of
simplify the
process of
a work process must
define
be
work processes
to
CHAPTER 2
Review
As this
research seeks to
define the
for
Because this
a system to
handle
work also seeks
legacy documents,
The literature review
which
follows is
Scanner, Computing Platform/Software,
organized
by the
prudence
to define real technical responses to the
requirements, the availability and capability of system components must
understood.
system:
the Literature
customer requirements
suggests a review of research on the subject.
customer
of
also
be
researched and
component elements of a
legacy document
Output Devices.
and
Scanner
Scanning requirements of Electronic Image
of
Information
implementors
publications
and
of
EIM
by AIIM
cited.
Other
for
capture
Although this
systems.
and others will
for EIM
help
common resolutions
"excellent"
compression schemes
general
automation of
scanning
systems
spots per
for EIM
noted
information is
may
scanning
inch
work are:
is that
can
not
systems
be
able
an overview
advantages and storage
(spi) is
100
disadvantages
the typical resolution used
spi
for
"marginal"
information
quality, 200 spi for
increase
with
scanning
speed of
advantages and
capturing
written on
focus, filtering, digitization, gray
the darkest image detail
of
today for
guide
EIM,
on
scanning
and
increasing resolution
are
"satisfactory"
quality.
"good"
quality, and 400 spi
the square of the resolution, though
disadvantages
legacy information,
but
over manual
with current
the specific requirements for
EIM
etc.3
capture
designed to
data
entry.2
The
technologies,
to replace manual data entry if human recognition of document type is needed.
More detailed information has been
reflection,
Association
effect.
available on
improve the
of publications
to define the future requirements for
include
storage requirements
usually minimize this
scanning
variety
fully collected by the
to establish the baseline requirements.
The quality
Also
quality.
work will attempt
a
systems.1
In the EIM market, 200
More
systems are most
Image Management (AIIM). The Association has
Previously gathered requirements for
document
Management (EIM)
with
scale conversion,
the scanner's sensors.
As
Illumination
most
EIM
must
on scanner
be
scanners used
illumination,
of sufficient
intensity to
Charged Coupled Device
(CCD)
arrays, a high level of illumination is required.
constant.
Compensation
Illumination
reflections
angle
from shiny
Illumination
or minimization of
may
Filtering may be
also
colors or
with
depth
must
depth"
of
analog
its
It can
also cause unwanted
printed on coated stocks.
when
scanning
object that not
scanned.
desired. Proper
filtering
may
permit
minimizing
faithfully emulate the properties of the human
levels
intensity information,
capture of
levels
of
used
of
in
"dropout"
of
a scanner.
lightness that may be
followed
intensity information
Filtering
eye.
Suppression
color content.
filters
effect on the range and
governs the number of
be held
be done.
from documents
applications, if the originals have
monochrome
must
"paste-ups"
a panchromatic response will require attention to the
The "bit
aging
across the platen must
field, may also be important
be
scanner to more
scanners work on the principle of
conversion.
of
specific color responses are
Digitization becomes important because
CCD
from
with
or produce unpredictable results
bound books
allowing the
be important for
providing
when
important if
metameric color errors,
may
inks,
direction, along
perfectly flat, for example,
lamp intensity changes
effect the appearance of shadows
pencil or
angle and
Uniformity of illumination
captured.
by analog-to-digital (A-D)
that may
be
For
captured.
example, a scanner with an 8-bit A-D converter can capture 256 levels of intensity. A scanner with a 10-bit A-D can
capture
1024 levels. Both analog design
and
"bit
depth"
must
be
matched to the originals to
be
scanned.
conversion"
is the term EIM
"Gray scale
large bit depth to
not common
a small one.
in EIM
systems.
to a smaller bit depth.
image.
If,
Although it is
references use
possible to save an
Usually this information
CurrenUy popular EIM
and when this
is done,
to describe the process of
systems
is
image in its
not stored, even
these or other
treated in depth
here.
There is litfie
dismissed
as
additional
of these processes are well
information in the literature
unnecessarily costly, both in terms
assumptions are still valid
this research.
Most
in the face
of
about
a
bit depth
of
known in
graphic
saving high-bit-depth
of storage costs and communication
technology improvements in
stochastic
arts,
"gray"
or
one, or
a
real
time
binary
process used
screening,
in
or some
and need not
be
information. It is
time. Whether these
storage and communications will
a
form, it is
temporarily, but it converted in
dithering, halftoning,
methods.4
combination of
image from
high bit depth
image information is discarded. The
this transformation might be thresholding, adaptive thresholding,
an
scale"
"gray
usually transform images to
seven-eighths or more of the
transforming
be tested
by
Currently popular
leader5
for EIM
A typical
scanner
is
color
dropout,
This is
advertise
for the EIM
market
200
either
contrast
or
color scanners with
or
x
2000
pixel.
Adobe
A
It
spi.
It has
PhotoShop9
in the
captures
18
amount of
color with
Howell
information
information
referenced above
landscape mode, independent
The Agfa
scanner
If we
lowest
resolution of
is
products of
real-time
It
scanner
is
capable of
bit depths up to eight, its
gray
conversion with
proprietary
allows user selection of a single
off
of
10
red
and
12 bits per
channel.8
A resolution
of 400
spi, a common high
the low end of the scale for graphic arts scanners.
is the Agfa DuoScan. This
green and
blue channels,
scanner
has
each of which
typically done
an optical resolution of
has
bit depths
a
of
12 bits
via powerful software packages
1000
per
like
is
per square
per square
should
be
at
it's best
capable of much
and
scanning 80
of resolution.
resolution.
Bell & Howell EIM
The Agfa Graphic Arts
inch. Because the Graphic Arts
between EIM
capable of
inch
scanners shows the
scanner
higher quality than the EIM
Graphic Arts
scanners
pages per minute
is
scanner
is capturing 100 times the
scanner.
speed and productivity.
(ppm) in
scanner
portrait
The Bell
mode, and 125
An integral document handler is described in terms
of
ppm
in
its reliability,
durability.
color.
scanner
can output
and small angle skew correction.
image information for the two
element of comparison
throughput and
inch. Though it
spots per
time image processing, as this is
image information, it
Another
The
graphic arts process.
megabytes of
megabytes of
monochrome.6
the Graphic Arts market where manufacturers like Agfa and Microtek
bit depths
comparison of the collected
0. 16
and
full
no advertised real
captures
400
a typical graphic arts scanner
can output
market
blue.
quality resolution for the EIM market, is
For example,
is exclusively
binary. Bell & Howell is the
is the Bell & Howell Copiscan 8080S. This
enhancement,
red, green
of scanners
binary. It has built-in hardware for
dramatically different from
full
line
and white and
capability.7
monochrome and
image processing for
predominantly black
similar
at an optical resolution of
principal output
are
scanners and their product
Eastman Kodak have
competitor
scanning
for EIM
scanners
lists its
make the most
more than
speed as
favorable
10 milliseconds
selections of
1000 spi, it translates to
10
a rate of
per scan
line
(ms/line) for monochrome, 13
ms/line monochrome
0.7
ppm.
100 times faster than the Graphic Arts
This is
scanner.
image, landscape
an enormous
ms/line
scan of
for
8.5 inches,
difference in speed; the EIM
and
No further information
seem to
part of
be
emphasized
was uncovered about
in the
the balance between quality and speed for EIM systems. Both
literature. Further
scanner promotional
be
exploration of this tradeoff will
explored as
this research.
Computing Platform/Software
Though the computing
is the heart
platform
of a
outcome of system and software requirements.
law
doubling
predicts a
is the
fundamental
selection of a
hardware
periodic
Software
of computer
upgrades are
meets customer requirements
classifies software
Because
software.
by users,
spent
in
interface
on
hence is the
of a
on
intensive
is
often proprietary, as
documents to
electronic
While interface
image processing
software
section on
It
include
EIM
systems.
form in
Scanners,
at
an environment where
interface,
user
work will
be
must
chosen which
Avedon11
etc.
and specialized
focus
is
one of
on
Applications
and
scanning
the primary requirements
accounted
for 43%
application.12
Littie
a typical
but references to
in terms
of
additional
of the
time
information
TWAIN13
and other scanner
compatibility
some amount of
automated adjustments
packages
systems
is
somewhat
with
limited. Current
relatively low resolution, in monochrome,
for
image processing is done in
contrast and
the
systems
binary raster
real-time as an
integral
for EIM
images.14
part of the
skew, and algorithms that promise improvements in
software.
available on several software packages
Popular
capability
Utility software,
speed
shows
standards are relevant
by Optical Character Recognition
Information is
The literature
for scanning
in the
recognition rates
hardware
are not germane to this research.
As
can
scanning
as a standalone offering,
relatively simple, scanning documents
scanner.
particular
Moore's
operations are performed.
are often
noted
Application software,
competitive advantage.
standards are available.
Information
legacy document system. Software
Operating System,
low-level scanning software,
computing platform, they
continuity in
for document processing speed, capability, format,
imaging
conversion of paper
found
was
and
as an
likely.
Scanning software for EIM systems
cited
More important than the
the focus on graphic arts elements of EIM systems, this
specialized software where
hardware
months.10
architecture that will allow the user to maintain
into categories,
of
most references treat
Computing hardware has become essentially a commodity.
capability every 18-24
life to the hardware
gives
legacy document system,
include Adobe Acrobat
for Optical Character
Capture15
and
Recognition,
Xerox TextBridge
an element of
many
Professional.16
The
TextBridge
including
word
Text
product claims to transform printed
processing, spreadsheet,
Markup Language
Acrobat Capture
font
fidelity in
Image only,
and
bitmaps
the
and
database applications,
are
electronic
41
output
form
formats
while
which
preserving
page
layout
include ASCII, many
PostScript, Portable Document Format (PDF)
and
popular
Hyper
(HTML).
for
makes similar claims
Image
and
The
format-preserving transformation,
sole output
format
Text. In the Normal format,
within the
document,
and
allowing
of
but
a greater emphasis
is
placed on
Capture is PDF in three different forms: Normal,
original
a small
images
file
are captured as
size.
formatted
electronic
In the Image only mode, only the
text,
full-
saved, preserving the image content without risk of recognition errors, but providing for no
searching capability,
and
offers the user a selection of
their advertisement.
permitting searching
page
It
pictures and tables.
documents into
bitmap are
and
creating large file
In the Image
sizes.
saved, allowing for searching, and lack
product supports the
first
of
and
Text mode, both the formatted
errors, but
with
very large file
sizes.
electronic text
The TextBridge
and third of these options.
Output Devices
Output devices
form into
not well
is
placed on
in this
uncovered
Xerox,17
suitable
element of a
for
end use.
input, not
research.
specifications of
Xeikon,18
information
are
currently
always applied
speed and
EIM
available
printing
systems
may be
is
This does
quality today. If this
not
as
or
required to maintain this color
Speed,
are available
resolution,
color
printing is rarely the constraining
at or above
to scanning, processing,
management system are often
slower and more prone
hardcopy output systems
electronic
to quality
from
loss,
fidelity that may be
fidelity.
major manufacturers
like
capability, and quality
No
current printer products are available via manufacturer's web pages.
is necessary,
systems
is
document
change with trends toward color and color
Canon,20
for these
portion of a
system
and others.
technology.21
printing
This may
available
Hewlett-Packard,19
readily
of these
output.
portion of an
Hardcopy output
review of these specifications
Resolution
legacy document system. They translate the document from
Requirements for this
defined. Because the scanning
emphasis
The
form
a
final
are the
400
spi
file transfer. In
for
most systems.
eleven case studies of
increases in quality
in
an
EIM
system.
In EIM literature, the term
imply that printing is unimportant, but that
research suggests
element
EIM,
speed
is
no reference was made to the
other elements are
or speed are required
limiting
by customers,
the
high-
speed, direct digital systems like the Xerox 6180 for monochrome or the Xeikon DCP-50 for
color can produce
600
spi output with excellent quality.
Alternate
media
like CD-ROM
are now
in
use.
Information
on the common
formats is
available
in summary
references.22
These
media
include CD-ROM (Compact Disk-Read
Many), CD-R (Compact Disc-Recordable)
CD-ROM's
use a constant
electronic publication.
WORM disks
CD-R's
cycle.
linear velocity
are available
in
are common
in form
several
allow
and
They are more cost-effective
making
Rewritable
access and are a
widely distributed format for
using
a master
making
use a constant angular
velocity
system
incremental writing
with
WORM (Write Once Read
disks.
so are produced
sizes, and
format
for
optical
for data
system
They are a read-only system,
Unlike CD-ROM's, they
access.
and
Only Memory),
the
short runs or
of new
for data writing
information to disk from
CD-ROM, but are writable,
individual publications,
as
process.
usually in
a
and
dedicated drive.
a single-batch write
they do not require
the
master-
process.
Rewritable
formats
are
optical
less
disks typically
popular with
use a magneto optical
EIM systems,
as the
technology
to permit
information is usually archival,
rewriting
of
information. These
so the rewrite
feature is
not
necessary.
The
Ricoh,23
up-to-date specifications of
Sony,24
and others,
All
alternate media
error
checking
but they
described here
and preserve
recorders and similar
will not
provide
be
devices
critical this study's
different
data integrity, there is
Within this study, any reference to
systems.
CD
means of
no
are available
focus
from
manufacturers
on the graphic arts
like
quality requirements.
capturing numerical digital information. As
image quality
aspect
all are
to selection of an alternate media format.
alternate media will accommodate customer needs
for commonality
with current
Endnotes
1
Avedon, Don M., Introduction
Management, 1996), 67.
to Electronic
Imaging, 3d ed., (Silver Spring, Association for Information
and
Image
2
Head, Robert, Document Management: The Essentials, (Silver Spring, Association for Information
Management, 1997), 3.
3 Black, David B., Document Capture for Document
Image Management, 1996), 12.
Imaging Systems,
and
Image
(Silver Spring, Association for Information
and
4
Stofel, James, C, Graphical
5 WROC TV8
News
Evening
and
Binary
Image
Processing
-
April
21, 1998.
7 http://www.kodak.com/daiHome/scanners/scanners.shtml
8 http://www.microtekusa.com/, http://www.agfahome.com
10
graph
-
Applications, (Dedham, Artech House, 1981.), 289.
May 31, 1998.
-
6 http://www.bhscanners.com/opening.html
9 http://www.adobe.com
and
April
-
-
April 21, 1998
April
21, 1998.
21, 1998
"In 1965, Gordon Moore was preparing a speech and made a memorable observation. When he started to
about the growth in memory chip performance, he realized there was a striking trend. Each new chip
data
contained
roughly twice
as much
capacity
as
its predecessor,
and each
chip
was released within
chip."
(http://www.intel.com/intel/museum/25ANNTV/hof/moore.htm)
previous
-
April
18-24
months of the
21, 1998.
11 Avedon, 92.
12 Thornton,
May A.,
Image Management,
"Unusual for
13
Interesting
Name.
Electronic Image Management, Case Studies, (Silver Spring, Association for Information
and
1993.) 41.
computer
Initially
acronyms, TWAIN has no real meaning-it simply stands for Tool Without An
named
SAPI-for Scanner Application
Programming Interface,
TWAIN is the
industry
for scanning and acquiring graphics from software applications. The idea behind TWAIN is to
allow any TWAIN-compliant software to talk to any TWAIN-compliant hardware.
TWAIN is an API standard for input devices such as scanners, framegrabbers and digital cameras, which provides
standard protocol
across-the-board
and compatibility between scanners and software. The specification's open
and applications programmers to support a wide range of devices by
developers
hardware
device independence
industry interface
allows
writing one standard device driver.
The TWAIN specification was developed
software vendors that
released
in spring 1992
application
14
includes
and
by the Working Group for TWAIN, a consortium of hardware and
Aldus, Caere, Eastman Kodak, Hewlett-Packard, and Logitech. The specification
lets
that supports the TWAIN
was
any Windows
(http://www.spco.com/Techsupp/HM/1902.htm April 21, 1998)
scanner manufacturers write a single
driver that
can work with
standard."
-
Avedon, 15.
'legacy'
documents into accurate,
"Adobe Acrobat Capture Software turns everyday business and
printed
page"; (http://www.adobe.com/)
searchable electronic files that look exactly like the
15
10
16
"TextBridge Pro 98 is a full-featured, highly accurate and easy
(http://www.xerox.com/scansoft/tbpro98win/ April 21, 1998)
to use
document
packag
recognition
-
17 http://www.xerox.com
18 http://www.xeikon.be/
April 21, 1998.
-
April 21, 1998.
19 http://www.hp.com/peripherals/main.html
-
April
21, 1998.
20 http://www.usa.canon.com/corpoffice/printers/index.html
April 21, 1998.
-
21 May, 6.
22 Avedon, 23.
"Ricoh's MP6200 external ATAPI CD-RW drive is the best way to preserve, archive and retrieve data of
every type, and its superior design and performance ensure reliable writing and reading of your critical information
for years to
(http://www.ricohcpg.com/ April 21, 1998)
23
come."
-
24
"For many applications in finance, medicine, government,
life are critical factors in choosing a data storage
and
business,
permanent, secure data storage and
medium."
long
archival
(http://www.ita.sel.sony.com/support/storage/faqs/worm.html
11
-
April
21, 1998)
CHAPTER 3
PROJECT GOAL
This research
image
will use
Quality Function Deployment method to better define
management system.
information
centers.
This research
systems
is due to
The
Though
will attempt
minimal
Requirements
will
document processing,
array
the
be
research will
be limited to the
general requirements will
the requirements for an electronic
market segment of
be gathered,
emphasis will
libraries
be
and corporate
placed on graphic arts quality.
to determine whether the low quality of many current electronic image
quality
requirements or
allocated to each
and republication
customer requirements against a
limited capability
processing
in the
required
of popular systems.
stage of electronic
form. A House
of
image
technology
management:
Quality will
technical response to those requirements.
completed, which will project the ability of
management
A
be
document capture,
constructed which will
feasibility analysis
will
be
to meet high quality requirements expected of customers.
12
CHAPTER 4
Methodology
This research
of
defining a system
used
of
seeks to analyze
in this
gathering
Originating in
grown to
This research
contacts
have been
will
strategy.2
them to be
in
other parts of the
place great emphasis on the
method of
insights that
dialog,
gathering
can
influence
literally, interviews
will
will
be
be
product
development,
be
applications
for
more than a structured method
specified.
requirements
and corporate
is identification
information
facilities in the Rochester
requirements
gained
is
of the
centers.
target
Personal
Telephone interviews
area.
Practitioners
personal customer visits.
by first-hand communication
observation of the customer's
used to
by the interviewer's
be tape
for
method will
intent
will
of
customers.3
The
with
in their workplace, the insights
are real advantages.
Open-ended questioning techniques
and prevent undue
libraries
as a tool
with the
United States.
face-to-face communication, the detailed
that can be gained from
clearly
legacy documents
Deployment (QFD)
QFD is nothing
QFD methodology for gathering
established with the managers of these
to capture
Quality Function
education and
on the requirements of
Most fundamental to the QFD
richness of
in
who wish
industry around 19721,
individuals, allowing
focus
conducted with customers
QFD
Japanese
use
of application of
market.
be
include its
wants and needs of
The first step
individuals
requirements of
that will meet their requirements. The
research.
QFD have
die
Function Deployment
Quality
induce
customers to share requirements
preconceptions.
recorded and transcribed.
This
In
order
in their
own words
to capture customer requirements most
will minimize note
taking
and permit an open
dialog.
After collecting
for
the requirements
root customer requirements.
These
quotes will
be
customers together.
sorted
These
from
Customer
by a process
can then
be
about ten
customers, the resulting interview transcripts
verbatim quotes of
called
their needs will be extracted from the
affinity grouping,
which collects
summarized and organized
into
a
list
tables"
The
the
next
"House
of
step in the QFD method is the
construction of
Quality". In this matrix, this list
of
"quality
fundamental
13
like
requirements
be
analyzed
transcripts.4
from many
of customer wants and needs.
or matrices, the
customer wants and needs
Figure 2-1 below diagrams this:
needs.5
technical response to those wants and
will
is
first
of which
is
called
arrayed against a
Technical Response
Customer
Needs
Relationship
and
Planning
Matrix
Matrix
Benefits
Technical Matrix
Figure 2. QFD House
QFD
QFD is
useful a requirement
Customer"
QFD includes
fundamental
gathering
a personal
customer requirements.
tool
as a
Requirement
because
process
QFD
imposes
grouping"
requirements via
response.
seen
"affinity
Lasdy, QFD
in Figure
2-1,
to
be
used
Also
like
provides a useful means of
this matrix can
display, in
response to those requirements, and the
will
cluster
an
based
Quality
Gathering
of the structure
interview
also
of
on
Tool
it imposes
on
gathering the "Voice
probing, open-ended questions designed to discern
a useful structure on
organizing
related customer
requirements together to allow a more coherent technical
conveying
requirement
information in
intuitive form, Customer Needs
relationship between them. An
and
a
Quality"
"House
of
As
Benefits, the technical
applications software package,
QualiSoft,
to facilitate creation of the matrix.
apparent
in Figure 2-1 is the Technical Correlations "roof
of
the
house,
which allows
tradeoffs that will exist between technical responses, e.g. system cost vs. resolution. The
selection of
of the
important
sales points relative to customer competitive
selection of specification
levels
relative
to engineering
competitive
14
the
user
to depict
Matrix
allows
benchmarking. The Technical Matrix
allows
Planning
benchmarking. While these
elements are useful
for the
product
development, they will not be
used
in this research, the limited
goals of which are
definition
of
requirements.
The House
requirements of
EIM
of
Quality created
EIM
users which
systems will constitute
in the
above exercise will
is the focus
of
become the basis for the definition
this study. The analysis and conclusions about
the principal output of this research.
15
of the
Graphic Arts
requirements
for
Endnotes
1
Shillito, Larry M., Advanced QFD : linking technology to
Sons, 1994), 1.
market and company needs,
(New York, John Wiley
and
2
3
Cohen, 21.
McQuarrie, Edward F., Customer Visits:
building a better market focus, (Newbury Park,
Sage Publications, 1993),
10.
4
McQuarrie, Edward F., Customer Visits: building
a
better
market
focus, (Newbury Park, Sage Publications, 1993),
140.
5
Cohen, Lou, Quality Function Deployment: how to
make
QFD
11.
16
work for
you, (New
York, Addison-Wesley, 1995),
CHAPTER 5
Results
Per the QFD method, interview transcripts found in Appendix A
requirements of
the
As
interviewees
were
interviewees. The individual
part of this
first
As these
logical
categories.
shown
in Table 1 below:
The
logically in
further, it became
requirements were sorted
Types
Sizes
sizes
content
content
were repeated
by
sorting"
the process called
"affinity
description
of
possible to
group them into
eleven
the customer requirements in each category is
Archival
Cost
to EIM
Users
Speed
Types
of text content to
Types
of pictorial or graphical content to
be
captured
Cost
requirements of
Ease
of use and
Scan
rate and throughput requirements
Turnaround time
Time
Utility
Ease
to end
legacy documents to be captured
legacy documents to be captured
of
of
be
captured
Quality requirements of the EIM process
Requirements for longevity
Quality
user
requirements
of use and
EIM
feature
users
requirements of
for job
feature
EIM
users
completion
requirements of the users of
Table 1. QFD Requirement Categories
of
The
Description:
Document
One
requirements.
Many of the requirements
collection.
requirements were then grouped
Document types
Utility
large
a
eleven categories and a simple
Category:
Picture
into
for fundamental
process, repeated requirements were removed, so that only a single instance remained, yielding 108
unique requirements.
Text
gathered
were analyzed
and
EIM-sourced information
Descriptions
the properties of QFD is that it allows requirements to be arrayed against potential technical responses to
those requirements. In this case, the technical responses are essentially the specification attributes of an EIM
system.
By reviewing
was created.
are more
Because
the
list
of the
detailed than those
description
of customer requirements, a
focus
of
list
of
twenty
technical responses to those requirements
this research on quality, the technical responses related to quality requirements
of other attributes.
Table 2, lists the
of each:
17
technical responses and provides a
brief
Technical Response:
Description:
Scanner/Camera Type
Imaging
device
used
for EIM capture,
e.g. reflection
scanner, transmission scanner, CCD
camera
Optical resolution
Highest
spatial
frequency sampled by the
scanner, indicator
of
the ability to capture fine
image detail
Sample depth (gray)
Sample depth (color)
Monochrome bit depth, or bits/pixel, indicator of the ability to capture gray information
Color bit depth, indicator of the ability to capture color information
Dynamic range
Range
of
lightness information
over which
the scanner can capture
information,
related to
the ability to capture highlight and shadow detail
Calibration/Stability
Area
Scanner capability to achieve a target response and maintain it
Scanner control of stray light and unwanted degradation to isolated image detail
Maximum area that can be imaged
Speed
Rate
of scan
Rate
of speed of an automatic
Scanner Flare
Scanning
Scanning
Doc'
t
Handler Speed
Doc't Handler
Robustness
of an automatic
document handler
document handler,
inversely related to failure rate
Reliability
Metadata parser/editor
Image
Processing
OCR / ICR capability
Viewer /
File
editor
conversion
utility
Electronic distribution
Searching
capability
Storage format / media
Capability to extract keyword information from scans or to allow entry or editing
Capability to algorithmically enhance quality, usually via digital imaging
Optical Character Recognition
of printed text or
File format
and storage medium used to save
Table 2. QFD Technical Responses
Per the QFD process, the
of
Quality. QFD
central portion of the
Low
were provided
corresponding
of
Quality is
digital information
allows relationships
House
and
Descriptions
collected requirements and technical responses were arrayed against each other
of
based the degree to
in Table 3
between
requirements and technical responses to
Quality called the Relationship matrix.
customer requirement.
shown
of
Capabilities for transforming digital information back into tangible forms.
Ease of use and feature requirements of the users of EIM-sourced information
Printing / finishing
Utility to end user
House
Intelligent Character Recognition
handwriting
Capability to view or edit images for validation or clean up
Facility for converting between file formats for either input or output
Capability to transfer digital information
Tools provided to facilitate finding desired information from an EIM system
and
which a given
If there
Correlation
values of
be
shown
in
a
in the
High, Medium,
and
technical response could support the achievement of a
was no correlation, no value was provided.
Table 4.
18
The resulting QFD House
^"^\^^
Technical Response
4)
'C'
&
e
I I
"o
to
&
1
"a.
"a.
a
o
Customer Requirements
^^\^^
a
o
i
E
o
to
1
8
O
s
a.
is
1
(>
M
L
H
Capture bound volumes
H
L
H
Capture newspaper clippings
H
L
H
Capture manuscripts
H
L
M
Doc't
Capture black
H
L
M
types
Capture
color photographs
H
L
M
Capture
postcards
Capture
Doc't
sizes
a
of art on paper
fax
H
L
M
H
L
H
H
L
H
H
L
M
Capture engineering documents
M
L
H
Ability to handle 35mm aperture cards
Ability to scan AO through A4 document sizes
H
M
H
H
M
Fonts
M
H
as small as eight point
H
H
M
M
M
8
-3
e
A.
n
n
L
H
Capture works
1 t
03
H
Capture maps
u
a
Capture office documents, forms
Capture different forms of microfilm
and white photographs
on
PS
&
M
L
-3
3
a,
'1
c
o
a
g 1?
M
H
i
8
B
1
3
B
o
J*
B
'I
c
Jr
i
S
,o
I
I
e
f
o
M
H
H
M
Reproduce Bodom italic four point
M
H
H
M
M
M
H
H
H
H
Reproduce Asian fonts
M
H
H
M
M
M
H
H
H
H
Text
Reproduction of light
M
H
H
M
M
M
H
H
H
H
content
Capture carbon copies
M
H
H
M
M M
H
H
H
Capture dot matrix
M
H
H
H
Smeared pencil input
M
H
H
Ability to handle fourth generation copies as input
M
H
H
pencil
Capture book illustrations
M
H
H
M M
H
H
H
M
H
H
H
M
M
H
H
H
H
H
H
H
H
M
H
H
H
H
H
H
H
H
M
M
H
H
H
H
H
H
M
H
M
M
M
M
H
H
M
H
H
H
H
H
H
H
Capture terrain lines in topographic maps
M
H
H
H
H
H
H
H
Scans
M
H
H
M
M M
M
H
H
M
H
H
M M
M
M
H
H
L
Image quality is as good as a photocopy
Correct lightness and darkness
M
H
H
M
M M
M
H
H
L
M
H
H
M
H
M
M
H
H
M
Quality for good video projection
M
H
H
H
H
H
M
No bent
M
Capture detail
of relief, planographic and
Picture
Capture specific hues, brightness
content
Capture
structures as small as
and
0.02
are
processes
darkness, level
of saturation
small as
0.04 mm
legible/readable
"Recognizable"
No
intaglio
mm.
Capture gray information on structures as
Quality
M
M
quality images
corners
H
M
moire
No skew
H
No
M
spots
Proper cropping
H
Suppression of unwanted background
M
"Dropout"
H
H
Representation does not alter the nature
of the original content
color'
quality
L
H
H
H
M
M
H
H
H
H
H
H
colors not captured
Capture "Tiighiight
H
H
M
H
H
H
H
H
H
H
M
L
L
H
M
H
L
M
M
H
H
M
M
L
H
M M
M
Consistency of color to the original
M
H
M
H
H
H
H
H
Color quality better than
M
H
M
H
H
H
H
H
H
M
H
M
H
H
H
H
H
H
M
L
M
M
M
M
M
M
L
M
M
H
H
H
H
H
H
M
M
H
H
H
H
H
H
H
tot
H
H
H
H
H
H
H
"Good
enough"
representation color
Reproduce difficult
a color photocopier
colors
Capture detail
observable at normal
Capture detail
evident on close
Capture finest detail
Save
electronic
viewing distances,
unaided eye
examination, perhaps with 5X loupe
which evidences original means of production
image which fully represents
original content
Relationship Matrix Key;
Table 3. QFD House
of
19
Quality (Part 1
of
H
H
H
H=High M=Medium L=Low BIank=No correlation
2)
*^r~
^^-^^
Technical Response
o
c
e
Digital format
is
CO
CO
o
D
1 %
03
O
a
1
8
1
S3
1
03
"o
3
o
'A
won1!
for
information
c
V
V
CI
n
permanent, like paper
Electronic form that
Archival Maintain
1
U
Hi
^"^\^^
bo
VI
o
c
i
-3
o
1
a
S
i
u
Customer Requirements
"o
o
I
c
s
<3
o
s
U
B
a
a,
n
u
o
O
S
.a
a!
s
8
a
I
o
'2
o
I I
r,
n
H
become obsolete
H
7- 1 0 years
Maintain information for 20-30 years
Maintain
Capture large
Low
Cost
forever.
information
numbers of archived
documents inexpensively
for scanning
costs
Cost at 1 0 to 20 cents
per page.
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
M
M
H
H
H
H
H
M
M
H
H
H
H
H
H
M
M
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
M
M
Flexible
linked to use
Capture only as much information as
Automatic document handling
Auto feed
without
Low level
of operator intervention
Utility
to
EM
User
Accept documents already
Ability
Ease
to print documents
of entering and
Extract
metadata
time
integrated
in electronic
handwriting
a newspaper
time
in
30
second or
less
300 engineering drawings
per
per
day
day
turnaround
of 6 months or
information in
less
H
L
H
H
M
L
H
H
H
H
L
M
M
H
H
H
H
L
M M
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
L
H
H
M
M
M
M
M
M
M
L
M
M
L
L
L
L
L
L
L
L
L
H
access
eye
blink)
H
H
H
H
M
R
R
H
H
M
H
Link searching from different sources
Provide indexing down to the article level for
Format is digital
L
M
H
Searchable text
to
equivalent of paper,
end
Universally readable electronic form
user
Output
easy to
H
serial
H
literature
H
read, permanent
H
H
compatible with user's system
File formats that people
can
H
easily handle
H
Output TIFF
H
Output PDF
to print and
assemble a
bid package
H
H
H
File formate that print quickly
Ability to reprint and bind electronic files
Ability
M
M
Searchable keywords/metadata
Utility
M
H
the form needed by the liser
(less than 30 sec, 1 0 sec, 2 sec,
Global distribution of electronic files
doc't
M
H
H
speed
day
M
H
Provide 24 hour turnaround
Provide two-to-three
M
H
metadata
text
Ability to scan at least
M
H
H
clipping
M
H
color
Recognize
H
M
H
H
"Instant"'
H
form
Crop
Drop out unwanted
around
H
larger than \ \x\ 7
H
printed
H
H
H
originals
Recognize
H
H
H
H
Turn
H
H
that can be easily
Deskew
Provide timely
H
from form fields
Scan 10,000 documents
around
H
H
Despecide
Scan
H
H
adding
High throughput, high
Turn
needed to replicate content
Ability to correct and insert documents easiliy
Ability to do image clean up if needed
Don't damage fragile
Speed
of images
jamming
modular components
H
H
Ability to easily capture and store hundreds of thousands
Need
M
H
capital costs
costs
7,
H
T.
Very low global distribution casts
Minimal
H
H
H
of
H
drawings.
IIdatloas hip IttstrL\K zy
Table 4. QFD House
of
20
4-High M-i^ed
Quality (Part 2 of 2)
urn
L'L 0W Biank=N 0
cc rrela non
The QFD House
requirements and
of
a useful
corresponding technical
requirements were the
requirement set.
Quality provides
The
focus
of
framework for
responses allow
for
It's
analysis.
an ordered
organized categories of customer
discussion
While quality
of requirements.
this study, the quality requirements must be understood in the context of the entire
stated needs
for
each of
the eleven categories of customer requirements are considered below:
Document types
EIM
described
users
over a
dozen
categories of
forms, engineering documents, faxes, books,
document types
requires a wide
a transmission scanner.
art
may be fragile
being
variety
Engineering
document input types to be
Document types included
newspapers, manuscripts and photographs. The
of scanner types.
Conversion
documents require large format
and require use of an overhead
CCD
of microfilm
does
diversity of these
to electronic form requires use of
scanner capability.
camera which
not require
Manuscripts
direct
and works of
contact with the object
scanned.
Document
Closely related to document
aperture
at one site
The image
is for
size
x
each
is 24x36
mm.
At the large
end
for
punched
small end were
card, and 35 mm
is the AO
office
At the
paper
documents
color
size, 841x1189 mm,
and
forms
used
which were
1 1 inches.
one approach to
needed to capture.
The category
fonts, light
Asian fonts
content
interviewees
defining quality requirements,
eight point
photocopies.
Hollerith
a
scan volumes were
Text
included
in
microfilm mounted
for engineering drawings. The largest
typically 8V2
sizes
types are document sizes, which also spanned a wide range.
cards, a single frame of 35 mm
transparencies.
As
captured.
were asked
of text content contains all those
or smeared pencil,
were cited as
especially
dot
matrix
difficult,
image
to define the kinds of image content
elements that were text
they
related, and
printing, carbon copies, and fourth-generation
as was
Bodoni italic four point,
which one customer used
as an acid test of text scan quality.
Text
users
content
became
is
one of
evident.
"informational
the quality
related categories where a
Corporate Information Center
quality"
scans.
The text
that text was legibility. While this
content
they
goal was shared
users of
differences in
EIM
cited was
systems were
among
primarily focused
groups of
on
EIM
collecting
relatively simple, and their quality requirement for
by Library EIM
21
requirements
users, some
Library users
wanted
to capture the
character and nature of the text.
quality
for this difficult
goal
the thinness of the
The requirement to
content was not mere
capture
Bodoni four
legibility, but
italic
point
to capture
was one such requirement.
"exaggeration between
The
the thickness and
strokes"
according to Anne Kenney, Associate Director for the Department
of
Preservation
at
Cornell University.
Picture
All image
broad
content
content that was not text related was grouped under the
and varied
in their descriptions. Some
.
capture the
0.2
detail
illustration
mm
that
they
ink in
of
darkness,
and
By way of example, Rodney Perry,
and the pale purple
create
hues, brightness
of
of picture content.
customers offered rather simple
terrain lines in topographic maps. Others gave more specific and
need to capture specific
category
and
level
Rochester Public
were
descriptions: book illustrations,
demanding requirements,
and
citing, for example, the
of saturation.
Library cited a desire
Cornell's Department
an old manuscript.
Customers
of
Preservation
to capture the yellowed paper
and
Conservation
wanted to
relief, planographic and intaglio processes, preserving the evidence of the structure used to
As
content.
part of a project
for the
Library of Congress, they had measured structures as
small as
wanted to capture.
Quality
All
users of
legible,
systems
recognizable
comparison
or
EIM
images,
to a photocopier.
background or
Several
had
elements of shared
correct
lightness
Nearly all
said
basic quality
and
they
darkness. A few
wanted
users expressed
sensitivity that the digital
users cited application-specific
capable of
customers
described
at a minimum,
general
quality in
by
to avoid quality problems with bent corners, skew, spots
representation not alter the nature of the original content.
providing
quality
good video projection
in
required as the color was
for
use
Patricia Pitkin,
college classrooms and
of
altering
copyrighted material.
RIT's Wallace Library,
distance
needed
quality
learning applications. Suzanne
group, wanted "highlight
color"
quality; perfection was not
emphasis only.
more stringent set of
Cornell project is to
as well as concerns with
needs:
Keenan, in Xerox's Electronic Document Management
A far
Everyone wanted,
cropping.
Reasons included the desire for faithful representation,
Some
requirements.
quality
EIM to
requirements came
capture the principle
from the
application of
EIM to
book illustration types from the
22
preservation.
19th
A
current
20th
and
early
centuries.
Because Cornell
be
1
quite
.
wants electronic
high. Cornell's Anne
Capture detail
3
Capture finest detail
.
At the third level
whether an
fully represent the
levels
viewing distances
at normal
of
an expectation
was created via
beyond the "good
quality
requirements can
the unaided eye.
with
5X loupe.
the original means of production.
which evidences
quality, there is
original content, their
of quality:
evident on close examination, perhaps with a
illustration
well
quality
which
Kenney defined three
Capture quality observable
2.
images
Calotype
or
to capture, in the scan, information that would distinguish
Aquatint,
via
etching
or steel engraving.
Clearly this is a level
input document
were captured without
of
enough"
level
requested
by many users.
Integrity
An important
attribute to all users of
omissions and
procedures
so without
in their
having
scan
systems was that all pages of the
Missing pages are a common
proper order.
for validating the
EIM
job's integrity. Users
wanted
to check each item of the job. Those users
problem, and several interviewees described
to maintain the 100% job quality, but wanted to do
doing
OCR
wanted
integrity in
the form of faithful
recognition of text.
Archival
When
users spoke about the permanence of
information, they
to the information. The interval varied between 7
the
format
of the
digital information
becoming
years and
cited the amount of time
forever. EIM
they needed
users worried about
to retain access
both the
media and
obsolete.
Cost
Cost
was
profits.
done
important to
everyone.
For Libraries, EIM
with that
fixed
Costs
of
commercial
efforts were
amount of money.
demanded
Quality levels
To
by customers
scanning
funded
Costs
in the low hundred thousands
more
flexible
Corporate Information Centers,
with grants.
cited
for EIM
were cited as the
systems were
costs
users,
The
are common.
The
driver for this
from
be
six cents
to ten dollars
per page.
wide range of cost.
users surveyed
capital expense
had
is high,
of use.
cost structure which could accommodate variable rates
23
directly to
cost governed the amount of work that could
scans ranged
important too. Two
cost translated
scanners that cost
and some users were
Outsourcing
$100,000,
looking
and
for
was one means of
a
achieving this
hour
per
goal.
Some customers,
or per piece
who
had been scanning locally, had
contracted
and were
scanning
paying
on a
basis.
Utility to EIM user
A
number of requirements related to
All
were organized
users wanted
together under the category
of
and ease of use surfaced
utility to the
during
the interview process.
the EIM system. At the most basic
user of
level,
the ability to capture and store hundreds of thousands of images. Because of the cost of
storing and
managing information, they
document
features, functionality,
content.
Users
wanted to capture
wanted to
be
only the information that
able to scan
was
necessary
documents automatically
and
and sufficient to represent the
reliably
without operator
intervention.
Users, especially Library users,
wanted systems
to be able to handle fragile originals without
them. Users wanted to scan documents like old books and maps
damaged. An
Users
imaging
wanted
implies the ability to
system
missing
page or edit content
They wanted components of EIM
infrastructure. Because the
They also
wanted
information infrastructure.
metadata
from
fragile
or rare and were
systems are
systems to
with
job
for image
be
integrity or
clean
up job quality
spots or recognition errors.
modular and
easily integrated into
an
existing
wanted systems
to
help them manage metadata,
allowing them to
information in a
Similarly,
which
extract
to add and edit metadata.
out"
determines
This
expensive, an upgrade must not require replacement of the entire system.
Similarly, they
systems to
out"
problems.
to be able to integrate documents already in electronic form into their existing
key document fields,
They wanted
easily
that accommodated these requirements was desired.
to be able to easily correct problems
add a
which were
damaging
have
utilities
information gets
form, allowing only the
for deskew, despeckle, crop,
captured
variable
problems with skew, spots and
in the scan,
data to be
cropping
and
is
saved.
color
"drop
sometimes used
to
"drop
and
OCR. Color
"drop
out"
the constant
This is especially important if OCR is to be
can cause problems
for OCR in
used.
addition to their obvious
appearance problems.
Speed
All
be
users were
concerned, to one extent or another,
captured governed
how
cost effective
it
was
to
with speed of capture.
capture a
24
large
The
volume of
speed with which
documents. One
documents
user wanted
could
to be able
to capture a newspaper clipping in 30 seconds. Another site was scanning in 300
engineering documents per day.
Yet
another was
scanning in 10,000
imposes different demands
Very high rates
Currently,
on
of
"informational
However,
10,000
as
documents
or
forms
per
day.
Obviously each
of these requirements
the architecture of the scanning device.
per
this means lower quality
color capability.
office
day require an
levels,
discussed
quality"
scans, so
as
high
automated
speed scanners are
using high
previously, those
quality limits
document
feeding and very high
speed scanning.
usually limited in resolution, bit depth
and
speed scanners were more concerned with
issue.
of current scanners were not an
Turnaround time
Closely related to speed was
In
customers.
application of
scanning
quality
some cases this meant
EIM,
medical
was
turnaround time.
24 hours
Fundamentally, EIM
less, in
or
information
and the speed with which the
forms for
on-line
others, 6
users wanted
months or
was required
to meet the demands of their
less. It
in
varied
electronic
access, overnight turnaround was essential. For the
the primary requirement, and if the book or a
widely
with the
form. For the
user
remanufactured replacement was
scanning
back
user
old
books,
on the shelf
in
six
months, that was good enough.
Utility to
A
number of requirements were related
was
very important. This is
rapid access and
30
display. Users
seconds to an eye
to the timeliness and utility of EIM output to the end user.
surprising
asked
for
users wanted
to
users wanted searchable metadata,
entire text of captured
textual
given
that one of the advantages of
"instant"
access to
Speed
transforming hardcopy
digital information,
which
of access
to digital
they translated
is
to a time of
blink.
Similarly, EIM
Most
not
end user
documents,
provide
for
electronic
the keywords that
which
demands that the
distribution
and electronic
identify jobs. But
entire
a
few
searching
wanted
document be translated,
of content.
to be able to
via
search
OCR into a
the
searchable
form.
EIM
users were also concerned with the
format
and media used to store their
and
the desired form as akin to paper, universally easy to read
display and print easily and quickly.
end-users'
were able
examples,
but
users were concerned about the
permanent.
The TIFF
long-term utility
25
and
of these
information. Users described
The format had to be
PDF formats
formats.
compatible with
were cited as current
the
Some EIM
material.
A
user
specializing in
binding, especially
customers
needed
have
users wanted
when
problems
to
maintain
Engineering
large bid
the
ability to do printing
drawings
packages were
noted the need
being prepared.
printing large image files,
so
and print
for
it.
26
centralized
Another
they maintained
a
finishing
of
previously digitized
large format printing
or
cited the problem that some of their
local high-speed
printer
for those that
CHAPTER 6
Summary
The
and
Conclusions
central goal of this research was to assess the graphic arts
unrequited
desire for higher quality
existed.
The
question
quality
has two
requirements of
answers
for
EIM
each of two
users to
determine if an
different
classes of
EIM
users.
The first
established a
they had
class of
business
established a
EIM
users was more
or service
huge
based
archive of
on a
interested in high
speed and
historically achievable
images using
level
low
costs than
of quality,
higher
quality.
usually "informational". Often
Quality expectations were
a standard process.
They had
well established
and their customers were satisfied.
The
quality first.
second class of users placed
They were
the means of capture, processing, and storage to be used. While
were not
willing to
150
page per minute
and recognizable picture quality.
users always
The
had
applications
The
archived on paper.
is
fragile,
or
These
an electronic
in this
goal
document handlers
served
defines
established
for this
disintegrating
essential about a
their requirements and
interest in
speed and cost,
they
The first
meet the
EIM
quality
systems with
class of users
200 dpi maximum
legible text
requirements of the market:
were always
in this
is the
current
EIM
market.
Library
by this market.
a new market
market
new market
for EIM,
is focused
is to
use
documents. Achievement
on
which springs
expediting
EIM to
from
access to
a
very different
basic information
help preserve and share
of that goal means that the
EIM
application
which
is
the essential elements of
system must capture all that
document.
users want to capture more than
image
They want nuances
refining
a common
Monochrome scanning
Corporate Information Centers
second class of users
for the technology. The
rare,
they had
or
these two classes of users define two different markets for EIM.
by currently available EIM products.
resolutions and
defining
compromise quality.
Effectively,
market served
still
which
fully captures
just the
raw
information
the nature of the original.
of pictorials captured.
They want
faithful
27
color
content of a
document.
They want the original
information
saved.
They want to
font to be
save
reproduced.
They also want to do this at
production speeds without a
libraries
lot
of operator
intervention. The
users with these goals were
exclusively those in
concerned with preservation.
To
meet these
available current
EIM
system will need technical capabilities which
systems with those of current
capability
will need to
captured.
Yet
reasonable
requirements, an EIM
be
close to those
scan rates will need
currendy
Graphic Arts
offered
in the
scanners.
graphic arts
blend those
Resolutions, bit depths
if
small structures of
to be closer to those of typical EIM scanners to
keep
currently
and color
0.02
scanning
of
mm are
to
be
costs at a
level.
This research focused
these two groups
preservation
had
function
the application of
on the requirements of the
requirement elements
of some
in common,
Library and Corporate Information Center market.
a niche market
library users. It would be interesting
EIM to preserving
and
sharing
rare or
for higher quality became
for
to see future research focus more narrowly on
fragile documents.
28
evident
While
Bibliography
29
Avedon, Don M., Introduction
to
Electronic Imaging, 3d ed., (Silver Spring, Association for Information
and
Image
Management, 1996).
Black, David B., Document Capture for Document Imaging Systems, (Silver Spring, Association for Information
and Image Management, 1996).
Cohen, Lou, Quality Function Deployment: how to
Eureka, W. E.
and
make
QFD
work for you,
Ryan, N. E.. The Customer-Driven Company, 2d
ed.,
(New York, Addison-Wesley, 1995)
(Dearborn, American Supplier Institute,
1994)
Head, Robert, Document Management: The Essentials, (Silver Spring, Association for Information
and
Image
Management, 1997).
Kenney, Anne R., Digital to Microfilm Conversion: A Demonstration Project 1994-1996, (Ithaca, Cornell
1996).
University Library,
Marsh, S.
et al
Facilitating
and
Training
in
Quality
Function Deployment, (Metheun,
GOAL/QPC, 1991)
May, Thonton A., Electronic Image Management Cases, (Silver Spring, Association for Information
and
Image
Management, 1993)
McQuarrie, Edward F.
,
Customer Visits
:
building a
better market focus,
Shillito, Larry M., Advanced QFD: linking technology to
market and
(Newbury Park,
company needs,
Sage Publications, 1993)
(New York, John
Wiley and
Sons, 1994)
Stofel, James C, Graphical and Binary Image Processing and Applications, (Dedham, Artech House, 1981)
Turabian, Kate L.
et al,
Writing, Editing,
A Manual for Writers of Term Papers, Theses,
and
Publishing), (Chicago, University
30
of
and
Dissertations (Chicago Guides to
Chicago Press, 1996)
APPENDIX A
Interview transcripts
31
The interviews
here for
conducted as part of this research were recorded on audio tape.
They are
transcribed and included
reference.
Michael Majcher
Manager, Xerox Technical Information Center
Webster, NY
March 11, 1998
Butterfield:
My research
defining
surrounds
diagram) a block diagram
with some sort of
the requirement for a system that looks roughly like this (shows
scanner, some computer
system
for gathering that information,
and then
storing and for providing output of some sort.
Majcher: We have everything except the CD-ROM writer.
some means of
Butterfield: OK. What I
What things do
you
wanted to talk to you about was, overall what are your requirements
for that
sort of system.
look for?
Majcher: Are
you focusing specifically on the scanner system or across the board.
Butterfield: Across the board. What are the qualities you are looking for.
Majcher: In terms
are automatic skew
of
scanning, obviously throughput is the primary issue. The faster the better. Some of the nits
to correct and insert documents easily. Resolution is an issue. Ease of
detection, ability
indexing or putting metadata on the document itself is important.
Butterfield: What does metadata mean?
when we scan internal reports
Majcher: For example, if you. there is hybrid systems you can use of course, but.
material for example, we have a data sheet with all the appropriate metadata for retrieval: author, tide, organization,
.
.
keywords,
etc.
We
can marry.
.
.
so when we scan the
.
or the accession number and we can
proposal
link that
document,
all we
really
need
is the
unique
identifier number
the metadata to do the search and pull it up. In an invention
situation, we don't already have that metadata already in existence. We need to be able to say, this is IP
number such and such and give
input
Butterfield: OK. The
is
search.
with
.
And invention
metadata
proposals
on
the fly.
describing all
is
the characteristics of the document that somebody would
one of the uses to which you are
putting this
use
to
system.
Majcher: Yes. Invention proposals, translations, internal reports, some newsletter data, some journal data.
Butterfield: Translations are translations of reports from Fuji Xerox?
Majcher: Yes they could be the Fuji Xerox or else documents, journal articles, proceedings papers.
Butterfield: In terms
of
.
.
.
You
important. What does that
said throughput was
Majcher: Number
of pages, pages per minute that are enabled through the
Butterfield: When
you think of
throughput, it is the throughput
of
input,
mean?
scanning
system.
not the throughput of output that's the
bottleneck?
Majcher: Yes. Actually, pushing paper through the scanner fast enough is a bottieneck.
Butterfield: OK. How do you do that now?
Majcher: We have a WD40 which is actually the scanner of the DocuTech. We put the documents through that.
Butterfield: Is there a document feeder of some sort. ?
.
Majcher: Yes.
Butterfield: What about the
effect on
Is that a problem?
Majcher: Well obviously in
a
.
the input. With a document
feeder,
you're not able
to process a book for
example.
book
or a monograph situation, you
typically
want
to maintain that
integrity,
so that
is
We don't have the problem but
a very manual operation on a flatbed scanner one page at a time. Now there's.
that
you can't spread the binding
where
material
archival
type
older
there are situations where you have
material,
..
have to do some optical adjustments and do
Butterfield: OK. A CCD camera or something?
and you
a top-down camera capture.
Majcher: Yes. OK.
Butterfield: What's the level
throughput.
.
of operator participation, of operator
involvement right now? In terms
of
.
Majcher: We've tried to keep that as low level as possible. Basically they take a pile of documents which has been
identified by number and feed them through to validate whether. You know, they'll do a QC on it to validate
.
32
.
pages, folded pages, things
skewed
Butterfield: Let
you
of that nature.
.
rescan and
.
insert. It is
fairly low level.
I
mean you
don't
need a
do tiiat.
computer programmer to
me move now to some of your requirements
for the quality
What
of this system.
sorts of things
do
look for?
Majcher: Readability. Bent pages, corners,
Butterfield: OK.
The
skew.
standard
QC
stuff.
Majcher: What's the name of the organization, it used to be NMA, National Micrographics Association, but now its
AIIM American Image and Information Management or something. We use the same criteria that are used or was
used
for producing
of page
microform as a
orientation,
very
robust set of standards as to what
density of the image, resolution
of the
image,
is the
level
acceptable
archivability, etc. We
try
of
quality in terms
to maintain those same
quality standards in the electronic image area since there are no real good set standards out there.
Butterfield: So you are using existing standards that are already in place for microform? Microform?
Majcher: Yes, microform is microfilm, microfiche.
.
Butterfield: OK. Could
What
about the
kinds
of
you point me
image
.
Could
to those later?
content.
Do
you
you look for in terms of the quality of those.
Majcher: Readability. Printability. How does it look
What do
.
right now, we use urn., the.
.
uh well on
.
photographs.
Do
I
might
you
find those
How does it look
on the screen.
but that's
not scan.
.
.
I
on
the print? Our system
going to say
was
going to impact scanning.
Butterfield: Ah. What kinds of things can go wrong for picture input. When it isn't readable,
Majcher: Too dark, too light. Get moir6 patterns sometimes, it is pretty obvious to the eye.
we also use the original
Butterfield: What
as
Butterfield: What do
Majcher: Well,
is tough,
and
when
it isn't
usable.
color.
Reproducibility, legibility, consistency
and white.
you think are
color
PDF
Is that a requirement?
about color?
black
we use
not
Majcher: Yes, you got to do color.
Butterfield: What are the quality requirements for
Majcher: Same
standards ?
have to handle halftones?
.
the electronic side,
image format but that's
you point me to where
have to handle
of color
to the original.
the most difficult kinds of images to handle?
no question.
That's probably the hardest. Photographs
would
be
next.
The rest is
sort of easy.
Butterfield: Do
you
have to
ever scan
in kanji fonts
or
Japanese
or
Chinese
characters?
Majcher: Yes.
Butterfield: Do those impose
Majcher: It depends
mitigating factor
resolution
nature of this system.
Majcher: We had two
outputs.
right to the
What kinds
All the images
desktop. So
Butterfield: OK. Are any of your
is an image representation.
of output
are maintained
legibility to the desktop
people still want paper so we use the electronic
customers
for
readability?
is high enough, it is
Sometimes, trying
is the quality of the original.
little bit more difficult.
copy is a
Butterfield: OK. The
browser
If your
there
xerographic
web
special requirements on the system
on your resolution.
image
in
is
do
you need
a central
real
from this
database
And
again
be
right
legibility
there
I think
now.
retrieved
by way of a
is that
second output
searchable content right now?
The
generation
system
and can
important. The
as a print master.
asking for
not that much of a problem.
to scan a fax or a fourth
some
is important.
what
described
so
far
Majcher: Yes.
Butterfield: Does anybody
want searchable text.
Majcher: Yes. We do that
on some
editing
on the
the word
OCR because it's
"the"
doesn't
chemical structure
a
documents, we do OCR but we do not do extensive
99% accuracy range and
consuming and you know, if you're in the
documents. On
little too time
get recognized, who cares.
doesn't
get recognized
some
This is
once, but
a throughput production type
gets recognized the second
picket
fence
still searchable so we
don't
issue. If phenol
time, it is
worry about that too much.
Butterfield: What kind of tools do
you use to do that.
Majcher: Xerox Text Bridge primarily (laughs), mostly Xerox products in OCR are pretty good.
Butterfield: Do you ever need the same output, or the same content accessible in different forms? Do
image
outputs and searchable outputs at the same time?
33
you need
both
Majcher: It depends on the document type, but we try to keep and image of the document in PDF as the least
common denominator. For example, a Word document written in 3.0 Word
won't necessarily be readable in Word
7.0.
Butterfield: Yes.
Majcher: So
also
rather than go through and reformat all the
takes care
of
the photos,
halftones, image
documents,
we maintain an
image database in PDF that
type problems that don't necessarily show up in a flat ASCII
database.
Butterfield: OK.
Majcher: So
we
keep the flat ASCII database or a meta database and then
long do you need to keep this information around?
the image itself.
Butterfield: How
Majcher: Depends
seven-to-ten year
on
life
the document type. Technical internal reports are kept forever. Invention proposals have a
There's.
span.
of the
documents
about
three inches thick and
keep
for
.
.
Almost
depending
a maximum of a year and then
Butterfield: And
that
all corporations
have
what
they call retention
That lists
schedules.
all
the corporation that are produced and minimum legal requirements for retention. Ours is
within
holds true
uh some
things you
keep forever,
some things you
destroy.
the things that you'd be scanning in on this system.
of
Majcher: No. We typically don't
the document type.,
on
scan
in anything that has less than say
seven year requirement.
It is
not cost
effective.
Butterfield: OK.
Majcher: Unless there's
other requirements over and above
Butterfield: I
document that
guess that
less than
legal
Say you
retention.
have
a
...
it's hard to think
would be available
probably
keep
tradeoff
between its
environment
that's
not
created
electronically.
There's
a
in
a
workflow
already
electronically
inherent format, initial creation format, attention period, value of use and number of people.
right now of a
you would
brings
us.
.
a year that you would
scan so
Two things pop into my mind. One is the kinds
.
of
it
costs, but let me come back
to that. The other is do you this system beyond the present day. How much of you input now, how much of the
things that you are now scanning, that are
legacy documents, do you
expect to add
in the
years
to come?
long term, it is kept
basis. In the research environment, even in the engineering and development environment when
people are tasked for a new subsystem or something, they may go back ten, twenty, thirty years to see what's
over a long
already been designed. An engineering design doesn't necessarily lose value over time. That's kept
Majcher:
Again,
let's take internal
reports
for
example.
That is
our technical memory.
For
on
a permanent
term. The reason we
use
packages and that sort of
Butterfield: And that's
PDF
as an
image type is that
thing. PDF
is, in
a sense, the
we
don't have to worry
electronic equivalent
to
about version control
paper
for
various
that can always be recalled.
what you need.
Majcher: Yes.
Butterfield: I
get
in
guess what
electronic
I
form. Are
was
trying to
you even
get at was
going to
need
how
much of the
inputs that
to do scanning in five
you are
scanning are you going to
Or are you always
years or ten years?
have to get into electronic form.
going to have paper documents that you are going to
Majcher: That's an interesting question, and it can be debated in a lot of ways. Ah, 99.9% of all the worlds
of 1% is electronic. Now, people say there's
information, right now, currently, today, is in a paper format.
that's true, but that accounts for less than
on
the
web.
of
pages
Yes,
hundreds of millions or billions
electronically
l/lO*
knowledge. On the other hand, at least here, in a practical sense, we are seeing a significant
generated for archival purposes, they're mostly
or
80%
more, in the number of paper documents actually
decrease,
collection packages and are available
composed in word processing packages, or spreadsheet packages, data
them in an electronic version. That
distribute
created electronically, are stored electronically, and we
1%
of the world's
electronically,
acquire. So on the one hand I'd say there will probably
explosively growing aspect to the documents we
future for a declining percentage of documents created in
foreseeable
continue to be a scanning requirement in the
micrographics
background, and one of the things you do is you start it today and you
hardcopy. I come from an old
is
an
decide
whether
it is
worthwhile or cost effective
to scan in a
what goes
retrospective collection,
forward from today
on.
When
because that's very expensive,
we set up our system for
if you start today
scanning
only
system because we only had a
internal reports, we started in June of '89. In July of '89, it wasn't a very robust
retrospective data that was captured as part of
month's data on it. But in March of '98, we've got ten years worth of
start at day one and work your way
the ongoing process to deal with it. As far as the retrospective, you can either
Bradford-Ziff
law which says that your
old
The
back from that point, or you can capture on the fly as required.
or
and
start
34
usage declines over time, 80% of your usage is in the first three years and
by year ten, you are looking at a
1/10*
of a percent that a document will ever be used. Well if a document of that time
probability of less than a
period is requested, we will take it and scan it and put it in the system so that we
probably capture most of the
documents that would be in demand over that period of time without having that whole cost/time expense of
capturing the
That
entire retrospective collection.
you
just box up
and put over
in Records Retention
it
and pull
out
it.
when you need
Butterfield: Costs. What kind
Majcher: Well, it's
lot
of costs
do
you associate with this
kind
of a system?
labor, any software, its an up front cost,
cost, in essence all the storage, hardware and retrieval software are ongoing sunk costs that you are going
to use for the retrieval system for the capture portion of it, so it is really smeared on that side. Scanner, some labor,
a
cheaper than microfilm.
Basically, it's a
scanner,
one time
some
disk
space and that's about
Butterfield: So
you
don't
Majcher: Well, it depends
internal
it.
associate
any sort of click charge with this sort of system.
I'm charging the customer for a service or whether I'm
on whether
doing
it
as part of an
process.
Butterfield: But in terms
of your
click charges to a vendor
for
Majcher:
Oh,
If it is
yes.
costs, for example, are there any circumstances
like this?
under which you'd consider
paying
a system
cheaper than
buying
the hardware and
having
the labor done.
As
a matter of
fact,
we
do
outsource some of that.
Butterfield: In
need multiple
terms,
rough
Majcher: Depends
pay for a system like this?
depends
on
the
documents,
quantity required, depends on whether you
depends on whether you have it indexed with metadata or whether you already have
can you tell me what you'd expect to
on the volume of
formats
or not,
the
that as part of your process to generate this information.
Butterfield: Well,
given
the requirements that
you've
described to me,
what
is it worth,
or what
have
you spent on
this equipment.
Majcher: Well, right now we're outsourcing most
a couple of hours a day, based on our volumes, the
anywhere
a
dollar
from
fifteen
ten to
I
a page area.
five dollars
cents to
can you specific prices
of our work.
When
we were
scanner was a couple grand.
per page
if you
doing
The
it internally, we were working
bureau's charge
service
to have it scanned. We're probably down in the less than
want.
service bureaus are charging that wide range, fifteen cents to five bucks, are there associated
quality levels? What accounts for the spread?
Majcher: Greed, whatever the market will bear, the documents themselves, is there a lot of prep work involved with
it. Do you have to remove staples, there is color so you have to adjust image density, things of that nature.
Butterfield: When the
Butterfield: I
think that's gotten most of my questions...
Let me look here. Print
finishing requirements. You
said
that sometimes customers need hardcopy.
Majcher: We use it as a print master. Yes.
Butterfield: Do they need bound documents.
.
.
?
Majcher: Sometimes. Sometimes
Butterfield: What kinds of bindings.
Majcher: Spiral, perfect, you know, depends on the customer. We're finding now, as long as the stuff is available
on the desktop, a lot of people are printing selected pages at their desktop printers. When they do a formal full
document, depending on how much the budget center is willing to pay, or how much they are willing to pay, they
.
.
spiral binding, to perfect binding, to three-ring binding, and
everything from a large staple in the comer, to
those all have associated costs to them.
Butterfield: OK. What level of integration do you expect from a system like this? Can it be modular, an integrated
can get
system.
Majcher: No. I
prefer
the modules.
What I
need
is
a
scanning
operation that can acquire and
feed files to
our
I'm not going to change my retrieval
going to change my database environment,
ongoing
requirement to accommodate a scanning module. All I need out of a scanning module is the throughput and the
to take those files, and either in batch mode or individually, into the existing system.
and the
system.
I'm
not
ability
quality
Butterfield: The scanning system would have to be compatible with your existing database and setup.
Majcher: Yes, but that's pretty much a given. If you are doing a product design, you know, you are not going to
design
an entire
example.
database,
If you try
storage, retrieval, dissemination system around one component, a
doing that,
you'll end
up like
scanning
component
Wang did and not sell a whole hell of a lot of them. The
35
for
components
really
need to
be
modular and
they need
to be standardized so
they
can
be integrated
fairly quickly and
easily.
Butterfield: OK. Ah. I tiiink tiiat
Would
you mind
showing
gets most of
me what you
are
my
Some
questions.
doing today
other things.
and show me the
to a couple of your folks and ask them some of these questions.
I'm
I
wondered
hardware that
hoping
if I
could see...
you'
ve got.
you can maybe point me
If I
could talk
to some other
folks.
Majcher: OK. Who do
Butterfield:
phone
Somebody
interviews
Majcher: Here's
I'll
you want?
geographically nearby,
a
list
of
my
counterparts
give them a call and make the
Butterfield:
all of
this
other companies maybe,
Syracuse, Buffalo. I may be
able
to do
as well.
OK, I
understand.
information,
and that
in tliis
introduction. I
I'll
corporation.
go through that.
from others,
can
I
Put a
can't give you
The
come
other
back
Majcher: Sure, sure.
Butterfield: Thank you.
[End of interview]
36
check next
to the people you want to talk to and
that document.
thing I
and
wanted to ask.
just test to
When I eventually
I got it right?
make sure
synthesize
Frank Belli
Xerox Technical Information Center
Webster, NY
March 11, 1998
Butterfield: I'm speaking with Frank Belli. Frank, what is your title?
Belli: I basically work for the Technical Information center. I prepare information to
Butterfield: I think I've seen some of that information on line.
Belli: I
the database manager Basis system of IDI of
use
Columbus, Ohio,
Belli: Basis is
a relational
database, like Oracle. The reason I
selected
Basis
have the ability to search for text longer than x number of characters. That
Butterfield: OK. So searchable text is a big requirement?
Belli: Yes. Some
Butterfield: This
of the reports are
system that
of the
Uh. The
I
documents
pretty long. If it
was
are
in
form, 80-90%
electronic
be hardcopy,
each page
can
Butterfield:
be
a
format
a minimum of two to three thousand
.
.
.
.
.
So those
.
are the ones that right now we are
if you have three
[unintelligible]
bytes. What
we
have done is
convert
or
four
pages
to PDF so that the
sent.
Why is
Belli: There is
printed
already in electronic form so we don't have to
And uh, we want to make those documents
are
of
report,
at that time, Oracle did not
limitation for us at that time.
those are the problems.
electronically so people can see them without having to
Now scanning of course of images are hard to do. So
documents
big
scanning and capture of documents in
important requirements in your mind for that system?
most
ones that would
is
Oracle
over
was a
now, maybe it would be Oracle.
scanning.
available
be Batelle.
showed you a rough picture of,
What are the
them.
saving
Belli: Most
scan.
which used to
line.
Oh, Batelle Research Institute?
Butterfield:
and
make available on
lot
PDF better?
of printers that can print
PDF. TIFF images tend to
require special softwares on the printer.
Butterfield: Do PDF files take up less space, are they smaller?
Belli: No, not necessarily. TIFF files are compressed, but we convert to PostScript
and
PDF
so people can print
them.
Butterfield: The quality requirements of scanning systems. What is involved there?
Belli: Quality is as good as making a copy of the original. So if you have an original, nothing special,
eight and a
half by 1 1 quality is
,
you
know
good.
Butterfield: So the quality of the system
Belli: It is as good as a good copy.
you
have
right now,
it is
a good as a good copy.
good enough? Does it need to be better? Would you like it to be better.
Belli: Simple text, simple drawing, something like that, it's OK. Problems come with pictures. Some are in color.
Then the quality is not good. Now you can ask, why don't you save color. The question is what do you do with it.
Butterfield: That is
The image is big.
Why is it bad if the image is big. Why is
t understand]
[Shaking his head,
Butterfield:
that a problem.
doesn'
Belli:
Butterfield: I know
much
some of these questions sound
time to transfer the files
dumb, but I
Belli: Yeah. Right now it takes too much time. We
there. It's about five months
want
to make sure I
understand.
Is it that it takes too
over the net?
used to
do
but
our own scanning,
now we send
it
out.
That's it
work.
Butterfield: We're looking at a box.
Belli: That's about five months work.
.
Butterfield: Five
months work
is
.
fitting
in
will it cost to get this scanned?
Belli: Because I don't have too many, it is
Butterfield: So how
much
Belli: That depends
on the amount of
money
something like that.
Butterfield: And that is black
Belli: I
used
a
box that's
cheaper
will you spend
time, but
and white only.
.
about one
to send
it
enough
out than
to
one
foot
maintain
by two feet,
full
of paper.
What
the system now.
to have those scanned?
you are
talking
about
dimes
per page.
Ten to twenty
cents per page,
.
being charged so much per hour. Because I don't have that
good quality.
to make it worthwhile. My vendor does a good job, they do
to be charged so much per page. Now I'm
many documents. I don't have
foot, by
37
Butterfield: What does it mean to have good quality?
Belli: It means to make sure that all the images are there. No pages like
distorted, there's
Because that becomes the document.
Butterfield: When you say distorted, you move your hand like that, you mean skewed.
no pages missing.
Belli: Yes, instead of coming like this [demonstrates straight with hands], it comes like that [turns document].
Because of the value of the document, you don't mind spending a little bit more to get it right.
Butterfield: What
about the font size? The size of the character? How small can it be?
Belli: Eight point. With resolution, we don't have any problem. I think we are 300. That is
been so long I forget. The other problem we have is colored region,
[unintelligible]
Butterfield: What is the
Belli: It
nature of the output of this system?
always electronic?
serves two
Butterfield:
purposes, electronic or copy material.
you're providing that electronic material via the internet on a
Any
Belli: Yeah right. We have
Butterfield: The
[unintelligible]
a
document
At the search,
on a server.
web page?
[unintelligible]
getting back, they are images, not text? I can't search for text.
be to OCR but that is not necessary. There are enough keywords for people,
on line the keywords.
scans that you are
Belli: The only way
would
to provide
Butterfield: So the keywords, which are searchable, let
document, looking at the image is good enough.
Belli:
Is it
It has
good enough.
find the document,
people
they've found the
and once
[Nods.]
Butterfield: OK.
Belli: Customer
now sends
Butterfield: You
Belli:
used
documents
electronically.
People
documents
can scan the
electronically.
to scanning in house.
[unintelligible]
Butterfield: Since
you used to do that, could I ask you some questions about how automated would you want the
scanning process to be? How much operator intervention do you want in that process?
Belli: Uh. If you answered that question two or three years ago, I would have answered what I wanted. Now, as I
say,
[unintelligible]
Butterfield: You
that
provide this output electronically,
but if the
customer calls and
says, I want a print, do you provide
hardcopy?
reports that are scanned, generally we provide the hardcopy because our printer can make faster
copy for them. We have special board in it. Some people cannot get this. A 400-page report can take a long time.
Butterfield: Roughly, what kind of turn around time do you need for documents you scan in.
Belli: Now the
Belli: For
scan?
Butterfield: You.
Belli: Normally,
you want
we send out and we get
in two-to-three days. Provide
once a
copy, it is scattered. I mean, I really,
to talk... [unintelligible].
Butterfield: Now
Belli: Four
right now, you are
months.
.
.
four
collecting this
stuff
for
a month...
months.
Butterfield: Right, four months.
Belli: So right now, it doesn't make that much difference.
Butterfield: I think that answers most of my questions. What I
many people, I talked to Mike, I talked to people at Kodak.
Belli: Have you talked to people at Records Center?
Butterfield: No is there someone I should talk to there? Do
.
Belli: Yes.
They do quite
same one that
I
Contract House
a
bit
am using, now
used
of scanning.
they
use the
The have
Service
-
doing
with
this
you
have
information, I
am
Belli: Sure. You
want to see our
Butterfield: Yes,
great!
scanning
talking
to
a name?
.
.
274-9125
a QFD House
Butterfield: Thanks. Anyway,
together, into a matrix.
putting
to come back if I could when I put all this information together. Did I get it right, etc?
I
am
there, because they use. they used to use the
in the Midwest. Might want to talk IBSI (Scanning
a real value
somewhere
by Xerox TIC) Todd Baitsholts
am
.
all this
.
station now?
[End of Interview]
38
.into
of
Quality.
.
.
I'd like
Carl Herrgesell
Manager of System Development
XSERV/VMS/Electronic Document Management
Webster, NY
April 13, 1998
Butterfield: Are
you
using
a system
like this
example?
Herrgesell:
We
are using a number of systems like this [the example].
Our business is divided into two halves, Engineering documents
definitions
not
and
Office documents
and
I
can give you
better
those. And we're attempting to use different platforms and technologies in those two areas. There is
a lot of merger across them. Historically, two groups merged
January a year ago. There has not been a lot of
of
platform merger.
Butterfield: The image
Herrgesell: On
the
images only,
gray
finding
I'm
it's
quality
of
Engineering side,
scale. Originally we
which
that the quality of
not sure
400 dpi
no
and color
are
scanning
information, however, is
function
a square
these systems. How do you evaluate the quality of these systems.
is the side I can speak to best, we are scanning to black and white
or not so there
hardcopy documents
good enough at
200 dpi
400 dpi. We
or aperture cards at
is XA the storage, but close to it. The reason we were scanning at
kanji characters, and its my understanding that has not been a
was to make sure we could capture readable
So
problem.
There's no
there's no color.
some of the scanner's we use will scan
Butterfield: So
you'
re
Herrgesell: Not
only capturing
on the
Engineering
dimension,
color
gray
scale and then
and even
binary images. Desire
side.
On the
office
is actually
scale
gray
thresholding determines
where
The scanner,
avoided.
black
starts.
to capture color? Is there any need to?
side, I'm
not sure and
I
would refer you
to one of the
in my group to talk about that. I'm not even sure what resolution we're scanning office documents
Probably, let's see, are you familiar with Cofax Ascent? That is the equipment that has been used at times
people
They could be at 600 dpi
office side.
Butterfield: But
as
far
we
have. This
Engineering
might
be
a
is
which
as you're concerned,
Herrgesell: Yeah. On the
images
are
which reduces our storage requirements.
200 is
side.
but I'm
fairly standard,
good enough even
That is
document count,
not an
on
the
not sure.
for kanji.
.
.
the database now, I'm
trying
image count, but
810,000. I
with
other
at.
about
to remember how
many
can check on whether
that is documents or images.
Butterfield: 810,000?
Herrgesell: 810,000
Butterfield: I
I
need to
be
single sheets or
documents that
we
have in
a
live database.
should tell you right now, if anything you plan to tell me is proprietary,
able to publish this outside
I don't
want to
hear it because
[Xerox].
Herrgesell: I've done this in demos. In benchmarking, it looks like Xerox is ahead of our benchmarks. It is good
information. Basically, all our active build-authorized engineering drawings are in this database, and they are
distributed
globally.
Butterfield: What kinds
Herrgesell: We have
a
of things are
with multiple settings on
end
up
do, in
no.
out
with a residue of
images,
image, have
So it is important for
we
have
you now?
the scanner, changing the threshold,
of not
us that we
scanned
in
a
flag
historical
I
don'
whatever
database,
we
Even
cards well.
the scanner offers.
Often,
we
t know what the percentage is currendy. We
that we set that says, somewhere in this image is a bad
do that. Now because
our
being able to scan old aperture
despeckling,
parts of which are unreadable.
our metadata about an
database,
quality problems for
decreasing historical quality problem
have taken
care of the
and currently, we are
image,
yes or
history that we need to put in
getting
most of our
information
Quality couldn't be
directiy from CAD via HPGL, a neutral format,
or
field-authorized
engineer
Particular
process
issues.
around
better. Quality problems we have tend to revolve
procedural than
more
It
is
over
to
us.
it
before
of
the
area
configuration person fencing the right
sending
drawing
you are going to have quality that is excellent.
anything to do with image quality.
Butterfield: Could you describe for the record
Herrgesell: An
aperture card
is
what an aperture card
a microfilm version of a
Butterfield: So it is just another type of microform.
Herrgesell: Right. It is a Hollerith card, usually with
metadata,
and then
is?
drawing.
punch
data
there is a window that has a piece of film. We
39
on
it that identifies
the
drawing
will put on aperture cards
with some of
up to E-size
its
drawings,
actually AO in the ISO standard and possibly beyond that. The quality is hopefully good. Nonetheless, we
getting entirely away from aperture cards. We are
what the best method of long-term storage
investigating
Butterfield: You
you need to
mentioned the
drawing
sizes that you
have
What
on aperture cards.
handle?
range is pretty much from A through E in ANSI
sizes, or A4 through AO in the ISO
identical. We have standardized on the ISO sizes for what is kept in out database.
Butterfield: What kinds
accessed
of output
do
from the
you need
sizes.
What form does this information take
system.
will
hardcopy document sizes
Herrgesell: The
are almost
are
be.
do
Those
when
it is
later.
Herrgesell:
Basically
searching and viewing. You may not have thought of searching as an output, but it is
function for us. People have to find drawings first and then make their choices. There are some
additional needs for
downloading files in electronic form for use with other software or to ship in packages to
vendors that might be bidding.
certainly
a critical
Butterfield: When
say searching, is that just searching based
document.
you
the text content of the
Herrgesell:
Currently, it is metadata
only.
Now the
on the metadata or
answer to a number of
do
customers want to search on
these questions are different over in the
Office Document Center.
Butterfield: So
you are
Herrgesell: For
the metadata.
searches on
speaking for the
documents.
Engineering
variety of reasons, basically looking at the
And in fact, in practice here, most of the users
just
a
That's
part number and sometimes revision.
costs and
benefits,
access
needs,
engineering documents
of
within
we
can get
basically kept
along
the scope of our system, which
with
it to
just
by the
way is
XVP (Xerox Virtual Print).
Butterfield: The form
Herrgesell: Single
of the output
product, does it need to be bound?
I'm not sure we're using a plotter that rolls drawings.
Butterfield: Back to scanning for a moment. What level of automation do you need from your scanning systems.
Herrgesell: With engineering documents, particularly, there are likely to be a variety of sizes we are scanning, so
we prefer
scans and
sheets are acceptable.
scanning manually. In other words, we don't put a stack
do quality checks on a screen before they are saved.
Butterfield: So
you are
viewing them
on a
CRT before they
of sheets and put them
in. We
use
individual
are stored.
Herrgesell: Yeah.
Butterfield: Are
you comfortable with
Herrgesell: We
are comfortable though we are
live
with
Xerox
Engineering
on
work process?
in the
Systems ES8180 in
That machine
old scanner were using.
that
our
will scan and
Are
you
midst of an
Doc Center
plot, it
will
looking for
improvement
has
which
funnel
off
improvement?
We
right now.
a much
faster
are
scan per
beginning
inch
to go
rate than the
drawings to a database. It has everything
the diagram you showed me except the CD ROM production.
Butterfield: OK. Do
customers are
finding
you
that
worry
they
Herrgesell: Its is
not a problem with us.
Butterfield: How
about
requirements
for
scanning on the input? For example, with book scanning, some
document in its original form. They have to slice the binding.
about the affect of
can't preserve the
scanning
speed.
How
long
does it take. How fast
can you scan and what are your
speed.
The one set of numbers I
Herrgesell: I'm not sure how fast the operation will be with the ES8180, but with the.
know is with the aperture cards next door. On a good day we can scan 300 images. Now that might sound
.
.
terrifically low, especially since we have the aperture card scanner with a very high rate of speed that cost us over a
hundred thousand dollars, but the issue is checking for quality and adding metadata to the images and collecting the
images into documents. Once all that work is done, the throughput drops considerably. And there is even a further
overnight check on some of the
correct some
documents
through automation.
Not all
of them will make
it,
the operators
have to
drawings.
Butterfield: What is the
Herrgesell: The
overnight check?
overnight check
is checking
again all
the metadata that has been entered, and cross checks between
different metadata items.
Butterfield: A second shift operator does that?
Herrgesell: It is done automatically by software during the
error messages. We are aiming for 100% quality.
Butterfield: And by that
you mean?
40
night.
The
regular operator
the next
day would read the
Herrgesell: No image errors, no data errors, no document
Butterfield: What is a document structure error?
Herrgesell: For example, a document in
misplaced into another document.
Butterfield: What kinds
of costs
Herrgesell: I don't have
do
which
there should be seven sheets, two are missing or could have been
you associate with
precise numbers.
structure errors.
We have
this sort of system?
gotten numbers on some of these operations with the use of
activity based costing, but I don't know what the numbers are. I could send you to someone else for that. For
aperture card scanning, basically we have a full time person and a Unix workstation and a Photomatrix scanner that I
Fully
mentioned earlier.
utilized, one shift, every day.
And I think there will,
at
least in the
short run,
be
a similar
person, workstation, and hard copy scanner operating throughout, one shift a day.
Butterfield: Earlier
you mentioned
die
cost of the aperture scanner at around
$100K. Would
roughly the same price for the hardcopy scanner.
Herrgesell: I'd probably want to ask would I pay the same for such a machine today. No
would be cheaper today, significantly cheaper, more reliable and smaller, and so on.
Butterfield:
Herrgesell:
for
Hardcopy scanner?
Hardcopy scanner, the
you expect to
pay
an aperture card scanner
Well there are scanners and scanners. If we were going to get a replacement
something like half an inch per second, the price may have dropped moderately. I
it originally cost, but it was expensive. Now it would not be the price of a flatbed
.
.
.
our old one which scans at
don't remember how
much
8.5x1 1, it would be an engineering scanner which is relatively expensive. I would say four
like
that. The ES8180 on the other hand, fills a significant part of the room next door. It is a
figures, something
different
niche
and
it is in the very low six figures.
very
scanner
that would do
Butterfield: So
one
to two million dollars?
Herrgesell: The low hundreds
of thousands.
Between
one and two
Butterfield: In this shop, what is the volume of documents that
Herrgesell: Let me look up some of my figures.
hundred thousand.
are captured.
Butterfield: OK.
Herrgesell: First
And the
average
distributing, I'm
of
all, the volume, about 810,000 images. The average
kilobytes
per
image is 1 12KB. In terms
images
per
document is 2.4.
actually scanning per month and
8,000 which may sound. for the Office
I'm going to say it may be around
be low, but this is Engineering. I could look that up too, but it would take
for the documents you are capturing here are primardy engineers.
not sure about that.
document business that
number of
of the number that we are
.
.
would
Butterfield:
The
Herrgesell:
Engineering,
customers
.
me a while.
.
manufacturing, procurement.
Butterfield: You've described quality attributes, costs,. deliverables. What are the time requirements for
document capture. What are the turnaround time requirements?
Herrgesell: All users of computer systems would like everything happening within an eye blink. What we have
our average for getting
managed to do is give 24 hour turnaround once the document reaches us. That would be
.
.
overnight check. Which is good, obviously we don't have a
something in electronic form. That is including the
a while for documents to reach us. So sometimes our users out
take
large backlog. Now, for various reasons, it may
In
frame.
longer
time
addition, we have a system which delivers drawings around the
there will perceive a much
And once a drawing, once the first 24 hours pass us, our
Mexico.
in
world to sites
Brazil, California,
Europe,
median
delivery time
to the rest of the
Butterfield: What does that
Herrgesell:
stand
world
is less
than
24 hours. The
name of
the system that
does that is WIMS.
for?
Worldwide Information Management System.
incremental quality requirements, perhaps at the tradeoff of cost or deliverables? At
this point do you see a need for higher quality? For color? For searchable text?
requests for better search
Herrgesell: I'm not aware that there has been a demand for any of those. There have been
Butterfield: Could
capability but it is
you
not
justify
drawings
necessary honorable by OCR'ing the
OCR do that for you?
Butterfield:
Why
won't
Herrgesell:
Most
of
the words on an engineering
drawing are
words
themselves.
like
"dia"
or
blocks
of text that appear on
demand for this, but there hasn't been
there could be some
every drawing. Theoretically, and probably practically,
project so far
enough to make us look seriously at it during the lifetime of the
an
which
is five
years.
There is
more of
interest in faster delivery.
Butterfield: So if you had to improve
one of
the three attributes, quality, cost or
41
delivery, it
would
be delivery?
Herrgesell:
chronically
If delivery
asked
And I'm
means speed, yes.
for connecting
our
information to
would go to a configuration system to get a
would
like that
Butterfield:
system and our system to
So
they'd
like to
list
of the
be the
see a connection
category this falls under, but
not sure what
configuration
information. Remember
drawings there
and to
look
we are also
earlier
I
at a sub-assembly?
said people
The
users
same system.
between the image information in drawings
and the configuration
information that links those drawings together.
Herrgesell:
Correct. Nobody really has
problems with the
images. It is the
access...
how to
access them and
how
fast they can get them.
Butterfield: Let me ask
you something about the future for systems like this. Do you think that
scanning drawings
archiving them is something that this group will do for some time, or will that get phased out with drawing
eventually already coming to the repository in electronic form.
Herrgesell: There will be an asymptotic curve. It will phase out, but it will never go away entirely. Also, there are
and
bodies
drawings
lurking in the company that need scanning that have never been caught up in an electronic
For example, we are currentiy scanning all of our facilities drawings in as a separate project from our
engineering drawings. And there may be more such items out there. Now there are people in my group that have a
better answer to that question than I do, but that is my impression. We are headed toward, basically a lights out
electronic operation, but we will probably never get there.
of
system yet.
Butterfield: What do
Herrgesell:
There
by a lights
you mean
would not
have to be
out?
There
a staff running manually.
will
be
an even
longer term
need
for
staff
to do output preparation.
Butterfield:
And
by output preparation,
that means?
For example, putting together a bid package. If, let's say 400 or more drawings are needed by a
to be put in a form expected by a vendor to bid on, it is more cost effective for that person to ask
person,
purchasing
our Doc Center to put that package together for them. So that's part of our business. We make a substantial part of
Herrgesell:
our revenue that way.
Butterfield: I forgot to
ask you about output
they TIFF files, GIF files,
Herrgesell: None of the above.
.
but
with a
.
formats. These
are
image file formats.
They are binary images. Are
.
They are
proprietary internal header. The
stored
in MMR
reason
format, Modified Modified Read. Which is nearly TIFF
for that is that's
what
XVP
which
is
a
fairly old system
could
handle.
an externally standard format?
going to that. We are upgrading XVP. As part of that upgrade, we will migrate all
our images to TIFF. Which will solve a number of problems for us.
Butterfield: I think that gets me to the end of my questions. Can I come back, after transcribing and processing
Butterfield: Is there
a need
Herrgesell: Yup. And
your comments
to
for
we're
bounce the
result off you and make sure
I
got
it right?
Herrgesell: Yes.
Butterfield: Let me thank you again. I have taken
Has this been helpful.
more time than
Herrgesell:
Butterfield: Yes it has. Thank
you
very much.
[End of Interview]
42
I
said.
of
Suzanne Keenan
Manager
XSERVVVMS/Electronic Document Management
Webster, NY
April 16, 1998
Butterfield: What kinds
of things do you do with Electronic Image Management in this shop?
Keenan: The Electronic Image Management Group has responsibility for two areas. One is in
engineering drawing
image management and we use the DocuPlex System XVP as it is known now. We've spend the last couple of
years
engineering drawings that were being designed, built or reman ufactured and scanned
DocuPlex. And now what
ve done it to implement DocuPlex out to all of the
engineering
manufacturing worldwide and they can access the engineering drawings right at their workstation.
all of the
converting
those and put them in
design teams
and
we'
Butterfield: How do they gain that access?
Keenan: There is a piece of client software
it
it. We
or print
in the
called
EDVP
and we can
download that to their
system.
That lets
them
putting up another product that is a web access which is called
IDOCS. And in the engineering world, we also have the engineering drawings and we have CRLV's which are the
life variance changes and then English translations if it happens to be a Fuji [Xerox] drawing in kanji.
Butterfield: I should say this up front that because I am doing this a research project for RIT, please don't tell me
view
anything that is
Keenan: OK.
are also
process of
proprietary.
Butterfield: You
said there are two areas and you've
Keenan: And the
other area which we've
just
active
drawings that they
are
retrieving
always
accessing those
we are
going to migrate to that.
using DocuPlex, in Europe, in Brazil,
Toronto. We have almost a million images on the system.
given me an
Butterfield: If
are the
high level
had to pick,
you
and one of
the things that we are
trying
to do
Engineering
China, Japan,
side,
we
and then
have
is
called
a thousand customers
different
parts of the
US,
overview, now if I could ask you some questions about what are the high level
requirements of the systems that you
Keenan: What
.
product that just came out that
Now in the
around the world that are
Butterfield: You've
.
scanning those in two systems. One is the Xerox
that is called Excalibur and that's developed by a company
and the other is a third party product
Excalibur Technologies. Then there is a next generation
Retreivalware. And
side.
frequently and
DocuShare product
named
Engineering
up responsibility for late last year, that's office documents,
internal customers throughout Xerox that have many, many files,
really general office documents. And there are
four-drawer files, filled with documents and are
is take
just described the
picked
have
these jobs for you now.
doing
requirements.
what are the
top
.
.
?
ten most
important things
about the systems you use to capture
these documents.
Keenan: Well, that it is high quality. Now, it is interesting from a scanning perspective, one of the things in the
Engineering world, it was very important that we had 400 dpi; 200 dpi really did not meet what we need because
kanji, so that for us was a particular requirement.
of
those kinds of things?
important
and that was one of the reasons we went into it because we were
Speed
is
Keenan: Well definitely.
very
We
were trying to take a number of weeks out of the cycle, so we
process.
product
improve
the
to
delivery
trying
needed to have a very high speed from a retrieval standpoint. I mean, people did not want to sit at their workstation
Butterfield: What
about speed and cost and
something to take a look at an image. It used to take them four hours before they could see
been probably one of the biggest issues for us. Because even though there is
tremendous value add for the customer, it is very hard to capture what the benefits are to getting a product to market
more quickly. The see it as much more expensive, which in fact it isn't. It seems to be a struggle, so cost is really
and wait
fifty seconds
it. From
a cost standpoint, that's
important. And
or
one of the reasons we are
going to the
web
is that that
will
help us reduce the cost significantly. So
that has been a real requirement.
Butterfield: Could
you prioritize
for
me
how
you would prioritize those three
Keenan: Cost first!
Butterfield: Cost is
most
important.
43
things for
.
.
.
Keenan: Yes. Then I
say speed. This is kind of odd, because quality, you would think of as normally first.
quality, many times that improves the quality of the engineering drawings anyway. Quality
not a particular issue in this case which was interesting.
But because
was
would
of the scan
Butterfield: Costs. How do
you evaluate costs for systems like this? How do you rank one system
costly than another? What are the factors that you look at?
Keenan: When we looked at it, this is kind of a unique situation because Xerox had an image
system, it
a moot point
on
for
We looked
us.
different image
at a couple of
management
systems,
and when we
less
as more or
was
kind
we
started,
of
really
the lead edge, there were not a lot of systems available. And we looked at the
quality, we looked the cost, we
at the speed, we looked at all the capabilities. DocuPlex in the end was the least
expensive, but part of it was
looked
because it
was
the Xerox product.
Butterfield: How is that
Keenan: What
did
we
sort of a system paid
was activity-based
for?
and went through and
costing
identified all
of
the expenses associated
the engineering side and the office side and we charge, we have a
pricing methodology that we developed and
what we tried to do was establish fixed costs which was what was
going to keep the system going. So it's the server
with
and
the software and the systems administration people, everything to
Basically the infrastructure.
Butterfield:
.
Keenan: Exacdy. And then
One
control their cost.
$1 19
turned out to
.
.
.
We
the
doors
open.
did was,
they
by the
we
developed
methodology that helps the customer
The first year, we just said, OK, this is
a variable cost
were real concerned about.
.
.
number of customers that were on
the system and it just happened that it
but they did not like that because it was
to this fixed and variable that said, Alright,
a month per customer and that recovered all of our costs,
one single cost and
here's
they had no
get our
funding
So
control.
what we
did
was we went
through what we call transfer agreements, then individual journal entries.
particular
group has. So if a group has a hundred customers and they have 50% of the images
fixed cost. For the variable, they can control that. We give them
get a percentage of that
image
The fixed
to them in a transfer agreement. It'll be a portion and it is based on the number of customers that a
given
they
what we
the things that
the total cost is. We divided it
what
is
of
keep
.
retrievals a month, and then
that is an area that
we're
they pay a price for every other
struggling with, but
do
we
want
one.
It is really not
on
the system, then
a certain number of
a click charge exacdy.
to give them a sense of feeling that
they
And
can control their
costs, but we recover our cost.
Butterfield: You're not looking for that
same sort of
Keenan: We'd like that. Because that's been
relationship from the suppliers of EIM systems?
There's really not the capability to do
one of the problems.
click
charges, and that would be the best solution.
Butterfield: How do
defined
what
of
looking
is instandy. Is that 10
for
Butterfield:
is the
So I'm
not
we
very
in terms
primarily
7336
PhotoMatrix
you
Handling
Butterfield: If I
doing
document
could connect you
side.
side.
We
are
But if you
to someone. The
I'm
not
really
it takes
using XDDS
wanted
you
to capture a
as the scanner
Butterfield: And
looking for
using
Engineering documents,
we are
using
sure what the requirement or what the
document handling capability
on the scanner?
what?
Office
Keenan: On the Office side,
office
documents.
a
Xerox
capability is in
but the thing that we are looking for is a scanner which improves productivity of
So as fast as possible. Now with the paper scanner, it depends on the sizes so,
a copier,
Engineering
on the
document?
for
to talk to someone who does a lot of
the scan.
from the
were
Keenan: Not in the
Butterfield:
you went
having it come
seconds.
what the exact numbers are.
Butterfield: Are
Keenan:
Engineering
aperture card scanner.
particular products,
the individual who is
I don't know
30
to you?
up instandy. Now we haven't
I don't know if Carl [Herrgesell] really had
into Microsoft and opened up a document, it comes
and
of the speed and turnaround time that
scan on the
documents, 8.5"xll", I
terms of those
seconds or
But basically if
conversant on the office
office
and a
2
speed mean
document
expectation.
Any concerns
Keenan: Now
a particular
seconds or
that number when you talked to him.
right up, that
does
you assess speed and what
Keenan: Speed in terms
side
I
could
drop documents into an
because the documents
side?
yes.
Any other thoughts about speed or throughput?
44
are so
automatic
large.
document.
.
.
Keenan: One thing from a handling side, there are times when the
capability to reduce a document and sometimes
they are oversized like a 9x12 and we're trying to scan, be 8.5x11. We have some problems like that. It wouldn't
come real high on my
priority list, but if you are trying to capture all of the requirements.
Butterfield: About quality. What are the characteristics of a
job?
high-quality
Keenan: From
image quality
an
Butterfield: Just
standpoint?
generally.
Keenan: Well, readability. On the Engineering side, because of the
quality of the drawings and the age of the
drawings, it is important, from a background standpoint, it is important that we control background color. For
instance,
we
is
have
got a
lot
not speckled.
we
be
able
lot
drawing that are old and were hand drawn with pencil that has
Quality for us too is being able to have a drawing that
to despeckle and that is important to us.
Deskewing so that we have it
of sepia and a
It is important that
smeared.
of
to do some cleanup.
We have the capability
Those are probably the biggest
aligned properly.
Butterfield: Is
ones.
color a requirement?
Keenan: Yes. It is
We are getting more and more customers giving us drawings
do
is highlight color. Not necessardy color, you know, all
wanting
is particularly being asked for.
becoming more of a requirement.
that are black and white but what
they
are
different colors, but just a highlight color
Butterfield: And do they want to see this
to
color reproduced
from their
original
documents
or
do they
want
to add it
to them?
Keenan:
it. Now in the
They want to reproduce
would expect
that we
will as
office
documents,
time goes on, but there is not a
at
this
time,
big requirement for
seeing color. But I
right now. But when I look
we are not
that
using Microsoft and doing color presentations, those documents eventually
Management System so you had better capture the color and print the color.
everyone
will get stored
in
a
at
Document
nature of color in jobs, is it color text, color line art, colored pictures?
Keenan: Well, on the Engineering side it is more highlighting areas of change. So if a drawing is changed, they
might circle it or they might do the text in color that changed for that particular revision. On the Office document
Butterfield: What is the
area, that is really all areas, some is text and.
Butterfield: The systems you have right now, they don't have the capability to do
Keenan: Right now, no, but that is what we are getting asked for now.
.
.
color right now?
Butterfield: If you had the ability to capture color and provide that for your customers, how
color rendition have to be. Would a highlighted region that was red have to match exacdy?
Keenan: No. No. In the Engineering side, I
would have to be more like-for-like color.
Butterfield: What
Keenan: For
Butterfield:
us
are
would
say that
would not
be
a real
high
accurate would the
priority.
In the Office side, it
the most difficult kinds of image content to handle.
it is kanji data.
Any other types?
Keenan: We've had
quite a
bit
Butterfield: What
difficulty with the manually drawn engineering drawings because of smear,
drawing whether we take it in electronically or whether we scan it.
of
the best highest quality is a CAD
portion of your
Keenan: Right now,
we are
inputs do
come
getting probably
and
to you now electronically and how do you expect that to change?
80% electronically in the Engineering world. In the Office
about
100% and I see in the Office world, I see
world, 100% right now that we scan. And I see in the CAD world going to
but I think that's a couple years away. Our
ramping up and get to eventually all electronic submission,
is going after the files to reduce facility space. Once we get those customers out, then they'll start
each year toward more electronic data.
moving their other documents to us electronically. So I see that ramping up
Butterfield: Do you ever see a point when you would get rid of your scanning capability altogether?
time. Totally? No I bet we'd get to 80% though. Overtime. But you know
not for quite a
Keenan:
we'll start
focus
right now
long
Probably
there's a lot of contract
documents, different legal documents and job
ticket type information that would
stay
hardcopy.
Butterfield: The nature
of the output that these systems provide.
Right
now
it is in image form? It is
not searchable
text?
Keenan: In the Engineering side, that is correct. In the Office side, we have optical character recognition, so
have content retrieval, attribute retrieval. Excalibur has absolutely phenomenal retrieval ability. It retrieves
synonyms. It is powerful. You want to look at it, go on the web under Excalibur.
Butterfield: Excalibur is the
retrieval side.
45
we
Keenan: Exacdy.
Butterfield: I think you said the output is via their electronic workstation. Are there print
finishing operations here?
Keenan: We do some printing here. It is getting less and less. The
that
we
the system is that we
configured
way
put print capability in the work areas so that
hopefully they will just have to view it, but if they do need a print, they
will
be
able to print
it in their
work area.
type of thing. And many times
only
1 1x17 in their
get
they
work areas.
Now in the
Engineering
full
larger
want
size or
Anything larger than
side, we do a lot
sizes than what
of
finishing here,
they have in
bid sets, that
their work areas.
They can
that comes over here for finishing. And there is still a fair
that. More than I thought we would have at this stage.
They view that as very clerical. In the Office
document side, we do get a fair amount of requests for finishing,
particularly when its like large folders of
information that they want. They do ask us to run those. We're trying to get them
away from that. You know, if
you create a Word document and you want a print of
it, you send it to a printer. It is the same metaphor you would
think would be used, but it is not.
amount of
Butterfield: Tell
Keenan: We
Museum. There
process to
be
whole project.
all used
and put
litde
doing
are all
about the
When
level
of preparation of
input documents ? Do
you ever scan
that in this particular area, but another group that I have is
kinds
of
cataloged and will
point though.
aperture
me a
are not
books
be
and
added
historical documents that
into
an
imaging
system.
we are
There
trying
will
be
books?
doing a project
for the Xerox
to preserve and that is in the
a requirement
for that. Another
did this engineering project, preparation was the most costly, laborious, difficult part of the
The lessons learned from that were phenomenal. Just take for example, the primary input was from
we
card, 35mm. Xerox had purchased a
number of companies
throughout the course of 30 to 40 years which
different attributes, different microfilm standards, different drawing formats, and to pull those all together
them into the DocuPlex system was a challenge at best [laughs]. Very difficult. It is also very difficult in
the Office documents. Take as an example,.
.
.
diverse.
They have had some many different
Resource] groups that have a set of
Safety. Every time you set up a new document
the customers are so
types of documents with so many requirements. We have HR [Human
attributes, then we have
Legal, Environmental Health
management collection
for
a new group,
capturing
and
all of the attribute
It is very labor intensive, very costly. Every time
$5000 just to set up the scan is minimum.
challenging.
paper...
Butterfield: You
are
talking
information the way that they
we put a new customer
up that
want to
we are
do it is
going to
scan
about the metadata?
Keenan: Yes. Primarily.
Butterfield: For
example?
Keenan: In the HR department, it might me an individual name, their Social Security number, their employee
those are a couple examples.
number. In the Engineering world, it might be a drawing or part number, a revision.
.
Butterfield: I
think that
answers all of
my
questions.
Thank
[End of Interview]
46
you
very
much.
.
Patricia Pitkin
Director
Library Services
Library
Institute of Technology
of
Wallace Memorial
Rochester
Rochester, NY
Butterfield: Could you tell me what kinds of things you are doing right now relative to a system like this [example].
Pitkin: Oh sure. We do. we capture documents and put them on the web. Students tend to do a lot of that in some
public areas that we have, but what they're
looking at is really more representational than on quality. We are also
.
doing
.
for an electronic review service that we have available where we capture faculty
digitize that, put it up, make it available through a web interface off of our catalog and then
students have access to it anytime, anywhere. And we are also
doing some digital capture of images and making
those images available and that's large an issue right now.
Say a faculty member takes some photographs and they
some capture of material
members notes and
want to
integrate those
into
them, but we are also starting to digitize
into study collections on our reserve environment so that
students can refer back to those images, either through the web interface or
they can print them out, generally they
not high output though, and refer to them later. So that's the kind of environment that we're
working with. We also
do some transmission that is somewhat related with regard to distributing digital articles through document delivery
them as well, so that
photographs
they
be
can
a class, we will make slides of
stored and either put
do that through a scanner and a system developed by the Research Library Group called Ariel. And it is
primary for digital transmission among libraries of documents to satisfy interlibrary loan requirements.
Butterfield: How do you evaluate the quality of the systems you use? What kinds of things are you looking for?
and we
Pitkin: Actually, we are struggling with that issue right now. Particularly in the distribution of photographic images
how we capture those and make them available. The way that we have been doing that is by capturing at least
and
image,
three resolutions of the
And then in
resolution.
faculty on
those
in the
projector
images
being
driving
a
However,
conference
familiar
very
the reaction was that
of
of
the image would be satisfactory from their perspective and we were
concerned that the resolution of
they
some
Barco/Sharp
fine,
were
projected
the screen size image would not be
back out, for their
purposes.
Now
we are
environment, not necessarily a quality reproduction environment, and so their
this is better than slides. Which was surprising. This was not a large sampling, but I was at a
last
with
if the quality
we were
teaching/learning kind
reaction was
museum
transmitted through the campus network and projected out through the
classroom to see
pleasandy surprised, because
adequate.
thumbnail, a medium quality screen size image, and then at least one higher
fourth level resolution. We have just recently done a focus group with
a
come cases, a
week
them, it is
be something you want to follow up on, but MESL, I'm not sure if you're
MESL. I do have a URL, you might want to check them out. They are a
and this might
of,
a museum
project, I think funded
by NSF for the last two or three
years, and
they have done
some user analysis on
they have the same results spread over a variety of focus groups and it validated what we had seen here with
our small sampling. That the faculty would initially say they want the highest resolution possible, however when it
came down to an actual use, they seemed quite happy with what we had available which is not a very high
this and
resolution.
The fellow
who was
Berkeley. And they took
large.
Butterfield: What
are
those
doing
focus
the presentation was
groups across six
the resolutions that you are
.
.
Howard Besser,
participating
and
I believe he is
universities, so the
focus
at
Stanford or
groups were
pretty
.
the Amico standard.
going to ask that and I don't remember them. We are following
Amico is a project that we have just been selected to be part of, and they have laid out the four resolution standards.
Pitkin: I knew
you were
I don't have that, but I
can get that
for
you.
Are
you a member of
AIIM? Because
we are
doing
a presentation of
this tomorrow. They are having a regional meeting here at RIT tomorrow. We'll be doing a presentation on what
afternoon around noon. The person who could
with digital collections. I believe our portion is in the
we are
doing
tell you about
is Milt Cofield. His
number
is 475-2751.
What are the hardest kinds to capture.
for that. We have really been taking flat images that are already there,
been what level of image manipulation and
and some of the images that we've been concerned about have
of the image itself, how close does it represent
enhancement do we have to do, for example, the quality of capture
of
capturing that. Do we try to enhance that to
what it actually shows, and that is really a question of the process
capture
what either the film or the digital image
just
we
or
do
make it close to what the image was supposed to be of
image
Butterfield: Difficult kinds
of
Pitkin: I don't think I have
a good answer
content.
47
That is one of the questions we haven't really resolved. And then the question
manipulating and worrying about color balance and altering the color, you really run into
the question of what you have done to the original, and what you've done to the
capturing item. You've added yet
another layer of interpretation that is sort of
humanly subjective as opposed to mechanically interpreted, so [laughs]
that's kind of a question that we've been
wrestling with, and we've tried to stay away from altering images.
Because it is only adding another layer of interpretation. That is the place that we're at at the moment.
Butterfield: So the goal is reality then.
Pitkin: Yes, I think that is probably true. Because some of the situations where we might get copyright clearance to
captured and make that available.
also
becomes, if you
take a
copy
of an
comfortable with
So hopefully,
image that is in
Do
original object.
start
we
try
a
book
and
to enhance that to
being able
digitize
to add any value
one of the things we are
hoping
We
that slide.
by doing that, but rather
do,
to
at
least, it
levels
are now multiple
to get closer to the original or not?
try
only adding
And I
removed
from the
think that none of use
another
level
of
feel
interpretation.
that we minimize the amount that we digitize and
rather, acquire images from sources that have already dealt with those issues so that we can be more of the transport,
distribution,
enough
and organization method and sort of the
filter,
creator, because
as opposed to the
Butterfield: What kinds
of color
Pitkin: We don't have
We have not really addressed that at all. It is more of: What
reproduce. So maybe I don't understand your question.
What
we get
is
have to handle
you
Pitkin: Absolutely. And
making Photo CD's
description
you
have?
is
you see
what you get.
input?
color
we'll digitize that. The process that we have been using of taking slides
from
Photo CD's and dumping those over into digital files and them catalog
going
And most of our work has been in the areas of interface design and in the area of
what we'll
do is
and then
them and make them available.
metadata
requirements, if any, do
any.
what we
Butterfield: Do
and
we are not close
to the source to make those quality decisions.
of content and appropriate
searching, the issues
of
how do
you search an
image. We've done
technical end of image capture.
less investigation
on the
Butterfield: How
would you arrange
the priorities of the things you've just
described,
searching, quality, resolution,
color...
a really functional approach to the output so for us I think we would say if the color is good
in
some
cases it may only be representational, that the color itself in a teaching environment, say, think of a
enough,
is as important as recognition of the image. I think that's more where we're coming from. And
of
Art
class,
Survey
Pitkin: We take
generally, so
being
able
have the image, find the
to us than does the quality
really
image,
match the original.
I
and
display the image
think we
have
a
lot
of
that is recognizable is more important
latitude. Sound like low
standards to
me.
Butterfield: Well, I need to understand what your real requirements are. I'm going to be asking you about quality,
costs and speed. So you really don't have any quality problems right now.
Pitkin: Well, part of that is that we don't have a large enough base. As we capture more images ourselves and
create more
smaller.
digital
be to
going to
.
.
collections ourselves, that will
someone else's.
available will
or things
Pitkin: I think
functional,
is
more of a concern.
distributing
But I do believe
so that the question of
our approach
quality
will
be
is
a
do capture, my guess is the ratio of what we capture to what is
be things that acquire, digitize, and make available.
problems with things like small fonts or moire or not capturing shadow
And then those that
you anticipate
become
someone else
probably be less than 10%
Butterfield: Would
detail
images that
acquire more
we
would
having
like that?
you are
talking
and the aesthetic
Butterfield: The
to the wrong person about that. As I said, I think my orientation is more the
I really look to the faculty to drive that as the user of the technology.
more.
detad is
.
.
nature of the output of these systems that we are
talking
about.
Could
you
describe them in
greater
depth?
classroom or on screen viewing.
going to be display and projection, for
for
students who want to take away a
color
for
requirement
output,
some
be
There may be some requirement may
will want to produce physical study guides
whether
the
be
remains
to
I
don't
it
faculty
physical output.
seen,
know,
Pitkin:
Probably the primary output is
from the
looking
collections.
that curriculum,
and
fair
I don't think
at these as a vehicle
use of
to worry about
for
we
know that. And I think
publication
in the
sense of a
developing a text book that would use
the images that we have. So
it, I don't see
unless
that as one of the
of course one of the questions will
be,
we're not
faculty member developing a curriculum and then from
these images because one of the huge issues is the copyright
they
are out of copyright,
natural outcomes of
48
they are
old enough that we
don't have
the project, and primarily because of that.
Because getting the
buying
we were
from. But
clearance
for the images that may be available will be difficult. Now it would be less difficult if
using just collections, because we would have one organizing unit that we could get
copyright
collections and
in the image arena,
often
the source would be widely distributed.
huge
problem
for
Now
us.
with color output of
have done
we
images to
a text
some of
Sometimes
the images come as individuals
you
don't
some work, now
form for
even
have
know
or
very small segments, so
holder is. So that is a
who the copyright
you talked to
Dave Panko. He has done
He
creation of exhibition catalogs.
would
be
more work
a good source
in terms
of
quality side. And he'll have a much higher sense of what the output requirements would be because he is
looking for print. And our primary output is looking towards display and projection.
Butterfield: How about accuracy. Is that something you look for in terms of these systems.
the
Pitkin:
Again,
that we
have the
not
not the original
least
close to the
little bit,
a
have. It is the accuracy of that in the image arena. It is not likely
making a digital representation of. It may be that, the source itself is
facsimile of it. So the question becomes, are you trying to replicate to the
perspective that we
source, but only a
facsimile. How
at
from the
source next to us that we are
we've
facsimile? And
done
some
we
digitization
haven't really
lot
worried a
about that.
of a poster collection, when
in fact
we
I should probably describe
do have the source material,
the quality should match, and we do have the original. But we have not really
for that, I think it has been subjective to the person who was.
the image was
and we are more concerned then with
put together a set of principles
captured, the
image
was put
lot of energy into that.
Butterfield: These systems,
.
up
on the
and the check to
screen,
.
the original was cursory at best. So we have not put
a
required?
Could
what
level
of
you tell me
automation,
about that and
what
level
of operator
participation, skill, preparation that is
the requirements for those things?
something
Pitkin: Right now, the digitization process is handled through our Media Resource Center.
also be a good person to talk to. They would be more concerned with the quality and more
reproduction
process
They would probably
concerned about the
issues. But right now, the process goes from film to digitized and we are trying to move into a digital
skill level of the people who do the capturing varies from students just setting up a
from the beginning. The
scanning a series of slides that have been taken and then there will be some quality connection to
location of the image or perhaps a litde work on the color. But again, that is pretty subjective and based on the
person who is doing it. The process for that is, as I understand it, pretty manual and not very automated. We were
slide scanner and
very happy when we
Butterfield: Is that a
got a slide scanner.
collections than
a
digitizing
Butterfield: You
for
concern
Pitkin: Absolutely. It is
huge
you?
concern.
And again, it is
a concern and
it is why
we would
be better
off
acquiring
them ourselves.
said you are
trying
to move into the digital from the
beginning
instead
of the
film to digital.
Why
is that?
It is the capital cost of that is something that is being looked at right now and I think the
is getting to be a little more affordable. The capital equipment side was a litde bit more than
digital cameras, digital scanners. And this did not have a real high
we were willing to absorb initially. The cost.
whole process.
this
last
until
this
year,
priority
process and the effect on the input. Do you need to maintain the input in their
Butterfield: How about the
Pitkin: It is really
the
price of
cost.
equipment
.
.
.
.
scanning
form? Are
original
you able
to do that with the processes you are using now?
Pitkin: I believe right now,
keeping the slides. The goal is to move away from that
the slide, but it is captured digitally. But right now we are keeping the slides.
we are
Butterfield: Some folks
Pitkin:
about
[Coughs]
How
who are
painful
scanning books actually
Pitkin:
we
off
not
keep
books.
don't do that. We haven't been that
serious
you even consider that?
Only if we had duplicate copies
Butterfield: You told me something
Pitkin: A full digital system? I think
of
the book.
about costs.
points as
least
binding
No,
do
it.
Butterfield: Would
we've
slice the
that is! Don't say that to David.
so that we
it relates to cost,
had
a two
we are
looking
drop
at
you expect
probably
to pay for a system like this?
$50,000 system. One of the other
about a
key
The timing is right for us to explore this because
Ethernet to all the classrooms. I think there is at
we are right now at a convergence point.
a network upgrade on campus which
Ethernet
What do
has been
to each classroom. It is
bringing
fairly high
speed
Ethernet, it is 10 meg
to the wall. In some
high speeds which means we can start to think about
situations, it is 100 megabit to the wall, so we are getting
bigger bandwidth applications to the classroom, and also,
get
can
pushing larger packets through that network so we
49
that campus upgrade is
tiieir environment,
bringing
access to
Ethernet to the
die images
change, and that will be completed
classrooms, so
the
image, it is
all the
Less
value.
learning
We
infrastructure
faculty now have
the
opportunity to use, in
big
the end of this year. The other factor here is we've been upgrading all the
talk about a capital.
when you
offices, so the
going, again, to be more bandwiddi intensive. That is one
at
..
be
least in
our
environment, it is
not
just creating
a
digital
process
for
that capability in our primary business, and without the
in the network, and without an upgrade in the classrooms, without classrooms that have projection, it has
upgrade
less
by
faculty
which are
able to use
to be accepted and adopted because those are obstacles. As well as use of these in distance
likely
applications.
support to
We have developed
a couple of
different image
collections that support
distance learning.
using these collections as a base for classes that are being taught at a distance, so that is really our focus.
And we have actually probably five or six different image collections for people to use. There is a poster collection
which is integrated into our catalog so that if you are searching you can get the image
display and the description of
are
it. We have two design
have the integration
integrated
archive projects which stand alone
image based collections;
these are all web
of some of the material that we are
within the catalog.
David in
Cary
has
some
capturing now, some of the locally
image representation within his site
produced
based. We
images,
of manuscripts.
again
We have
some representations of the artwork that was student produced available off of our web site so that people can use
Those
that as part of their resumes.
Butterfield: You talked
optical character
recognition,
Pitkin: Yes. All
Most
all of our
is
captured
text material. A
at
So
we
have
about
15%
documents into
in text
and put
common reader
at
application.
is image collections, have
of what you talked about
printed text
OCR
making it searchable and we
OCR
part. The processing time
the
doing
We did look
for the
translating
of our reserve material
baseline format for
we are not
are the ones that come to mind right now.
about searching.
opted
is
done any
you
electronically searchable form?
in the Adobe Capture and use PDF as the
an
Relatively uniform,
out there.
away from that. So
on that was really.
.
.
we are
universal access.
taking just
page
images
and
willing to accept
50 faculty using
was not what we were
of our reserves are now electronic.
We have
about
just text, but text is a big part of it.
Butterfield: What might have tipped the balance on the decision to save images versus saving searchable text?
Pitkin: It is ready the processing time. That was a decision we made about a year and a half ago and one was the
And that
electronic reserves.
quality
of capture on the
reliable, and the
uses more than
OCR. It
processing
was
was not as
reliable, at least I remember
horrendous. This is
a
fairly
quick
having
a
discussion
about
turnaround environment for us.
it, it wasn't that
One
of the
technical folks thought there was a way to do textual searching without the overhead, and we didn't explore that.
But in the latest version of Acrobat Suite, there was the capability of doing that without overhead but I don't know
the details of that.
Butterfield: You
said the
hundred times too
processing
was
horrendous. How bad
was
it? Two times,
an order of magnitude, a
slow?
Pitkin: As I recall, it added, my impression, was that it was at least four or five times more than just capturing it.
And that was far more than we could absorb. The threshold was significantly more, and I think we also had some
it avadable in a form other than
concerns in the preservation of the format itself of the original document in making
original.
And
significant
our requirement
getting it
with the
is
a
24 hour
turnaround time
increase. We've had
something about the transfer. The
time, and the delay time caused some problems
conversion of
was
problem
for
for
material that
is
presented
some problems with output of
with
PDF to PostScript to
the printer
pretty severe.
have answered most
timing
out.
get
PDF to
it to
to us. That could be
a shared printer.
There
output on the printer adds a
So for large documents there
delay
was a
users that was
of my questions. Are there any other thoughts you'd like to
Pitkin: No, I think I've answered all I can. This is really exciting. I really enjoy working with this.
Butterfield: Well, I ready appreciate you taking the time.
Pitkin: Good. Really not a problem.
Butterfield: I think
you
[End of Interview]
50
share?
Rodney Perry
Associate Director for Central
Rochester Public Library
Library Services
Rochester, NY
April 27, 1998
Butterfield: This library, the Rochester Public Library, is doing some scanning. Can you tell me about it?
Perry: We have done a project where we took some of our clipping files. We use a lot of newspaper clippings,
those,
doing
until
.
this point where we now have a grant to
sending them
Butterfield: What other kinds,
photographs and
Perry: We
are
starting
themselves to this.
in
We
don't. But
Chapter
are
We
are
AIIM
on
other
because it
easy, we had good collaborating partners, which
City's [Rochester, NY] archives. I think that photos lend
seemed
with the
of the
an
some
fairly significant collection
resources, the
material that
I
put
the Custer letter and the Lincoln letter are
4/23/98],
field,
require a
manuscripts,
of postcards.
the second decade of the
I
was
digital treatment
some
interesting
looking at some
20th
of
in the
package
[in
fairly remarkable
rather than a
things there
of those
East Avenue down
century
East. It is a postcard,
the intersection of Main and
.
for scanning photographs, or electronicady capturing
getting in the production business.
clippings, photographs, that need to be captured?
collaborating
have
we also
think, from my knowledge
image treatment. So we have
teens,
some way.
you mentioned
that I
nineteen
develop a plan
Everybody understands a photograph. They add meaning and viewpoint and so forth
that words
of
around
with photographs
satisfied grant conditions.
a
scan
to set up an index. That sort of fell apart because of the amount of time it took and we weren't
ready
it as a separate project that fit into the normal workflow, so it sort of fell down of its own weight. So that's.
try
and
presentation
in
a
way
to Western
examples of manuscripts
text treatment,
by that I mean
putting out there. Also have
day. Scenes probably in the
worth
the other
by where they are now building
that
Law
have many of those, and that is probably worth
digitizing. Another significant area is our clipping files. We have about 60 file cabinets full of clippings on ad sorts
of topics of multiple interest and they serve probably well over half of the reference function in the local history
building,
division. And it is just
would
of
like to digitize
a series of
and
file folders
over
think partly,
they are
value.
image,
the
who
access to
be indexed
information. So
Photographs
between
visual
Photographic
downtown,
to make sure that the important words are clear and that the OCR program has read them
by key word. So that is another resource that I think would lend itself to digitizing.
digitizing is a route to access, for example, this clipping file, they are not interesting themselves, but
can
they
correcdy,
mostiy organized around topic, and I
know this field better than I ted me that you can do, on top
text file in some way, and as long as you spend enough time
of old newspaper clippings,
distribute those. People
the scanning, do OCR software which turns it into a
going
and we
they have information
and postcards, they're
information
and textual
just
interesting
out on the web.
I
think the textual
not quite sure what
for
information. And I'm
and manuscripts are sort of gee whiz
information. I'm
value.
kinds
at
of things.
here, but I
they
another
So I
show.
not sure which
information, like
I'm getting
Photographs have
what
is
guess
more
kind
of
I'd draw
information
a
distinction
important in the
Gosh, here is this picture
I
long run.
library
of the central
clippings, probably have a larger proportion of
think text
has
a
different information
value than a
digital image.
Butterfield: You've described
several
that are important. How do you know
know that
you
had
different input types,
a good scan
from
perhaps we could talk about the
a good scan versus a
bad
scan.
quality characteristics
What would you look for to
a good scan?
I'd probably look at Ught and dark. Is it accurate from that point of view. A woman who is our local
expert on scanning, Sue Shippey, would be a good one to talk to. She showed me the Associated Press photo file
I had her look up Widie Mays, found a picture of
product, it gives access to pictures which are AP photographs.
Perry: I
guess
Willie Mays. This is the famous picture of Mays catching the ball over his head and kind of saving the game in the
World Series in 1954. And that is good enough to bring back memories and bring back the event. I'm not sure it is
important that it is a crisp and clear photo. There is a way to order the actual photograph, so for library purposes,
You've got Wdlie Mays, 1954 catch. I think libraries, and
something like that printed out is sort of good enough.
run-of-the-mid
for
the
uses, don't need a lot of definition in the picture. We
libraries
I'm speaking of Public
mainly,
have a number of users who turn this material we get from the library into other products, and probably Sue is the
how the user uses the output. Our thinking at this point about our imaging project is that we will use
we don't really have
thumbnails and then we'll have second images that are bigger and of some better quality, but
final
not
the
product
it
is
first
it is
line,
because we are.
major
issues with what we are
one to speak to
quality
doing
..
51
Butterfield: What do you mean by that first line, not final product?
Perry: Well, you are showing the picture Willie Mays. There is a system, if you want to order
it you can order it.
If you want to publish it you can publish it. If you are
looking for a better image of photographic quality, this photo
is on books on the subject you can go further with it. So this is first line of several usages of
This
requirements.
standards are
would
be
demand to
business.
reproduction
for
like. Here's
a picture of the
The
Mauritania. You
input type
second
In
shows what
varying quality
happened. Solthinkour
out of photo reproduction
a couple of examples that
of computer
They look something like
see.
business
come and they'll get some
Essentially it is a catalog.
cartoon
need
you
I've
put
business.
in my
.
.
Film-on-paper
handout, my
printouts, and they've lost something, but certainly
characters, it is really the situation. This is
can see what an old
don't have tight definition, but I'm not sure you
level of quality that isn't always appreciated.
Butterfield: Let's
It
demanding. If fact, with our collaborating partners, the City Archives and Photo
the photograph to be a substitute for the real thing.
They hope that this information will
presentation, these happen to be photocopies
recognize the people.
a tenth graders report.
not that
probably
Lab, they really don't want
create people's
good enough
tug boat looks like. So
it. I don't think
described
you need
you
it looked
what
You
you get a sense.
to spend a lot of money creating a
were the manuscripts and you showed us examples of
the Custer and Lincoln manuscripts on Friday. What are the quality characteristics you want to capture there.
Perry: I think there, you want to get the color of the paper, it is off white, maybe it is brown, maybe it is folded and
worn at
the edges. The ink was maybe pale purple or something. So there, I'm speaking about a higher standard
than the
Widie Mays photograph, because you are really recreating the item electronically. You're not going to
for example. With a photographic database, you can always send for a
produce a photograph on request
conventional
photograph, but
the recreation
it is
straight
is
you can't get a conventional manuscript of the
one of the object
black
and white.
But
itself and
a
letter from George
does it look like. With this photocopy,
what
digital image
should capture all of
electronically stands for itself, [unintelligible]
Butterfield: The clippings, you talked about capturing the text, but for
Custer,
so
I think
don't ready achieve it;
the stains, characteristics because there the
you
object
manuscripts you
didn't. That
wasn't an
oversight
Perry: Yes. If you
a piece of
information,
incidental. In the
about
story
can't read
But that's the way it is. A clipping is
in itself and the information, in a sense,
in a
to have the OCR program turn the word "management
George Custer's writing, that is
not an object
in itself. A manuscript is
your problem.
an object
reorganization"
clippings you need
Kodak
other simdar word.
have it turn
and
So
out
you want that
"management
to be
user
"mammography"
organization"
in the index
friendly in an OCR program
and not
or some
don't have to
so you
spend a
lot
of staff
time going through the clippings and verifying that the OCR worked. This is not every word, but it is the important
words that you are going to organize and classify by. I guess in the clipping file, we begin to meld the image system
with the text recognition systems. I don't really understand it very well..
Butterfield: I'm only really interested in the. The technology piece is a response, what I'm really trying to
understand is the requirements, and I think you are giving me great information on that. You said merging those
.
.
two; is that because newspapers have both text and pictures?
Perry: Well, yes, I hadn't even thought of that Yes, sometimes a newspaper story will have a photography with it
My comment had to do with, I think, here is where I don't know technically, I think capture is a scanning/image
technology and the sorting the text is a text technology. That is my simple view of it. It begins to take the shift. We
capture the image then turn it into the equivalent of a text file. So I don't understand the technology and why you
can't pretend
it is
file [unintelligible].
a text
in clippings, what would you like to see.
here yesterday working, and I try to stay away from reference questions, but I was helping a guy
find. he was writing. he was a new English speaker. he had to do a report on Greenpeace. He got some
should be able to put into a computer
sources, some internet sources. But taking him as an example, he
"Greenpeace"
sources and also access to local information
news
internet
and
articles
and get books and magazine
Butterfield: As
Perry: I
a user of a system that could scan
was
.
.
.
.
.
.
locally. This is the global picture, the European
clipping file about activities of Greenpeace
on Lake Ontario chasing freighters that were
their
boats
of
one
have
and
American
picture,
picture, the
locally they
would be about. Sometimes that would show up in a
it
what
that's
it
but
the
That's
not
oil.
theory,
case,
dumping
newspaper
but newspapers are not particularly well indexed. That is one of the reasons we do all this
out of our newspaper
index,
clipping
go
of
local
to a separate,
papers.
Probably the
Science division
either automated or manual
file
would
clip that Greenpeace
of newspaper clippings,
52
story.
newspaper
Users
should not
clipping information
have to
ought to
show
equivalent status of a
up
the results of searches.
Ontario had
a photograph
Greenpeace
motor
files
and
different
book catalog
Taking
boat,
or a periodical
in the
Right now, this is
that would show up too.
resources to
index. So it
was
just
Amalgamating
part of the search.
theory a litde further, if that Greenpeace chasing a freighter dumping oil in Lake
City Archives or City Photo Lab was over in Charlotte taking pictures of the
the
look
But libraries
at things.
would
fantasy land,
all
but
like to integrate the
we send people
to different
results of searches of
images
files.
Butterfield: You talked
You
about the qualities.
said that some of the
inputs
you
described
For
were colored.
example, the postcard of East and Main. The manuscripts had yellow paper and purple ink. What color
reproduction.
.
.
how
Perry: Well the
idea
does that
good
model
I
bring
have to be.
color
to bear is.
.
.
we
have
It does
a color photo copier.
a
pretty
good
job
of
giving
you an
the original colors were. But it is only pretty good. And I think we'd probably look for better
standards than a photocopier because I think Robin's Egg blue is a lot different than tight blue, and things look
of what
pretty ugly pretty fast if the color is not quite right so I think the accuracy of color reproduction is important. And
this accuracy may be required on the terminal. Printing it out is another level, [unintelligible] But I think you could
draw
a
a
distinction between
lesser
standard of
particularly
accurate color representation on
accuracy than
Library style
accurate.
printers that are capable of
matching colors.
Butterfield: I think
free,
you could at
getting
film
the terminal and actual printing it out. You could accept
the terminal. The reason I say this is that color copiers are not
copies
could, I'm sure, copy it
good color reproductions,
you made some assumptions about
much
high quality
better. And I think the expense of
that could be faithful to
color printers
the costs and capabilities of technologies there. If it was ad
color rendition on the terminal and color rendition on the
print, is there stdl a difference in importance of
what
you see on the screen and what you see on the print?
Perry: No, that was a pragmatic.
Butterfield: What kinds of inputs do
Achides'
heels
of systems
you think are most
difficult? Where have
you seen poor quality?
What
are the
like this?
Perry: Frankly, in terms of the image, I'm so amazed that it happens, that the images themselves don't really.
Nothing strikes me. I think the important thing from the access point of view is describing the content so you can
find it. In other words, this Widie Mays catch, you had to find this through Wdlie Mays, and I didn't search through
.
World Series. There
searches to get you
are other ways
it
might
be
This descriptive field. How
searched.
to this. I don't know whether this is done
well or
receptive
poorly because I don't
is this to
understand
.
web
it
particularly, but I'm not familiar with the areas of difficulty in terms of capturing.
Butterfield: Do you ever envision having to scan documents that are so faded that even the originals are difficult to
see?
Do
colors or
you
have
requirements
for capturing tiny fonts
or
extremely detailed line drawings? Exotic fluorescent
of those sorts of things.
any
heels. I think maybe
Perry: As a public library, probably not. Let me back up to the question of
newspapers are
heels, I'm not sure. Because of their size and their condition. Those that you want to scan
of doing it poorly. The
to capture before it disintegrates, are in really difficult shape. That is not ready an issue
Achdles'
Achides'
issue is how do
you
do it
and get the size,
[unintelligible]
handle.
quality newspapers are maybe difficult to
quite
a few of to digitize is maps. They carry some
have
we
on
that
mention
to
I
forgot
Perry: The other thing
early
also very heavily used. What was the Town of
of
same problems as photographs and newspapers. They are
Butterfield: So
large,
poor
the
Webster like in 1920? [unintelligible] And
we
have
quite a
few
maps that could
away from that because of the size mainly, [unintelligible]
s a new kind of input Are the quality requirements
Butterfield: So
that'
of maps
be digitized, but
we are
different from the
other
staying
types of
inputs?
show up or a topographic map, the
Perry: The detati in the map is important. A cross that marks a cemetery must
would have a high degree of
a
information,
of
map
of
capturing
terrain lines, so I'd say in terms
accuracy
requirement. I'm not sure color per se, but detail, fine print.
quality. What about the costs of
Butterfield: If I could, before I go, see some of these. We've talked all about
systems like this. What are the cost concerns.
other programs. So I think many of these
Perry: It is hard to do a lot of in the digitizing area and take money from
an issue except to find out
of
much
that
are externally funded. As developmental projects, cost is not
[unintedigible].
Then, I
think as you
understand
ongoing costs,
53
you can at
least
make choices of
deciding
whether
to pursue, to proceed in a dramatic way with an extensive effort to get new money or transition what you are
doing
now. An example, probably the clearest example is the clippings. If we find that we can
effectively get access to
the clippings, that it does not cost too much to do that, then instead of clipping, we scan the
clipping and go directly
to electronic. I really can't comment on the degree of cost, as developmental projects you spend what you get and
do
It is hard to not
as much as you can.
the print book resources in order to scan postcards or convert microfilm.
by
I think they tend to be externally funded. To some degree, you build them into your routine,
Butterfield: Related to that what about the level of automation to do this sort of thing. How
intervention,
ends
-
would
be
acceptable.
of production?
Butterfield: In terms
Tape
intervention
operator
Perry: In terms
[unintelligible]
much user
of
scanning
lot
a
manual notes
(not
not suitable.
Photographs
of
documents.
verbatim) from this point forward.
Perry:
Automation
System
production mode.
be
runs
wid
be
itself. If we
individuady handled. Depends
on what
photographs and
were to
it is
and
be
doing
be
scanned
from resulting negative. Allows for a
have odd sizes. Manuscripts have to
the clippings, we
how many
we
have to do.
Butterfield:
What is
to be captured:
volume of stuff
Perry:
We know
we
that because
have 25,000
it is
photographs.
With city
archives and photo
labs have
about
375,000. We
won't
do
all of
interesting.
not ad
Manuscripts, we have some, but it is not a major part of our collection.
Clippings, we have 60 filing cabinets, 200 file drawers. I couldn't guess how many
actual
items there are, hundreds
of thousands certainly.
The postcards, is in the several thousands. Not
Maps, we have, let's say, 1500.
Picture file
books,
would not
calendars.
dramatic
resource
be
scanned
because
sure of the quantity.
of copyright
issues. It
contains
Magazine clippings,
clippings
from
Quality varies, color in some, some line drawings, wide variety of uses. To me, that is the most
arranged by topic. Copyright process is such an obstacle that we would not be involved for
sometime or ever.
Butterfield:
How important is
speed?
Perry:
Speed
Speed
of capture
of
captured
is important. Project basis,
clipping is pretty important.
it in text searchable form.
speed
Probably
is important to
need to scan a
understand
clipping in 30
how
much
it
costs
in the future.
seconds to scan and
know
you've
Butterfield:
How
long
do
source, for
Perry:
you expect to
example
for
Don't know how long
be in the business
of scanning?
Wdl
you ever get electronic
input
directly from
the
current newspapers?
be in the business of capturing documents electronicady. Marketplace has not made
information clear. In the print world, at least you own it in a printed form. In the electronic
is the right to look at it and leave. Not clear what electronic resource is. CD-ROM's wear
we'll
durability of electronic
world, all you're buying
out.
Not
clear yet what
to do. Not making decisions about this for now.
Butterfield:
Thank
you
Perry gives
for this interview.
Butterfield
a tour of pertinent areas of
library including
postcard collection.
[End of Interview]
54
Picture
file,
newspaper
clipping files, maps,
Robert Gerlach
Sales
and
Marketing
Holsons Technologies
Rochester, NY
May 1, 1998
Butterfield: I
about a system that
printing, CD-ROM
means of
Electronic Image Management Systems and as I do with all my
looks roughly like this [diagram] scanner, computer, storage, and some
some sort of online access. Is this the sort of system you are using?
want to ask you some questions about
interviews, I'm talking
Gerlach: Yes.
writer or
do mainly the scanning
Butterfield: OK. That's the piece I'm focusing on.
Actually we
Gerlach: There
Then
you go.
Butterfield: Could
in the
we're
little
right
conversion part of
it
location.
do here and the kinds of systems you are using?
Gerlach: What Holsons does. Our primary purpose is a document conversion center. We're going to take
s hardcopy paper and convert it over to electronic image and ship it out to what their system is going to
you tell me a
about the sorts of work you
anybody'
handle. That
could
grown system.
be IBM image
It doesn't matter
If they don't have
The
giving
image.
them
in,
a system, we go
OpenText Live Links
package.
plus.
or
Excalibur
be
other one would
back their
Butterfield: What
hardcopy
sort of
It
what the
could
have,
be
a
file
It
net system.
we are able
could
be
a simple
we work with a couple manufacturers of electronic
and to go ahead and
Alchemy, it is
CD
Home
writer system.
to export images out to that system one way or another.
implement those
systems
a straight conversion shot.
imaging
for them to
That just take in
systems, such as
give them a complete
their
documentation,
paper, and back on a CD and the CD would contain the text the software and an
scanning
systems are you using.
Gerlach: The bread and butter we using the Kodak 900 series scanners. And then we go through Holsons design
workflow process. Most of their stuff they try to automate with machinery versus human beings, so they've
implemented bar code, sort of OCR their own OCR package process to basically enter electronically and use
minimal
labor to
make the error corrections off the
Butterfield: The bar coding
you are
talking
about
OCR.
is for entering
metadata?
Gerlach: Yes, metadata, extracting, if it is a form, certain fields, key fields.
Butterfield: How do you evaluate the quality of the scanning systems that you use?
Gerlach: Hopefully, we evaluate our scanning is going to be as good as anybody else's.
Butterfield: What
are the attributes of
Gerlach: Attributes
back
at an
image
And that is
because
nice.
of
quality
that we
quality that you look at?
look at is basically, despeckle, deskew, decrop,
on your screen, that you are
part of the process that we
handle
you can utilize your scanner at
Worst
get a clear
case
image
scenario, you try to
off of
it. When
Butterfield: What
Gerlach:
Quality
and enhance
do that it
to
slows
going to
get
work and effort
platforms go at
image if
you
can read versus
you
know,
200 dpi, the
have the
look
when you
something
you can't read.
reason
being
that
right process will come out
300 dpi, that is because the images are so bad that you need
it down to its knees, and the production value, as far as it
quality problems that you might have now?
if it is going to be forms or like a HCFA form
ready based on the document itself,
ve have triplicates of carbon paper
outs and you don't want the red dropouts, or
are some of those
problems are
you'
theory
holds, Garbage In, Garbage Out. So
still
and what we
everything that people perceive you are,
if they need it and that is the key point. Do
it?
Butterfield: You said HTFAF form?
Gerlach: A HTFAF form is a medical
you
really
try
to do
need that
you are
it
and
does take
some
information
and
how
do
is
enhance
clear
you
need
insurance
company.
to
to get the job done.
that going to have the red drop
and people think you are going to be able to see that. The
not
it. Scan
speed and also the
bring it down
you
long
takes us twice or three times as
full
going to see something you
I
claim
form that
most
insurer's
use
for the doctoral file form to hand to the
could show you one.
Butterfield: The quality piece of that is the color dropout.
for a long time. It all
Gerlach: The color dropout It is a standard form. It has been in the insurance industry
want
you to extract the data,
just
people
Some
OCR.
system or how people want to do
depends on what
imaging
55
they don't care
about the
form itself. Some
other people want to see the
plus, red is our most difficult color to handle in the document
imaging
form,
so
it
makes
it
a
litde interesting. And
world.
Butterfield: You talked
about color dropout. Do you have any requirements to be able to capture color, as color?
Gerlach: We don't like to do it. It is not our core. Our core is pretty much black and white documentation. If
to incorporate color such as photographs or color flow charts, we can do
people need
We try to
separate scanner.
avoid
it
it,
and
for that
we use a
at all costs.
Butterfield: You try to avoid it.
Gerlach: It is just not our core. We
are in the large document conversion world which means we get a million
We're going to take those midion pieces of paper and put them in electronic form. And to do that
to do it as fast as humanly possible. More people are in the graphic world where you need a photograph
pieces of paper.
you want
that
is
a specialized
Butterfield: Do
item
and a slower process and the cost to
you think the work
Gerlach: I
think the work
equipment
you need the
cropping.
In the business
is the information back
is there
high
as
and
I think
color
do that is
much greater.
input?
people that specialize
in that area,
die expertise,
of documentation, they don't really care
resolution scanner, you need
world
fast as
Butterfield: Are there image
is there for
you need the specialized
you are
going to do a lot of fine tuning,
fine detail, what they want to get
about the
possible.
in documents that are especially tough, tiny fonts.
Gerlach: Well, the smaller the fonts, yeah. We always like to see laser printing inputs because that is the easiest
especially in OCR applications. The harder ones are actuady dot matrix world and dot matrix carbon copy papers,
just because
you
lose
elements
generations
.
.
[unintelligible]
Backfile conversions, can you tell me what that is?
Gerlach: Backfile conversion is simple. What happens is before
Butterfield:
electronic conversion,
before
computers were
around, paper was your normal mode of document and recording information. Now you have two media, you have
files like Word or WordPerfect like now, you're typing into your computer, that's an electronic file.
your talking, like Xerox that has a long history. They keep all their records in hardcopy paper.
That why this budding is here. What they want to do sometimes is take that history in hardcopy format and convert
it into electronic format. That is what they call a backfile conversion. It is any old documentation that they want to
bring into the electronic arena that hasn't been in the electronic arena.
Butterfield: The backfile is paper archive and the conversion is taking that paper to electronic form.
electronic
What happens is that
Gerlach: Right. That is pretty much what we try to specialize in. There is also what they call Day 1 forward
want to
process, not everything is electronic of course, people have paper coming in and that is what they
incorporate.
Butterfield: The kinds
of output you get
from the
systems you are using.
For the
most part
it is image
output or
do
you sometimes produce searchable text?
Gerlach: It is mosdy, we're going to output a TIFF image, and that TIFF image, it all depends on how you want to
it. If went through the OCR process or what happens is there is really two types of retrieval. There is an
index scenario where it could take a document. We have to give that document a name or a retrieval name, name,
we scan it and I
and ID number to associate with the paper, or you use an OCR field, where
Social
retrieve
Security number,
hoc imaging, what I am looking for, but if I want to see everything with Holsons
to give me a hit on any
Technologies, I'm going to key into my database Holsons Technologies and it is going
document that contains Holsons Technologies.
Butterfield: Do you provide any sort of printed output to your customers or is it mostly electronic?
of paper, not creating it.
to avoid printed output because we are in the business of getting rid
Gerlach: We
don't know if we do
ad
try
Butterfield: So
your output
basicady becomes an
on-line archive someplace.
Usually what we do is give back to the customer the hardcopy
Gerlach: Yes. We're going to give them back
CD
with
the
CD
and
they can go ahead and import it into their own imaging
and electronic format is usuady a
whatever it is going to be?
RAID
or
or
optical
on
disk,
system. And maybe store it
to.
.
.
but you don't
Butterfield: Holsons basically acts as a service provider, taking in paper and providing electronic,
host the electronic archive.
but only if they request
Gerlach: We don't host it no. We may keep a copy for their backup security requirement,
it. We try to
pass
it through.
Butterfield: How do
you evaluate
the accuracy of the
accuracy, there is two things. There is an accurate.
systems
Accurate
56
that you use.
means
if you
We
go through a process.
As far
give us a midion pieces of paper in
as
electronic
format. To do that,
it is, it is just,
you are
we go
Actually, Holsons has designed a system called MIC/VIC and what
do data entry, they do double key. That is the only way to verify if
[unintelligible] I'm going to ask someone else them to match it up. If we
through.
.
.
sort of a way, when most people
going to have
a correct
image
or a
get a midion pieces of
paper, we're going to run them through a counter, we're going to run them through a scanner,
then we're going to run them through a counter again. If all three things match up, then we know we're correct
we've got a mdlion pieces of paper. If one of those don't match up, then we go through the process again until it
If not then
matches up.
call accuracy.
The
to make sure all the
a
lot
of checks and
we through a
other part
O's
are
human
is OCR is very
ad the
determine the OCR.
you
you throw software at
with
an
accuracy,
which means
A's. To do that,
s are
Gerlach: Well the OCR engine, that is
overriding
have them verify the
and
on
count.
That is
one of the parts that we
that if I scan in a piece of paper, I want
what we use
is
our
OCR
process and we go through
balances.
Butterfield: How do
search engines ad
big
A'
O's,
it
at
it
and the
looking
what
software,
at the same
all
letter
.
.
OCR is
ad about.
depends
OCR
on search
stands
for Optical Character Recognition, and
try to, if you have three
engines, is going to go and
try to determine what it is, if it is
and
a zero or an
O. If it doesn't
get an
it is going to kick it back and flag it that image or that letter. And you're going to go and verify
what they perceive the images would be. So they will give you, "I think it's an O, I'm not sure it's
vote then
the paper or
O", but it could be
an
A it could be
the content of the word and to the
going to be.
Butterfield: The
software
Gerlach: Yes. And
flags
makes the
[unintedigible] You've
got
a zero.
suspect
letters
going to go and try to fill it in. And usuady based on
right of it you should know exacdy what the letter is
you are
and the
and a
in. So OCR is
key
OCR
And
letters to the left
a
human actually looks at that
forms and that goes into the
output and says
HTFAF'
s
is this
or that.
because that's the
using OCR and what it is, it is going to take this piece of
Well, it means on 100 letters, then I've got five missing. I
A. There is a wide variety of ways that they do it. They break it down just to that
and
ICR. Most
people are
paper, I'd scan it through and I'd get 95% accuracy.
don't know
whether
letter. Some
my
people
question mark
is
an
O
or an
break it down to
letter. And then
a word and they'll
other ones will show you and you go get
different
flag
flag
that
letter. This O
would
OK it is
O it is
some of them show underneath,
be in yellow, say, and that is
1 it is a P or whatever. The
a
the paper and double check that it is going to be an O. So that's the
And that usuady makes a big difference in the
don't have to because it is time consuming.
processes.
to the paper if you
an
costs of
how
people can
do it. You don't
go
and ICR?
Gerlach: ICR is just Inteldgent Character Recognition. The
Butterfield: And the OCR
nobody has
perfected
that because
everybody's
cursive world and with cursive all the
letters
handwriting
are
recognition is going to be handwriting, and right now
is totally different. Also handwriting usually goes into
joined together
so you can't see a true separation.
So
someone
ICR package, they'll be very rich.
invents
Butterfield: What are you looking for in terms of the level of automation in the systems you are using?
Gerlach: Really depends on what the customer wants. Most backfile conversions, the reason why the are in a large
don't need that information in ten seconds or less. For them to get a simple,
warehouse or warehouses is that
a true total
they
put
it in,
retrieve
mission-critical
it in 30
seconds or
documentation
and
less
I
and
I
can
need to find
find it
it in
by filing
100 pieces of paper. Some people, it is
To them, they are going to spend the extra
through
nano-seconds.
of indexing that document such as name, social security number,
money and either OCR it have multiple ways
application is document or request for whatever it is going to be, is
of
kind
what
document
does
this
what
contain,
matter of cost to get it in the system,
this a PO, invoice, you name it and specify exactly what it is. For them, it is a
information
back.
want to spend depends on how bad they need that
and how much
they
Butterfield: What about the level
of automation of
Gerlach: The input
It
process such as
automation?
OCR
comes
or what we call scan
scanning these documents.
into ready the
fix,
which
speed of your scanner and speed of the software to go
is despeckle, deskew, decrop.
.
Butterfield: What is decrop?
sometimes
your image is cockeyed. Crop is going to be.
going to happen. Deskew if
scan field of an 8 Vi by
out When you scan in, you are going to have a
to get cropped out So
1 1 field. What happens if you put an 8 Vi by 14 piece of paper into it, the bottom is going
s.
of 8 Vi by 1 1 s and
mix
a
have
if you
ve got to make sure of these parameters. So
what you want to do is
black
band
there
see
a
to
are
you
going
what will take place is you set your crop for an 8 Vi by 14. For 8 Vi by 11,
what they call decropping. We're
that will take up a lot of memory for people, so what you want to do is that's
Gerlach:
Crop is what is
people want a part of the
.
.
document blocked
'
you'
57
14'
going to take that black border out of there and then the 8 Vi by 1 1 and make the image clean and make the image
its not taking as much memory on the CD or on the hard drive. So that's what they mean by decropping.
Butterfield: Do
you
routinely
drop
documents into
a
hopper tike
you'd
do
on a copier and
have them
so
run through
automatically?
Gerlach: We usually have
going to
a person watching.
put a piece of paper on and
it is going to
of paper and
50,000 times, it has
say I
take one after another.
skewed or crinkled and
operator
about
involved in
10,000
The scanning world is different than the copying world, like I'm
copies? The copying world is reams of nicely generated stacks
500
We're
on
the other end. And after that paper has been around
everything else. We're going to go stuff it back through a
watching it because there are what we call document jams or things do get
you want to watch that. So there's always an
you have to make sure.
crumpled corners and wrinkles and
going to have people
folded over and
scanner and we're
want
handling
pieces of paper.
Butterfield: And the
.
output of that process.
Gerlach: Right That's just.
.
.
documentation. We have the capability
And that's, for most people, fast.
.
Then it
despeckle, deskew, decrop, OCR,
.
.
of one scanner
in
an eight
That's for image capture, not OCR'ing.
Then you go through the
goes through other phases.
and that's
basically
the stages. In
fact,
over there
in the
hour
shift
doing
electronic process of
other room
I
can show
you the stages.
Butterfield: And just
I
so
right
get the numbers
that
is for
what resolution scan?
Gerlach: 200 dpi
Butterfield:
Binary
output?
Gerlach: Yes.
Butterfield: Do
come
you
have
an
issue
with
having
Do bound documents
to preserve the original form of the document?
to you? Is that a problem?
Gerlach: Bound documents, there are scanners if they need them bound and can't break the binding there are
production time, so it just comes down to an individual
basically what you call flat bed scanners that just slows our
what
or
scan
we'll
hand
do
a
I can't break it so
they call a book scanner. Personal preference. You just
case of
hey,
through an extra layer on and
Butterfield: What kinds
Gerlach: As far
it
do
you associate with
as costs associated with.
Butterfield: What
It is up to
means more cost to the person.
of costs
.
.
are the equipment costs?
Do
we
Or do
have to
you
them which way
they
want
to
do it.
the input system?
it
put
pay for
ad
in
capital and
equipment
equipment errr?
up front? Are
there click charges or
capital costs?
scanner world, it all depends. Most
Gerlach: Well, it is just like a copy machine, it is actually a scanner and in the
scanner. The one we have which
out
the
lease
to
are
probably going
people, if they are a growing company, they
conversion places are going to do one of
document
Most
$100,000.
about
costs
paper
a
of
pieces
can do'l0,000
day
which are slower and throw more people at them. Or you are
two things, they are going to buy lower end scanners
for
for bigger prices and throw less people at them to get the job done. So it is really
going to buy higher scanners
software
some
PC's
and
our
be
scanners,
to
equipment
our overhead, it is going to be people and its going
programmers to get the job done.
some
and
technology
you can share publicly?
Butterfield: The volume of scanning in this shop, is that something
out about 30,000 pieces of paper a day.
are
we
right
pumping
sites
probably
two
now,
Gerlach: Volume, with our
It
flows.
ebbs and
Butterfield-
If
accuracy
Gerlach-
we are
,
quadty issues. I talked to
I talked to
you about
you could
improve any
Throughput
which
photographic
of course.
just leads to
image
lot faster. If and
you about your requirements
.
.
,
for
^
throughput.
,
_
And for
you improve.
one of those elements, which things would
The
more
more profits.
versus a microfilm
documents
Most
we can put through
people came
image,
from the
and right now, the world
when the speeds of those scanner come
up to
in
world of
is,
a shorter penod of
microfilming
you
time,
which is
definitely can
the better off
taking
a
capture information a
meet those of the microfilm world, then a
lot
of
lower cost for the customer.
things will happen. Better profit margins and actuady
could be better or site specific image quality.
customers tell you that they wish the quality
your
of
Any
price. We have jobs that come out to $10 per
in
varies
Gerlach: It depends on the application and basically it really
mirror everything else.
duplication of what's coming back, mirror the fonts,
page because they expect perfect exact
me close to an
care about is get me an image. Find
Butterfield-
Then
we
image
have
and
people we
do it for
I'm happy. It depends
six cents a page.
on what
they
All they
We're
want.
58
doing
what their requirements are.
Butterfield: I think
you answered most of
my
questions.
showing me around your
Gerlach: Yes. Sure.
scanner.
Butterfield: Thank
You've been really helpful.
you.
.
If you've
.
[End of Interview]
59
got
just
another minute and wouldn't mind
Anne R.
Kenney
Associate Director for the Department
Allen Quirk
of
Preservation
Scanning Technician
University
Cornell
Ithaca, NY
May 1, 1998
Butterfield: Could
Kenney: In terms
you tell me about what sort of systems are
image capture,
of
We have
moved to
can meet
them,
we
do
outsourcing a good chunk of our
don't care what kind of system they
we
capturing beyond bitonal and looking at
requirements for halftone reproduction.
Butterfield: Capture beyond bitonal
being
used
here
at
Corned?
using XDOD's [Xerox Documents On Demand scanner].
material, we basicady define our requirements and who ever
some upstairs
We
are using.
which means capture
beyond
beginning
are
graphical content and what's the
bit
one
to
resolution, bit
do
investigation into
more
depth,
enhancement
kind
of
per pixel.
Kenney: Yes.
Butterfield:
Why go beyond bitonal?
Kenney: Well,
production
a good chunk of material requires.
Butterfield: Could
you
Kenney: Oh. Well,
of our current
principally
.
has
.
either
gray
scale or color
information,
or subtleties of
that require gray scale reproduction.
describe in
photographic
investigations
intaglio
depth
which we are
on the most prevalent
planographic and
more
kinds
what
of
inputs
you're
now.
requiring
materials, works of art on paper, and book idustrations
doing
under contract
for the
19th
book illustration types from the
processes.
And
Library
of
which
is the focus
Congress. We
are
of one
focusing
20th
centuries, for relief,
early
of them before the introduction of the halftone in
and
there was quite a range
the 1880's.
Butterfield: How do
you judge the quality of reproduction of those works?
Kenney: The quality of the original or the quality of the digital surrogate?
Butterfield: The quality of the digital. How do you determine what is a good
are you
Kenney: We do judgements both
depending on
which would
be
what
larger text Then
simtiar
on screen and via printouts, and we are
the requirements we've established.
and then at the
the
detail,
normal eye would see at
what would
"Where is the
to the detail
of an aquatint.
scan and what
is
a
bad
scan and what
for?
looking
.
.
.
a close
up
we've
looking
for levels
identified three levels
reading distances.
Thinking
examination or under slight
things as halftones.
you'll get
Or it
can
of
of
information,
information. One is
about the
dlustration
essence
as part of a
magnification, say up to maybe
evidence of the structure or process used to create the
[unintelligible]
Which is
be
And
dlustration
be exceedingly fine
this very fine reticulation of certain processes.
It
such as the tedtale
This
5X,
content?"
can
be
black lace
[example] happens
to be
Calotype. Aquatints have, it is like almost a cracked egg surface that is caused by the gelatin
process used. We discern Calotypes from Aquatints based on the character of the reticulation. Does the image file
allow for positive identification of the process used. Does the stroke and acid bite of an etching, is it evident in the
the process of a
master
file? Does the scoop
Butterfield: Are
you
of the stipple
actually
looking
engraving
come out?
to make those judgements from the scan of the image instead of the source
document?
Kenney: We look
at the source
document itself and
that information in the digital surrogate, we know
in
most cases we
probably
Butterfield: What kinds
would.
Because
do
define those levels
what we need
you are
of color requirements
we
talking,
you
of
information. If we
need to represent
to capture. We can obviously scale from that. And
1000, 1500, 2000 dpi for
have? You
some of these.
mentioned color a minute ago.
Kenney: We're very interested in converting museum objects, works of art on paper, color photographs and color
work to define ICC compliant color space, color
appearance is important to us. We are following, with interest the
for [unintelligible] and the emerging FlashPix file formats that have ICC conforming
and
management
particularly
capabilities associated with them.
Butterfield: What
are the
important
attributes of color that you want
60
to capture?
Kenney: Well, I'm interested in being able to capture the specific hues, the brightness and darkness, the level of
saturation. All of that is important. For many color documents, the color appearance doesn't have to be a
total,
exact match. Color maps, that sort of thing. It is
basically representational color radier than true, kind of
appearance color. But for works of art, it is critical for us to be able to get as close to the Chagall blue as possible in
the digital file, that kind of thing. So our color requirements will range across a
fairly good spectrum. We are
interested in color control from the point of either creating a photo intermediate or direct scan via use of
emerging
color management systems, the use of
targets.
Butterfield:
Kenney:
By targets you mean?
Gray scale, color targets.
Butterfield: Standards.
Kenney: Yes.
Butterfield: What
is important to
proportion.
you or
Kenney: Oh, text is
Our belief.
is text
.
most of what you
.
large
a
part.
.
described
so
far has been
Is that mostly
graphics elements.
what
.
a good chunk of what we've
imaged here. We've
converted close to three mdlion pages of text.
has been, you put the document there, you know your document,
imaging
you love it, you define carefully what is the significant information being conveyed by that document. And men you
create a digital surrogate that represents, futiy, that informational content. Because that digital surrogate has the best
.
.
at the center of our
approach
shot for longevity and utility if is rich enough to reflect those attributes and is rich enough for processing. While we
may be interested in creating a good, rich digital map, digital image of that document we'll want to be able to
process it or create alternative formats or alternative views to meet user needs. We may want to move to a PDF file.
We may want to OCR
the [unintedigible].
encode
that information. We may want to investigate new technologies for searching across
Butterfield: Your focus then is
processing operations?
Kenney: Yes. We don't
the technologies
from
digital image,
good
sophisticated or as
fly
capturing the image
they
all
our perspective
and then
faithfutiy so
have to be tied in
has been in the
at once.
I'm
that you can
That
capture of things.
alternative views or
creating
technology itself develops. So
what our user wants.
resolution
think
on
formats
subsequently do those
we understand enough.
Understanding
what
changes more
we created a good rich
.
.
the most stable of
it takes to
as users
quickly
database of images
post
create a
become
and then
more
derive
on
the
interested in the post-processing of raw gray scale information with sufficient
so it is not tied to a particular piece of machinery, a particular kind of scanning
also
to meet specific needs
approach.
Butterfield: What
about the
integrity of the
original
document.
Maintaining
the physical
integrity of the
original
that a large issue here?
document. Is
good deal of
ready large issue. As we move beyond basic, brittie printed text you get a
name of preserving its content. So we are very
in
the
document
the
to
destroy
faculty
interested in development of solutions for high quality, bound volume scanning. Non-destructive.
Butterfield: You have talked a lot about quality. What about storage requirements of documents?
Kenney: We have developed an approach that puts our digital masters at the heart of the system. So we need a
Kenney: Yes it is. It is
a
the part of
resistance on
for providing timely processing of those files to meet
ad of our image collections into one
for
amalgamating
user needs. We
currently investigating
I'm interested in
out to bid sometime, maybe, next year. I'm interested in RAID,
we'll
go
and
serving capability
of declining
advantage
to
take
been
made
what improvements in HS have
what hierarchical storage management.
storage system that
is
capable of
providing timely
.
costs
in
.
magnetic storage.
Butterfield: Is
data
access and
alternatives
are
storage space a
you are capturing.
.
big
consideration now?
Are
you
making any
concessions
in terms
of
the amount of
.
Kenney: No.
Butterfield:
Lossy compression?
Kenney: No.
Quahty is more important to you than.
Quahty and migration are both important
Butterfield: By migration you mean?
Kenney: That if we were to use a lossy compression scheme,
Butterfield:
.
Kenney: Yes.
the
compression process used
in migrating to
new
we'd
have to
formats. It is just
61
not
only worry
one more step.
about the
file format but
Butterfield: So it
just be
would
Kenney: Right. We're not,
another attribute of the persistence of that
you
know.
become less
storage costs are
.
.
file format.
of a
concern, so the need to compromise is
reduced.
Butterfield: Tell
me
about your requirements
something
for
a
level
involvement in the scanning process. How important is that.
Kenney: Well, to this point, it has been very important. Not
level
of automation or the
of operator
so.
I mean I would like to see the shift away from
item-by-item
either
at
the
scanner
or
at
the
review,
control
load to automated processes for inspection.
heavy
quality
For defining requirements for a rich enough bit stream that post processed according to guidelines based on the kind
of content
that we may be
intervention
Butterfield: What
Kenney: We
also
do
Butterfield:
I
with.
.
.
more control
develop automated methods for
done
image
by the end user and less human
evaluation.
now?
to insure that the system is operating correcdy. We do 100% quality
And
material.
structure,
files if there
then we'll check the
files
a check to make sure that the
By
.
becoming.
see more of that
as we
occurring
are the sorts of check that are
scan targets
facsimiles for text based
We'll
dealing
at the same time
.
you mean the
are structured
ordering
of pages.
.
seems to
appropriately
be
control on printed
a concern with the printouts.
in the
and stored
proper
fashion.
.
Kenney: The ordering of pages, segments, things like that.
Butterfield: Metadata? What kinds of metadata are you capturing?
.
.
Kenney: We have
more work to do in that area. We have defined requirements for particularly TIFF header
information. For basic structuring information for serials and monographs. At the very base level, we can retrieve
by page number, variants of page numbers. For serial literature, we want to make sure we provide access down to
the article
kind
level. We do
a combination at the point of scan to create references to tables of
The beginning
of stuff.
that tide information.
And
searching capability.
Butterfield: What kinds
of
and
and then we provide through
ending
beyond that
we will.
.
do
of costs
issues,
we will move to
.
OCR
and
you associate with this whole process
keying,
contents,
tight encoding to
and
how do
indexes,
access to the article
you
that
level
off
for text
provide
pay for it. Are the
costs
capital.
Kenney: We primarily have
two midion
convert about
involved,
cents an
received outside
images, both
at
This
most recent
University of Michigan,
project which was to
the capture costs of.
indexing
.
.
which
was about
fifty
image.
.
Butterfield: The forms
Kenney: Web
.
we were
for text [unintedigible].
doing
how do
you provide
it to
end users.
access.
Kenney: In this last
project we
off the shelf too
you either put
.
of output that you provide,
Butterfield: Turnaround time for
it back
shipment it has to be
isn't
conversion efforts.
and the
the selection, review, preparation, scanning, quality control, media, and base
Butterfield: And the level of quality
Kenney: These were 600 dpi bitonals that
books
funds for
Cornell
long,
so
scanning?
did,
How fast does scanning have to happen.
100,000 images in about 13 months. We don't like to keep the
say that outside of six months from when you pull it from the shelf to
about maybe
I'd probably
on the shelf or put a replacement on the shelf.
returned
in
a month, and then there
is this
We
work with outside
elaborate negotiation of.
.
.
vendors, we send out a
is the quality fine
or
if it
rescan.
Butterfield: You
said put the replacement
back
on
the shelf, you are scanning and reprinting books. What is the
process there?
Kenney: We
print on acid-free paper on
Butterfield: So the
Kenney: Yes. We
replacement
also
book is
which was most
have done
important to
Kenney: Quahty. We
Butterfield: And
[Xerox] DocuTech.
book based
on the
digital
scan.
some computer output microfilm.
Butterfield: We talked about quality,
you.
the
a remanufactured
.
we talked about
if I had to ask you
costs, we talked about turnaround time,
.
are a preservation program
specific attributes of
here! [laughs]
improved? I'd like better capture of halftones. I
quality that you'd like to see
based
level of gray scale with sufficient resolution, and then post processing
see a base
so that your
not tied.
document types. I think that's a real nice way to go so that you are
instance.
for
that
So
requirements.
intedectual
than
its
is tied to the physical constraints of the item rather
indicated earlier, I'd like to
.
on parameters specific to
.
.
capture
Butterfield: The
physical
image
constraints.
62
.
Kenney:
object'
No,
the physical
know, I don't
have to
want
If I have
s constraints.
to disbind it. If I have
I
create a photo mtermediate.
an oversize
be
want to
bound
a
able to
have
a
I have to
volume scanner,
map, I have to
use a particular
use a system that
type of scanning device
you
or
I
capability for processing, regardless of how I
dependent on a machine's native capabilities
it in a way that could be clearly defined that is not so
Butterfield: These are all the questions I have here. I'd like to take you
up on your offer to see your lab.
Kenney: Yes. Let me walk you up and I'll show you what we're doing. Let
me tell you a little about the project
that we're doing. As I said, it is with.
it is under contract with the
Library of Congress. And what we did was to
captured
.
work with
books
from
and serials
So
period.
halftones
.
faculty and curator's advisory committee here to
we
1850 to
about
have Calotypes
have line.
and we
.
in equating
have etchings and we have engravings and photogravure and we have
line art a couple of other. And then we worked with the
Advisory Committee
different levels of view, and I've been having a lot of fun
working with my
and we
what, you
minutes of arc are over that
select 22 different samples from
commercially printed
1920 somewhere that were the most prevalent
printing during this
simple
.
to define their requirements at
opthamologist
around
baby
"e"
know,
when you
take that test for reading at 16 inches, you know, how
many
of the digital equivalency. To define an essence level
translates into in terms
and then we'll
verify that against what our audience, what our viewers would say in a printed facsimde produced
from the digital file. I suspect our printing requirements will be higher than our capture requirements for
replicating
that capture requirements for replicating that information. We
may capture something at 300 dpi, but print it at 400
or 600 or something to give the appearance of what the human eye sees. So once we've established
those
requirements, we move it down as low as
production process at
1000 dpi,
are we
we can get
seeing it
how low
a quality we can get away with and
Butterfield: So you're looking for necessary
Kenney:
not
Necessary and
linear. At
want
sufficient.
We
it to.
.
and still retain those
.
800 dpi? I
want to not see
basic
If we
attributes.
how high
a
quality
are
seeing
get but
we can
stiti meet our customer requirements.
and sufficient
it levels off. Quality and cost and resolution are
doing anything more than increasing your costs. So we
want to get that curve as
point adding more dpi
some
at
or
bit depth isn't
to hit it right where the curve is level.
Butterfield: So
when you said quality was most important it not most important in a vacuum.
Kenney: It is quality consonant with what the content of the original is. When we worked with Xerox on the joint
study, we were very interested in 600 dpi capabihty because we did an assessment of production topography and
printing techniques of the last century and a half, looking at the use of metal type and the limitations of that metal
type with large print runs. And so printers had to be very careful about how small and closely spaced they made
those letters. And I looked at all the common typefaces and
they were produced for mass publications
at
five
point
type and above. And below that was hand lettered or wonderful evidence of skill, but not made for mass
And
consumption.
greater.
you the
dpi
And
necessary
for
so that
look
at the size of
the smallest material in five point type, it is one midimeter high or
at your requirements
for capture, that 24 dots
across a
1
mdlimeter
led
us
for the kinds
of
[Library of Congress]. As
we
In the 600 dpi bitonal is
us.
to current effort with LC
not enough
high
character gives
it. And that translated to
idustrations that
defined those
are present
objective.
.
.
a
600
in those
As
we
is in the analog, then we map to a digital equivalency. And if we can
for conversion, and we are taking it one step further working
our
requirements
we can begin to define
small high-end imaging company called Picture Elements. We will take one instance of that which is
objectify the
define those,
with a
look
capture to eliminate moire and weirdnesses that wid come out of
requirement
books,
as you
as you
halftones
subjective requirements
and create software
for
what
for automatically
detecting halftone
content region,
applying the
appropriate settings
and...
Butterfield: Autosegmentation?
Kenney: Exacdy. We
were
impressed
with what
Xerox had done
with
the
Autosegmentation, but it doesn't
work
more flexibly responsive to the
very well on older halftone types. So we're looking at something that would be
software
that we would put up on
free
become
a
would
then
that
And
kinds of materials that we need to convert.
the
Calotype.
You know, that
take
to
do
to
What
it
is
capability.
that
roll
out
some site and then continue to
going
kind
of stuff.
400 dpi
And then budd that
eight-bit stream that
associated with
very fine
suite of software to
is going to do most
reticulation
in
be
able
to, from any kind of. probably it is something like
interested in. There wid be some compromises
.
.
of what we're
some of those
things, but
you
know in the
big
scheme of
things, that is
probably going to do it
Butterfield: The
graphic elements that you said are the most
fonts...
63
chadenging
content.
.
.
you talked about
five
point
a
Kenney: Photographic
stuff
is [feigns choking] One, it is hard to objectify
when you start moving into color.
what
is the detati. And two, they
are so
information rich. In particular,
Butterfield: Are Asian fonts
all
we've
that
you
it
a requirement?
looked into kanji
and we've looked at non-Roman alphabets, and actually the Japanese weren't
Yes,
interested in scanning until it got up to 400 [dpi] because of the. it is very subtle distinctions. Let me show
I'm doing a little show and tell. This is our favorite font. It is the Bodoni italic four point. And
what we use.
Kenney:
.
this guy, Mr.
by
was produced
1796
and was used
which
is typical,
Bodoni,
and
in the late
in the Wild West for Wanted
two, it is
and
characterized
strokes that are associated with
it. If we
18ttl
posters.
by the
century was used to print Dante's Divine Comedy in
What is nice about Bodoni is, one, the italic is on an angle,
exaggeration
can capture
this,
between the thickness
we capture
it
with that with good assurance
Butterfield: So if
Kenney: If we
we can capture
can capture
Bodoni.
.
and the thinness of the
capturing 99% of the typographical
don't want to do page-by-page review.
we are real sure we are
information in printed materials. Most pages don't take a 600 dpi. But
So
.
.
.
we
that we are getting it.
.
Bodoni italic four point,
we are
capturing
whatever
is there. It is
a
bar
set
higher than the
standard.
printing
Butterfield: It is
a pretty chadenging target.
Let me take you up and show
Kenney: It is!
Butterfield: What is the
name of your
you our
baby lab.
scanning technician?
Kenney: Allen Quirk.
Paul about what you've got in terms of the 20 or so different idustrations. Allen
him to measure, as best as possible, the finest feature that evidences the
been
I've
asking
may
process used. So it could be the squiggle of the Calotype, the halftone. This is what he's doing. This is easy
because this is line art. It is easy, but as you move beyond, like this is a photogravure, it is very difficult. Those
difficult to put your hand on where your resolution requirement is. We're no fools, we start
tone processes are
Kenney: Aden,
can you talk with
seek your advice.
very
with
the text, it is easy to
measure.
And
so we are
the
exacdy the squiggles, but to represent in
That's
what we'll
Butterfield: So
be
you
file,
pressing the limits
of that
looking for.
don't actually
need
All
we
ready
need
is
not
the evidence that the squiggles convey that this is a
to be at the Nyquist of the image
to replicate
Calotype.
information, something lower
something like 0.04, 0.02 mtilimeters.
Kenney: That's right so they are looking
but we have not gotten to gravure. Photogravure in
right
at
now,
Quirk: That is what we are looking
than that.
at
finer than that.
going to be
0.08 midimeters to 0.02
As
we are
working through the various
etching types
particular
and relief printing, we are
mtilimeters.
do you mind if I record?
By
right now. And then as we get
Quirk: Oh, that's ok. So that's what we're finding as we are looking at
and finer measurements.
magnification
higher
use
have
to
difficult ones, I'm sure that we are going to
Kenney: And you are using a 50X.
limits.
a 50X [loupe] right now and I'm reaching my
Quirk: I'm
Butterfield:
is
working
the way,
.
into more
.
using
send me e-mail if there are any questions.
Kenney: I'm going to go, as I'm missing this meeting. Listen,
appreciate it.
Butterfield: Thank you. Thank you very much, I ready
ve measured so far with a 50X loupe.
Butterfield: The finest structures
increments down to 0.02 mm. And m etchings I m finding
Quirk50X loupe And that measured with measuring
that.
Mezzotint Calotype, it is going to exceed that That 1S
beyond
that And photogravure I'm sure will go way
that element is that we are
both in terms of measurement and definition, you know, defining
try our
them the squiggles of the
it is going to be defining a block that will show
going to have to capture. Or maybe
know
if we are gomg to 24 bit, Anne
I don't
believe.
Calotype or the grain of the photogravure. And right now, I
what
going to
.
may have told
Butterfield: Well,
you already.
per
.
.
she told me about
.
both
monochrome and color and
I
suppose color presumes
.
24 bit.
M
.
.
~
.
,
.
Not 24 bit
pixel, right?
Anne has m mind tor this.
Quirk: Well, that is what I'm asking, I don't know what
at?
Butterfield: The types of images you are looking
n,nhmo
photogravure, mezzotint Calotype
Quirk: Halftones, Steel engravings, Copper engravings, etchings,
print.
64
h AnictaHr
Anistatic
and
Butterfield: Anistatic print?
Quirk: I'm not familiar with it and given the implication
But I'm still in my education process in this.
of
the name, I'm surprised to read it was used in 1861
Butterfield: What
else can you tell me about your role in this.
I said, I have not gotten to the limit of what I am able to do at
this exact point We're having
someone who will be in on
Monday that is going to be able to explain the processes so that between the two of us
we will be able to define what is that element that
determines that this is a steel engraving as opposed to a
engraving. Or what is the element that it was an
etching instead of a steel engraving? Does one taper off in a
different way and therefore we need something much finer, that type of thing. And then
there is going to be the
problems with things such as this. We are just
getting to the limits of what we are going to be able to measure with
Quirk: Well,
as
this.
Butterfield: Are
you
Quirk: I think that
degree
angle which
Butterfield: Ad
worrying about the angles at which these structures appear as well?
be finding out if we have to. I know that some of our halftones are
we'll
is
usuady find now. We've been
here is monochrome.
what we
of what
I
see
Quirk: That is
what we are
intended to
monochrome, it
me
looking
at right now.
Which
finding halftones
suggests to me.
was printed with one color
ink, but as
.
.
and actually,
you can
not
going to be
on a
45
on other angles.
tell,
it is
not monochrome.
as you start
looking
It is
closer and
closer, there is definite depth. I think the plan right now is to consider these gray scale and capture monochrome.
And one of the things we are running into is finding, for example here, we measure a line width at 0.02 mm. At
0.04
have a light line, and then a dark line, so at that resolution, we are not into bitonal, even
is printed, presumably with one color ink, and this one here is a steel engraving. A
very fine steel
engraving. And the other thing in terms of a
scanning challenge, which I think of as a technician is that I know that
I can pick up 0.04 mm even if it is fairly light as long as I don't have to pick up the same area of lightness in a dark
point
mm we will
though this
area.
In
other words,
I
can
have that
come
out and we
can start
be getting the detail in the what lights on dark
Butterfield: So you need both the resolution and the dynamic
to get this adequately,
range to get the
but then
we are not
going to
black.
Quirk: Well, in this particular case, what I am suggesting is that the resolution is cheated towards picking up
darkness. In other words, if there is something mere, it wants to make sure it gets it at the expense of picking up
very small white area. If this were bitonal, you wouldn't care about dynamic range.
Butterfield: Because
Quirk: If it
was
you
just
just black
captured the structure where ever
or white,
right. You
it
a
existed.
could just capture the structure at whatever
level
of
detail
you
have to worry about dynamic range. When we measure something at 0.04 mm, we are not
measuring down to its bitonal structure yet. We're measuring down to maybe the steel plate structure, but a thinner
scratch on the steel plate, it will lay less ink on the paper, and as a result it will look gray or black, light or dark. So
needed.
You
either we're
wouldn't
going to have to measure further down or we are going to have to accept that that is ad we can have. If
0.02 mm, which means two line pairs across it. And that is just not going to happen. But if we
we can resolve your
can resolve that
in gray scale, that
would
be
adequate.
That is
what
I'm assuming that
we are
that you don't need to go the next step down to say, this is made up of litde hatched black
of heavier
black lines,
within
that very
fine line, they
each
have different
[Tape ends]
[End of interview]
65
going to determine,
and this is made up
lines,
structural characteristics.