EMC Captiva Recognition

EMC CAPTIVA RECOGNITION
INTRODUCTION
This document provides an overview of the recognition engines in support of both
EMC® Captiva® Capture and Advanced Recognition engines. It outlines what each
recognition engine provides and notes which engines are included, as well as those
that are licensed separately. You will also find tips on which engine is recommended for
certain use cases.
EMC PRODUCT DESCRIPTION GUIDE
Table of Contents
EMC CAPTIVA CAPTURE (FORMALLY KNOWN AS
INPUTACCEL)
3
GENERAL-USE OCR
3
EAST EURO/APAC OCR
5
BARCODE RECOGNITION
7
EMC CAPTIVA ADVANCED RECOGNITION
(FORMALLY KNOWN AS DISPATCHER)
8
GENERAL-USE OCR
8
WESTERN OCR
9
EAST EURO/APAC OCR FOR CAPTURE
10
ADVANCED ZONAL OCR/ICR AND FULL PAGE OCR
12
BARCODE RECOGNITION
13
CHECK READING
14
THIRD PARTY RECOGNITION ENGINE SUPPORT
15
PRIME RECOGNITION FOR EMC CAPTIVA CAPTURE
15
CAPTIVA RECOGNITION CAPABILITIES
16
GENERAL GUIDELINES
18
ADDITIONAL INFORMATION
19
EMC CAPTIVA CAPTURE (FORMALLY KNOWN AS
INPUTACCEL)
EMC Captiva Capture supports two processing types: zonal recognition and full-text recognition.
Zonal recognition is used to capture data from structured forms where field location is the same
from one image to the next. Full-text recognition is used to capture all text on the page for full-text
search and archiving purposes.
CAPTIVA CAPTURE
GENERAL-USE OCR
EMC Captiva General-Use OCR is the standard engine included with the Captiva Capture Standard
and Enterprise Capture Server.
With the inclusion of Advanced Zonal OCR/ICR and its superior handprint recognition into EMC
Captiva Advanced Recognition, the current Standard Handprint/General-Use ICR engine will be
deprecated in the near future. Existing projects that use Standard Handprint/General-Use ICR
should be migrated to use the handprint engine included with Advanced Zonal ICR as soon as
possible.
Processing type
Recognition type
Licensing
Notes
Full-text
Machine printed
Machine
•
1D barcodes
printed –
including dot matrix, OCR-A,
delivered as
and OCR-B fonts
Zonal
ICR
standard.
•
1D barcodes –
delivered as
standard.
license
Uses up to three engines for
optimal accuracy
•
Provides support for a wide
variety of output formats,
ICR requires
an additional
Recognizes machine print
including TIFF, PDF, PDF/A
•
Can carry out full page form
recognition, ICR, OCR, multi-
Note: The
line and omni-font characters
three
recognition and barcode
processing
recognition
types can be
run at the
same time.
•
Over 100 languages are
available
•
Engine requires a license
•
Supports 2 recognition types:
o
Character recognition
o
Barcode recognition
Barcode support
EMC Captiva General-Use OCR/ICR engine also provides barcode recognition and supports the
following barcode types:
Codabar, Codabar with start-stop char transmit, Code 128, Code 128 with check digit transmit,
Code 39, Code 39 full ASCII mode, Code 39 with check digit control and transmit, Code 39 with
start-stop char transmit, EAN8/13, EAN/UPC with 2 and 5 digit supplement, ITF (2 of 5 interleaved),
ITF with check digit control and transmit, Postnet code, UCC Code 128, UPC-A, UPC-E (6-digit)
Language recognition support
Afrikaans, Albanian,Aymara, Basque,Bemba, Blackfoot, Brazilian-Portuguese, Breton, Bugotu,
Bulgarian (Cyrillic), Byelorussian (Cyrillic),Catalan, Chamorro, Chechen, Chinese (Simplified),
Chinese (Traditional), Corsican, Croatian, Crow, Czech, Danish, Dutch, English, Eskimo (Inuit),
Esperanto, Estonian, Faroese, Fijian, Finnish, French, Frisian, Friulian, Gaelic (Irish), Gaelic
(Scottish), Galician Ganda, German, Greek, Guarani, Hani, Hungarian, Icelandic, Ido, Indonesian,
Interlingua, Italian, Japanese, Kabardian, Hawaiian, Kasub, Kawa, Kikuyu, Kongo, Korean,
Kpelle,Kurdish, Latin, Latvian, Lithuanian, Luba, Lule Sami, Luxembourgian, Macedonian (Cyrillic),
Malagasy, Malay, Malinke, Maltese, Maori, Mayan, Mia, Minankabaw, Mohawk, Moldavian (Cyrillic),
Nahuatl, Northern Sami, Norwegian, Nyanja, Occidental, Ojibway, Papiamento, Pigin English, Polish,
Portuguese, Provencal, Quechua, Rhaetic, Romanian, Romany, Ruanda, Rundi, Russian (Cyrillic),
Sami (Lappish), Samoan, Sardinian, Serbian (Cyrillic), Serbian (Latin alphabet), Shona, Sioux,
Slovakian, Slovenian, Somali, Sorbian (Wend), Sotho, Southern Sami, Spanish, Sundanese, Swahili,
Swazi, Swedish, Tagalog, Tahitian, Tinpo, Tongan, Tswana (Chuana), Tun, Turkish, Ukrainian
(Cyrillic), Visayan, Welsh, Wolof, Xhosa, Zapotec, Zulu
Note: Twenty languages installed by default and up to 123 languages available, including English,
Simplified Chinese, Japanese and Korean.
CAPTIVA CAPTURE
EAST EURO/APAC OCR
EMC Captiva Capture East Euro/APAC OCR module performs optical character recognition of scanned
or imported images and exports the image and index data to more than 25 different word
processing and text formats. It recognizes several languages with a specialty towards Eastern
European and Asia Pacific languages. EMC Captiva Capture East Euro/APAC OCR module is an addon to the Captiva Capture Standard and Enterprise Capture Severs.
Note: The East Euro / APAC OCR engine can be used for zonal extraction without Advance
Recognition but this is not strongly recommended. In this specific mode, the data retuned from the
OCR engine is not UIM data but data that is stored in IA Values. Thus, to pass the data to Captiva
Desktop, custom code needs to be written to pass the IA Values to UIM data (in order to present the
data in Captiva Desktop). Furthermore, the zonal setup is accomplished via the module itself and
not through the Captiva Designer. Therefore, it is strongly and highly recommended to use the East
Euro / APAC OCR engine with Advanced Recognition (formerly known as Dispatcher) instead.
Processing type
Recognition type
Licensing
Notes
Full-text
Machine printed
Unlimited
•
Zonal (not
strongly
Uses a single recognition
recognition per
engine. Configure and define
client
different zones for different
content types, including text,
recommended per
pictures, tables, barcodes,
note above)
and check marks.
•
Recognizes multiple
recognition languages
simultaneously
•
Supports many popular output
formats including PDF
•
Supports more than one
hundred languages
•
Not strongly recommended for
zonal but appropriate for full
text machine printed text
•
Recognizes many types of
barcodes
•
Can process documents
corresponding to one or more
languages at a time
Barcode support
EMC Captiva Capture East Euro/APAC OCR module also performs barcode recognition and supports
the following barcode types:
Codabar, Code 128, Code 39, Code 93, EAN 13, EAN 8, IATA 25, Industrial 25, Interleaved 25,
Matrix 25, PostNet, UCC 128, UPC-E
Language recognition support
Abkhaz, Adyghe, Afrikaans, Agul, Albanian, Altaic, Armenian (Eastern), Armenian (Grabar),
Armenian (Western), Avar, Aymara, Azerbaijani (Cyrillic), Azerbaijani (Latin), Bashkir, Basque,
Belarussian, Bemba Blackfoot, Breton, Bugotu, Bulgarian, Buryat, Catalan, Chamorro,Chechen,
Chinese (PRC),Chinese (Taiwan), Chukcha, Chuvash, Corsican, Crimean Tatar, Croatian, Crow,
Czech, Danish, Dargwa, Dungan, Dutch (Netherlands), Dutch (Belgium), English, Eskimo (Cyrillic),
Eskimo (Latin), Esperanto, Estonian, Even, Evenki, Faroese, Fijian, Finnish, French, Frisian, Friulian,
Gaelic Scottish, Gagauz, Galician, Ganda, German, German (new spelling), German (Luxembourg),
Greek, Guarani, Hani, Hausa, Hawaiian, Hebrew, Hungarian, Icelandic, Ido, Indonesian, Inguish,
Interlingua, Irish, Italian, Japanese, Kabardian, Kalmyk, Karachay-Balkar, Karakalpak, Kasub, Kawa,
Kazakh, Khakas, Khanty, Kikuyu, Kirghiz, Kongo, Korean, KoreanHangul, Koryak, Kpelle, Kumyk,
Kurdish, Lak, Lappish, Latin Latvian, Lezgin, Lithuanian, Luba, Macedonian, Malagasy, Malay,
Malinke, Maltese, Mansi, Maori, Mari, Maya, Miao, Minankabaw, Mohawk, Mongol, Mordvin, Nahuatl,
Nenets, Nivkh, Nogay, Norwegian, Norwegian (Bokmal), Norwegian (Nynorsk), Nyanja, Occidental,
Ojibway, Old English, Old French, Old German, Old Italian, Old Spanish, Ossetian, Papiamento, Pigin
English, Polish, Portuguese (Brazil), Portuguese (Portugal), Provencal, Quechua, Rhaeto-Romanic,
Romanian, Romanian (Moldavia), Romany, Ruanda, Rundi, Russian (Old Spelling), Russian,
Samoan, Selkup, Serbian (Cyrillic), Serbian (Latin), Shona, Sioux, Slovak, Slovenian, Somali,
Sorbian, Sotho, Spanish, Sunda, Swahili, Swazi, Swedish, Tabassaran, Tagalog, Tahitian, Tajik,
Tatar, Thai, Tinpo, Tongan, Tswana, Tun, Turkish, Turkmen, Tuvan, Udmurt, Uighur (Cyrillic),
Uighur (Latin), Ukrainian, Uzbek (Cyrillic), Uzbek (Latin), Visayan, Welsh, Wolof, Xhosa, Yakut,
Yiddish, Zapotec, Zulu
Also support for several formal languages: Basic, C++, Cobol, Fortran, Java, Pascal, chemical
formulas, E13B, CMC7.
CAPTIVA CAPTURE
BARCODE RECOGNITION
EMC Captiva Capture Barcode Recognition carries out barcode recognition. Barcode Recognition
settings supports two barcode types:
•
1D barcode parameters
•
2D barcode parameters
Processing type
Recognition type
Licensing
Notes
Zonal
1D barcodes
Delivered as
•
2D barcodes
standard
Can detect several barcodes
of different types in a
document
•
Configuration files can be
customized to detect specific
1D or 2D barcodes
•
A resolution of 300 DPI is
recommended for smallersized barcodes - anything
lower could render the
barcode unreadable by the
engine
Barcode Recognition support
EMC Captiva Capture Barcode Recognition supports the recognition of the following barcode types:
1D barcodes:
Add 2, Add 5, Airline 2 of 5 (IATA 2 of 5), Australian Post, BCD Matrix, Codabar, Code 2 of 5
(Industry 2 of 5), Code 32, Code 39, Code 39 Extended, Code 93, Code 93 Extended, Code 128,
UCC/EAN 128, DataLogic 2 of 5, EAN 8, EAN 13, Intelligent Mail, Interleaved 2 of 5, Invert 2 of 5,
Matrix 2 of 5, Patch Code, Royal Post, UPC-A, UPC-E, and PostNet.
2D barcodes:
PDF-417, QR, Data Matrix
EMC CAPTIVA ADVANCED RECOGNITION (FORMALLY
KNOWN AS DISPATCHER)
EMC Captiva Advanced Recognition supports two processing types: zonal recognition and full-text
recognition. Zonal recognition is used to capture data from structured forms where field location is
the same from one image to the next. Full-text recognition is primarily used for semi-structured or
unstructured documents where text does not reside in a static location from one image to the next.
EMC Captiva Advanced Recognition supports several recognition types, including machine-printed
text, hand-printed text, check marks, courtesy amount recognition (CAR), legal amount recognition
(LAR), and MICR/CMC7.
The following information provides a breakdown on all the recognition options for EMC Captiva
Advanced Recognition and indicates which engines are included.
CAPTIVA ADVANCED RECOGNITION
GENERAL-USE OCR
The General-Use OCR is the standard engine included with EMC Captiva Advanced Recognition.
With the inclusion of Advanced Zonal OCR/ICR and its superior handprint recognition into EMC
Captiva Advanced Recognition, the current Standard Handprint/General-Use ICR engine will be
deprecated in the near future. Existing projects that use Standard Handprint/General-Use ICR
should be migrated to use the handprint engine included with Advanced Zonal ICR as soon as
possible.
Processing type
Recognition type
Licensing
Notes
Full-text
Machine printed
Machine
•
Zonal
1D barcodes
printed –
including dot matrix, OCR-A,
delivered as
and OCR-B fonts
ICR
standard.
•
1D barcodes –
delivered as
standard.
license
Uses up to three engines for
optimal accuracy
•
Provides support for a wide
variety of output formats,
ICR requires
an additional
Recognizes machine print
including TIFF, PDF, PDF/A
•
Can carry out full page form
recognition, ICR, OCR, multi-
Note: The
line and omni-font characters
three
recognition and barcode
processing
recognition
types can be
run at the
same time.
•
Over 100 languages are
available
•
Engine requires a license
•
Supports 2 recognition types:
o
Character recognition
o
Barcode recognition
Barcode support
EMC Captiva Advanced Recognition General-Use OCR/ICR engine also provides barcode recognition
and supports the following barcode types:
Codabar, Codabar with start-stop char transmit, Code 128, Code 128 with check digit transmit,
Code 39, Code 39 full ASCII mode, Code 39 with check digit control and transmit, Code 39 with
start-stop char transmit, EAN8/13, EAN/UPC with 2 and 5 digit supplement, ITF (2 of 5 interleaved),
ITF with check digit control and transmit, Postnet code, UCC Code 128, UPC-A, UPC-E (6-digit)
Language recognition support
Afrikaans, Albanian,Aymara, Basque,Bemba, Blackfoot, Brazilian-Portuguese, Breton, Bugotu,
Bulgarian (Cyrillic), Byelorussian (Cyrillic),Catalan, Chamorro, Chechen, Chinese (Simplified),
Chinese (Traditional), Corsican, Croatian, Crow, Czech, Danish, Dutch, English, Eskimo (Inuit),
Esperanto, Estonian, Faroese, Fijian, Finnish, French, Frisian, Friulian, Gaelic (Irish), Gaelic
(Scottish), Galician Ganda, German, Greek, Guarani, Hani, Hungarian, Icelandic, Ido, Indonesian,
Interlingua, Italian, Japanese, Kabardian, Hawaiian, Kasub, Kawa, Kikuyu, Kongo, Korean,
Kpelle,Kurdish, Latin, Latvian, Lithuanian, Luba, Lule Sami, Luxembourgian, Macedonian (Cyrillic),
Malagasy, Malay, Malinke, Maltese, Maori, Mayan, Mia, Minankabaw, Mohawk, Moldavian (Cyrillic),
Nahuatl, Northern Sami, Norwegian, Nyanja, Occidental, Ojibway, Papiamento, Pigin English, Polish,
Portuguese, Provencal, Quechua, Rhaetic, Romanian, Romany, Ruanda, Rundi, Russian (Cyrillic),
Sami (Lappish), Samoan, Sardinian, Serbian (Cyrillic), Serbian (Latin alphabet), Shona, Sioux,
Slovakian, Slovenian, Somali, Sorbian (Wend), Sotho, Southern Sami, Spanish, Sundanese, Swahili,
Swazi, Swedish, Tagalog, Tahitian, Tinpo, Tongan, Tswana (Chuana), Tun, Turkish, Ukrainian
(Cyrillic), Visayan, Welsh, Wolof, Xhosa, Zapotec, Zulu
Note: Twenty languages installed by default and up to 123 languages available, including English,
Simplified Chinese, Japanese and Korean.
CAPTIVA ADVANCED RECOGNITION
WESTERN OCR
The Western OCR performs full-text and zonal capture of machine-printed text and is included in
EMC Captiva Advanced Recognition.
Processing type
Recognition type
Licensing
Notes
Full-text
Machine printed
Delivered as
Specialty is western languages
Zonal
Standard
Language recognition support
English, German, Danish, Spanish, Finish, French, Dutch, Italian, Norwegian, Portuguese, and
Swedish.
Note: The Western OCR engine cannot read Asian characters, but can be used on Asian operating
systems to read Western characters
CAPTIVA ADVANCED RECOGNITION
EAST EURO/APAC OCR FOR CAPTURE
EMC Captiva Advanced Recognition East Euro/APAC OCR for Capture module performs optical
character recognition of scanned or imported images and recognizes several languages with a
specialty towards Eastern European and Asia Pacific languages. The East Euro/APAC OCR for
Capture is an add-on to EMC Captiva Advanced Recognition.
Processing type
Recognition type
Licensing
Notes
Full-text
Machine printed
Unlimited
•
Zonal
Uses a single recognition
recognition per
engine. Configure and define
client
different zones for different
content types, including text,
pictures, tables, barcodes,
and check marks.
•
Recognizes multiple
recognition languages
simultaneously
•
Supports many popular output
formats including PDF
•
Supports more than one
hundred languages
•
Appropriate for zonal and full
text machine printed text
•
Recognizes many types of
barcodes
•
Can process documents
corresponding to one or more
languages at a time
Barcode support
EMC Captiva Advanced Recognition East Euro/APAC OCR module also performs barcode recognition
and supports the following barcode types:
Codabar, Code 128, Code 39, Code 93, EAN 13, EAN 8, IATA 25, Industrial 25, Interleaved 25,
Matrix 25, PostNet, UCC 128, UPC-E
Language recognition support
Abkhaz, Adyghe, Afrikaans, Agul, Albanian, Altaic, Armenian (Eastern), Armenian (Grabar),
Armenian (Western), Avar, Aymara, Azerbaijani (Cyrillic), Azerbaijani (Latin), Bashkir, Basque,
Belarussian, Bemba Blackfoot, Breton, Bugotu, Bulgarian, Buryat, Catalan, Chamorro,Chechen,
Chinese (PRC),Chinese (Taiwan), Chukcha, Chuvash, Corsican, Crimean Tatar, Croatian, Crow,
Czech, Danish, Dargwa, Dungan, Dutch (Netherlands), Dutch (Belgium), English, Eskimo (Cyrillic),
Eskimo (Latin), Esperanto, Estonian, Even, Evenki, Faroese, Fijian, Finnish, French, Frisian, Friulian,
Gaelic Scottish, Gagauz, Galician, Ganda, German, German (new spelling), German (Luxembourg),
Greek, Guarani, Hani, Hausa, Hawaiian, Hebrew, Hungarian, Icelandic, Ido, Indonesian, Inguish,
Interlingua, Irish, Italian, Japanese, Kabardian, Kalmyk, Karachay-Balkar, Karakalpak, Kasub, Kawa,
Kazakh, Khakas, Khanty, Kikuyu, Kirghiz, Kongo, Korean, KoreanHangul, Koryak, Kpelle, Kumyk,
Kurdish, Lak, Lappish, Latin Latvian, Lezgin, Lithuanian, Luba, Macedonian, Malagasy, Malay,
Malinke, Maltese, Mansi, Maori, Mari, Maya, Miao, Minankabaw, Mohawk, Mongol, Mordvin, Nahuatl,
Nenets, Nivkh, Nogay, Norwegian, Norwegian (Bokmal), Norwegian (Nynorsk), Nyanja, Occidental,
Ojibway, Old English, Old French, Old German, Old Italian, Old Spanish, Ossetian, Papiamento, Pigin
English, Polish, Portuguese (Brazil), Portuguese (Portugal), Provencal, Quechua, Rhaeto-Romanic,
Romanian, Romanian (Moldavia), Romany, Ruanda, Rundi, Russian (Old Spelling), Russian,
Samoan, Selkup, Serbian (Cyrillic), Serbian (Latin), Shona, Sioux, Slovak, Slovenian, Somali,
Sorbian, Sotho, Spanish, Sunda, Swahili, Swazi, Swedish, Tabassaran, Tagalog, Tahitian, Tajik,
Tatar, Thai, Tinpo, Tongan, Tswana, Tun, Turkish, Turkmen, Tuvan, Udmurt, Uighur (Cyrillic),
Uighur (Latin), Ukrainian, Uzbek (Cyrillic), Uzbek (Latin), Visayan, Welsh, Wolof, Xhosa, Yakut,
Yiddish, Zapotec, Zulu
Also support for several formal languages: Basic, C++, Cobol, Fortran, Java, Pascal, chemical
formulas, E13B, CMC7.
CAPTIVA ADVANCED RECOGNITION
ADVANCED ZONAL OCR/ICR AND FULL PAGE OCR
EMC Captiva Advanced Recognition Advanced Zonal OCR/ICR and Full Page OCR is an engine that
provides full-text and zonal capture of machine-printed text and zonal capture of hand-printed text.
Beginning with Captiva 7.0, Advanced Zonal OCR/ICR and Full Page OCR is included as part of EMC
Captiva Advanced Recognition.
Recognition
Processing type
name
Advanced
Licensing
Notes
Machine-print
Delivered as
Preferred choice for
ICR
standard
applications that
Recognition
type
Full-text, zonal
Zonal
OCR/ICR
require a very high
degree of accuracy
and that include
both machine and
hand print
recognition
requirements
Advanced Full
Full-page
Machine-print
Page OCR
Delivered as
Preferred choice for
standard
applications that
require a very high
degree of accuracy,
and that include
both machine and
hand print
recognition
requirements
Language recognition support
Australia, Austria, Azerbaijan, Belgium, Brazil, Bulgaria, Canada, Central America, Central Europe,
Croatia, Czech Republic, Denmark, Estonia, Faroese, Finland, France, Germany, Great Britain,
Greece, Hungary, International (for OCR and MICR classifiers only), Ireland, Italy, Liechtenstein,
Lithuania, Luxembourg, Malaysia, Netherlands, New Zealand, Norway, Poland, Portugal, Romania,
Russia, Rwanda, Scandinavia, Slovakia, Slovenia, Somali, South Africa, South America, Spain,
Sweden, Switzerland, Thailand, Turkey, United States, and Western Europe
CAPTIVA ADVANCED RECOGNITION
BARCODE RECOGNITION
EMC Captiva Advanced Recognition Barcode Recognition carries out barcode recognition. Barcode
Recognition settings supports two barcode types:
•
1D barcode parameters
•
2D barcode parameters
Processing type
Recognition type
Licensing
Notes
Zonal
1D barcodes
Delivered as
•
2D barcodes
standard
Can detect several barcodes
of different types in a
document
•
Configuration files can be
customized to detect specific
1D or 2D barcodes
•
A resolution of 300 DPI is
recommended for smallersized barcodes - anything
lower could render the
barcode unreadable by the
engine
Barcode Recognition support
EMC Captiva Advanced Recognition supports the recognition of the following barcode types:
1D barcodes:
Add 2, Add 5, Airline 2 of 5 (IATA 2 of 5), Australian Post, BCD Matrix, Codabar, Code 2 of 5
(Industry 2 of 5), Code 32, Code 39, Code 39 Extended, Code 93, Code 93 Extended, Code 128,
UCC/EAN 128, DataLogic 2 of 5, EAN 8, EAN 13, Intelligent Mail, Interleaved 2 of 5, Invert 2 of 5,
Matrix 2 of 5, Patch Code, Royal Post, UPC-A, UPC-E, and PostNet.
2D barcodes:
PDF-417, QR, Data Matrix
CAPTIVA ADVANCED RECOGNITION
CHECK READING
EMC Captiva Advanced Recognition Check Reading Check Reading engine is used for automatic
reading of business and personal checks, deposits slips, cash-in and cash-out documents. It can
read hand printed, handwritten and machine printed documents and performs entire check
recognition efficiently. It has been successfully used for check and remittance processing.
Processing type
Recognition type
Licensing
Notes
Zonal
Handwritten
License with
•
Machine printed
two processing
work in development and
types that can
production environments
MICR/CMC7 CAR/LAR
be activated
separately
with the
possibility of
running both
at the same
time:
•
Check
Reading
US
•
Check
Reading
France
•
Requires a license in order to
Need to be registered using
the Check Reading license
management system
THIRD PARTY RECOGNITION ENGINE SUPPORT
PRIME RECOGNITION FOR EMC CAPTIVA CAPTURE
Prime Recognition is a high-accuracy, high-reliability optical character recognition (OCR) module
that works with other EMC Captiva Capture modules as part of a complete document capture
system. The OCR module is an add-on to the Captiva Capture Standard and Enterprise Capture
Servers and is available through EMC Select.
Processing type
Recognition type
Licensing
Notes
Full-text, zonal
Machine printed
Requires
•
Mark sense
optional
license
Includes voting between
different engine results
•
Provides a wide variety of
output formats, including text
and/or image PDF, PDF-A,
JBIG, XML UTF8/16
•
Recognizes a variety of page
layouts and sizes
•
Recognizes multiple languages
simultaneously
•
Supports lexical checks
Language recognition support
The following languages are recognized by the Prime Recognition engine:
Danish, Dutch, French, German, Italian, Norwegian, Portuguese, Spanish, Swedish, U.K. English,
U.S. English, Japanese, Korean, Traditional and Simplified Chinese, and Russian.
CAPTIVA RECOGNITION CAPABILITIES
The following highlights the features supported by various EMC Captiva Capture and Advanced
Recognition engines:
General-Use
OCR
Western OCR
Advanced
OCR/ICR
East
Euro/APAC
OCR
Prime
Recognition
Feature
Captiva
Capture and
Advanced
Advanced
Capture and
Product
Advanced
Recognition
Recognition
Advanced
integration
Recognition
Recognition
Machine
Machine
Machine
Machine
Machine
type
printed
printed
printed
printed
printed
Recognition
1D barcodes
Specialty
Dot matrix,
font types
customizable
Capture
Mark sense
None
Farrington 7B,
Dot matrix,
OCR A/B
typewriter,
Dot matrix
OCR A/B,
MICR E13B,
MICR CMC7
Character
Alpha,
Alpha,
Alphanumeric,
Alphanumeric,
Alpha,
type settings
numeric,
numeric,
amount,
numeric,
numeric,
alphanumeric,
alphanumeric,
numeric
upper/lower
alphanumeric,
upper/lower
amount,
case,
upper/lower
case
upper/lower
superscript,
case,
case,
subscript,
subscript,
customized
italics, special
superscript
characters
Character
No
Yes
Yes
No
No
Yes
Yes
No
No
No
No
Yes
No
No
Yes
Yes
No
No
Yes
Yes
No
No
Yes
No
No
height/ pitch
settings
Customize
character set
Image
Enhance
tools
Accurate/
balanced/fast
setting
Preconfigured
OCR engine
settings for
specific
applications
and
countries
Multi-engine
Yes
No
Yes
No
Yes
Yes
No
No
Yes
Yes
No
No
Yes
No
Yes
No
No
No
Yes
No
No
No
No
Yes (2)
No
voting
Dictionary
support
Logical
context
checking
Detect multiline fields
Multi-core
capable
GENERAL GUIDELINES
The following information provides more detail on several of the key engines in Captiva Capture and
Advanced Recognition. To determine the best engine for a given application, you should always test
with your set of documents.
Key strengths
General use case
Captiva Capture
General-Use Recognition
•
Good, all-purpose
•
machine-print engine
•
2-way and 3-way voting
•
Available in many
applications for machine print
•
•
Very accurate with
and/or capturing a few index
•
Preferred choice for
machine-print
applications with high
•
Voting between engines
accuracy and speed
•
Dynamic speed/accuracy
requirements, and high
variability of image quality,
balance, based on image
fonts, etc. between
quality
East Euro/APAC OCR
Ideal for applications that
require full-text recognition
languages
Prime Recognition
General use capture
•
Available in many
documents
•
Preferred choice for Eastern
languages, with focus on
European and Asia Pacific
recognition of Eastern
languages
European and Asia Pacific
languages
Captiva Advanced
Recognition
East Euro/APAC OCR
•
Available in many
•
Preferred choice for Eastern
languages, with focus on
European and Asia Pacific
recognition of Eastern
languages
European and Asia Pacific
languages
Western OCR
•
•
Good, all-purpose
•
Preferred choice for free-form
machine-print engine
data capture applications in
Very accurate on numeric
Western languages
fields
General-Use OCR/ICR
•
•
Good, all-purpose
•
General use capture
machine-print engine
applications, whether
Available in many
machine or hand print
languages
Advanced Zonal OCR/ICR
•
Very accurate with
•
Preferred choice for
and Advanced Full Page
machine-print and hand-
applications that require a
OCR
print
very high degree of accuracy,
Available in multiple
and that include both
•
processing speeds
machine- and hand-print
recognition requirements
ADDITIONAL INFORMATION
•
In addition to the wide array of recognition options in Capture and Advanced
Recognition, Captiva also provides a software development kit that allows thirdparty recognition engines to be incorporated into the EMC Captiva Capture
product
•
EMC Captiva Capture governs the capture process and allows recognition data to
be passed between the various Capture and Advanced Recognition modules
•
Licensing for the add-on engines is per client connection. To determine the
number of connections required, first determine the number of computers that
will be performing keyword based document classification, data extraction, and
rubber band OCR. That will determine the number of client connections you will
need for the add-on OCR engines.
CONTACT US
To learn more about how EMC
products, services, and solutions can
help solve your business and IT
challenges, contact your local
representative or authorized reseller—
or visit us at www.EMC.com.
EMC2, EMC, the EMC logo, are registered trademarks or trademarks of EMC Corporation in the
United States and other countries. VMware are registered trademarks or trademarks of VMware,
Inc., in the United States and other jurisdictions. All other trademarks used herein are the
property of their respective owners. © Copyright 2014 EMC Corporation. All rights reserved.
Published in the USA. 06/14 EMC Product Description Guide H4755.6
EMC believes the information in this document is accurate as of its publication date. The
information is subject to change without notice.
www.EMC.com