EMC CAPTIVA RECOGNITION INTRODUCTION This document provides an overview of the recognition engines in support of both EMC® Captiva® Capture and Advanced Recognition engines. It outlines what each recognition engine provides and notes which engines are included, as well as those that are licensed separately. You will also find tips on which engine is recommended for certain use cases. EMC PRODUCT DESCRIPTION GUIDE Table of Contents EMC CAPTIVA CAPTURE (FORMALLY KNOWN AS INPUTACCEL) 3 GENERAL-USE OCR 3 EAST EURO/APAC OCR 5 BARCODE RECOGNITION 7 EMC CAPTIVA ADVANCED RECOGNITION (FORMALLY KNOWN AS DISPATCHER) 8 GENERAL-USE OCR 8 WESTERN OCR 9 EAST EURO/APAC OCR FOR CAPTURE 10 ADVANCED ZONAL OCR/ICR AND FULL PAGE OCR 12 BARCODE RECOGNITION 13 CHECK READING 14 THIRD PARTY RECOGNITION ENGINE SUPPORT 15 PRIME RECOGNITION FOR EMC CAPTIVA CAPTURE 15 CAPTIVA RECOGNITION CAPABILITIES 16 GENERAL GUIDELINES 18 ADDITIONAL INFORMATION 19 EMC CAPTIVA CAPTURE (FORMALLY KNOWN AS INPUTACCEL) EMC Captiva Capture supports two processing types: zonal recognition and full-text recognition. Zonal recognition is used to capture data from structured forms where field location is the same from one image to the next. Full-text recognition is used to capture all text on the page for full-text search and archiving purposes. CAPTIVA CAPTURE GENERAL-USE OCR EMC Captiva General-Use OCR is the standard engine included with the Captiva Capture Standard and Enterprise Capture Server. With the inclusion of Advanced Zonal OCR/ICR and its superior handprint recognition into EMC Captiva Advanced Recognition, the current Standard Handprint/General-Use ICR engine will be deprecated in the near future. Existing projects that use Standard Handprint/General-Use ICR should be migrated to use the handprint engine included with Advanced Zonal ICR as soon as possible. Processing type Recognition type Licensing Notes Full-text Machine printed Machine • 1D barcodes printed – including dot matrix, OCR-A, delivered as and OCR-B fonts Zonal ICR standard. • 1D barcodes – delivered as standard. license Uses up to three engines for optimal accuracy • Provides support for a wide variety of output formats, ICR requires an additional Recognizes machine print including TIFF, PDF, PDF/A • Can carry out full page form recognition, ICR, OCR, multi- Note: The line and omni-font characters three recognition and barcode processing recognition types can be run at the same time. • Over 100 languages are available • Engine requires a license • Supports 2 recognition types: o Character recognition o Barcode recognition Barcode support EMC Captiva General-Use OCR/ICR engine also provides barcode recognition and supports the following barcode types: Codabar, Codabar with start-stop char transmit, Code 128, Code 128 with check digit transmit, Code 39, Code 39 full ASCII mode, Code 39 with check digit control and transmit, Code 39 with start-stop char transmit, EAN8/13, EAN/UPC with 2 and 5 digit supplement, ITF (2 of 5 interleaved), ITF with check digit control and transmit, Postnet code, UCC Code 128, UPC-A, UPC-E (6-digit) Language recognition support Afrikaans, Albanian,Aymara, Basque,Bemba, Blackfoot, Brazilian-Portuguese, Breton, Bugotu, Bulgarian (Cyrillic), Byelorussian (Cyrillic),Catalan, Chamorro, Chechen, Chinese (Simplified), Chinese (Traditional), Corsican, Croatian, Crow, Czech, Danish, Dutch, English, Eskimo (Inuit), Esperanto, Estonian, Faroese, Fijian, Finnish, French, Frisian, Friulian, Gaelic (Irish), Gaelic (Scottish), Galician Ganda, German, Greek, Guarani, Hani, Hungarian, Icelandic, Ido, Indonesian, Interlingua, Italian, Japanese, Kabardian, Hawaiian, Kasub, Kawa, Kikuyu, Kongo, Korean, Kpelle,Kurdish, Latin, Latvian, Lithuanian, Luba, Lule Sami, Luxembourgian, Macedonian (Cyrillic), Malagasy, Malay, Malinke, Maltese, Maori, Mayan, Mia, Minankabaw, Mohawk, Moldavian (Cyrillic), Nahuatl, Northern Sami, Norwegian, Nyanja, Occidental, Ojibway, Papiamento, Pigin English, Polish, Portuguese, Provencal, Quechua, Rhaetic, Romanian, Romany, Ruanda, Rundi, Russian (Cyrillic), Sami (Lappish), Samoan, Sardinian, Serbian (Cyrillic), Serbian (Latin alphabet), Shona, Sioux, Slovakian, Slovenian, Somali, Sorbian (Wend), Sotho, Southern Sami, Spanish, Sundanese, Swahili, Swazi, Swedish, Tagalog, Tahitian, Tinpo, Tongan, Tswana (Chuana), Tun, Turkish, Ukrainian (Cyrillic), Visayan, Welsh, Wolof, Xhosa, Zapotec, Zulu Note: Twenty languages installed by default and up to 123 languages available, including English, Simplified Chinese, Japanese and Korean. CAPTIVA CAPTURE EAST EURO/APAC OCR EMC Captiva Capture East Euro/APAC OCR module performs optical character recognition of scanned or imported images and exports the image and index data to more than 25 different word processing and text formats. It recognizes several languages with a specialty towards Eastern European and Asia Pacific languages. EMC Captiva Capture East Euro/APAC OCR module is an addon to the Captiva Capture Standard and Enterprise Capture Severs. Note: The East Euro / APAC OCR engine can be used for zonal extraction without Advance Recognition but this is not strongly recommended. In this specific mode, the data retuned from the OCR engine is not UIM data but data that is stored in IA Values. Thus, to pass the data to Captiva Desktop, custom code needs to be written to pass the IA Values to UIM data (in order to present the data in Captiva Desktop). Furthermore, the zonal setup is accomplished via the module itself and not through the Captiva Designer. Therefore, it is strongly and highly recommended to use the East Euro / APAC OCR engine with Advanced Recognition (formerly known as Dispatcher) instead. Processing type Recognition type Licensing Notes Full-text Machine printed Unlimited • Zonal (not strongly Uses a single recognition recognition per engine. Configure and define client different zones for different content types, including text, recommended per pictures, tables, barcodes, note above) and check marks. • Recognizes multiple recognition languages simultaneously • Supports many popular output formats including PDF • Supports more than one hundred languages • Not strongly recommended for zonal but appropriate for full text machine printed text • Recognizes many types of barcodes • Can process documents corresponding to one or more languages at a time Barcode support EMC Captiva Capture East Euro/APAC OCR module also performs barcode recognition and supports the following barcode types: Codabar, Code 128, Code 39, Code 93, EAN 13, EAN 8, IATA 25, Industrial 25, Interleaved 25, Matrix 25, PostNet, UCC 128, UPC-E Language recognition support Abkhaz, Adyghe, Afrikaans, Agul, Albanian, Altaic, Armenian (Eastern), Armenian (Grabar), Armenian (Western), Avar, Aymara, Azerbaijani (Cyrillic), Azerbaijani (Latin), Bashkir, Basque, Belarussian, Bemba Blackfoot, Breton, Bugotu, Bulgarian, Buryat, Catalan, Chamorro,Chechen, Chinese (PRC),Chinese (Taiwan), Chukcha, Chuvash, Corsican, Crimean Tatar, Croatian, Crow, Czech, Danish, Dargwa, Dungan, Dutch (Netherlands), Dutch (Belgium), English, Eskimo (Cyrillic), Eskimo (Latin), Esperanto, Estonian, Even, Evenki, Faroese, Fijian, Finnish, French, Frisian, Friulian, Gaelic Scottish, Gagauz, Galician, Ganda, German, German (new spelling), German (Luxembourg), Greek, Guarani, Hani, Hausa, Hawaiian, Hebrew, Hungarian, Icelandic, Ido, Indonesian, Inguish, Interlingua, Irish, Italian, Japanese, Kabardian, Kalmyk, Karachay-Balkar, Karakalpak, Kasub, Kawa, Kazakh, Khakas, Khanty, Kikuyu, Kirghiz, Kongo, Korean, KoreanHangul, Koryak, Kpelle, Kumyk, Kurdish, Lak, Lappish, Latin Latvian, Lezgin, Lithuanian, Luba, Macedonian, Malagasy, Malay, Malinke, Maltese, Mansi, Maori, Mari, Maya, Miao, Minankabaw, Mohawk, Mongol, Mordvin, Nahuatl, Nenets, Nivkh, Nogay, Norwegian, Norwegian (Bokmal), Norwegian (Nynorsk), Nyanja, Occidental, Ojibway, Old English, Old French, Old German, Old Italian, Old Spanish, Ossetian, Papiamento, Pigin English, Polish, Portuguese (Brazil), Portuguese (Portugal), Provencal, Quechua, Rhaeto-Romanic, Romanian, Romanian (Moldavia), Romany, Ruanda, Rundi, Russian (Old Spelling), Russian, Samoan, Selkup, Serbian (Cyrillic), Serbian (Latin), Shona, Sioux, Slovak, Slovenian, Somali, Sorbian, Sotho, Spanish, Sunda, Swahili, Swazi, Swedish, Tabassaran, Tagalog, Tahitian, Tajik, Tatar, Thai, Tinpo, Tongan, Tswana, Tun, Turkish, Turkmen, Tuvan, Udmurt, Uighur (Cyrillic), Uighur (Latin), Ukrainian, Uzbek (Cyrillic), Uzbek (Latin), Visayan, Welsh, Wolof, Xhosa, Yakut, Yiddish, Zapotec, Zulu Also support for several formal languages: Basic, C++, Cobol, Fortran, Java, Pascal, chemical formulas, E13B, CMC7. CAPTIVA CAPTURE BARCODE RECOGNITION EMC Captiva Capture Barcode Recognition carries out barcode recognition. Barcode Recognition settings supports two barcode types: • 1D barcode parameters • 2D barcode parameters Processing type Recognition type Licensing Notes Zonal 1D barcodes Delivered as • 2D barcodes standard Can detect several barcodes of different types in a document • Configuration files can be customized to detect specific 1D or 2D barcodes • A resolution of 300 DPI is recommended for smallersized barcodes - anything lower could render the barcode unreadable by the engine Barcode Recognition support EMC Captiva Capture Barcode Recognition supports the recognition of the following barcode types: 1D barcodes: Add 2, Add 5, Airline 2 of 5 (IATA 2 of 5), Australian Post, BCD Matrix, Codabar, Code 2 of 5 (Industry 2 of 5), Code 32, Code 39, Code 39 Extended, Code 93, Code 93 Extended, Code 128, UCC/EAN 128, DataLogic 2 of 5, EAN 8, EAN 13, Intelligent Mail, Interleaved 2 of 5, Invert 2 of 5, Matrix 2 of 5, Patch Code, Royal Post, UPC-A, UPC-E, and PostNet. 2D barcodes: PDF-417, QR, Data Matrix EMC CAPTIVA ADVANCED RECOGNITION (FORMALLY KNOWN AS DISPATCHER) EMC Captiva Advanced Recognition supports two processing types: zonal recognition and full-text recognition. Zonal recognition is used to capture data from structured forms where field location is the same from one image to the next. Full-text recognition is primarily used for semi-structured or unstructured documents where text does not reside in a static location from one image to the next. EMC Captiva Advanced Recognition supports several recognition types, including machine-printed text, hand-printed text, check marks, courtesy amount recognition (CAR), legal amount recognition (LAR), and MICR/CMC7. The following information provides a breakdown on all the recognition options for EMC Captiva Advanced Recognition and indicates which engines are included. CAPTIVA ADVANCED RECOGNITION GENERAL-USE OCR The General-Use OCR is the standard engine included with EMC Captiva Advanced Recognition. With the inclusion of Advanced Zonal OCR/ICR and its superior handprint recognition into EMC Captiva Advanced Recognition, the current Standard Handprint/General-Use ICR engine will be deprecated in the near future. Existing projects that use Standard Handprint/General-Use ICR should be migrated to use the handprint engine included with Advanced Zonal ICR as soon as possible. Processing type Recognition type Licensing Notes Full-text Machine printed Machine • Zonal 1D barcodes printed – including dot matrix, OCR-A, delivered as and OCR-B fonts ICR standard. • 1D barcodes – delivered as standard. license Uses up to three engines for optimal accuracy • Provides support for a wide variety of output formats, ICR requires an additional Recognizes machine print including TIFF, PDF, PDF/A • Can carry out full page form recognition, ICR, OCR, multi- Note: The line and omni-font characters three recognition and barcode processing recognition types can be run at the same time. • Over 100 languages are available • Engine requires a license • Supports 2 recognition types: o Character recognition o Barcode recognition Barcode support EMC Captiva Advanced Recognition General-Use OCR/ICR engine also provides barcode recognition and supports the following barcode types: Codabar, Codabar with start-stop char transmit, Code 128, Code 128 with check digit transmit, Code 39, Code 39 full ASCII mode, Code 39 with check digit control and transmit, Code 39 with start-stop char transmit, EAN8/13, EAN/UPC with 2 and 5 digit supplement, ITF (2 of 5 interleaved), ITF with check digit control and transmit, Postnet code, UCC Code 128, UPC-A, UPC-E (6-digit) Language recognition support Afrikaans, Albanian,Aymara, Basque,Bemba, Blackfoot, Brazilian-Portuguese, Breton, Bugotu, Bulgarian (Cyrillic), Byelorussian (Cyrillic),Catalan, Chamorro, Chechen, Chinese (Simplified), Chinese (Traditional), Corsican, Croatian, Crow, Czech, Danish, Dutch, English, Eskimo (Inuit), Esperanto, Estonian, Faroese, Fijian, Finnish, French, Frisian, Friulian, Gaelic (Irish), Gaelic (Scottish), Galician Ganda, German, Greek, Guarani, Hani, Hungarian, Icelandic, Ido, Indonesian, Interlingua, Italian, Japanese, Kabardian, Hawaiian, Kasub, Kawa, Kikuyu, Kongo, Korean, Kpelle,Kurdish, Latin, Latvian, Lithuanian, Luba, Lule Sami, Luxembourgian, Macedonian (Cyrillic), Malagasy, Malay, Malinke, Maltese, Maori, Mayan, Mia, Minankabaw, Mohawk, Moldavian (Cyrillic), Nahuatl, Northern Sami, Norwegian, Nyanja, Occidental, Ojibway, Papiamento, Pigin English, Polish, Portuguese, Provencal, Quechua, Rhaetic, Romanian, Romany, Ruanda, Rundi, Russian (Cyrillic), Sami (Lappish), Samoan, Sardinian, Serbian (Cyrillic), Serbian (Latin alphabet), Shona, Sioux, Slovakian, Slovenian, Somali, Sorbian (Wend), Sotho, Southern Sami, Spanish, Sundanese, Swahili, Swazi, Swedish, Tagalog, Tahitian, Tinpo, Tongan, Tswana (Chuana), Tun, Turkish, Ukrainian (Cyrillic), Visayan, Welsh, Wolof, Xhosa, Zapotec, Zulu Note: Twenty languages installed by default and up to 123 languages available, including English, Simplified Chinese, Japanese and Korean. CAPTIVA ADVANCED RECOGNITION WESTERN OCR The Western OCR performs full-text and zonal capture of machine-printed text and is included in EMC Captiva Advanced Recognition. Processing type Recognition type Licensing Notes Full-text Machine printed Delivered as Specialty is western languages Zonal Standard Language recognition support English, German, Danish, Spanish, Finish, French, Dutch, Italian, Norwegian, Portuguese, and Swedish. Note: The Western OCR engine cannot read Asian characters, but can be used on Asian operating systems to read Western characters CAPTIVA ADVANCED RECOGNITION EAST EURO/APAC OCR FOR CAPTURE EMC Captiva Advanced Recognition East Euro/APAC OCR for Capture module performs optical character recognition of scanned or imported images and recognizes several languages with a specialty towards Eastern European and Asia Pacific languages. The East Euro/APAC OCR for Capture is an add-on to EMC Captiva Advanced Recognition. Processing type Recognition type Licensing Notes Full-text Machine printed Unlimited • Zonal Uses a single recognition recognition per engine. Configure and define client different zones for different content types, including text, pictures, tables, barcodes, and check marks. • Recognizes multiple recognition languages simultaneously • Supports many popular output formats including PDF • Supports more than one hundred languages • Appropriate for zonal and full text machine printed text • Recognizes many types of barcodes • Can process documents corresponding to one or more languages at a time Barcode support EMC Captiva Advanced Recognition East Euro/APAC OCR module also performs barcode recognition and supports the following barcode types: Codabar, Code 128, Code 39, Code 93, EAN 13, EAN 8, IATA 25, Industrial 25, Interleaved 25, Matrix 25, PostNet, UCC 128, UPC-E Language recognition support Abkhaz, Adyghe, Afrikaans, Agul, Albanian, Altaic, Armenian (Eastern), Armenian (Grabar), Armenian (Western), Avar, Aymara, Azerbaijani (Cyrillic), Azerbaijani (Latin), Bashkir, Basque, Belarussian, Bemba Blackfoot, Breton, Bugotu, Bulgarian, Buryat, Catalan, Chamorro,Chechen, Chinese (PRC),Chinese (Taiwan), Chukcha, Chuvash, Corsican, Crimean Tatar, Croatian, Crow, Czech, Danish, Dargwa, Dungan, Dutch (Netherlands), Dutch (Belgium), English, Eskimo (Cyrillic), Eskimo (Latin), Esperanto, Estonian, Even, Evenki, Faroese, Fijian, Finnish, French, Frisian, Friulian, Gaelic Scottish, Gagauz, Galician, Ganda, German, German (new spelling), German (Luxembourg), Greek, Guarani, Hani, Hausa, Hawaiian, Hebrew, Hungarian, Icelandic, Ido, Indonesian, Inguish, Interlingua, Irish, Italian, Japanese, Kabardian, Kalmyk, Karachay-Balkar, Karakalpak, Kasub, Kawa, Kazakh, Khakas, Khanty, Kikuyu, Kirghiz, Kongo, Korean, KoreanHangul, Koryak, Kpelle, Kumyk, Kurdish, Lak, Lappish, Latin Latvian, Lezgin, Lithuanian, Luba, Macedonian, Malagasy, Malay, Malinke, Maltese, Mansi, Maori, Mari, Maya, Miao, Minankabaw, Mohawk, Mongol, Mordvin, Nahuatl, Nenets, Nivkh, Nogay, Norwegian, Norwegian (Bokmal), Norwegian (Nynorsk), Nyanja, Occidental, Ojibway, Old English, Old French, Old German, Old Italian, Old Spanish, Ossetian, Papiamento, Pigin English, Polish, Portuguese (Brazil), Portuguese (Portugal), Provencal, Quechua, Rhaeto-Romanic, Romanian, Romanian (Moldavia), Romany, Ruanda, Rundi, Russian (Old Spelling), Russian, Samoan, Selkup, Serbian (Cyrillic), Serbian (Latin), Shona, Sioux, Slovak, Slovenian, Somali, Sorbian, Sotho, Spanish, Sunda, Swahili, Swazi, Swedish, Tabassaran, Tagalog, Tahitian, Tajik, Tatar, Thai, Tinpo, Tongan, Tswana, Tun, Turkish, Turkmen, Tuvan, Udmurt, Uighur (Cyrillic), Uighur (Latin), Ukrainian, Uzbek (Cyrillic), Uzbek (Latin), Visayan, Welsh, Wolof, Xhosa, Yakut, Yiddish, Zapotec, Zulu Also support for several formal languages: Basic, C++, Cobol, Fortran, Java, Pascal, chemical formulas, E13B, CMC7. CAPTIVA ADVANCED RECOGNITION ADVANCED ZONAL OCR/ICR AND FULL PAGE OCR EMC Captiva Advanced Recognition Advanced Zonal OCR/ICR and Full Page OCR is an engine that provides full-text and zonal capture of machine-printed text and zonal capture of hand-printed text. Beginning with Captiva 7.0, Advanced Zonal OCR/ICR and Full Page OCR is included as part of EMC Captiva Advanced Recognition. Recognition Processing type name Advanced Licensing Notes Machine-print Delivered as Preferred choice for ICR standard applications that Recognition type Full-text, zonal Zonal OCR/ICR require a very high degree of accuracy and that include both machine and hand print recognition requirements Advanced Full Full-page Machine-print Page OCR Delivered as Preferred choice for standard applications that require a very high degree of accuracy, and that include both machine and hand print recognition requirements Language recognition support Australia, Austria, Azerbaijan, Belgium, Brazil, Bulgaria, Canada, Central America, Central Europe, Croatia, Czech Republic, Denmark, Estonia, Faroese, Finland, France, Germany, Great Britain, Greece, Hungary, International (for OCR and MICR classifiers only), Ireland, Italy, Liechtenstein, Lithuania, Luxembourg, Malaysia, Netherlands, New Zealand, Norway, Poland, Portugal, Romania, Russia, Rwanda, Scandinavia, Slovakia, Slovenia, Somali, South Africa, South America, Spain, Sweden, Switzerland, Thailand, Turkey, United States, and Western Europe CAPTIVA ADVANCED RECOGNITION BARCODE RECOGNITION EMC Captiva Advanced Recognition Barcode Recognition carries out barcode recognition. Barcode Recognition settings supports two barcode types: • 1D barcode parameters • 2D barcode parameters Processing type Recognition type Licensing Notes Zonal 1D barcodes Delivered as • 2D barcodes standard Can detect several barcodes of different types in a document • Configuration files can be customized to detect specific 1D or 2D barcodes • A resolution of 300 DPI is recommended for smallersized barcodes - anything lower could render the barcode unreadable by the engine Barcode Recognition support EMC Captiva Advanced Recognition supports the recognition of the following barcode types: 1D barcodes: Add 2, Add 5, Airline 2 of 5 (IATA 2 of 5), Australian Post, BCD Matrix, Codabar, Code 2 of 5 (Industry 2 of 5), Code 32, Code 39, Code 39 Extended, Code 93, Code 93 Extended, Code 128, UCC/EAN 128, DataLogic 2 of 5, EAN 8, EAN 13, Intelligent Mail, Interleaved 2 of 5, Invert 2 of 5, Matrix 2 of 5, Patch Code, Royal Post, UPC-A, UPC-E, and PostNet. 2D barcodes: PDF-417, QR, Data Matrix CAPTIVA ADVANCED RECOGNITION CHECK READING EMC Captiva Advanced Recognition Check Reading Check Reading engine is used for automatic reading of business and personal checks, deposits slips, cash-in and cash-out documents. It can read hand printed, handwritten and machine printed documents and performs entire check recognition efficiently. It has been successfully used for check and remittance processing. Processing type Recognition type Licensing Notes Zonal Handwritten License with • Machine printed two processing work in development and types that can production environments MICR/CMC7 CAR/LAR be activated separately with the possibility of running both at the same time: • Check Reading US • Check Reading France • Requires a license in order to Need to be registered using the Check Reading license management system THIRD PARTY RECOGNITION ENGINE SUPPORT PRIME RECOGNITION FOR EMC CAPTIVA CAPTURE Prime Recognition is a high-accuracy, high-reliability optical character recognition (OCR) module that works with other EMC Captiva Capture modules as part of a complete document capture system. The OCR module is an add-on to the Captiva Capture Standard and Enterprise Capture Servers and is available through EMC Select. Processing type Recognition type Licensing Notes Full-text, zonal Machine printed Requires • Mark sense optional license Includes voting between different engine results • Provides a wide variety of output formats, including text and/or image PDF, PDF-A, JBIG, XML UTF8/16 • Recognizes a variety of page layouts and sizes • Recognizes multiple languages simultaneously • Supports lexical checks Language recognition support The following languages are recognized by the Prime Recognition engine: Danish, Dutch, French, German, Italian, Norwegian, Portuguese, Spanish, Swedish, U.K. English, U.S. English, Japanese, Korean, Traditional and Simplified Chinese, and Russian. CAPTIVA RECOGNITION CAPABILITIES The following highlights the features supported by various EMC Captiva Capture and Advanced Recognition engines: General-Use OCR Western OCR Advanced OCR/ICR East Euro/APAC OCR Prime Recognition Feature Captiva Capture and Advanced Advanced Capture and Product Advanced Recognition Recognition Advanced integration Recognition Recognition Machine Machine Machine Machine Machine type printed printed printed printed printed Recognition 1D barcodes Specialty Dot matrix, font types customizable Capture Mark sense None Farrington 7B, Dot matrix, OCR A/B typewriter, Dot matrix OCR A/B, MICR E13B, MICR CMC7 Character Alpha, Alpha, Alphanumeric, Alphanumeric, Alpha, type settings numeric, numeric, amount, numeric, numeric, alphanumeric, alphanumeric, numeric upper/lower alphanumeric, upper/lower amount, case, upper/lower case upper/lower superscript, case, case, subscript, subscript, customized italics, special superscript characters Character No Yes Yes No No Yes Yes No No No No Yes No No Yes Yes No No Yes Yes No No Yes No No height/ pitch settings Customize character set Image Enhance tools Accurate/ balanced/fast setting Preconfigured OCR engine settings for specific applications and countries Multi-engine Yes No Yes No Yes Yes No No Yes Yes No No Yes No Yes No No No Yes No No No No Yes (2) No voting Dictionary support Logical context checking Detect multiline fields Multi-core capable GENERAL GUIDELINES The following information provides more detail on several of the key engines in Captiva Capture and Advanced Recognition. To determine the best engine for a given application, you should always test with your set of documents. Key strengths General use case Captiva Capture General-Use Recognition • Good, all-purpose • machine-print engine • 2-way and 3-way voting • Available in many applications for machine print • • Very accurate with and/or capturing a few index • Preferred choice for machine-print applications with high • Voting between engines accuracy and speed • Dynamic speed/accuracy requirements, and high variability of image quality, balance, based on image fonts, etc. between quality East Euro/APAC OCR Ideal for applications that require full-text recognition languages Prime Recognition General use capture • Available in many documents • Preferred choice for Eastern languages, with focus on European and Asia Pacific recognition of Eastern languages European and Asia Pacific languages Captiva Advanced Recognition East Euro/APAC OCR • Available in many • Preferred choice for Eastern languages, with focus on European and Asia Pacific recognition of Eastern languages European and Asia Pacific languages Western OCR • • Good, all-purpose • Preferred choice for free-form machine-print engine data capture applications in Very accurate on numeric Western languages fields General-Use OCR/ICR • • Good, all-purpose • General use capture machine-print engine applications, whether Available in many machine or hand print languages Advanced Zonal OCR/ICR • Very accurate with • Preferred choice for and Advanced Full Page machine-print and hand- applications that require a OCR print very high degree of accuracy, Available in multiple and that include both • processing speeds machine- and hand-print recognition requirements ADDITIONAL INFORMATION • In addition to the wide array of recognition options in Capture and Advanced Recognition, Captiva also provides a software development kit that allows thirdparty recognition engines to be incorporated into the EMC Captiva Capture product • EMC Captiva Capture governs the capture process and allows recognition data to be passed between the various Capture and Advanced Recognition modules • Licensing for the add-on engines is per client connection. To determine the number of connections required, first determine the number of computers that will be performing keyword based document classification, data extraction, and rubber band OCR. That will determine the number of client connections you will need for the add-on OCR engines. CONTACT US To learn more about how EMC products, services, and solutions can help solve your business and IT challenges, contact your local representative or authorized reseller— or visit us at www.EMC.com. EMC2, EMC, the EMC logo, are registered trademarks or trademarks of EMC Corporation in the United States and other countries. VMware are registered trademarks or trademarks of VMware, Inc., in the United States and other jurisdictions. All other trademarks used herein are the property of their respective owners. © Copyright 2014 EMC Corporation. All rights reserved. Published in the USA. 06/14 EMC Product Description Guide H4755.6 EMC believes the information in this document is accurate as of its publication date. The information is subject to change without notice. www.EMC.com
© Copyright 2026 Paperzz