AddressDoctor Enterprise Documentation

Informatica AddressDoctor
Product Documentation
Version 5.6.0
AddressDoctor GmbH
Roentgenstr. 9
67133 Maxdorf
Germany
+49 (6237) 9774 0
USA
208 S WILMINGTON ST STE 200
Raleigh NC 27601-1434
United States
+1 (866) 402 2800
UK FreeCall
0800-0328-276
France Numéro Vert
0800 917113
India FreeCall
000800 1003486
Singapore FreeCall
800 1301756
[email protected]
www.AddressDoctor.com
Released: November 5, 2014
Foreword
This documentation explains features and functions of Informatica AddressDoctor, previously known
as the AddressDoctor Software Library, for postal address validation. You have selected a leading
data quality product that provides you superb address quality for postal addresses from all around
the world.
This documentation is meant to cater to the information needs of beginners and advanced users
alike. It covers all platforms supported by Informatica AddressDoctor and all available interfaces. The
Introduction chapter (chapter 2) provides a general overview of the Informatica AddressDoctor
components and concepts and looks at them from a business perspective. While chapter 3 describes
the installation process, chapter 4 should help you get started right away, both when you are new to
Informatica AddressDoctor or migrating from the previous version. The Concepts chapter (chapter 5)
helps you understand important features of Informatica AddressDoctor. We recommend this
chapter for all user groups. Advanced users will find chapter 6 ”How do I…” helpful. It provides
sample code for common tasks.
This PDF document provides embedded bookmarks for fast access to document chapters, open the
bookmark view of your PDF viewer in case it was not opened by default. Additionally, all chapter
number references in the text provide hyperlink access to the reference targets.
Always check the release note section included with the API documentation, see chapter 10.2. The
Informatica AddressDoctor documentation is a work in progress document, and we always
appreciate user feedback to improve this document. Email your suggestions and comments to
[email protected].
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
2
Contents
1.
Document Conventions
6
2.
Introduction
7
3.
4.
5.
2.1
Functional Overview
8
2.2
Supported Platforms
12
2.3
System Requirements
13
Setup
14
3.1
General Remarks
14
3.2
Installing the Library Files
15
3.3
Installing the Reference Databases
17
Quick Start Guide
21
4.1
First-Time Use of Informatica AddressDoctor 5
21
4.2
New Features and Enhancements in Informatica AddressDoctor
26
Concepts
39
5.1
Character Set Mapping
39
5.2
Transliteration
40
5.3
Address Element Abstraction
42
5.4
Address Parsing
43
5.5
Address Validation
45
5.6
Informatica AddressDoctor
45
5.7
AddressObjects
46
5.8
Input and Output Encoding
47
5.9
AddressElement Items and AddressLines
47
5.10
Address Item Types
49
5.11
Process Modes
50
5.12
Process Parameters
56
5.13
Output Formatting
67
5.14
Output Standardization
67
5.15
Alternative Names and Aliases
69
5.16
AliasStreet Option Examples
69
5.17
Process Status Values
72
5.18
Mailability Scores
74
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
3
6.
5.19
Geocoding Status Values
75
5.20
CAMEO Status Values
77
5.21
CASS Status Values
78
5.22
SERP Status Values
78
5.23
SNA Status Values
78
5.24
AMAS Status Values
78
5.25
SendRight Status Values
79
5.26
Country Specific Enrichment
79
5.27
Element Status and Relevance Values
88
5.28
Extended Element Result Status Fields
91
5.29
ResultPercentage Values
100
5.30
Language ISO Code Output
100
5.31
Address Types
100
5.32
Return Codes
106
5.33
OptimizationLevel
109
5.34
Preloading
110
5.35
Caching
112
5.36
Multithreading
112
5.37
Memory Management
114
How do I…
117
6.1
…initialize Informatica AddressDoctor?
117
6.2
…determine Informatica AddressDoctor version?
119
6.3
…specify processing or input parameters and a result format?
119
6.4
…handle unlock codes?
122
6.5
…configure reference databases?
123
6.6
…determine the current engine settings?
124
6.7
...assign an address to the AddressObject?
124
6.8
…validate an address?
129
6.9
…parse an address?
129
6.10
…check the process mode?
130
6.11
…retrieve a suggested correction?
130
6.12
...retrieve the result status and additional information?
131
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
4
6.13
...retrieve address enrichments?
132
6.14
...analyze error conditions?
133
6.15
...assign and process addresses in non-Latin script?
134
6.16
…use Informatica AddressDoctor with multiple processor cores?
136
6.17
…produce valid Informatica AddressDoctor XML?
137
6.18
…use Informatica AddressDoctor XML for flexible Business Processes?
137
6.19
…use Informatica AddressDoctor for Master Data Management?
138
6.20
…use Informatica AddressDoctor in an eBusiness Environment?
138
6.21
…use the Quick Address Entry Feature?
139
6.22
…use Informatica AddressDoctor in a multi-tenant hosted environment?
139
6.23
…use Informatica AddressDoctor for Web Services?
140
6.24
...validate an address in CERTIFIED mode?
141
6.25
...optimize performance?
147
7.
Demonstration Applications
151
7.1
ConsoleDemo Application
151
7.2
AddressCheck (Windows only)
151
8.
Sample Address Data for Testing
153
8.1
Addresses with Status Code Vx
153
8.2
Addresses with Status Code Cx
154
9.
Miscellaneous Topics
156
9.1
Background on the (Postal) Reference Database
156
9.2
Postal Certifications
158
9.3
Support Information
158
9.4
Recommended Database Layout for International Addresses
159
10.
Appendix
162
10.1
API Document Type Definitions
162
10.2
API Reference
162
10.3
Schematic Representation of Informatica AddressDoctor Processing Flow
163
10.4
AddressElement Output Examples
163
10.5
Province Output
164
10.6
Reference Data Copyright Notices
169
11.
Glossary
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
171
5
1. Document Conventions
This document uses icons in the margin to indicate if an explanation is specific to a certain version of
the interface. While the functionality of the API is the same for all interfaces, different syntax may be
required.
Applies to the C interface (C wrapper) on all platforms
Applies to the Java interface (Java wrapper) on all platforms
Sample program code may be identified by its fixed space typeface, for example:
For i = 1 to 5
j = i + 1
Next
In some cases, this document abbreviates the Informatica AddressDoctor product name to
AddressDoctor. For example, the document may use AddressDoctor as a sample address element
name.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
6
2. Introduction
Informatica AddressDoctor provides a powerful software library with functions to enhance and
ensure postal address data quality. With a world population of more than 6.5 billion people and
increasingly global trade relationships, more and more people face the challenge of handling
addresses from all around the world.
At the same time, taking care of customer relationships is more important than ever - especially in
today’s rushed world economy. We receive letters generated by computers, talk to computers on
the telephone, and we check-out in supermarkets by ourselves. Data, once in a computer system, is
often considered to be correct. In many cases, it serves as the foundation for numerous business
processes. Rarely ever is the data in the system questioned. This can lead to dangerous situations, as
we could all see in the movie “The Net” where a young woman loses her identity because of deleted
data.
The following example should illustrate the situation:
Data input by hotel staff
Correct address needed for delivery
Sven Schreiber
Feuerbergstr. 1
67134 Birkenheide
Germany
The Informatica AddressDoctor product line was introduced in 1994 by Platon Data Technology
GmbH (now AddressDoctor GmbH), a German software company that has become the innovation
leader in data quality tools for postal addresses. From the very beginning, Informatica
AddressDoctor has specialized in international addresses.
Here is an overview of the Informatica AddressDoctor product line:
Online Applications
Web Services
Software Library
Reference Data
Reference data and the software library are
the foundation for all Informatica
AddressDoctor product offerings
With the global launch of Informatica AddressDoctor, Informatica AddressDoctor has set new
standards in terms of flexibility, ease of use and processing power. Covering more than 240
countries and territories, the software encompasses knowledge about postal addresses from
virtually anywhere.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
7
The global approach of Informatica AddressDoctor is a direct and major benefit to its customers:
 Cost savings in:
Vendor n for
Country n
…
PRICE
Vendor 3 for
Country 3
Vendor 2 for
Country 2
Vendor 1 for
Country 1
Many different
regional vendors
One Vendor
One World
o Contract Management
o Integration
o Deployment
o Licenses
o Support and Maintenance
o …
 Cross-country synergy effects compared
to a multi-vendor approach
AddressDoctor
2.1 Functional Overview
Informatica AddressDoctor features several stages of address processing, namely Transliteration,
Parsing, Validation and Formatting, which interact with each other.
2.1.1 Character Set Mapping and Transliteration
Informatica AddressDoctor incorporates functionality to handle international strings and their
complexities. It uses fully Unicode enabled string processing which enables the transliteration of
non-roman characters into the Latin character set and mapping between different character sets.
 Storing data in and mapping between over 30 different character sets including UTF-8, ISO 88591, GBK, BIG5, JIS, EBCDIC
 Proper “elimination” of diacritics according to language rules
 Transliteration for various alphabets into Latin Script:
o
o
o
o
o
o
Greek (BGN/PCGN 1962, ISO 843 – 1997)
Cyrillic (BGN/PCGN 1947, ISO 9 – 1995)
Hebrew
Japanese Katakana, Hiragana and Kanji
Chinese Pinyin (Mandarin, Cantonese)
Korean Hangul
For example:
ΑΘΗΝΑΣ 63
105 52 ΑΘΗΝΑ
GREECE
ATHINAS 63
105 52 ATHINA
Transliteration
Data input is in a foreign alphabet
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
GREECE
Transliterated data is in Latin script
8
2.1.2 Address Parsing, Formatting and Standardization
Restructuring incorrectly fielded address data is a complex and difficult task especially when done
for international addresses. People introduce many ambiguities as they enter address data into
computer systems. Among the problems are misplaced elements (such as company or personal
names in street address fields) or varying abbreviations that are not only language, but also country
specific.
The Informatica AddressDoctor Parser component identifies address elements in totally unfielded or
partially fielded addresses and assigns them to the proper fields. This is an important precursor to
the actual validation. Without restructuring, “no match” situations might result.
Properly identified address elements are also important when addresses have to be truncated or
shortened to fit specific field length requirements. With the proper information in the right fields,
specific truncation rules can be applied.








Parses and analyzes free form addresses and identifies individual address elements
Detects countries (names, ISO codes, big cities, and so on.)
High processing speed
Ideal pre-processing stage before validation
Processes over 30 different character sets
Formats addresses according to the postal rules of the country of destination
Standardizes address elements (such as Avenue to Ave, Street to St, or vice versa)
Identifies “address trash” elements such as telephone numbers and puts them into the proper
fields
For example:
2.1.3
Global Address Validation
7031 Columbia Gateway Dr, Suite 101
Columbia MD 21046
USA
Parsing
Data is unstructured or in incorrect fields
House number:
Street:
Sub-Building:
City:
State:
ZIP:
Country:
7031
Columbia Gateway Dr
Suite 101
Columbia
MD
21046
USA
All elements are stored in proper fields
Address validation is the correction process where properly fielded address data is compared against
reference tables supplied by postal organizations or other data providers. Informatica AddressDoctor
has to deal with improperly truncated data, incomplete data, missing address elements, ambiguous
names and many other challenges.
The Informatica AddressDoctor validation is designed to provide the best possible matches while
minimizing incorrect modifications to address elements. In many cases, it is not possible to fully
validate an address. Here Informatica AddressDoctor has a unique deliverability assessment feature
that classifies addresses according to their probable deliverability.
The address validation feature of Informatica AddressDoctor has the following advantages:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
9








Leverages the world’s largest reference database of postal data.
Validates individual address elements and checks for correctness using sophisticated fuzzy
matching technology.
Provides a batch validation mode for bulk address validation. Batch validation checks
elements for correctness and changes them if necessary
Provides an interactive mode that enables you to check elements for correctness and
improve them if possible. In the interactive mode, Informatica AddressDoctor provides pick
lists of alternatives for ambiguous input data records.
Provides a Fast Completion mode that automatically completes truncated or incomplete
address elements to facilitate fast data entry.
Provides a single-line address validation option in the Fast Completion mode. The single-line
address validation option enables you to validate addresses that are entered as a single line
in the AddressComplete element.
Produces standardized and formatted output based on postal standards and user
preferences.
Uses an internal performance-optimized data storage mechanism for the reference data. No
third party database software is required
For example:
7031 Golumbia Gateway Dr.
Suite 101
Columbia MD 21044
USA
7031 COLUMBIA GATEWAY DR
STE 101
COLUMBIA MD 21046
USA
Validation
Incorrect input address
2.1.4
Corrected output address
CAMEO Socio-Demographic Encoding
In cooperation with the Callcredit Information Group, Informatica AddressDoctor provides address
enrichment with socio-demographic characteristics (CAMEO) through our product offerings CAMEO
offers a highly detailed system through which you'll learn all you need to know about your
customers and markets, helping you make the best of every marketing opportunity. Whether you're
managing a customer database, searching for prospects or conducting market analysis, CAMEO will
provide you with the latest socio-demographics and lifestyle data at Micro-cell level using a wide
range of data variables, including:
Child Presence and Age
Adult Age
Single Households
Retired Households
Movement
Property Age
Urbanicity
Land Values
Qualifications
Further Education
Interest in Fashion
Cars Bought New
Previous Owners
Interest in Family Goods
Interest in Books and CDs
Arts and Culture
Culinary Interests
Interest in Business
Interest in Finance
Interest in the Lottery
Computer Literacy
Concern with Self Image
Activity Levels
Interest in Home and Garden
Mail Order Responsiveness
Car Age and Type
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
10
Car Manufacturer Origin
Engine Power and Size
Prestige Car
Purchase of Luxury Goods
Use CAMEO to:

Enhance and segment consumer databases

Better understand your customers and responders

Locate more prospects by finding look-a-likes

Perform area and site location analysis

Understand market potential

Perform advanced statistical analysis and modelling
CAMEO is available for the following countries:
Australia, Austria, Belgium, Brazil, Canada, Czech Republic, Denmark, Estonia, Finland, France,
Germany, Hong Kong, Hungary, Indonesia, Ireland, Italy, Japan, Mexico, Netherlands, New Zealand,
Norway, Poland, Portugal, Romania, Russian Federation, Singapore, Slovakia, South Africa, Spain,
Sweden, Switzerland, the United Kingdom, and United States of America.
Like for all enrichment options, Informatica AddressDoctor validates each address before adding the
CAMEO information. This improves the result by enabling socio-demographic information to be
displayed more accurately.
Accessing CAMEO
In order for CAMEO codes to be available for any country, a customer must first be subscribed to the
address validation reference data for that country and have a valid unlock code. Informatica
AddressDoctor always performs address validation prior to any enrichment process such as
Geocoding and CAMEO encoding. Second, the customer must have a valid CAMEO unlock code (as is
the same with Geocoding) and a subscription to the CAMEO database for their selected countries.
See your Informatica AddressDoctor Sales representative for CAMEO pricing and availability.
Output Details
The CAMEO enrichment offering provides multiple new output fields.
CAMEOStatus provides for troubleshooting issues with the enablement of CAMEO in your
environment. It will tell you if the databases may be missing, if the address cannot be encoded, or if
it was successful.
CATEGORY provides a code and description for the Age and Affluence of the address per country.
GROUP provides a code and description for the neighborhood of the address per country.
INTERNATIONAL provides a code and description of the Age and Affluence at an International level.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
11
MVID is a match key that can be used to link your CAMEO encoded addresses to another Callcredit
Information Group product called CAMEO Analysis. CAMEO Analysis is a separate product offering
that can be licensed directly from Callcredit Information Group.
2.1.5 Informatica AddressDoctor in a Nutshell
Informatica AddressDoctor provides a single, unified library (DLL on Windows or .so/sl shared library
on Unix systems, respectively) with both, C function calls ("C API") and a Java API. The library
accesses *.MD database files, which contain the postal reference data. The software consists of a
single engine which, after initialization, processes input addresses contained in AddressObjects, the
data structure for storing an input address, parameter settings and the processing result (for details
see the “Concepts” Chapter 5 below).
2.2 Supported Platforms
Informatica AddressDoctor version 5 is developed using the C++ programming language. The
resulting API is available for C and Java, provided by a single combined software library. While the
Informatica AddressDoctor documentation provides only examples for the most common
implementation languages reflected by those two API flavors, they may be used to guide
implementation in any programming language, such as C++, C#, VB.Net, PHP, Perl, Ruby or Python.
Note that Informatica AddressDoctor can only provide support at an API level and does not provide
support for implementation-specific questions.
While the primary development platform is Windows and Microsoft Visual Studio 2005, the
Informatica AddressDoctor package is available for many hardware and software platforms,
including Windows, AIX, Solaris, Linux, and HP-Unix platforms. Some of the packages are available on
request if you cover the full cost of porting, build, test, and support. Contact Informatica
AddressDoctor Support (see chapter 9.3) about the individual availability of certain platform package
versions.
The following table lists the supported platforms and system configurations.
Operating System
Processor Architecture
Java Development
Kit
Windows XP Pro SP3
Windows Server 2008 SP2
x86 (32-bit)
Sun SE 7
Windows XP Pro x64 Edition SP2
Windows XP Pro SP3
Windows Server 2008 R2
Windows Server 2008 SP2
x64 (64-bit)
Sun SE 7
SUSE Linux Enterprise Server 10
and 11
x86 (32-bit)
x64 (64-bit)
Sun SE 7
RedHat Enterprise 5 and 6
x86 (32-bit)
x64 (64-bit)
Sun SE 7
RedHat Enterprise 5 and 6
System z (64-bit)
IBM SE 7
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
12
Operating System
Processor Architecture
Java Development
Kit
AIX 5.3, 6, and 7
POWER (64-bit)
IBM SE 7
Solaris 10 and 11
Intel (64-bit)
SPARC (64-bit)
Sun SE 7
HP-UX 11
Intel Itanium (64-bit)
HP SE 5
2.3 System Requirements
Informatica AddressDoctor has been designed to achieve the best possible performance while being
highly efficient in its memory and resource usage. In order to ensure best possible performance, a
fast I/O system and sufficient memory is recommended. At the time of writing, the entire worldwide
postal reference database requires around 15 to 20 GB of disk space. Additional disk space is needed
when United States certified databases are used also. As with most applications, the engine will
perform better if more memory and a faster processor are installed. The minimum requirements are
512 MB of memory for validation operations and 128 MB of memory, if only parsing is required.
To optimize performance, the most commonly used databases should reside in memory (see chapter
6.25 for details). Thus it is recommended to have at least 1 GB RAM available, up to at least 16 GB
needed for loading the full reference database set into memory. As this exceeds the maximum 3 GB
of memory that a 32-Bit operating system can typically address (see chapter 6.25), Informatica
AddressDoctor strongly recommends using 64-Bit operating systems in production.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
13
3. Setup
Informatica AddressDoctor has been designed to be independent of other software modules, thus
easing the setup process. All components have been linked in such a way that no external
dependencies on libraries or DLL files (on Windows) exist, apart from absolutely necessary core
system libraries like KERNEL32.DLL (on Windows) or libc.so (on Linux). In some cases, runtime files or
software development environments may be required for the demo applications.
3.1 General Remarks
Informatica AddressDoctor is provided as download in separate ZIP files (packages). There are
packages for:

Software libraries

Documentation

Postal reference data (see section 9.1)
3.1.1 Software packages
The file names of the packages for the software contain the release date in the format YYMMDD as
well as the release version in the format 5.0.x.yyyy. In this naming convention, x is the major build
version and yyyy is the minor build version with a length of three to four digits. In addition, the file
names include the platform (PPP) and the architecture used (32 or 64-Bit). File names may contain
compiler information also, in case different compiler versions are available for one platform.
The file names will thus be as follows (all examples are without compiler information):
AD5_PPP_32/64_YYMMDD_(5.0.x.yyyy).zip
C/Java Packages:
WIN: Windows
RHT_SUSE: Linux (Red Hat and Suse)
AIX: AIX
SOS: Solaris SPARC
HPU: HP-UX
As an example a file could be named AD5_SOS_32_090210_(5.0.11.384).zip, which contains
Informatica AddressDoctor with C and Java API for Solaris Sparc 32 bit. File names containing extra
compiler information would look like AD5_SOS_32_090410_(5.0.11.392)-sun.studio.11.zip. This
contains Informatica AddressDoctor with C and Java API for Solaris Sparc 32 bit, compiled using Sun
Studio 11.
3.1.2 Documentation package
The documentation for all wrappers and platforms is contained in a ZIP file with a name following
this naming convention:
AD5_DOC_YYMMDD_(5.0.x.yyyy).zip
3.1.3 Postal reference data packages
The postal reference data is available in individual ZIP files for each supported country and territory.
These files are named as follows:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
14
DB5_XXX5ZZ_YYMMDD.zip
Once again the YYMMDD stands for the date of the release of the database. The XXX is replaced by
the ISO-3 alpha code (ISO 3166) as found in the Informatica AddressDoctor country list online
(http://www.addressdoctor.com/en/countries-data/country-list.html), and ZZ denotes the type of
reference data: For now BI for Batch/Interactive, FC for Fast Completion, Cx for CERTIFIED and GC for
Geocoding. An example of a file name would be DB5_DZA5BI_091210.zip for an Algerian
Batch/Interactive database released on December 10th 2009.
3.2 Installing the Library Files
3.2.1 C Installation
The C .lib file requires no special setup. Simply unpack the ZIP file preserving the directory structure.
The following directories (folders) will be created:
/bin
/etc
/include
/lib
/src
The bin directory contains executable sample applications like the ConsoleDemo (see chapter 7 for
details).
At this time, the sub-directories under etc contain XML configuration file examples that must be
copied to your working directory for adjusting the default behaviour of the ConsoleDemo
application. The include directory contains all required header files.
The.dll (.so/sl on Unix) file is contained in the lib directory and must be copied to your shared library
path (echo %PATH% on Windows or echo $LD_LIBRARY_PATH on Unix, see chapter 7.1 also). The code of
the sample applications (see chapter 7.1) is located in the src directory and its sub-directories.
Take extra care to remove any prior versions of Informatica AddressDoctor shared library from your
shared library path to avoid confusion and unnecessary support effort. If used, make sure that your
configuration XML files are present in the directory your application references them from (i.e. in
case of Informatica AddressDoctor ConsoleDemo, the working directory the executable is called
from).
On many UNIX platforms, increasing the thread stack size to at least 1MB is required; for example,
using export AIXTHREAD_STK=1000000 on AIX or export PTHREAD_DEFAULT_STACK_SIZE=1000000 on HP-UX,
and so on. Furthermore, we recommend setting ulimit -s unlimited.
3.2.2 Java Installation
To use Informatica AddressDoctor, Version 5.5.0 and later, through the accompanying JAR archive,
you must have the Java Runtime Environment (JRE) version 7 set up on the device on which you
install Informatica AddressDoctor. Previous versions of Informatica AddressDoctor work with the JRE
version 6. If you want to develop your own applications, you must have the Java platform (JDK) SE 7
installed on the device. However, on HP-UX, you can continue to use HP SE 5 version of the JDK.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
15
You can download the JRE package from the Sun Java website. Note that Informatica AddressDoctor
does not officially support Informatica AddressDoctor Version 5.5.0 and later installations that run
on JRE versions earlier than Version 7.
The Informatica AddressDoctor ZIP archive contains several files that should be extracted to a
directory on your computer preserving the stored folder names (for an explanation of the archive
structure see chapter 3.2.1 above). After extraction, files might need to be copied as follows:
Windows
Copy AddressDoctor5.jar and AddressDoctor5.dll to the classpath of your Java Runtime. For instance
C:\Program Files\Java\jre\lib\ext typically resides in the system-wide classpath, although it is
recommended practice to use and explicitly set application specific classpaths using
the –cp switch, for example after unpacking the ZIP archive to the present working directory (see
also chapter 7.1):
java –Xss2048k -cp bin;lib/AddressDoctor5.jar -Djava.library.path=lib ConsoleDemoJava
Solaris and other Unix versions
Copy AddressDoctor5.jar and libAddressDoctor5.so (resp. libAddressDoctor5.sl in case of the HP-UX
version of the Java wrapper) to the classpath of your Java Runtime. For instance /usr/j2se/jre/lib/ext
typically resides in the system-wide classpath, although it is recommended practice to use and
explicitly set application specific classpaths using
the –cp switch, for example after unpacking the ZIP archive to the present working directory (see
also chapter 7.1):
java –Xss2048k -cp bin:lib/AddressDoctor5.jar -Djava.library.path=lib ConsoleDemoJava
Take extra care to remove any prior versions of Informatica AddressDoctor from your system wide
classpath to avoid confusion and unnecessary support effort. If used, make sure that your
configuration XML files are present in the directory your application references them from (i.e. in
case of the Informatica AddressDoctor ConsoleDemoJava, the working directory the executable is
called from).
Ensure that sufficient memory can be allocated by your application. At this time we recommend
2048k thread stack size if you intend to use the validation functionality, as well as a minimum of
512m of heap space. Assuming the name of the main class of your application is MyApp and you are
using Informatica AddressDoctor on Linux in the lib sub-directory, compile and start it as follows:
javac -cp .:lib/AddressDoctor5.jar MyApp.java
java –Xss2048k –Xms512m –Xmx2048m -cp .:lib/AddressDoctor5.jar
-Djava.library.path=lib MyApp
The Java Virtual Machine may encounter a limit on the amount of heap memory it can assign to an
application, typically between 1.5-2.5 GB on 32-Bit operating systems. This effectively limits the
number of databases that can be preloaded.
On many UNIX platforms, increasing the thread stack size to at least 1MB is required; for example,
using export AIXTHREAD_STK=1000000 on AIX or export PTHREAD_DEFAULT_STACK_SIZE=1000000 on HP-UX,
and so on. Furthermore, we recommend setting ulimit -s unlimited.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
16
Additionally, the IBM J9 JVM will require the JVM call java
OS stack size.
–Xmso2048k
for increasing the
3.3 Installing the Reference Databases
The postal reference databases are named XXX5YY.MD where XXX stands for the ISO-3 alpha code
(ISO 3166) of the country and YY for the database type. The postal reference databases are read only
and platform independent. The same database files may be used on all supported platforms.

Batch/Interactive – ISO5BI.MD

Fast Completion – ISO5FC.MD

Certified – ISO5Cx.MD with x={1,…,n}

Address Code Lookup – ISO5AC.MD

Standard Geocoding – ISO5GC.MD

High Precision Arrival Point Geocoding – ISO5GCAP.MD

Parcel Centroid Geocoding – ISO5GCPC.MD

CAMEO – ISO5CA.MD

Supplementary – ISO5Ex.MD with x={1,…,n}
To use any of these databases, you must have a valid unlock code to indicate that the database
be unlocked.
For example the following databases are presently available for Germany:
DEU5BI.MD
DEU5FC.MD
DEU5GC.MD
DEU5CA.MD
DEU5AC.MD
3.3.1 Installation Notices
All reference database ZIP files are typically unpacked into the same directory, but storing them on
different storage devices is also supported (via SetConfig.xml, see the respective DTD in Appendix
10.1 and chapter 6.5). Several applications may share a common set of reference databases on a
shared read-only drive, although performance might suffer in such a setup (unless all databases are
fully pre-loaded to memory, see chapter 6.25). When full pre-loading of all database files you require
is not an option, Informatica AddressDoctor strongly recommend using SSDs (solid state drives)
instead of mechanical HDDs (hard disk drives) for the database files. This will significantly improve
performance, especially under multi-threading conditions (see chapter 6.25 for more details).
Postal reference database files carry expiration date information to honour data provider license
terms and help customers determine whether they are using current and relevant reference data.
This ExpirationDate information is accessible via GetConfig.xml (see the respective DTD in Appendix
10.1). In certain cases a database file will be no longer accessible when its ExpirationDate has
elapsed, for example, when required by data provider license terms.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
17
Depending on the number of database files in use, it might well be necessary to raise the number of
file handles or descriptors available to a process, for example, by virtue of setting ulimit –n 8192 on
UNIX type operating systems. Note that additional internal limitations of the libc version used might
interfere also, which then calls for an upgrade to the latest patch version or fix pack of the operating
system used.
3.3.2 Special Remarks for USA CASS Certified Mode
The US CASS certified processing requires additional databases that contain CASS related
information such as Carrier Route codes, EWS, ZIPMOVE, LACSLink, DPV, DFS2, SuiteLink, and so on.
This information is contained in files named USA5C1.MD to USA5C25.MD (at the time of writing,
subject to additions).
These CASS specific database files must be present in the database directory in order to have their
respective information available for the CERTIFIED process mode. While the ZIPMOVE file provides
information about past ZIP code changes, the Early Warning System (EWS) file contains information
about upcoming changes to ZIP codes and both are required for CASS certification. The regular EWS
and ZIPMOVE database updates by USPS are made available through Informatica AddressDoctor and
need to be placed in the database directory.
For DPV, LACSLink or SuiteLink information the additional files mentioned above are needed, but
USPS licensing terms do not allow storing this data outside the US. Therefore, this data is available
only to US customers.
Since CASS Cycle L (2007 - 2009), DPV and LACSLink processing are mandatory to achieve a CASS
certification, with CASS Cycle M (2009 - 2011) SuiteLink was made mandatory also. Even if the
certified database files are missing, Informatica AddressDoctor adds ZIP+4 Codes, as long as
USA5BI.MD is available in the database folder (see chapter 6.24 also)
SuiteLink contains suite numbers for business addresses in selective high-rise buildings and targets
high-rise addresses with high-volume default mail. SuiteLink also improves business addressing
information through assignment of suite numbers. The new data provided by the USPS has been
provided to Informatica AddressDoctor end users through an updated USA5C18.MD file. Informatica
AddressDoctor also allows records with input suite data that did not match to the ZIP4 file to go
through the SuiteLink process ignoring the input suite data. If a match is found during SuiteLink
processing the input suite data will be retained in the residue component and output on DAL2 as
required by the USPS to retain the extraneous data.
Residential Delivery Indicator (RDI) processing has been added to Informatica AddressDoctor for the
CASS Cycle N (2011 – 2012) processing and is available with the 5.2.7 release. This component is
optional and is not required to get the postal discounts. The new databases needed for RDI
processing are USA5C22.MD and USA5C23.MD. The RDI processing is intended for parcel shippers,
their agents or analysts. The result of the processing is a single bit of “Y” or “N”. The “Y” represents a
residential delivery and is determined by the zip9 or zip11 not being found in the database. The “N”
represents a business delivery and is determined by the zip9 or zip11 being found in the database.
Starting with version 5.2.9 eLOT (Enhanced Line of Travel) processing has been added to Informatica
AddressDoctor. The feature is enabled automatically as soon as the USA5C24.MD and USA5C25.MD
databases are available. The databases are created and distributed by Informatica AddressDoctor.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
18
The USPS is the official licensor of the eLOT database. Customers requiring eLOT for use with
Informatica AddressDoctor will need to obtain a license to the eLOT data from the USPS and pay the
necessary license fees to the USPS prior to receiving the Informatica AddressDoctor eLOT database
file. Upon proof of license, Informatica AddressDoctor provides the customer with the necessary
entitlements and instructions for obtaining the Informatica AddressDoctor formatted eLOT database
for use with the product.
Contact the USPS National Customer Support Center at 1-800-238-3150 for information on how to
obtain a license or you can check https://ribbs.usps.gov/index.cfm?page=elot for additional
information on eLOT licensing.
3.3.3 Special remarks for Canadian SERP Certified Mode
The Canada SERP processing requires an additional CAN5C1.MD containing the PoCAD (Point of Call
Address Data) which has been introduced with the 2011 SERP cycle. The file has to be placed in the
database directory specified in the SetConfig.xml file. SERP processing will not be possible without
this database.
Starting with Version 5.4.2, Informatica AddressDoctor becomes SERP 2014-compliant. SERP 2014
compliance ensures that Informatica AddressDoctor versions 5.4.2 and later adhere to the following
changes to the postal rules and regulations set by Canada Post:

When the range-based Point of Call Address Database (PoCAD) has only one suite range
available for a given address and if the suite number in the input address is outside the
available range, Informatica AddressDoctor marks that address as invalid. However, if the
input contains a postal code that maps to a Large Volume Receiver (LVR), Informatica
AddressDoctor copies the (input) suite number (to the output) even when the input suite
number does not match any of the corresponding database entries that contain the correct
single suite-civic number combination.

When the range-based PoCAD has only one address associated with a civic street and if the
input address does not match the address available in the database, Informatica
AddressDoctor marks that address as invalid or non-correctable.

When the range-based PoCAD has a Type 2 record that does not have a route identifier and
delivery mode identifier available for a rural address, Informatica AddressDoctor handles
that address in the same way it handles Type 1 addresses. However, the following conditions
apply to the handling of rural civic addresses:
o
If the input address does not have a match in the range-based PoCAD and the postal
code of the input address has a corresponding Type 4 address in the range-based
PoCAD, Informatica AddressDoctor marks the address as VQ (Valid but questionable)
in the SERP category enrichment.
o
If the input address is a rural address with a street having no civic street number,
Informatica AddressDoctor adds the civic street number when a unique correction is
possible. If no unique civic street can be added to the input address, that address is
rejected.
Note that Post Office box numbers from 99900 through 99905 in Canada denote Deliver to Post
Office (DTPO) addresses for retail outlet locations. Addresses with Post Office box numbers from
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
19
99900 through 99905 are specifically meant for parcel delivery and should not be used for other mail
items such as letters, publications, etc.
For more information about SERP 2014 certification, contact Canada Post.
3.3.4 Special Remarks for Australian AMAS Certified Mode
The Australian AMAS certified processing requires 2 additional databases: AUS5C1.MD and
AUS5C2.MD. These database files need to be placed in the database directory specified in
SetConfig.xml in Section <DataBase> with Type=”CERTIFIED” for ISO=”AUS” or “ALL”.
Without the additional files, no AMAS processing is possible.
The databases contain Postal Address File (PAF) data which includes Australia Post’s Delivery Point
Identifiers (DPIDs).
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
20
4. Quick Start Guide
To install Informatica AddressDoctor, unpack the ZIP file for the selected platform in such a way that
the directory structure is preserved. For more information about the ZIP file and directory structure,
see section 3. Similarly, unpack the postal reference database files to a destination directory of your
choice.
Informatica AddressDoctor consists of a single engine which, after initialization, processes input
addresses contained in AddressObjects, the data structure for storing an input address, parameter
settings and the processing result (for details see the "Concepts" chapter 5 below).
4.1 First-Time Use of Informatica AddressDoctor 5
The engine needs to be initialized by a specific sequence:
o
AD_Initialize() must be called to actually initialize the engine. It evaluates the
settings and configures the engine accordingly. Only after this function has returned
successfully AD_GetAddressObject() or any other functions may be called.
o AD_DeInitialize() must be called last to de-initialize the engine; the engine is then
ready to be initialized again; all AddressObjects must have been released by calling
AD_ReleaseAddressObject() before calling AD_DeInitialize().
Consequently, include the following minimal C example code (the program flow is similar for Java)
for correcting a single address from Singapore in your application (also refer to the latest API
documentation, see Appendix 10.2):
AD_AOHandle hAOHandle;
char sResultXML[ 16 * 1024 ];
AD_Initialize(
"<?xml version='1.0' encoding='iso-8859-1' ?>\n"
"<!DOCTYPE SetConfig SYSTEM 'SetConfig.dtd'>\n"
"<SetConfig>\n"
"<General />\n"
"<UnlockCode>(Enter Code here)</UnlockCode>\n"
"<DataBase CountryISO3='ALL' Type='BATCH_INTERACTIVE' Path='/ADDB'
PreloadingType='NONE'/>\n"
"</SetConfig>\n",
NULL,
NULL,
NULL
);
AD_GetAddressObject( &hAOHandle );
AD_SetInputDataXML( hAOHandle,
"<?xml version='1.0' encoding='ISO-8859-1'?>\n"
"<!DOCTYPE InputData SYSTEM 'InputData.dtd'>\n"
"<InputData>\n"
"<AddressElements>\n"
"<Country Item='1' Type='NAME'>SGP</Country>\n"
"<Locality Item='1' Type='COMPLETE'>Singapore</Locality>\n"
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
21
"<PostalCode Item='1' Type='FORMATTED'>048624</PostalCode>\n"
"<Street Item='1' Type='COMPLETE'>Raffles Place</Street>\n"
"<Number Item='1' Type='COMPLETE'>80</Number>\n"
"<Building Item='1' Type='COMPLETE'>#50-01 UOB Plaza 1</Building>\n"
"<Organization Item='1' Type='NAME'>AddressDoctor GmbH</Organization>\n"
"</AddressElements>\n"
"</InputData>\n"
);
AD_Process( hAOHandle );
AD_GetResultXML( hAOHandle, sResultXML, sizeof( sResultXML ) );
AD_ReleaseAddressObject( hAOHandle );
AD_DeInitialize();
Ensure that the minimal configuration XML (see SetConfig.dtd in chapter 10.1 for configuration
setting details) passed upon AD_Initialize() contains a valid Unlock Code you received when
purchasing the Informatica AddressDoctor library and the correct destination path that your
reference database files have been unpacked to. See InputData.dtd in chapter 10.1 for more details
on the structure of data input as XML using AD_SetInputDataXML(). Depending on your requirements,
there is also the possibility of using 16 bit input and output (which is the default for Java), see
chapter 5.8 for details.
Now compile your application as usual, making sure that Informatica AddressDoctor dependencies
are met. How to achieve this varies greatly between platforms and compilers, for example on Linux
and using gcc, the following command will build the ConsoleDemo C++ example code (see chapters
3.2.1 and 7.1):
gcc -Iinclude -Llib -lAddressDoctor5 -lpthread -o bin/ConsoleDemo src/ConsoleDemo.cpp
The output of Informatica AddressDoctor processing will be provided in sResultXML in XML format,
see Result.dtd (chapter 10.1) for the structure of the XML result from AD_GetResultXML():
<?xml version="1.0" encoding="UTF-16"?>
<Result
ProcessStatus="V2"
ModeUsed="BATCH"
Count="1"
CountOverflow="NO"
CountryISO3="SGP"
PreferredScript="DATABASE"
PreferredLanguage="DATABASE">
<ResultData ResultNumber="1"
MailabilityScore="4"
ResultPercentage="100.00"
ElementResultStatus="F0F000F0F000404440E0"
ElementInputStatus="60600060600020222060"
ElementRelevance="10100010100000000010">
<AddressElements>
<Country Type="NAME_EN" Item="1">SINGAPORE</Country>
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
22
<Locality Item="1">SINGAPORE</Locality>
<PostalCode Item="1">048624</PostalCode>
<Street Item="1">RAFFLES PLACE</Street>
<Number Item="1">80</Number>
<Building Item="1">UOB PLAZA 1</Building>
<SubBuilding Item="1"># 50</SubBuilding>
<SubBuilding Item="2">01</SubBuilding>
<Organization Item="1">ADDRESSDOCTOR GMBH</Organization>
</AddressElements>
<AddressLines>
<RecipientLine Line="1">ADDRESSDOCTOR GMBH</RecipientLine>
<DeliveryAddressLine Line="1">80 RAFFLES PLACE</DeliveryAddressLine>
<DeliveryAddressLine Line="2">#50-01 UOB PLAZA 1</DeliveryAddressLine>
<CountrySpecificLocalityLine Line="1">SINGAPORE 048624</CountrySpecificLocalityLine>
<FormattedAddressLine Line="1">ADDRESSDOCTOR GMBH</FormattedAddressLine>
<FormattedAddressLine Line="2">80 RAFFLES PLACE</FormattedAddressLine>
<FormattedAddressLine Line="3">#50-01 UOB PLAZA 1</FormattedAddressLine>
<FormattedAddressLine Line="4">SINGAPORE 048624</FormattedAddressLine>
</AddressLines>
<AddressComplete>ADDRESSDOCTOR GMBH
80 RAFFLES PLACE
#50-01 UOB PLAZA 1
SINGAPORE 048624
</AddressComplete>
</ResultData>
</Result>
Finally, an example for Java (note that in comparison to the C example the “Encoding” attribute for
the “Input” and “Result” elements has to be explicitly set to UTF-16 or UCS-2 via Parameters.xml
here as well as WriteXMLEncoding for both SetConfig.xml and Parameters.xml, as Java defaults to its
native 16 Bit string handling, see chapter 5.8):
private static AddressObject m_oAO;
public static void main(String[] args) {
int iLastError = 0;
String sResultXML = "";
try
{
AddressDoctor.initialize(
"<?xml version='1.0' encoding='UTF-16' ?>"+
"<!DOCTYPE SetConfig SYSTEM 'SetConfig.dtd'>"+
"<SetConfig><General WriteXMLEncoding='UTF-16' />"+
"
<UnlockCode>(Enter Code here)</UnlockCode>"+
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
23
"
"
<DataBase CountryISO3='ALL' Type='BATCH_INTERACTIVE'"+
Path='/ADDB' PreloadingType='NONE' />"+
"</SetConfig>", null,
"<?xml version='1.0' encoding='UTF-16' ?>"+
"<!DOCTYPE SetConfig SYSTEM 'Parameters.dtd'>"+
"<Parameters WriteXMLEncoding='UTF-16'>"+
"
<Input Encoding='UTF-16' />"+
"
<Result Encoding='UTF-16' />"+
"</Parameters>", null);
iLastError = AddressDoctor.getLastError();
System.out.println("Using AddressDoctor version: " + AddressDoctor.getVersion());
System.out.println("Init returned " + iLastError);
} catch (AddressDoctorException ex)
{
System.out.println("Exception while initializing "+
"AddressDoctor: " + ex.toString());
System.out.println("Further processing not possible, "+
"application ends!");
return;
}
try
{
m_oAO = AddressDoctor.getAddressObject();
} catch (AddressDoctorException ex)
{
System.out.println("Exception while trying to get an "+
"AddressObject: " + ex.toString());
System.out.println("Further processing not possible, "+
"application ends!");
try
{
AddressDoctor.deinitialize();
} catch (AddressDoctorException ex2){}
return;
}
try
{
m_oAO.setInputDataXML(
"<?xml version='1.0' encoding='UTF-16'?>"+
"<!DOCTYPE InputData SYSTEM InputData.dtd'>"+
"<InputData>"+
"<AddressElements>"+
"
<Key>4711</Key>"+
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
24
"
<Country Item='1' Type='NAME'>SGP</Country>"+
"
<Locality Item='1' Type='COMPLETE'>Singapore</Locality>"+
"
<PostalCode Item='1' Type='FORMATTED'>048624</PostalCode>"+
"
<Street Item='1' Type='COMPLETE'>Raffles Place</Street>"+
"
<Number Item='1' Type='COMPLETE'>80</Number>"+
"
<Building Item='1' Type='COMPLETE'>#50-01 UOB Plaza 1</Building>"+
"
<Organization Item='1' Type='NAME'>AddressDoctor GmbH</Organization>"+
"</AddressElements>"+
"</InputData>");
} catch (Exception ex)
{
System.out.println("Data could not be assigned! Closing "+
"application: " + ex.toString());
try
{
AddressDoctor.releaseAddressObject(m_oAO);
AddressDoctor.deinitialize();
} catch (AddressDoctorException ex2){}
return;
}
try
{
AddressDoctor.process(m_oAO);
iLastError = AddressDoctor.getLastError();
System.out.println("Process returned " + iLastError);
} catch (AddressDoctorException ex)
{
System.out.println("Exception during process: " +
ex.toString());
}
if (iLastError == 0)
{
try
{
sResultXML = m_oAO.getResultXML();
} catch (AddressDoctorException ex)
{
System.out.println("Exception while trying to get "+
"ResultXML: " + ex.toString());
return;
}
System.out.println(sResultXML);
}
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
25
try
{
AddressDoctor.releaseAddressObject(m_oAO);
AddressDoctor.deinitialize();
} catch (AddressDoctorException ex)
{
System.out.println("Exception while releasing the AO and "+
"de-initializing AddressDoctor: " + ex.toString());
}
}
}
Refer also to the C and Java source code provided for Informatica AddressDoctor 5 ConsoleDemo
described in chapter 7.1.
4.2 New Features and Enhancements in Informatica AddressDoctor
Informatica AddressDoctor adds features and enhancements in each product release. The following
lists describe the features and enhancements in the current release and in earlier releases. For
complete information on the product changes in any release, consult the release notes for the
release.
4.2.1 What’s new in version 5.2.8
Informatica AddressDoctor introduces the following features and enhancements in version 5.2.8:
CAMEO social and demographic analysis
Informatica AddressDoctor returns CAMEO code values for the following countries:
Australia, Austria, Belgium, Brazil, Canada, Czech Republic, Denmark, Estonia, Finland, France,
Germany, Hong Kong, Hungary, Italy, Japan, Mexico, Netherlands, New Zealand, Norway, Poland,
Portugal, Romania, Singapore, Slovakia, Spain, Sweden, Switzerland, the United Kingdom, and the
United States.
Australian localities and vanity names
Informatica AddressDoctor maintains a valid vanity name from an input field to an output field when
you run the engine in Certified mode.
Australian Incremental Change File
Informatica AddressDoctor supports the Incremental Changes File through the AMAS Certified
Software Program. The file contains is a list of delivery point IDs that have changed between releases
of the Postal Address File.
SERP and AMAS Certification
The software library has passed the respective requirements for SERP and AMAS 2012 certification.
Czech Republic Address Validation Enhancements
Reference data for the Czech Republic includes district numbers.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
26
AddressType Output Field
Informatica AddressDoctor populates the AddressType output field for all countries whose reference
data supports the identification of the address type.
Geocoding Enhancements
You can perform geocoding without first cleansing and standardizing the address data. As always,
Informatica AddressDoctor recommends cleansing all addresses prior to geocoding in order to
provide the most accurate coordinates possible.
4.2.2 What’s new in version 5.2.9
Informatica AddressDoctor introduces the following features and enhancements in version 5.2.9:
India Enhancements
Informatica AddressDoctor provides enhanced parsing and validation operations and also provides
enhancements to the India reference database.
Note that Informatica AddressDoctor does not support the Parse-Only mode for India address
validation.
Note also that older versions of the database are incompatible with version 5.2.9. Use the older
database for version 5.2.8 and earlier versions, and use the new database for version 5.2.9 and later
versions.
Country Improvements
Informatica AddressDoctor offers improved address validation for the following countries:
Italy, Netherlands, Singapore, Hong Kong, Malaysia, Great Britain, Germany, and the United States.
United States improvements include support for eLOT sequence numbers and street name aliases.
Enhancements to Transliteration, Parsing, and Formatting for Japan Addresses
Informatica AddressDoctor has made significant improvements to the way Japan addresses are
processed and validated. Due to these changes, the format of the Japanese database has changed.
To obtain the new functionality, download the new database for Japan.
Note: if you use a previous version of the API with the new database, you will not benefit from the
enhancements in version 5.2.9.
4.2.3 What’s new in version 5.3.0
Informatica AddressDoctor introduces the following features and enhancements in version 5.3.0:
Address Resolution Code
The Address Resolution Code is a new twenty-character output string that is similar to the Element
Result Status field. It is populated for all non-valid (process status = Ix) records. The Address
Resolution Code explains why an address is rejected and directs you to possible solutions.
Extended Element Result Status Code
The Extended Element Result Status code is a new twenty-character output string that is similar to
the Element Result Status field. It is populated for valid or corrected addresses. The code indicates
that additional information may be available in the reference database for the given address.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
27
Standardization of Non-valid Address Elements
Informatica AddressDoctor can standardize address elements for non-valid (process status = Ix)
addresses. Standardized addresses can improve downstream business processes such as matching
and de-duplication.
Dual Addresses
You can specify the address type against which to validate an address.
Additional Results in Fast Completion and Interactive Modes
Informatica AddressDoctor has increased the upper limit of the suggestion list from twenty to one
hundred results. In addition, house number ranges can be expanded for countries where individual
house numbers exist.
Support for Ireland
Informatica AddressDoctor introduces support for Ireland.
British Forces Postal File
Informatica AddressDoctor implements the Royal Mail British Forces Post Office (BFPO) data.
Multi-Language Support for Belgium
Informatica AddressDoctor introduces multi-language support for Belgian addresses. You can specify
the language of the output, or you can preserve the language of the input address. Use the
PreferredLanguage parameter to write the output address in French, Flemish or German.
Language ISO Code Output
Informatica AddressDoctor can write the ISO code language as output when the output address
contains data from the reference database. The output is an ISO 639 3-letter code, i.e. “DEU” for
Germany. For transliterated output, the original language will be reported, for example “JPN” for
romanized Japanese output.
Austrian Postal Changes
Informatica AddressDoctor supports the latest address format for Austria. Austrian Post changed its
address format in 2011. Informatica AddressDoctor 5.3.0 reflects the changes.
New and Removed Databases
Informatica AddressDoctor introduces the databases for the following countries:

Curacao (CUW)

Sint Maarten (SXM)

Bonaire, Sint Eustatius and Saba (BES)

Montenegro (MNE)

Serbia (SRB)

South Sudan(SSD)
The following databases are no supported:

Serbia and Montenegro (SCG)

Netherlands Antilles (ANT)
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
28
Australian Sub-Building Changes
Informatica AddressDoctor adds sub-building data to the Australian Database as of August 2012.
Version 5.3.0 validates addresses with sub-building information for Australian addresses in Batch
and Interactive modes. Informatica AddressDoctor supports two versions of the Australian
databases to ensure compatibility with previous versions of the software library.
Singapore Updates
The reference data from Singapore Post includes floor, suite and door values. The Software Library
supports the information in all processing modes.
Japan Updates
Japan reference data includes house numbers and address codes for Japan. Due to these changes, a
new format of the Japanese database has been introduced. Informatica AddressDoctor supports two
versions of the Japan data to ensure compatibility with previous versions of the software library.
Certifications
Informatica AddressDoctor is officially certified by five postal organizations around the globe: the
United States Postal Service, Canada Post, Australia Post, New Zealand Post, and La Poste of France.
4.2.4 What’s new in version 5.3.1
Informatica AddressDoctor introduces the following features and enhancements in version 5.3.1:
Address Resolution Code Updates
Informatica AddressDoctor adds output values to the Address Resolution Code.
Extended Element Result Status
Informatica AddressDoctor adds output values to the Extended Element Result Status.
Country Improvements and Enrichments
Informatica AddressDoctor offers improved address validation logic for the following countries:
France, South Africa, China, Japan, the United Kingdom, and Serbia.
Informatica AddressDoctor supports the Choumei Aza code as an enrichment for Japan.
Informatica AddressDoctor supports the Unique Delivery Point Reference Number as an enrichment
for the United Kingdom.
Informatica AddressDoctor supports the Postal Address Code as an enrichment for Serbia.
Canada Enhancements
Informatica AddressDoctor adds enhancements in the following areas:

Multi-language support

Thirteen-character abbreviation for localities

Rural Route information
4.2.5 What’s new in version 5.4.0
Informatica AddressDoctor introduces the following features and enhancements in version 5.4.0:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
29
Point Address Geocoding
Point Address Geocoding enables highly accurate and precise “to the premise” geocoding points for
properties and premises. Point address geocoding includes the following types of geocoding:

Arrival Point geocoding. The geo coordinates are calculated for a point that is placed in the
center of a street segment in front of the house.

Parcel Centroid geocoding. The geo coordinates are calculated for a point that is at the
geographic center of the parcel of land.
4.2.6 What’s new in version 5.4.1
Informatica AddressDoctor introduces the following features and enhancements in version 5.4.1:
Address Code Lookup
Address Code Lookup enables you to enter a country-specific address code and retrieve the
complete or partial address for the code. Address Code Lookup is currently available for the
following countries:

Germany

Great Britain

Japan

South Africa

Serbia
Country Improvements and Enrichments
The parser handles fields that are unique to Turkish addresses.
Informatica AddressDoctor now supports native parsing for China. The parsing improvements enable
better-quality address validation for China.
Informatica AddressDoctor supports the two characters that Swiss Post has added to the Swiss
postal codes.
Informatica AddressDoctor can now return the new address code for deprecated or outdated
addresses for Japan.
Informatica AddressDoctor has improved the parsing and validation of Japan addresses, including
the following:

Support for the transliteration of the JIS-2004 Japanese character set into the Latin character
set.

Support for the Preserve Input Script parameter for Japan addresses.
Informatica AddressDoctor supports additional variations for Post Office Box values in Australian
addresses.
Informatica AddressDoctor has improved the performance of address validation for India.
Informatica AddressDoctor also returns suggestions for partial or incomplete Indian addresses in
Interactive mode.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
30
Informatica AddressDoctor provides the National Address Database ID as an enrichment output field
for South African addresses.
Informatica AddressDoctor provides the Brazilian Institute of Geography and Statistics (IBGE) code as
an enrichment output field for Brazilian addresses.
Informatica AddressDoctor provides enrichment output fields for the Amtliche Gemeindeschlüssel
(AGS), the locality ID, and the street ID in German addresses.
Sort Order for House Numbers
Informatica AddressDoctor returns a list of house numbers in logical order instead of alphanumeric
order. For example, Informatica AddressDoctor now sorts and returns numbers in the following
logical order:
1, 2, 3, 11, 12, 13, 14, 21, 22
Unlock Codes and Engine Expiration
Informatica AddressDoctor includes improvements and changes that relate to unlock codes and
engine expiration. Starting with release 5.4.1, new unlock codes are needed for supplementary
databases. You must reinitialize the 5.4.1 engine with the new unlock codes in order to enable the
supplementary databases.
In addition, the GetConfig.xml file reflects the status of the engine at the time of the AD_Initialize
call.
SendRight Certification
Informatica AddressDoctor has passed the 2014 Cycle of the SendRight Certification.
4.2.7 What’s New in Version 5.4.2
Informatica AddressDoctor introduces the following features and enhancements in version 5.4.2:
SERP 2014 Compliance
Informatica AddressDoctor is SERP 2014‐compliant.
Extended Coverage for Point Geocoding
Informatica AddressDoctor extends point geocoding support to addresses in Austria, Denmark,
Germany, the Netherlands, and Sweden
4.2.8 What’s new in version 5.5.0
Informatica AddressDoctor introduces the following features and enhancements in version 5.5.0:
Support for Single Line Address Validation
Informatica AddressDoctor can parse and validate addresses that are entered in a single line in Fast
Completion mode. Single line address validation is available for the following countries:

Australia

Canada

Germany

Great Britain

New Zealand
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
31

United States
Support for Taiwan
Informatica AddressDoctor database coverage includes Taiwanese (Republic of China) addresses.
Note that Informatica AddressDoctor currently supports Taiwanese addresses in the Latin script.
Support for Locality Aliases (Vanity Names)
You can retain locality aliases, also known as vanity names, in the validated output.
Support for Java Version 7
Informatica AddressDoctor 5.5.0 uses version 7 of the Java Run-Time Environment. To develop your
own applications, you must install the Java Development Kit SE 7 on the development machine.
Note: You can continue to use the HP SE 5 version of the Java Development Kit on a machine that
runs the HP-UX operating system.
Country Improvements
Informatica AddressDoctor supports the new street name-based address system applied in South
Korea.
Informatica AddressDoctor supports the Postal Address Code as an enrichment for addresses in
Austria. Informatica AddressDoctor supports the INSEE code as an enrichment for addresses in
France.
Informatica AddressDoctor supports Gmina codes, Locality TERYT IDs, and Street TERYT IDs as
enrichments for addresses in Poland.
Support for Preserving Input Scripts
You can preserve the input script of addresses from Belarus, China, Greece, Kazakhstan, Macedonia,
Russia, and Ukraine.
Cyrillic Support for Belarus and Macedonia
Informatica AddressDoctor extends Cyrillic transliteration support to Belarus and Macedonia
addresses. You can enter and validate Belarus and Macedonia addresses in the native script.
Enhancements to United States Address Validation
Informatica AddressDoctor Version introduces the following improvements to United States address
processing:

Support for default unique ZIP code assignments

Support for locality name override

Improved handling of delivery instructions

Improved handling of leading zeros in sub-building number elements
4.2.9 What’s New in Version 5.6.0
Informatica AddressDoctor introduces the following features and enhancements in version 5.6.0:
Country Improvements
Informatica AddressDoctor adds support for Taiwan addresses in the Mandarin Traditional Chinese
script, the native and official script in Taiwan.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
32
Informatica AddressDoctor adds an attribute, GlobalPreferredDescriptor, to specify the output
format for street, building, and sub-building element descriptors in addresses from Australia and
New Zealand.
Informatica AddressDoctor adds support for Kilometer information as additional street information
in valid Brazil addresses.
Informatica AddressDoctor adds support for county and sub-building information in the Fast
Completion output for United States addresses.
Informatica AddressDoctor adds support for the new seven-digit postal codes in Israel.
Enhancements for Countries in the DACH Region
Informatica AddressDoctor adds support for keywords such as Zimmer and App in the house number
field of addresses from Germany, Austria, and Switzerland. Informatica AddressDoctor parses the
Zimmer and App information in the House Number field as sub-building information.
Japan Enhancements
Informatica AddressDoctor adds support for Ban or block information in Japan addresses.
Informatica AddressDoctor adds support for Gaiku code in Japan addresses. Informatica
AddressDoctor now provides old and new Choumei Aza codes and the Gaiku code in Japan address
output and supports a combination of the Choumei Aza code and the Gaiku code in Address Code
Lookup for Japan addresses.
Spain Enhancements
Informatica AddressDoctor provides improved reference address data and the following validation
improvements for Spain addresses:

Identification of the building name and street name in the Delivery Address Line 1 field.

Addition of a slash symbol (/) between a building element and a sub-building element when
the sub-building element is a number.
United Kingdom Enhancements
Informatica AddressDoctor adds support for rooftop geocoding for the United Kingdom addresses.
Informatica AddressDoctor adds support for Address Key values in the United Kingdom addresses.
4.2.10 New Parameters Added in Version 5.6.0
Informatica AddressDoctor adds the following new parameters and values in Version 5.6.0.

GlobalPreferredDescriptor attribute for the Result element in Parameters.DTD. Configures
the output format for street, building, and sub-building element descriptors in Australia and
New Zealand addresses, and the Strasse element in Germany addresses. Supported values
are DATABASE, LONG, SHORT, and PRESERVE_INPUT.

ADDRESS_KEY attribute for the SupplementaryGB element in Result.DTD. Provides the
address key as an address enrichment to the United Kingdom addresses.

GAIKU_CODE attribute for the SupplementaryJP element in Result.DTD. Provides the Gaiku
code as an address enrichment to Japan addresses.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
33
4.2.11 Setting up Informatica AddressDoctor
Informatica AddressDoctor processes input addresses (AD_Process()) contained in the
AddressObjects. As there is only one engine per process, there is no Informatica AddressDoctor
handle.
The engine needs to be initialized by a specific sequence:

AD_Initialize() must be called to actually initialize the engine. It evaluates the settings and
configures the engine accordingly. Only after this function has returned successfully
AD_GetAddressObject() or any other functions may be called.

AD_DeInitialize() must be called last to de-initialize the engine; the engine is then ready to
be initialized again; all AddressObjects must have been released by calling
AD_ReleaseAddressObject() before calling AD_DeInitialize().
The engine stores the following data for its internal use (see SetConfig.dtd):

General engine configuration, i.e., the maximum amount of memory the engine may request
from the OS

The access codes; at least one valid access code must be supplied when calling AD_Initialize()

Optional preloading parameters for the databases
In addition, the engine stores the following parameter data as a default for the AddressObjects (see
Parameters.dtd in chapter 10.1):

Process parameters, i.e., the processing mode to be used

Input parameters, i.e., which input encoding is to be used

Format specifications for the result, i.e., Casing specifications
This configuration data has default values as specified by the corresponding SetConfig.dtd; they can
be changed by passing a corresponding config XML as parameter to AD_Initialize():
There is no way to change the configuration after AD_Initialize() has been called, this parameter
configuration data is used by the AddressObjects by default, when no alternative setting is made for
a specific AddressObject.
4.2.12 AddressObjects
The AddressObject is a data structure for storing an input address, parameter settings and a result.
AddressObjects store the following (configuration) data:
 Parameter settings (see Parameters.dtd)
 An input address (see InputData.dtd)
 A result (see Result.dtd)
 The last return code
AddressObjects should not be created and destroyed frequently, but rather be reused for
performance reasons. Specifically, the parameter settings should be reused to avoid repeated
settings overhead.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
34
Informatica AddressDoctor manages all AddressObjects. Only a specific number of AddressObjects
can be created. This number can be set in the initialization phase of the engine (using the
“MaxAddressObjectCount” attribute). Recommended setting would be between once and twice the
number of threads, which must be set using the “MaxThreadCount” attribute and is currently limited
to a practical maximum value of 32 threads (see chapter 5.36)
Note that the default parameter settings differ between Informatica AddressDoctor 4 and 5, for
instance the standard settings for script and casing (see chapters 5.12 and 5.14 for details).
4.2.13 Direct API
The engine in- and output in Informatica AddressDoctor 5 is based on a XML API. See the
corresponding DTDs in Appendix 10.1 and the following chapter 5 for details.
To ease the transition from Version 4, the engine partly supports setting and getting some of the
XML values and attributes directly, for example
AD_SetInputAddressElement( hAOHandle, "PostalCode", 1, "67133" )
sets the item 1 postal code to "67133".
To set a street and a dependent street using the direct API in C, the “Item” parameter has to be 1 for
the first and 2 for the second:
AD_SetInputAddressElement( hAOHandle, "Street", 1, NULL, "Main St 5" );
AD_SetInputAddressElement( hAOHandle, "Street", 2, NULL, "Dependent St 8" );
For example, to set 3 formatted address lines, the “Line” parameter has to be set from 1 to 3:
AD_SetInputAddressLine( hAOHandle, "FormattedAddressLine", 1, "AddressDoctor GmbH" );
AD_SetInputAddressLine( hAOHandle, "FormattedAddressLine", 2, "Roentgenstr. 9" );
AD_SetInputAddressLine( hAOHandle, "FormattedAddressLine", 3, "D-67133 Maxdorf" );
Similarly, setting street and dependent street in Java:
m_oAO.setInputAddressElement("Street", 1, "COMPLETE", "Main St 5");
m_oAO.setInputAddressElement("Street", 2, "COMPLETE", "Dependent St 8");
And setting 3 formatted address lines in Java:
m_oAO.setInputAddressLine("FormattedAddressLine", 1, "AddressDoctor GmbH");
m_oAO.setInputAddressLine("FormattedAddressLine", 2, "Roentgenstr. 9");
m_oAO.setInputAddressLine("FormattedAddressLine", 3, "D-67133 Maxdorf");
Both kinds of API functions may be intermixed although this is not recommended; specifically, note
that calling AD_SetInputDataXML() clears any possibly existing input data beforehand as input may not
be assigned using both, direct and XML API (see the respective return code in chapter 5.32 also).
Complete direct API examples in C and Java follow (also refer to the latest API documentation, see
Appendix 10.2):
AD_AOHandle hAOHandle;
char sCompleteAddress[ 4096 ];
AD_U32 ulNumResults;
AD_Initialize(
"<?xml version='1.0' encoding='iso-8859-1' ?>\n"
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
35
"<!DOCTYPE SetConfig SYSTEM 'SetConfig.dtd'>\n"
"<SetConfig>\n"
"<General />\n"
"<UnlockCode>(Enter Code here)</UnlockCode>\n"
"<DataBase CountryISO3='ALL' Type='BATCH_INTERACTIVE' Path='/ADDB'
PreloadingType='NONE'/>\n"
"</SetConfig>\n",
NULL,
NULL,
NULL
);
AD_GetAddressObject( &hAOHandle );
AD_SetInputAddressElement( hAOHandle, "Country", 1, NULL, "SGP" );
AD_SetInputAddressElement( hAOHandle, "Locality", 1, NULL, "Singapore" );
AD_SetInputAddressElement( hAOHandle, "PostalCode", 1, NULL, "048624" );
AD_SetInputAddressElement( hAOHandle, "Street", 1, NULL, "Raffles Place" );
AD_SetInputAddressElement( hAOHandle, "Number", 1, NULL, "80" );
AD_SetInputAddressElement( hAOHandle, "Building", 1, NULL, "#50-01 UOB Plaza 1" );
AD_SetInputAddressElement( hAOHandle, "Organization", 1, NULL, "AddressDoctor GmbH" );
AD_Process( hAOHandle );
AD_GetResultCount( hAOHandle, &ulNumResults );
if( ulNumResults > 0 )
AD_GetResultAddressComplete( hAOHandle, 1, sCompleteAddress, sizeof( sCompleteAddress
) );
AD_ClearData();
another input address
// Not necessary here, only if hAOHandle were to be filled with
AD_ReleaseAddressObject( hAOHandle );
AD_DeInitialize();
Or, alternatively:
private static AddressObject m_oAO;
public static void main(String[] args) {
try {
// Initialize the engine
AddressDoctor.initialize(
"<?xml version='1.0' encoding='UTF-16' ?>"+
"<!DOCTYPE SetConfig SYSTEM 'SetConfig.dtd'>"+
"<SetConfig><General WriteXMLEncoding='UTF-16' />"+
"
<UnlockCode>(Enter Code here)</UnlockCode>"+
"
<DataBase CountryISO3='ALL' Type='BATCH_INTERACTIVE'"+
"
Path='/ADDB' PreloadingType='NONE' />"+
"</SetConfig>", null,
"<?xml version='1.0' encoding='UTF-16' ?>"+
"<!DOCTYPE SetConfig SYSTEM 'Parameters.dtd'>"+
"<Parameters WriteXMLEncoding='UTF-16'>"+
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
36
"
<Input Encoding='UTF-16' />"+
"
<Result Encoding='UTF-16' />"+
"</Parameters>", null);
// Get an AddressObject to use
m_oAO = AddressDoctor.getAddressObject();
// Set the address elements
m_oAO.setInputAddressElement("Country", 1, "ISO_3", "SGP");
m_oAO.setInputAddressElement("Locality", 1, null, "Singapore");
m_oAO.setInputAddressElement("PostalCode", 1, null, "048624");
m_oAO.setInputAddressElement("Street", 1, "NAME", "Raffles Place");
m_oAO.setInputAddressElement("Number", 1, null, "80");
m_oAO.setInputAddressElement("Building", 1, null, "#50-01 UOB Plaza 1");
m_oAO.setInputAddressElement("Organization", 1, null, "AddressDoctor GmbH");
// Process the AddressObject
AddressDoctor.process(m_oAO);
// If there is at least one result, print the address on the screen
if (m_oAO.getResultCount() > 0)
System.out.println(m_oAO.getResultAddressComplete(1));
// Clear the AddressObject so that it may be filled with another input address
m_oAO.clearData();
// Release the AddressObject, all AddressObjects must be released to deinitialize
AddressDoctor.releaseAddressObject(m_oAO);
// Deinitialize the engine
AddressDoctor.deinitialize();
}
catch (AddressDoctorException e) {
System.exit(1);
}
}
Take note of the native XML example shown in chapter 4.1 to understand the differences between
direct and XML type API usage for evaluation of which API mode might better suit your needs.
Also, the example given above pertains 8 Bit data handling, see chapter 5.8 “Input and Output
Encoding” for the differences when handling 16 Bit data (which is the default for Java).
4.2.14 Transliteration (formerly UniString Object)
The “Transliteration only” process mode in Version 4 via the UniString object has been superseded:
To obtain transliterated address elements without validation, simply set the “Mode” attribute of the
“Process” element in Parameters.xml (see DTD in chapter 10.1) to PARSE using AD_SetParametersXML()
before submitting your AddressObject to AD_Process(). For this specific use case, setting an
“OptimizationLevel” of NARROW would also be recommended (see chapter 5.33).
4.2.15 Unlock Code Mechanism
The Informatica AddressDoctor unlock code mechanism has been slightly redesigned, note that
multiple unlock codes are to be passed as separate XML elements (see chapter 6.4) and that
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
37
information which databases have been unlocked may be queried using AD_GetConfigSettingsXML() see chapter 6.6 for details.
4.2.16 Unlock Codes and Engine Expiration
Informatica AddressDoctor includes the following improvements and changes regarding unlock
codes and engine expiration.
4.2.17 Not Yet Valid Unlock Codes
Previous versions of the engine reported unlock codes that were not yet valid with a status of
“EXPIRED” in the GetConfig.xml file. Informatica AddressDoctor now reports these codes with a
status of “NOT_VALID_YET”. In addition, Informatica AddressDoctor no longer reports the following
warning for these codes when calling AD_Initialize or AD_InitializeW:
AD_SC_WRN_INIT_UNLOCKCODE_EXPIRED = 2 = “The SetConfig.xml contained at least one expired
or not yet valid unlock code”
4.2.18 Adjacent Unlock Codes
Unlock codes that are adjacent to the currently valid unlock codes or unlock codes that overlap are
now handled correctly by extending the internally computed engine expiration date accordingly. In
the GetConfig.xml file, adjacent unlock codes are reported with a status of “NOT_VALID_YET”.
4.2.19 Engine Expiration
In order to use the library, valid unlock codes of type VALIDATION are required. If no valid codes are
present in the SetConfig.xml file, the engine goes into an “expired” state. The engine does not accept
any process calls in the expired state. Instead, it returns the following critical error code:
AD_SC_CERR_EXPIRED = -1601 - “The engine usage period has expired or is not activated yet”
The same situation occurs with a call to AD_Initialize and AD_InitializeW. In Version 5.4.1 and later,
the engine is initialized so that a GetConfig.xml file can be retrieved to get details about the unlock
codes. Previous versions did not allow any calls to the engine. However, it should be noted that no
process calls can succeed when the engine is in an “expired” state.
If the expiration date is reached and a call to AD_Process is made, the engine goes into an “expired”
state and returns the above error code. The engine must be re-initialized with valid unlock codes.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
38
5. Concepts
This chapter explains the major functions of Informatica AddressDoctor and shows how
Transliteration, Formatting and the two major address processing stages, Parsing and Correction,
interact. The entire functionality is implemented through two objects, Informatica AddressDoctor
(frequently referred to as “Engine”) and AddressObject. For the C and Java interfaces these objects
have been mapped to functions.
The following figure may act as a general guideline in understanding the sequence of the different
processing stages an address is subjected to by Informatica AddressDoctor. A more detailed
discussion is given in chapter 5.6.
Informatica AddressDoctor supports address parsing and address verification for more than 240
countries and territories through one API. Consequently, Informatica AddressDoctor scales easily
from a single country setup to multi country or even global scenarios.
5.1 Character Set Mapping
In today’s computer environments we encounter numerous character sets. In the early days of
computing most systems used either EBCDIC or ASCII character sets. Programmers and system
designers used the concept of code pages to cope with the limited characters that were available on
these computers.
Several years later Unicode was introduced to address the problems associated with the large
number of different character sets that are used around the world. With room for more than 65000
characters it seemed like a sufficient solution at first. Now even this character set has become too
small to represent all characters from around the world, thus newer versions of Unicode support
well over a million. When data is transferred or transported between different computer systems,
character set mapping problems frequently occur.
These problems result from different numerical values that are assigned to the “same” character in
different character sets. While the basic ASCII characters of the Latin alphabet like A, B, and C are
usually represented with the same numerical values, the problems often start once accented or
other non-standard characters are used. These are often encoded differently in each character set.
The following table shows a comparison of the decimal values for some characters in the Latin and
Unicode character set:
Character
Latin
Unicode
A
65
65
B
66
66
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
39
Character
Latin
Unicode
Å
143
197
ß
225
223
さ
—
12373
While some characters have the same numerical representation in both character sets, others have
different values. If a file is created using one character set and then displayed using another
character set, mapping problems will occur that lead to an illegible text at best. Taking the text ABÅß
from Latin to Unicode without any mapping will lead to the following output text AB•á that is clearly
different from the original input. Other characters such as the Japanese cannot even be represented
in Latin and would be lost when a Unicode file would be viewed with a Latin interpretation.
Informatica AddressDoctor’s transliteration stage offers functionality1 to address these issues. String
data is internally stored in the Unicode UCS 2 format. Strings can be assigned in any of the more
than 30 supported character sets (and possibly more). If data is retrieved in another character set, a
mapping takes place to ensure that the characters are properly represented in the other character
set. Provided that each character has a representation in the other character set, no information is
lost. Characters that have no representation in a particular character set (such as さ in Latin) will be
mapped to a space.
5.2 Transliteration
Transcription and Transliteration are processes of changing one character of one character set into
other characters of another character set, such as converting from Greek to Latin, or Japanese
Katakana to Latin.
A transliteration uses invertible mapping, so that a transliteration can be reversed without
information loss. In contrast, a transcription aims to provide non-native speakers with an
approximate pronunciation of a word, based on the pronunciation rules of their own language. In
practice, transliterations are consistent with transcriptions for many character sets, while no real
(i.e. invertible) transliterations exist for most syllable or ideographic languages. Thus for the rest of
this document, transliteration is used to denote both, transliteration or transcription.
Transliteration surpasses mere character set mapping, which is limited to the mapping between
different numerical representations of a character (see the example in chapter 5.1). A language such
as Japanese with the Katakana, Hiragana and Kanji characters has no direct representation in the
English language. However, each Japanese character has a certain associated sound that can be
approximated using phonetic Latin characters.
Numerous transliteration schemes have been introduced for different languages. The following
examples show how transliteration works for different languages:
Ä → AE (Latin → Latin)
ĝ → g (Latin → Latin)
1
Note for users of the previous version: By virtue of the AD_UniString object. (See chapter 4.2.14.)
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
40
个→ ka (Japanese → Latin)
Ж → ZH (Cyrillic → Latin)
We can see that even within the Latin alphabet transliteration can be useful when certain extended
characters cannot be represented in the target character set.
Most languages use only a subset of the sounds a normal human could produce and of course these
subsets differ from language to language. If a sound used by one language cannot be represented
correctly in a different script, it must be approximated: This approximation may be quite inadequate
if the sounds used in the target language for transliteration differ significantly from the sounds in the
original language.
This problem is especially relevant when transliterating languages with very few syllables, such as
Japanese (much less so for Chinese). Here are some examples of circular transliteration (i.e. English
to kana to English) leading to dramatic changes:
Original:
Philippines
Japanese:
フィリピン
Transliterated: Firipin
Original:
Düsseldorf
Japanese:
ヂュッセルドルフ
Transliterated: Dyusserudorufu
Original:
Beethoven
Japanese:
ベートーベン
Transliterated: Betoben
These transliterated words provide challenges when working with transliterated place names for
non-Asian countries that were previously represented in an Asian language. Examples using
character set mapping and transliteration may be found in chapter 6.15.
One known limitation with transliteration of Japan addresses from Kanji script is that certain
characters when they are part of the first name of the contact are incorrectly transliterated into
Arabic numerals instead of the corresponding Latin alphabets. The following table shows those Kanji
numerals that could cause this issue, their Arabic equivalents, and the preferred Latin
transliterations.
Kanji
Numeral
Arabic
Equivalent
Latin
Transliteration
Kanji Numeral
Arabic
Equivalent
Latin
Transliteration
一
1
ichi
六
6
roku
二
2
ni
七
7
nana
三
3
san
八
8
hachi
四
4
yon
九
9
kyū
五
5
go
十
10
jū
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
41
5.2.1 Cyrillic Transliteration
Informatica AddressDoctor supports Cyrillic transliteration for the following countries:

Belarus

Kazakhstan

Russia

Macedonia

Ukraine
5.3 Address Element Abstraction
Addresses have developed differently in different countries and cultures. Informatica AddressDoctor
uses mapping and a hierarchical approach to describe the various address elements. At the
foundation of the address “pyramid” is a country. Currently there are 191 United Nations member
countries as well as several dependent and independent territories around the world. Informatica
AddressDoctor covers a total of over 240 countries and territories in its postal reference databases.
A country is often subdivided into provinces or regions. These regions are not always required in
postal addresses, however. Cities or localities belong to a region and buildings are typically assigned
to streets. The building themselves can often be subdivided, which is done through the concept of a
sub-building. An example for a sub-building would be a floor or suite number. Organizations then
reside in sub-buildings and are subdivided into departments that in turn employ people (known as
contacts). The following figure visualizes this abstraction model graphically:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
42
Note that not all elements are present (or required) in all cases. Non-business addresses for instance
will lack the Organization and Department elements. Also, different names may be in use for similar
elements: As an example, the territorial subdivision in the USA is called a state. Canada calls it a
province, while Switzerland has given the name Canton to this subdivision. Informatica
AddressDoctor has defined general terms that allow mapping these concepts globally to the
standardized “Item” fields of the AddressObject (see chapters 5.7 and 5.9).
As an example, the AddressObject contains an attribute with the name “Province”. Depending on
the country, this field may either contain the state (USA), the county (UK), the province (Canada),
the prefecture (Japan), the Canton (Switzerland), the Bundesland (Austria or Germany) and so on.
The following figure illustrates this mapping:
Province
County
(e.g. UK)
State
(e.g. USA)
Province
(e.g. Canada)
Prefecture
(Japan)
Kanton
(Switzerland)
Another example shows the Japan address system, which is divided into several address levels from
the biggest entities down to blocks and buildings. Informatica AddressDoctor can validate from
postal code, province down to street level which are parts of reference data. House number level is
not included in the reference data and only copied from input.
5.4 Address Parsing
Addresses stored in computer systems were often entered by humans. Frequently, people entering
data do not understand the nature of the information that they enter. Quite often the fields for
storing the data are not sufficient, because they are either not long enough or because there are too
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
43
few fields to store addresses from countries all over the world. See section 9.4 on page 159 for a
recommended database layout to store addresses from all around the world.
Computer programs that validate postal addresses often rely on the information provided by the
field names to identify address elements. While this is sufficient in a few cases, most of the time this
information can be misleading because information was entered in the wrong fields. It is frequent
that consignee information such as names are placed in address fields that are designed to store
street information. Postal codes are input in city fields and building numbers are placed in wrong
locations. These are just some of the easier challenges found in international addresses where
incorrectly fielded data is omnipresent.
Analyzing address elements and assigning them to the proper fields is one of the most difficult
challenges of handling postal addresses. Informatica AddressDoctor implements a parsing engine
that is independent of postal reference data. The parsing engine parses Japanese Kanji addresses
natively.
As a consequence, the parser as implemented in Informatica AddressDoctor can be used without
any postal reference data present and is especially suited for OEM integration scenarios where the
reference data can be added as needed.
The parsing functionality is implemented by the PARSE process mode (and is implicitly included with
the other process modes described later in chapter 5.11). It can either work on fielded data that is
retrieved from address element fields or from totally unfielded data as it can be found in databases
that have just a line by line layout for address data. While structuring an unfielded address seems
more difficult at first, handling the potentially conflicting information in a seemingly fielded address
can be an even more difficult challenge. Here the name of the field might indicate that a street
should be expected but the software has to decide that this “hint” is possibly nonsense and be bold
enough to decide to ignore this information.
Depending on the “OptimizationLevel” attribute set in Parameters.xml (see the DTD in chapter 10.1),
Informatica AddressDoctor parser behaves differently in respect to fielded input element
assignment (see chapter 5.33 for more detail):
NARROW: The parser will honor input assignment strictly, with the exception of separation of
House Number from Street information.
STANDARD: The parser will separate address element more actively, for example:
o
Province will be separated from Locality information
o
PostalCode will be separated from Locality information
o
House Number will be separated from Street information
o
SubBuilding will be separated from Street information
o
DeliveryService will be separated from Street information
o
SubBuilding will be separated from Building information
o
Locality will be separated from PostalCode information
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
44
WIDE: Parser separation will happen similarly to STANDARD, but additionally up to 10 parsing
candidates will be passed to validation for processing. Validation will widen its search tree
and take additional reference data entries into account for matching.
It is very important to note that, apart from the case of NARROW, such mixed input will naturally
result in information to be separated out into different AddressElement Items by the parser. For
example, while Street and SubBuilding were jointly assigned as Street on input, that SubBuilding will
be located in a SubBuilding Item on output (and not in Street) – provided it could be identified as
such.
5.5 Address Validation
When validating an address, each component is compared against a postal reference data set that is
stored in Informatica AddressDoctor reference databases (see section 9.1). All elements of an
address may be correct, thus resulting in positive validation. On the other hand, each individual
element may match against the reference data, but the components do not make sense when
looked at in their combination.
Let us regard an example:
City: Wilmington
ZIP:
90210
State: CA
In this example each component is correct by itself. However, the components do not match, as the
ZIP code does not belong to Wilmington and Wilmington is not in the state of California. Whenever it
is possible, Informatica AddressDoctor will attempt to correct such errors. To do this without
endangering potentially correct data elements and creating “false positives”, great care is taken and
very sophisticated algorithms are used to analyze and potentially correct the data.
The algorithms used by Informatica AddressDoctor include fuzzy matching and heuristics to predict
the best possible correction for an address. It is always Informatica AddressDoctor’s intention to
correct or improve an address if at all possible. Here it differs from most postal certification schemes
such as the Coding Accuracy Support System (CASS) as introduced by the US Postal Service. These
certification schemes intend to prevent poorly addressed mail from entering the postal mail stream,
thus easing the work of the postal organization. Informatica AddressDoctor’s intent, however, is to
improve as many addresses as possible (see chapter 5.33 for more detail on “OptimizationLevel”).
5.6 Informatica AddressDoctor
Informatica AddressDoctor processes input addresses (AD_Process()) contained in the
AddressObjects. As there is only one engine per process, there is no Informatica AddressDoctor
handle.
The engine needs to be initialized by a specific sequence:

AD_Initialize() must be called to actually initialize the engine. It evaluates the settings and
configures the engine accordingly. Only after this function has returned successfully
AD_GetAddressObject() or any other functions may be called.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
45

AD_DeInitialize() must be called last to de-initialize the engine; the engine is then ready to
be initialized again; all AddressObjects must have been released by calling
AD_ReleaseAddressObject() before calling AD_DeInitialize().
The engine stores the following data for its internal use (see SetConfig.dtd):

General engine configuration, i.e., the maximum amount of memory the engine may request
from the OS

The access codes; at least one valid access code must be supplied when calling AD_Initialize()

Optional preloading parameters for the databases
In addition, the engine stores the following parameter data as a default for the AddressObjects (see
Parameters.dtd in chapter 10.1):

Process parameters, i.e., the processing mode to be used

Input parameters, i.e., which input encoding is to be used

Format specifications for the result, i.e., Casing specifications
This configuration data has default values as specified by the corresponding SetConfig.dtd; they can
be changed by passing a corresponding config XML as parameter to AD_Initialize():
There is no way to change the configuration after AD_Initialize() has been called, this parameter
configuration data is used by the AddressObjects by default, when no alternative setting is made for
a specific AddressObject.
5.7 AddressObjects
The AddressObject serves as a container object for a postal address. It has several properties that
can store individual components of an address such as postal (ZIP) code, street name and building
number, but also company or contact names.
AddressObjects store the following (configuration) data:

Parameter settings (see Parameters.dtd)

An input address (see InputData.dtd)

A result (see Result.dtd)

The last return code
AddressObjects should not be created and destroyed frequently, but rather be reused for
performance reasons. Specifically, the parameter settings should be reused to avoid repeated
settings overhead.
Informatica AddressDoctor manages all AddressObjects. Only a specific number of AddressObjects
can be created. This number can be set in the initialization phase of the engine (using the
“MaxAddressObjectCount” attribute). Recommended setting would be between once and twice the
number of threads, which must be set using the “MaxThreadCount” attribute and is currently limited
to a practical maximum value of 32 threads (see chapter 5.36).
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
46
For setting the address elements in an AddressObject (as described in chapter 5.3, examples are
Organization, Department, Street, Province, PostalCode and Locality), see chapter 6.7. For retrieving
them, see chapter 6.11. Retrieving address elements individually is especially useful when the data
to be processed originates from a database that has individual fields for address elements.
Alternatively, the AddressObject may be assigned unfielded FormattedAddressLine data (see chapter
6.7.4), where the address representation is only structured by delimiters such as linefeeds. The
FormattedAddressLine representation is also helpful when retrieving processed address data: It will
return the processed data according to the country specific formatting rules. When assigning
AddressObject values, either the address elements or the FormattedAddressLine representation
should be used, while both representations are provided on output (see the Result.xml example at
the end of chapter 3).
5.8 Input and Output Encoding
The XML-encoding is passed within the XML header <?xml ?>; if none is explicitly set, UTF-8 or UTF16 is the default (as defined by the XML standard and depending on the bit width chosen as
described below). The different encodings for XML input and output may be specified via attributes
(see chapter 6.3 and SetConfig.dtd/Parameters.dtd in chapter 10.1). For the direct API, the engine
default encoding is ISO-8859-1.
There may be 8 and 16 bit input and result data; to deal with both character sizes, there are two sets
of functions for any Set…() or Get…() functionality:

The 8 bit versions have no special naming (i.e. AD_SetInputDataElement() or
AD_GetResultDataParameter())

The 16 bit versions end in W (for word, i.e. AD_SetInputDataElementW() or
AD_GetResultDataParameterW())
When using the 16 bit API functions it is crucial to have set a corresponding 16 bit encoding,
otherwise an encoding error code is returned. The 16-bit input functions Set...W() also support an
additional parameter for the string length, thereby making it possible to pass non-zero-terminated
strings. To enable passing zero-terminated strings of unknown length, the special value
AD_AUTOLEN can be passed as string length; the engine then automatically determines the length.
The currently active encoding (see Parameters.dtd) must match the used function: When a 16 bit
function is called, the encoding must also be 16 bit (i.e. UTF-16), consequently.
This is specifically the case for the Java API (see the example in chapter 4.1), which does support 16
bit input and output only, in line with internal Java string handling.
5.9 AddressElement Items and AddressLines
Many of the direct API functions have an item or line parameter. The same applies to the XML API,
where XML element attributes are used for that purpose. These parameter numbers refer to the
index or hierarchical level of an address element or line: Items and lines are counted from 1 on, the
default for XML is 1 (see the DTDs in Section 10.1).
To set a street and a dependent street using the direct API in C, the “Item” parameter has to be 1 for
the first and 2 for the second:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
47
AD_SetInputAddressElement( hAOHandle, "Street", 1, NULL, "Main St 5" );
AD_SetInputAddressElement( hAOHandle, "Street", 2, NULL, "Dependent St 8" );
For example, to set 3 formatted address lines, the “Line” parameter has to be set from 1 to 3:
AD_SetInputAddressLine( hAOHandle, "FormattedAddressLine", 1, "AddressDoctor GmbH" );
AD_SetInputAddressLine( hAOHandle, "FormattedAddressLine", 2, "Roentgenstr. 9" );
AD_SetInputAddressLine( hAOHandle, "FormattedAddressLine", 3, "D-67133 Maxdorf" );
Similarly, setting street and dependent street in Java:
m_oAO.setInputAddressElement("Street", 1, "COMPLETE", "Main St 5");
m_oAO.setInputAddressElement("Street", 2, "COMPLETE", "Dependent St 8");
And setting 3 formatted address lines in Java:
m_oAO.setInputAddressLine("FormattedAddressLine", 1, "AddressDoctor GmbH");
m_oAO.setInputAddressLine("FormattedAddressLine", 2, "Roentgenstr. 9");
m_oAO.setInputAddressLine("FormattedAddressLine", 3, "D-67133 Maxdorf");
Refer to chapter 6.7 for understanding the valid combinations of AddressElement Items and
AddressLines for address data input.
In the XML API case, an example for InputData.xml with two items assigned for sub-elements of the
PostalCode (known as ZIP+4, see chapter 6.24) would be:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE InputData SYSTEM "InputData.dtd">
<InputData>
<AddressElements>
<Country Item="1" Type="NAME">USA</Country>
<Locality Item="1" Type="COMPLETE">Raleigh</Locality>
<PostalCode Item="1" Type="UNFORMATTED">27601</PostalCode>
<PostalCode Item="2" Type="UNFORMATTED">1356</PostalCode>
<Province Item="1" Type="COUNTRY_STANDARD">NC</Province>
<Street Item="1" Type="COMPLETE">Fayetteville Street</Street>
<Number Item="1" Type="COMPLETE">133</Number>
<SubBuilding Item="1" Type="COMPLETE">Suite 201</SubBuilding>
<Organization Item="1" Type="COMPLETE">AddressDoctor</Organization>
</AddressElements>
</InputData>
Check chapter 6.7 for details on how to process XML input using AD_SetInputDataXML(). In this XML
example all “Type” attributes (see chapter 5.10) correspond to their default values, so omitting them
would yield the same processing results.
The following table gives examples of what certain AddressElement items would typically contain:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
48
Legend:
Typically needed for correct global address representation
Typically not populated for a correct address*
Not available through the Informatica AddressDoctor 5 API
Will be needed for future country support
Not address relevant (will be copied over to output)
*but may contain certain input elements copied over to output
The sequence of AddressElement Items is hierarchical, as defined by reference data. Depending on
reference data detail, some empty Items may thus be followed by ones filled again. Consequently,
the item hierarchy on output is only really meaningful for AddressElements for which reference data
is available (typically the ones that are postally relevant, like locality). Make sure to check the
“ElementResultsStatus” described in chapter 5.27.3 and “ElementRelevance” in chapter 5.27.4 to
decide whether the hierarchical sequence of the output has been retained from input (parsing only,
see chapter 5.4) or was adjusted based on reference data (parsing and validation, see chapter 5.5).
For information on AddressElement Item output from different countries, see Appendix 10.4. For a
more complete introduction to international addresses and their address elements see the “The
Global Source Book for Name and Address Data Management” by Graham Rhind:
http://www.grcdi.nl/book2.htm
5.10
Address Item Types
Normally, each AddressElement Item number should only occur once, although some address
elements may contain several logically separate sub-elements called Item Types: For example items
of the type TITLE, FIRST_NAME, MIDDLE_NAME, LAST_NAME and FUNCTION may be assigned to the
sub-elements of each “Contact” address element at the same time, while there is little sense in
assigning both, the FORMATTED and UNFORMATTED type, for a “PostalCode” address element –
also see the InputData.xml examples in chapter 5.9 and chapter 6.7.2.
For the majority of address elements the default item type is COMPLETE, which may be conveniently
used whenever no separation of address elements into logically separate sub-elements is available
for the input data (see chapter 6.7.2 on fielded assignment of address elements and the
InputData.xml DTD in chapter 10.1 for valid item types): For instance you may assign “Paris Cedex
11” either in one piece, as Locality item 1, Type “COMPLETE” or separately, with “Paris” as Locality
Item 1, Type “NAME” and “Cedex 11” as Locality Item 1, Type “SORTING_CODE”.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
49
The following list describes the valid Item/Type input combinations when the respective default type
is used:
Key:
RECORD_ID and TRANSACTION_KEY may be set concurrently
Organization:
COMPLETE and DEPARTMENT may be set concurrently
Contact:
COMPLETE and FUNCTION and GENDER may be set concurrently (when
NAME is used instead, FIRST_NAME, MIDDLE_NAME and LAST_NAME
may not be set - see chapter 6.7.2 for more detail)
Province:
COUNTRY_STANDARD, ABBREVIATION and EXTENDED may be set
concurrently
PostalCode:
FORMATTED and UNFORMATTED may be set concurrently
Street:
COMPLETE and ADD_INFO* may be set concurrently
Locality:
COMPLETE only
Number:
COMPLETE and ADD_INFO* may be set concurrently
Building:
COMPLETE only
SubBuilding:
COMPLETE only
DeliveryService: COMPLETE and ADD_INFO* may be set concurrently
It is recommended to refrain from setting “Type” attributes explicitly on input, apart from the
examples given above and in chapter 6.7.2: Omitting them corresponds to their default values,
which yields decent processing results in most practical situations. Under special circumstances,
input item types might need to be adjusted under direction from Informatica AddressDoctor support
(see chapter 9.3). Note that the majority of types documented in the InputData.xml DTD (see
chapter 10.1) is only listed for reasons of symmetry with the Result.xml DTD and not really intended
for actual use on input.
For an overview and explanation of what item types are available on output, refer to the Result.xml
DTD (see chapter 10.1).
5.11
Process Modes
Informatica AddressDoctor supports several validation types. Most of them are country independent
and work for all supported countries. An exception is the CERTIFIED validation type that offers
country specific logic and does not work for all countries. Similarly, the single-line address validation,
which is available in the fast completion mode, is currently available only for select countries. Each
validation type is designed for a specific task.
The validation process modes are:

Correction Only (BATCH)
*
While ADD_INFO could in principle be used to provide additional information on AddressElement input that is supposed to be passed
through validation without change, it is really intended to be filled on output, containing portions (provided such a split-off could be
determined) of not postally relevant AddressElement input that could not be validated against reference data.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
50

Suggestions (INTERACTIVE)

Fast Completion (FASTCOMPLETION)

Certified (CERTIFIED)

Address Code Lookup (ADDRESSCODELOOKUP)
The process mode is a parameter to the AD_Process() function of Informatica AddressDoctor. When
calling that function to process an AddressObject, the validation type that was supplied (see chapter
6.3) via AD_SetParametersXML() before the call determines the processing that takes place. Each
AddressObject and thus each individual call of the AD_Process() function may use a different
validation type.
Additionally, two more process modes bypassing validation are available for special pre-processing
purposes:

Obtain separate tokens (possibly including transliteration) from parsed input data without
corrections (PARSE)

Identification of records missing country information, including correction where possible
(COUNTRYRECOGNITION)
See the figure in Appendix 10.3 for more details on the Informatica AddressDoctor processing flow.
Note that process modes might fall back to others as described below, so it is recommended practice
to check that the process mode used was the one intended, both, by interpreting the process status
value (see chapter 5.17) and checking directly (see chapter 6.10).
5.11.1 Batch
The Correction Only (also known as ”BATCH”) type is intended to be used in batch processing
environments when no human input or selection is possible. It is optimized for speed and will
terminate its attempts to correct an address when ambiguous data is encountered that cannot be
corrected automatically. The Batch processing mode will fall back to Parse Only (see chapter 5.11.6),
when the respective database is missing for a specific country.
5.11.2 Interactive
When working in interactive environments, it is often useful to generate suggestions when an
address input is ambiguous. This can be achieved by using a suggestions validation type, that is
known as “INTERACTIVE”. This validation type is especially useful in Web based data entry
environments when capturing data from customers or prospects. It requires the input of an almost
complete address and will attempt to validate or correct the data provided. If ambiguities are
detected, this validation type will generate up to 100 suggestions that can be used for pick lists (the
maximum number of suggestions can be controlled by the MaxResultCount parameter in the
SetConfig.xml and Parameters.xml). The Interactive processing mode will fall back to Parse Only (see
chapter 5.11.6), when the respective database is missing for a specific country.
5.11.3 Fast Completion
The Fast Completion validation type is used in quick address entry applications. It allows input of
truncated data in several address fields and will generate suggestions for this input. Due to its fast
response time, the engine can also be used to create suggestions while users type. The Fast
Completion type is best suited when users are aware that they can purposely truncate input data.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
51
However, the “FASTCOMPLETION” validation type does not support extended parsing and thus can
only be used when assigning fielded input data (see chapter 6.7) using the “AddressElement” Items
(see chapter 5.9) – as opposed to AddressLine input using “FormattedAddressLine” or
“DeliveryAddressLine”. The Fast Completion processing mode will fall back to Parse Only (see
chapter 5.11.6), when the respective database is missing for a specific country.
Effective in version 5.3.0 of Informatica AddressDoctor, the upper limit of the suggestion list has
been increased from twenty to one hundred results. You can specify the upper limit of the results
returned in the MaxResultCount parameter in SetConfig.xml as well as in Parameters.xml. The
default value is set to “20” and can be overwritten by the user to a hundred for example. It should
be noted that the maximum value for this parameter is one hundred, and the minimum value is one.
Specifying a value greater than the maximum will result in an error. For example, specifying a greater
value in Parameters.xml than in the SetConfig.xml will result in a reduction of the value in
Parameters.xml to the value in SetConfig.xml, this will be reported back by the warning
AD_SC_WRN_MAXRESULTCOUNT_REDUCED.
5.11.4 Single-Line Address Validation
The single-line address validation feature is a new addition to the Fast Completion mode starting
with Version 5.5.0. You can use single-line address validation to validate addresses entered into the
AddressComplete element as a single line and receive suggestions to complete the address.
Informatica AddressDoctor Version 5.5.0 supports single-line address validation for the following
countries:

Australia

Canada

Germany

Great Britain

New Zealand

United States
To activate single-line address validation, you need a separate unlock code of type
SINGLE_LINE_VALIDATION. Contact your sales representative for more information about obtaining
the unlock code.
Informatica AddressDoctor identifies address elements in a single-line address input based on their
position in the sequence the elements are entered. So, it is imperative that you follow the order
shown in Table 1 when you enter single-line addresses. When you enter an address in single line,
ensure that you do not mix Delivery Address Line (DAL) elements and Country-Specific Location Line
(CSLLN) elements.
Table 1 Country-Specific Order of Address Elements
Country
Order of Address Elements
Australia
Sub-building, House Number, Street, Main Locality, Province, Postal Code
Canada
Sub-building, House Number, Street, Delivery Service, Main Locality,
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
52
Country
Order of Address Elements
Province, Postal Code
Germany
Street, House Number, Postal Code, Locality, Province
Great Britain
Sub-building, House Number, Street, Main Locality, Sub-Locality, Postal
Code
New Zealand
Sub-building, House Number, Street, Delivery Service, Locality, Postal Code
United States
Sub-building, House Number, Street, Locality, Province, Postal Code
If you did not enable single-line address validation, Informatica AddressDoctor returns N7, Feature
Not Unlocked, process status message. If the input maps to a country that is not supported for
single-line address validation, Informatica AddressDoctor returns the process status code N6
denoting that single-line address validation is not supported for the specified country.
As you can see from Table 1, the typical sequence of address elements is from the specific to the
generic. You must enter the elements in the specified sequence even if you leave out some of the
elements from the input. However, for optimum results, we recommend that you provide as many
details as possible in the input. Even though delimiters are not mandatory in a single-line address
input, a comma or semicolon in the input is considered as an element separator and might fetch
better suggestions. Note that Informatica AddressDoctor currently does not support country,
organization, building, or contact information in the single-line address input.
If the single-line address input contains only a numeric input, Informatica AddressDoctor considers it
as the Postal Code and returns suggestions accordingly. For countries where the house number
appears on the left side of the street name or locality, if the single-line address input begins with a
number that is followed by a string, Informatica AddressDoctor considers the number as a house
number and the following string as the street name or locality. If no match is found for this
combination, Informatica AddressDoctor attempts to interpret the input as street name without
house number or as a combination of postal code and locality.
When there is no perfect match for an input, Informatica AddressDoctor returns multiple
suggestions to help you choose the most appropriate result. The maximum number of suggestions
that Informatica AddressDoctor returns is decided by the value configured for MaxResultCount in
parameters.xml and setconfig.xml files.
5.11.5 Certified
A number of countries have special requirements for the processing of addresses from their
countries. An example of such a special processing requirement is the CASS certification of the
United States Postal Service (USPS). In order to process addresses compliant with a certification
scheme, the validation type “CERTIFIED” is available. We now support all certifications required by
major postal administrations of the world. Therefore, Informatica AddressDoctor supports the
following certifications; CASS certification for the USA, SERP certification for CAN, AMAS certification
for AUS, SNA certification for FRA and SendRight certification for NZL. The Certified processing mode
will fall back to Batch if it is not supported for a specific country. Note that extended parsing of
unfielded data is not supported in the “CERTIFIED” processing mode (see chapters 5.5 and 6.24 for
the differences between certified and normal processing).
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
53
5.11.6 Address Code Lookup
In Version 5.4.1 and later, Informatica AddressDoctor offers Address Code Lookup. Address Code
Lookup is a new process mode that has its own unlock code. With Address Code Lookup, you can
enter a country specific address code and retrieve the complete or partial address for the code.
Address Code Lookup is currently supported for the following countries:

Germany

Great Britain

Japan

South Africa

Serbia
For example, the Choumei Aza code is an eleven-digit code that defines a unique delivery point for a
Japan addresses. You can now search on the Choumei Aza code to find the associated address. You
can also use a combination of the Choumei Aza code and the Gaiku code, which is a four-digit code
that identifies a city block in Japan, to find more precise results.
To use address code lookup, you must download the Address Code Lookup database, <XYZ>5AC.MD,
and specify the value ADDRESS_CODE_LOOKUP for the Type attribute of the Database parameter in
the SetConfig.xml file. In addition, you must specify the value ADDRESS_CODE_LOOKUP for the
UnlockCode attribute Type in the GetConfig.xml file to indicate that the Address Code Lookup
database should be unlocked.
The following table describes the values that you can specify for the Type attribute of the
AddressCode parameter in the InputData.xml file.
Address Code Type
Country
Description
DEU_AGS
Germany The Amtliche Gemeindeschlüssel (AGS) is a variable length
code that uniquely identifies a locality in Germany. There
may be more than one locality for a given AGS code.
For example, a DEU_AGS code with a value of 07338018
returns the following output:
Locality: Maxdorf
Province: Rheinland-Pfalz
DEU_LOCALITY_ID
Germany The Locality ID is a variable length code that uniquely
identifies a German locality.
For example, a DEU_LocalityID code with a value of
68015519 returns the following output:
Locality: Maxdorf
Province: Rheinland-Pfalz
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
54
Address Code Type
Country
Description
DEU_STREET_ID
Germany The Street ID is a variable length code that uniquely
identifies a German street address.
For example, a DEU_StreetID code with a value of
100560690 returns the following output:
Röntgenstr.
67133 Maxdorf
Germany
GBR_UDPRN
Great
Britain
The Unique Delivery Point Reference Number (UDPRN) is
an eight character code that uniquely identifies each postal
address of the Royal Mail PAF database.
For example, a GBR_UDPRN code with a value of 15511432
returns the following output:
Flat 16
Haden Court
Lennox Road
London
N4 3HS
United Kingdom
JPN_CHOUMEI_AZA_CODE
Japan
The Choumei Aza code is an eleven-digit code that defines
a unique delivery point for Japan addresses.
For example, a JPN_CHOUMEI_AZA_CODE of 28201160001
returns the following output:
〒670-0081 兵庫県姫路市田寺東1丁目
Or:
01 Chome
Taderahiga-shi
Himeji-shi Hyogo-ken 670-0081
Japan
SRB_PAK
Serbia
The Postal Address Code (PAK) is a six digit code that
defines a unique Serbian address to the street level.
For example, a SRB_PAK code with a value of 251133
returns the following output:
Majora Ilica 1
14000 Valjevo
Serbia
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
55
Address Code Type
Country
Description
ZAF_NADID
South
Africa
The National Address Database (NAD) ID is a unique
numeric ID assigned to each South African street address.
For example, a ZAF_NADID code with a value of 2170232
returns the following output:
4 Balmoral Road
Vincent
East London
5247
South Africa
5.11.7 Parse Only
For separating address input into tokens for subsequent processing in other systems, bypassing
Informatica AddressDoctor validation, the special process mode “PARSE” can be used (see chapter
5.4 for a general introduction on parsing and chapter 6.9 for an example of address parsing).
A typical use case scenario for this mode might be that address data of already high quality simply
needs to be tokenized quickly for export to an external system, possibly including transliteration (see
the “PreferredScript” parameter in chapter 5.12.1), formatting (see chapter 5.13) and
standardization (see chapter 5.14) of the output.
5.11.8 Country Recognition
Sometimes input data lacks country information, which is crucial for successful Informatica
AddressDoctor processing. To identify such problematic records quickly, without having to run the
data set through full validation, a special process mode “COUNTRYRECOGNITION” is provided. This
functionality is the first step of the Informatica AddressDoctor processing flow and thus part of all
process modes (see Appendix 10.3).
This process mode will also attempt to amend missing country information where possible, based on
characteristic information like major locality or territory names (see chapter 5.17 for the possible Rx
process status values). Note that such attempts at adding country information can only succeed
where the information identified is absolutely unambiguous: For example, there is a Berlin in
Germany as well as in numerous US states or South Africa, Columbia and El Salvador. In addition,
“MA” might refer to the ISO2 code for Morocco or the US state of Massachusetts.
5.12
Process Parameters
Numerous parameters pertaining to processing may be specified through Parameters.xml, see the
DTD definition in the Appendix (chapter 10.1). These parameters may usually be defined with a
global Informatica AddressDoctor scope or a per AddressObject scope, an example is given in
chapter 6.3.
5.12.1 The PreferredScript Parameter
The “PreferredScript” attribute of the “Result” element is used to specify in which alphabet the
output should be returned (see the Character Set Mapping and Transliteration chapters 5.1 and 5.2):
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
56
DATABASE
All Unicode characters (as per reference database standard)
POSTAL_ADMIN_PREF
All Unicode characters (as preferred by local postal administration)
POSTAL_ADMIN_ALT
All Unicode characters (local postal administration alternative)
ASCII_SIMPLIFIED
ASCII characters
ASCII_EXTENDED
ASCII characters with expansion of special characters
(for example: Ö = OE)
LATIN
Latin characters
LATIN_1
Latin I characters
LATIN_ALT
Latin characters (alternative transliteration)
PRESERVE_INPUT
Same characters as the input address (available only for Belarus,
China, Greece, Japan, Kazakhstan, Macedonia, Russia, and Ukraine )
The default setting for the “PreferredScript” attribute is “DATABASE”. The alphabet in which the
data is returned differs from country to country. For most countries the output will be Latin I or ASCII
regardless of the selected preferred language.
NOTE: If the input contains address elements that are not in the corresponding database,
Informatica AddressDoctor copies such elements to the output in the same script the
address was input irrespective of the value set for the PreferredScript parameter.
If the parameter for address is set to “PRESERVE_INPUT”, Informatica AddressDoctor preserves the
alphabet of the input address. If the input contains more than one script, Informatica AddressDoctor
overrides the PRESERVE_INPUT configuration and returns the address in the default script in the
reference database.
For example, if a Japan address contains fields with both Kanji and Latin characters and if the
PreferredScript parameter is set to PRESERVE_INPUT, Informatica AddressDoctor returns all address
fields using Kanji characters because that is the default script for Japan addresses in the reference
database.
For countries that use an alphabet other than Latin I, the returned alphabet differs from country to
country.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
57
The following table shows how the output is returned for specific countries:
Country
DATABASE
POSTAL_ADMIN_PREF
POSTAL_ADMIN_ALT
BLR
Cyrillic
Cyrillic
Cyrillic
KAZ
Cyrillic
Cyrillic
Cyrillic
KOR
LAT
MDA
ASCII
Latin-7
Latin-2
ASCII
Latin-7
Latin-2
ASCII
Latin-7
Latin-2
LATIN
Latin-2 (transliterated by
ISO standard)
Latin (Mandarin
transliteration)
ASCII
ASCII
Latin-1 (transliterated by
ISO standard)
ASCII
ASCII
ASCII
Latin-7
Latin-2 (transliterated by
ISO standard)
ASCII
ASCII
ASCII
CHN
Hanzi
Hanzi
Hanzi
CRI
CZE
Latin-1
Latin-2
Latin-1
Latin-2
Latin-1
Latin-2
GRC
Greek
Greek
Greek
HKG
HUN
ISR
JPN
ASCII
Latin-2
ASCII
Kanji
ASCII
Latin-2
ASCII
Kanji
ASCII
Latin-2
ASCII
Kana
MKD
Cyrillic
Cyrillic
Cyrillic
ASCII
POL
ROM
Latin-2
Latin-3
Latin-2
Latin-3
Latin-2
Latin-3
RUS
Cyrillic
Cyrillic
Cyrillic
SVK
TWN
Latin-2
ASCII
Latin-2
ASCII
Latin-2
ASCII
ASCII
ASCII
Latin-2 (transliterated by
ISO standard)
ASCII
ASCII
UKR
Cyrillic
Cyrillic
Cyrillic
ASCII
LATIN_ALT
ASCII (transliterated by
BGN standard)
Latin (Cantonese
transliteration)
Latin-1
Latin-2
ASCII (transliterated by
BGN standard)
ASCII
Latin-2
ASCII
Latin-7
ASCII (transliterated by
BGN standard)
ASCII
Latin-7
Latin-2
ASCII (transliterated by
Macedonian BGN
standard)
Latin-2
Latin-3
ASCII (transliterated by
BGN standard)
Latin-2
ASCII
ASCII (transliterated by
Ukrainian BGN standard)
LATIN_1
ASCII (transliterated
by BGN standard)
ASCII_SIMPLIFIED
ASCII_EXTENDED
Yes
Yes
Latin-1
Yes
Yes
Latin-1
Latin-1
Yes
Yes
Yes
Yes
ASCII
Yes
Yes
ASCII
Latin-1
ASCII
ASCII
ASCII (transliterated
by BGN standard)
ASCII
ASCII
Latin-1
ASCII (transliterated
by Macedonian BGN
standard)
Latin-1
Latin-1
ASCII (transliterated
by BGN standard)
Latin-1
ASCII
ASCII (transliterated
by Ukranian BGN
standard)
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Countries not listed in the table use the default output setting described previously. Examples using
different scripts can be found in chapter 6.15.
Here are some examples that show output based on the PRESERVE_INPUT setting.

Example 1 – Japan address in Kanji script:
<InputData>
<AddressElements>
<Country Item="1" Type="NAME">JAPAN</Country>
<Locality Item="1" Type="COMPLETE">オオサカシ</Locality>
<Locality Item="2" Type="COMPLETE">ミヤコジマク</Locality>
<Locality Item="3" Type="COMPLETE">ウチンダイチョウ</Locality>
<PostalCode Item="1" Type="UNFORMATTED">〒534-0013</PostalCode>
<Province Item="1" Type="COUNTRY_STANDARD">オオサカフ</Province>
<Street Item="1" Type="COMPLETE">02 チョウメ</Street>
</AddressElements>
</InputData>
With PreferredScript set to PRESERVE_INPUT, the output is in Kanji script:
<FormattedAddressLine Line="1">〒5340013オオサカフオオサカシミヤコジマ
クウチンダイチョウ2チョウメ</FormattedAddressLine>

Example 2 – Japan address in mixed input, Kanji and Latin:
<InputData>
<AddressElements>
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
58
<Country Item="1" Type="NAME">JAPAN</Country>
<Locality Item="1" Type="COMPLETE">ŌSAKA-SHI</Locality>
<Locality Item="2" Type="COMPLETE">MIYAKOJIMA-KU</Locality>
<Locality Item="3" Type="COMPLETE">UCHINDAI-CHŌ</Locality>
<PostalCode Item="1" Type="UNFORMATTED">534-0013</PostalCode>
<Province Item="1" Type="COUNTRY_STANDARD">ŌSAKA-FU</Province>
<Street Item="1" Type="COMPLETE">2 CHŌME</Street>
<Organization Item="1" Type="COMPLETE">オ</Organization>
</AddressElements>
</InputData>
The output in this case, where the input contains both Latin and Kanji scripts, is in Kanji
script , which is the default script of the Japan address database.
<FormattedAddressLine Line="1">〒534-0013 大阪府大阪市都島区内代町2丁目 オ
</FormattedAddressLine>

Example 3 – Russian address input in Latin script:
<InputData>
<AddressElements>
<Country Item="1" Type="NAME">RUSSIAN FEDERATION</Country>
<Locality Item="1" Type="COMPLETE">Majma</Locality>
<PostalCode Item="1" Type="UNFORMATTED">649100</PostalCode>
<Province Item="1" Type="COUNTRY_STANDARD">Altaj</Province>
<Street Item="1" Type="COMPLETE">ul. Celinnaâ</Street>
<Number Item="1" Type="COMPLETE">1</Number>
</AddressElements>
</InputData>
The output in this case is in Latin script:
<FormattedAddressLine Line="1">ul. Celinnaâ 1</FormattedAddressLine>
<FormattedAddressLine Line="2">Majma</FormattedAddressLine>
<FormattedAddressLine Line="3">Altaj</FormattedAddressLine>
<FormattedAddressLine Line="4">649100</FormattedAddressLine>
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
59

Example 4 – Russian address in mixed input that contains Cyrillic and Latin scripts:
<InputData>
<AddressElements>
<Country Item="1" Type="NAME">RUSSIAN FEDERATION</Country>
<Locality Item="1" Type="COMPLETE">Majma</Locality>
<PostalCode Item="1" Type="UNFORMATTED">649100</PostalCode>
<Province Item="1" Type="COUNTRY_STANDARD">Altaj</Province>
<Street Item="1" Type="COMPLETE">ul. Celinnaâ</Street>
<Number Item="1" Type="COMPLETE">1</Number>
<Organization Item="1" Type="COMPLETE">й</Organization>
</AddressElements>
</InputData>
The output in this case, where the input contains both Latin and Cyrillic scripts, contains only
Cyrillic because that is the default script of the Russian address database.
<FormattedAddressLine Line="1">Й</FormattedAddressLine>
<FormattedAddressLine Line="2">ул. Целинная 1</FormattedAddressLine>
<FormattedAddressLine Line="3">Майма</FormattedAddressLine>
<FormattedAddressLine Line="4">Алтай</FormattedAddressLine>
<FormattedAddressLine Line="5">649100</FormattedAddressLine>
5.12.2 The PreferredLanguage Parameter
The “PreferredLanguage” attribute of the “Result” element is used to specify the language in which
the output should be returned. The default setting for “PreferredLanguage” is “DATABASE”.
The alphabet in which the data is returned differs from country to country (see 5.12.1), but for most
countries the output will be Latin, regardless of the selected preferred language:
Value
Description
DATABASE
Language derived from reference data for each address.
ENGLISH
English locality and province name output, if available.
ALTERNATIVE_1,2,3 Alternative languages for multi-language countries. See the table
below.
If no alternative is provided as part of the postal reference data, this
setting will revert to the default, which is “DATABASE”.
PRESERVE_INPUT
Return output in the same language the input was provided in.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
60
Alternative Language Options
Customers can specify the language of the output address or preserve the language of the input
address for Belgium and Canada. For example, users can output a French-language address in
Flemish or preserve the input language.
The following table describes the PreferredLanguage values that you can define for Belgium and
Canada addresses:
Value
Language Output for Belgium
Language Output for Canada
ALTERNATIVE_1
Flemish
English
ALTERNATIVE_2
French
French
ALTERNATIVE_3
German
[no language]
5.12.1 Multi-Language Support for Belgium
Customers in Belgium can specify the language of the output or preserve the language of the input
address.
Belgium Address Example 1:
For the following French-language address, the user specifies “PRESERVE_INPUT” as the
PreferredLanguage value:
PreferredLanguage = PRESERVE_INPUT
Street/HNO = Rue Royale 4
Locality = Bruxelles
Therefore, the address output is in French. If the option is set to “Database” then the official
language from the reference database is used for the output.
Belgium Address Example 2:
For the following French-language address, the user specifies German as the output language:
PreferredLanguage = ALTERNATIVE_3
Street/HNO = Rue Royale 4
Locality = Bruxelles
Street element: this is the French and the database language, “Koningsstraat” is the Flemish value of
this particular street. However, the input street is not available in German in the database,
therefore, “Rue Royale” is also the German value of the street element, that is, the database value.
Locality element: The locality for this address is available in French, Flemish and German. French is
also the default value.
Therefore the resulting address will be:
Street/HNO = Rue Royale 4
Locality = Brüssel (German)
Street defaults to Database because the PreferredLanguage = ALTERNATIVE_3 is not available for
this record.
The resulting formatted address is:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
61
Rue Royale 4
1000 Brüssel
Note that if the preferred language is not available for the element in the reference database then
the resulting language will default to “DATABASE”. Therefore your final address may appear in
multiple languages (i.e. Street in French and Locality in German). This issue will be addressed in a
future release by providing additional information about the language via the new status codes.
5.12.2 Multi-Language Support for Canada
Customers in Canada can specify the language of the output or preserve the language of the input
address. This implies that customers can output an English address in French in Québec for example.
Note that only street descriptors and provinces are available in multiple languages.
Translations for Street Descriptors and Types
The following are the only translations of street types recognized by Canada Post:
Descriptor
English Symbol
French Symbol
STREET
ST
RUE
AVENUE
AVE
AV
BOULEVARD
BLVD
BOUL
The PreferredLanguage parameter is used to output the address in one of the two languages
supported in Canada:

DATABASE: The official language of the region in Canada, which is English for all provinces
except Québec. DATABASE is the default option.

ALTERNATIVE_1: English

ALTERNATIVE_2: French
Customers may use the “PRESERVE_INPUT” parameter to preserve the language of the input
address.
Canada Address Example:
PreferredLanguage = Alternative_1 (English)
Input:
Output:
615 Av Monique
615 Monique Ave
Québec QC G1B 2A8
Québec QC G1B 2A8
Canada
Canada
5.12.3 The ForceCountryISO3 and DefaultCountryISO3 Parameters
The “ForceCountryISO3” and “DefaultCountryISO3” attributes of the “Input” element allow a certain
degree of influence on country recognition. While “ForceCountryISO3” will cause address records to
be always treated as originating from the country set here (thus overriding any explicitly assigned
country element), “DefaultCountryISO3” will only apply to records lacking such explicit country
information.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
62
5.12.4 The CountryType and CountryofOriginISO3 Parameters
The “CountryofOriginISO3” and “CountryType” attributes of the “Result” element are used to
control country information output. While “CountryOfOriginISO3” will cause country information
output to be suppressed for address records originating from the country set here, “CountryType”
will determine in which format country information will be output.
Some possible values for “CountryType” are (see the DTD in chapter 10.1 for a complete list, the
default is “NAME_EN”):
ABBREVIATION
ISO_2
ISO_3
ISO_NUMBER
NAME_CN
NAME_DA
NAME_DE
NAME_EN
NAME_ES
NAME_FI
NAME_FR
NAME_GR
NAME_HU
NAME_IT
NAME_JP
NAME_KR
NAME_NL
NAME_PL
NAME_PT
NAME_RU
NAME_SA
NAME_SE
5.12.5 The MatchingAlternatives and MatchingScope Parameters
The “MatchingAlternatives” and “MatchingScope” attributes of the “Process” element are used to
influence the matching of address elements during validation.
While “MatchingAlternatives” allows suppressing the use of historical and synonym (or, more
precisely, exonym - see http://wikipedia.org/wiki/Exonym) data for matching address elements
(NONE, SYNONYM_ONLY, ARCHIVE_ONLY, with a default of ALL), setting “MatchingScope” other
than the default “ALL” will reduce the granularity of address elements (see chapter 5.3) for which
matching must succeed, i.e. “LOCALITY_LEVEL” will only consider matches on province, locality and
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
63
postcode level, while “STREET_LEVEL” extends matching to streets and “DELIVERYPOINT_LEVEL”
finally adds house number matching.
Refer to the DTD in chapter 10.1 for a complete list of valid attribute values, these attribute settings
may not have an effect for countries lacking the necessary level of detail in the postal reference
data.
5.12.6 The MatchingExtendedArchive Parameter
In version 5.4.1 and later, the MatchingExtendedArchive parameter can return the new address code
for deprecated or outdated addresses for Japan. If the input address is an outdated address, and the
new process parameter MatchingExtendedArchive = ON, Informatica AddressDoctor validates the
old address against the archived addresses in the reference database. If MatchingExtendedArchive =
OFF, the outdated input address is likely to be rejected, or to be corrected to some other address.
If the address is an outdated address, then Informatica AddressDoctor returns the address with the
following new Extended Element Result Status (EERS) code:
EERS=F (output address is outdated)
If the supplementary enrichment for Japan is activated, Informatica AddressDoctor returns the
validated outdated address with the old Choumei Aza code and the new Choumei Aza code as
enrichment values. The new Choumei Aza code can then be used as an input for the
ADDRESS_CODE_LOOKUP processing mode to retrieve the corresponding new address.
Note that both the JPN5E1.MD and JPN5AC.MD database files are needed with their respective
unlock codes in order to search for the new address using the new Choumei Aza code.
5.12.7 The StandardizeInvalidAddresses Parameter
Version 5.3.0 provides the ability to standardize address elements for invalid (Ix Process Status Code)
addresses. Standardized addresses can improve downstream business processes such as matching
and de-duplication.
Address elements that may be standardized are:

Street Types

Pre and Post Directional

Delivery Service Item

Sub-building descriptors

State/Province/Region; for example, California to CA
The standardization of invalid address elements can be controlled by setting the
StandardizeInvalidAddresses parameter of the Result element in the Parameters.xml to “ON”. The
default is “OFF”, ensuring compatibility with previous versions.
5.12.8 The DualAddressPriority Parameter
Starting in version 5.3.0 the users can specify which address type they would like to validate against.
For example, when a single address record contains both a PO BOX/Rural Route address and a Street
address, users can select the address they would like validated. Users can validate against the
following address types:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
64

POSTAL_ADMIN

DELIVERY_SERVICE

STREET
The handling of dual addresses can be controlled by setting the DualAddressPriority parameter of
the Result element in the Parameters.xml to one of the values above. The default is
“POSTAL_ADMIN”, ensuring compatibility with previous versions.
5.12.9 The RangesToExpand Parameter
Starting in version 5.3.0, the “RangesToExpand” parameter determines whether house number
ranges should be expanded for countries where individual house numbers exist. RangesToExpand
can have the following values:

NONE – do not extend ranges (default value)

ALL – The house number ranges will be expanded for all addresses where individual house
numbers exist

ONLY_WITH_VALID_ITEMS – This value will only expand those ranges where we are sure
that all expandable items exist in the reference data.
Example:
Option = ONLY_WITH_VALID_ITEMS
HNO range: 5-25
For countries such as the United Kingdom, where individual house numbers exist in the reference
database, the Engine will expand the house number range and list the individual house numbers in
the suggestion list. For countries, where we only receive house number ranges from the data
provider, the Engine cannot expand the range because the individual house numbers do not exist in
the reference database, and will only output house number ranges in the suggestion list.
To summarize, when “ONLY_WITH_VALID_ITEMS” option is active, the Engine will only expand
house number ranges if individual house numbers exist in the reference database, otherwise the
behavior will be similar to “NONE”.
The RangesToExpand parameter is used in conjunction with another parameter
“FlexibleRangeExpansion” in order to control range expansion and to give the optimum results.
FlexibleRangeExpansion contains the values ON and OFF. When set to “ON” (default), the Engine
limits the expansion of ranges in such a manner that those at the end of the result list are not
expanded. The Engine’s logic determines the number of results to expand and how many to keep as
ranges without exceeding the MaxResultCount limit. Therefore, a suggestion list could contain both
expanded and unexpanded ranges for house numbers and/or buildings, depending on the values
specified for MaxResultCount, RangesToExpand, and FlexibleRangeExpansion.
5.12.10 The GlobalPreferredDescriptor Parameter
You can configure Informatica AddressDoctor Version 5.6.0 to specify the output format for street,
building and sub-building element descriptors. To specify the output format for element descriptors,
configure one of the following values for the GlobalPreferredDescriptor parameter in
parameters.dtd :
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
65
DATABASE. Returns the element descriptor available in the reference address database. This
is the default value. If there is no matching entry in the database, Informatica AddressDoctor
copies the input to the output.
LONG. Returns the expanded form of the element descriptor.
SHORT. Returns the abbreviated form of the element descriptor.
PRESERVE_INPUT. Copies the element descriptor in the input to the output. If the input
element descriptor is not an official synonym, Informatica AddressDoctor returns the
corresponding value from the database in the output.
In Informatica AddressDoctor Version 5.6.0 the GlobalPreferredDescriptor parameter works only for
address element descriptors in Australia and New Zealand addresses and the Strasse element
descriptor in Germany addresses.
5.12.11 The EnrichmentGeoCoding Parameter
To enable the Geocoding Enrichment, set the EnrichmentGeoCoding parameter in the
Parameters.xml file to “ON”.
Starting in version 5.4.0, if you enable the Geocoding Enrichment, then you also must specify the
type of geocoding to use in the Process attribute EnrichmentGeoCodingType in the Parameters.xml
file. By default, the arrival point geocoding type is enabled.
In Informatica AddressDoctor version 5.6.0, you can include the rooftop geocoordinates in validated
address output for the United Kingdom. To include the rooftop geocoordinates for the U.K.
addresses, set EnrichmentGeoCodingType to ARRIVAL_POINT.
The following table describes the values that you can specify for EnrichmentGeoCodingType:
Value
Description
NONE
Uses the Standard Geocode database.
ARRIVAL_POINT
Uses the High Precision Arrival Point database. If the database finds the
arrival point geocoordinates, the geocode is returned with the EGC9
status code. If the arrival point geocoordinates do not exist or if the
Arrival Point database cannot be connected to, then Informatica
AddressDoctor uses the Standard Geocode database as a fallback to
interpolate the geocoordinates.
Effective in Version 5.6.0, Informatica AddressDoctor returns rooftop
geocoordinates for the United Kingdom addresses.
PARCEL_CENTROID
Uses the Parcel Centroid database. If the database finds the parcel
centroid geocoordinates, the geocode is returned with the EGCA status
code. If the parcel centroid geocoordinates do not exist, then
Informatica AddressDoctor returns the EGC0 (no geocode available)
status code. If the Parcel Centroid database cannot be connected to,
then Informatica AddressDoctor returns one of the error status codes
(EGCU, EGCN, or EGCC).
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
66
5.13
Output Formatting
The AddressComplete multi-line output format generated by Informatica AddressDoctor when
processing may be modified and adjusted from the standard behavior by setting parameters
provided in the “Result” element of Parameters.xml (see the DTD in Appendix 10.1) passed via
AD_SetParametersXML().
(See chapter 6.7.4 also. For a global reference of postal address formats, see
http://www.addressdoctor.com/en/countries_data/addressformats.asp)
The “Result” element allows formatting control over:

The type of format through “FormatType”

The delimiter used in output formatting through “FormatDelimiter”

Choosing whether the country is included through “FormatWithCountry”

Number of lines through “FormatMaxLines”
Valid settings for “FormatType” are (default is “ALL”): ALL, ADDRESS_ONLY, WITH_ORGANIZATION,
WITH_CONTACT, WITH_ORGANIZATION_CONTACT or WITH_ORGANIZATION_DEPARTMENT
“FormatDelimiter” is from a choice of (default is “CRLF”):
CRLF, LF, CR, SEMICOLON, COMMA, TAB, PIPE or SPACE.
“FormatMaxLines” determines the maximum number of overall address lines returned in a range of
1-19 (the default is 19) and “FormatWithCountry” may be switched “ON”, from the default “OFF”.
These formatting parameters are available both as attributes of the Input element (if the input is
provided in multi-line fashion), as well as attributes of the “Result” element that are applied unless
the “AddressComplete” attribute of the “Result” element is set to “OFF” (from the default “ON”).
5.14
Output Standardization
The output generated by Informatica AddressDoctor when processing addresses follows the rules of
the postal administrations and the Universal Postal Union (UPU).
It is possible to modify and adjust the standard output behavior by setting attributes provided in the
“Result” element of Parameters.xml (see the DTD in Appendix 10.1) passed via
AD_SetParametersXML().
The “Result” element allows standardization control over:

Element length (by means of abbreviation) through “GlobalMaxLength”

Casing through “GlobalCasing”

Abbreviation through “ElementAbbreviation”
While the “GlobalMaxLength” attribute determines the default maximum number of characters per
line for all address elements, “FormatMaxLines” (see the previous chapter 5.13) determines the
maximum number of overall address lines returned in the case of multi-line “AddressComplete”
output. Note that the default value for “GlobalMaxLength” is 1024.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
67
The “GlobalCasing” attribute can be used to influence the casing of the output. The five possible
options are native casing as per reference database standard (NATIVE), upper casing (UPPER), lower
casing (LOWER), mixed casing (MIXED) and unchanged (NOCHANGE).
While upper casing and lower casing will create an output independent of the country the data is in,
mixed casing will consider country specific rules, while the default native casing will be based on the
reference database content. Setting the casing to be unchanged will return the data the way it was
entered for the output of PARSE process mode, while validated results will be provided as found in
the reference data and according to postal rules. Result address elements that could not be checked
against the reference data will retain their input casing when NOCHANGE is set.
Additionally, standardization may also be defined per address element via the “MaxLength” and
“Casing” attributes of the “AddressElementStandardize” element, thus overriding the global settings:
“MaxLength” is then used to set the maximum characters per line for each address element. Setting
“MaxLength” to 0 will inherit the length configured globally.
Each address element has a sensible allowed minimum length. Valid minimum values for
“MaxLength” are as follows:
Address Element
Minimum
Length
Address Element
Minimum
Length
Organization:
25
Street:
20
Department:
25
Number:
5
Contact:
25
DeliveryService:
25
First Name:
20
Locality:
20
Middle Name:
20
PostalCode:
5
Last Name:
20
Province:
2
Title:
20
Country:
2
Function:
20
CountrySpecificLocalityLine: 25
Salutation:
20
DeliveryAddressLine:
25
Gender:
1
RecipientLine:
25
Building:
25
FormattedAddressLine:
25
Sub-Building:
25
AddressComplete:
25
Note that depending on the data and the country selected return values might still exceed the
selected minimum value. This happens if there is no useful way to abbreviate the values further.
An example would be to abbreviate the following postal code from United Kingdom “AB123AD” to 5
characters. The return value will still be contains 7 digits: “AB123AD”.
Release 5.2.7 introduces the new Parameter “ElementAbbreviation”. At the moment this parameter
will only influence data from the USA.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
68
In CERTIFIED mode, you can set the ElementAbbreviation parameter ON if you want Informatica
AddressDoctor to abbreviate street and locality names when a validated or corrected address has
more characters than the maximum character length (13 characters) defined by the USPS.
Note: You must have the relevant CASS databases set up for the ElementAbbreviation function to
work.
If you set the ElementAbbreviation parameter OFF, Informatica AddressDoctor returns the address
output based on the input, field length setting, and database entries.
In Version 5.4.1 and later, Informatica AddressDoctor abbreviates the locality and street information
in Batch and Interactive modes if the locality and street names exceed the maximum allowable
character length defined by the USPS. Informatica AddressDoctor uses the Alias database
(USAC12.MD) to ensure accuracy of the abbreviation.
In Version 5.2.9 and later, this parameter also has an impact on the output of German and Dutch
addresses.
If the parameter is set to ON, the output street name in German addresses will be abbreviated to 22
characters if the reference database includes the short name for the street. Similarly, for addresses
in the Netherlands, the output street name will be abbreviated to 24 characters if the reference
database includes the short name for the street.
This parameter also influences the output of CHOME addresses in Japan. Usually the word “CHOME”
will be output in the street field together with the number of the CHOME. If the parameter is
switched to “ON” the word CHOME will not be output. The CHOME number will be inside the
Number field.
5.15
Alternative Names and Aliases
Informatica AddressDoctor recognizes alias names for streets, localities and provinces around the
world. The setting “MatchingAlternatives” in the ParametersXML (see chapter 5.12.5) is used to
influence whether aliases should be used for matching or not. Per default, Informatica
AddressDoctor replaces any alias with the official or preferred name of the address item.
With Informatica AddressDoctor Version 5.2.7 this was partially changed for the locality field. A new
sub item has been introduced (PREFERRED_NAME). As a result the alias names will be retained in
the
locality NAME and COMPLETE fields in certain process modes (for example, CERTIFIED for Australia
or the USA). The locality PREFERRED_NAME field always contains the official or preferred name for
the locality.
In Version 5.2.9, a new option is available in Certified mode called AliasStreet with the option values
of “OFFICIAL” and “PRESERVE”. OFFICIAL will change the input alias street name to the USPS official
street name, and PRESERVE will retain the input alias street name, unless it is a corrected alias, in
which case it will be converted to the USPS official street name.
5.16
AliasStreet Option Examples
The following examples show the address outputs when the AliasStreet option is set to OFFICIAL and
PRESERVE.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
69
5.16.1 AliasStreet Option = OFFICIAL
Input
Output
407 W BRONSON HWY
407 W VINE ST
KISSIMMEE FL 34741
KISSIMMEE FL 34741-4154
USA
USA
Note: The output street name differs from the input street name because the database contains the
Alias record and the AliasStreet option is OFFICIAL.
5.16.1 AliasStreet Option = PRESERVE
Input
Output
407 W BRONSON HWY
407 W BRONSON HWY
KISSIMMEE FL 34741
KISSIMMEE FL 34741-4154
USA
USA
Note: The street name is unchanged because the AliasStreet option is PRESERVE.
Starting with Version 5.5.0, you can choose to retain locality aliases, also known as vanity names, in
the validated output. Informatica AddressDoctor, Version 5.5.0, also gives you more control over the
way street aliases are handled in the output. You can set AliasStreet and AliasLocality values in
parameters.xml to define the handling of aliases for streets and localities.
The following table shows the parameters and supported values.
PRESERVE
OFFICIAL (Default)
OFF
AliasStreet
Retains the alias
for the street in
the output.
Returns the street name –
the alias or the postal
name – as mandated by
the postal regulations of
the country.
Returns the postal name
for the street in
the output.
AliasLocality
Retains the alias
for the locality in
the output.
Returns the locality name
– the vanity name or the
postal name – as
mandated by the postal
regulations of the country.
Returns the postal name
for the locality in
the output.
If you want to validate addresses in the Certified mode and generate output that conforms to the
postal regulations of the country, you must ensure that the AliasStreet and AliasLocality parameters
are set to the default value, OFFICIAL.
If you want Informatica AddressDoctor to preserve the vanity name for the locality or the alias for
the street in the validated output, you must set the respective parameters to PRESERVE. If you want
Informatica AddressDoctor to return the postal name of the locality or street in the validated output,
set the respective parameters to OFF.
The following examples show different outputs for PRESERVE and OFFICIAL settings for the same
U.S. address.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
70
5.16.2 Input address:
<InputData>
<AddressElements>
<Country Item="1" Type="NAME">USA</Country>
<Locality Item="1" Type="COMPLETE">SHILOH</Locality>
<PostalCode Item="1" Type="UNFORMATTED">62269</PostalCode>
<Province Item="1" Type="COUNTRY_STANDARD">IL</Province>
</AddressElements>
<AddressLines>
<DeliveryAddressLine Line="1">9468 RIEDER RD</DeliveryAddressLine>
</AddressLines>
</InputData>
5.16.3 Output when AliasLocality is set to PRESERVE:
<AddressElements>
<Country Type="NAME_EN" Item="1">UNITED STATES</Country>
<Locality Item="1">SHILOH</Locality>
<PostalCode Item="1">62269</PostalCode>
<Province Item="1">IL</Province>
<Province Item="2">SAINT CLAIR</Province>
<Street Item="1">RIEDER RD</Street>
<Number Item="1">9468</Number>
</AddressElements>
In the preceding output example, you can see that the Locality 1 information (SHILOH) is the same as
the one provided in the input.
5.16.4 Output when validated in the Certified mode with AliasLocality set to OFFICIAL:
<AddressElements>
<Country Type="NAME_EN" Item="1">UNITED STATES</Country>
<Locality Item="1">O FALLON</Locality>
<PostalCode Item="1">62269</PostalCode>
<Province Item="1">IL</Province>
<Province Item="2">SAINT CLAIR</Province>
<Street Item="1">RIEDER RD</Street>
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
71
<Number Item="1">9468</Number>
</AddressElements>
In the preceding output example, you can see that the Locality 1 information (O FALLON) is different
from the one provided in the input (SHILOH). Shiloh is an alias for the locality that has O Fallon as its
postal name, and Informatica AddressDoctor corrects the locality name to the postal name because
of the OFFICIAL value set to AliasLocality.
5.17
Process Status Values
The process status values returned by AD_GetResultXML() or AD_GetResultParameter() summarize the
result output quality of a Process() call to Informatica AddressDoctor. For more detailed information
on the results, consult the “ElementResultStatus” also (see chapter 5.27.3).
The following table describes the process status values:
Value
Description
A1
Address code lookup found a partial address or a complete address for the input code.
A0
Address code lookup found no address for the input code.
C4
Corrected. All postally relevant elements are checked.
C3
Corrected. Some elements cannot be checked.
C2
Corrected, but the delivery status is unclear due to absent reference data.
C1
Corrected, but the delivery status is unclear because user standardization introduced
errors.
I4
Data cannot be corrected completely, but there is a single match with an address in
the reference data.
I3
Data cannot be corrected completely, and there are multiple matches with addresses
in the reference data.
I2
Data cannot be corrected. Batch mode returns partial suggested addresses.
I1
Data cannot be corrected. Batch mode cannot suggest an address.
N7
Validation error. Validation did not take place because single-line validation is not
unlocked.
N6
Validation error. Validation did not take place because single-line validation is not
supported for the destination country.
N5
Validation error. Validation did not take place because the reference database is out of
date.
N4
Validation error. Validation did not take place because the reference data is corrupt or
badly formatted.
N3
Validation error. Validation did not take place because the country data cannot be
unlocked.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
72
Value
Description
N2
Validation error. Validation did not take place because the required reference
database is not available.
N1
Validation error. Validation did not take place because the country is not recognized or
not supported.
Q3
Suggestion List mode. Address validation can retrieve one or more complete addresses
from the address reference data that correspond to the input address.
Q2
Suggestion List mode. Address validation can combine the input address elements and
elements from the address reference data to create a complete address.
Q1
Suggestion List mode. Address validation cannot suggest a complete address. To
generate a complete address suggestion, add data to the input address.
Q0
Suggestion List mode. There is insufficient input data to generate a suggestion.
RB
Country recognized from abbreviation. Recognizes ISO two-character and ISO threecharacter country codes. Can also recognize common abbreviations such as "GER" for
Germany.
RA
Country recognized from the ForceCountryISO3 setting.
R9
Country recognized from the Default CountryISO3 setting.
R8
Country recognized from the country name.
R7
Country recognized from the country name, but Informatica AddressDoctor identified
errors in the country data.
R6
Country recognized from territory data.
R5
Country recognized from province data.
R4
Country recognized from major town data.
R3
Country recognized from the address format.
R2
Country recognized from a script.
R1
Country not recognized because multiple matches are available.
R0
Country not recognized.
S4
Parse mode. The address was parsed perfectly.
S3
Parse mode. The address was parsed with multiple results.
S1
Parse mode. There was a parsing error due to an input format mismatch.
V4
Verified. The input data is correct. Address validation checked all postally relevant
elements, and inputs matched perfectly.
V3
Verified. The input data is correct, but some or all elements were standardized, or the
input contains outdated names or exonyms.
V2
Verified. The input data is correct, but some elements cannot be verified because of
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
73
Value
Description
incomplete reference data.
V1
Verified. The input data is correct, but user standardization has negatively impacted
deliverability. For example, the post code length is too short.
The Vx, Cx, and Ix Process Status values may be returned by “BATCH”, “INTERACTIVE” or “CERTIFIED”
Process() calls, while Qx is only returned for “FASTCOMPLETION”, Sx for “PARSE”, Rx for
“COUNTRYRECOGNITION”, and Ax for “ADDRESSCODELOOKUP” (see chapter 5.11 for details on the
different Process Modes). Nx Process Status values may be returned for any Process()call.
Processing the same input address in Batch or Interactive mode will usually yield the same process
status, except for I4 / I3 in the case of wrong numeric inputs. In this case, Batch might return I4,
while Interactive gives I3.
Note that for BATCH processing it is strictly recommended to only accept records with Vx or Cx
status for automated data updates. Ix records need to be reviewed manually before using these
results for any data update whatsoever.
When N1 (because country was not recognized or is fundamentally unsupported) is returned,
recognized fundamentally unsupported countries will be reported in the Result parameter
CountryISO3. This is the case for ex-countries such as the Soviet Union (SUN) or the Netherlands
Antilles (ANT).
Unrecognized countries will leave this parameter empty.
5.18
Mailability Scores
Informatica AddressDoctor provides an estimate of how likely successful delivery of mail to an
address might be. This is a simplification of the process status values (see chapter 5.17) and gives a
measure to determine whether an address should be bothered with for mailing in a specific usage
scenario:
Value
Description
5
Completely confident
4
Almost certain
3
Should be fine
2
Fair chance
1
Risky
0
Undeliverable
Addresses with a mailability of 5 and 4 may always be considered for sending mail, while 0 or 1
should not be used independent of the scenario. Addresses marked with 2 or 3 may be used, but
should be treated with caution: 2 indicates that the results are not corrected and therefore may still
contain an incorrect address component. 3 indicates a correction which may require a review before
sending the mail piece.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
74
If there is a requirement to understand exactly what was validated or corrected in the address, the
ProcessStatusValue, ElementInputStatus, and ElementResultStatus fields should be used instead of
the Mailability Score.
5.18.1 5: Completely Confident
All relevant elements of the address that have been entered were checked in the processing and
have been verified in the process.
5.18.2 4: Almost Certain
An address is considered to be Almost Certain when one of the following two scenarios is present:
Scenario 1: Some of the relevant elements of the address could not be checked due to reference
data and the rest of the address have been verified in the process.
Scenario 2: All relevant elements have been entered and some of the relevant elements of the
address have been corrected in the process with a very high confidence. This only happens if the
match was unique and the number of discrepancies was very low.
5.18.3 3: Should Be Fine
Some of the relevant elements of the address have been corrected in the process. A correction only
happens if the match was unique and the number of discrepancies was acceptable.
5.18.4 2: Fair Chance
The address could not be corrected or validated in the process based on two scenarios:
Scenario 1: A candidate match could not be made that had sufficient confidence.
Scenario 2: The address matching ended with multiple candidates with similar confidence levels
(multi-match situation). The input address, therefore, has a Fair Chance to be mailable as the
relevant elements exist.
5.18.5 1: Risky
The address entered could only generate a partial match.
5.18.6 0: Undeliverable
The address entered is either missing too many components or a majority of the components could
not be verified as they generate no matches against the reference data.
5.19
Geocoding Status Values
Informatica AddressDoctor 5 enables geocoding for selected countries: this means the Version 5 API
will provide the option to enrich a validated address by the respective geo-coordinates in the WGS84
(http://wikipedia.org/wiki/WGS84) format.
The quality of coverage will vary from country to country and while Informatica AddressDoctor
strives to provide geo-coordinates on house number or building level, depending on data availability,
only street or even locality level geo-coordinates might be available.
With version 5.4.0 of Informatica AddressDoctor, point address geocoding has been added. Point
address geocoding enables accurate and precise geocoordinates for a specific point at an address
without interpolating the values. The point address geocoding product includes the following types
of geocoding:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
75

Arrival Point geocoding. The geocoordinates are calculated for a point that is placed in the
center of a street segment in front of the house. To use arrival point geocoding, you must
download the High Precision Arrival Point database. If the arrival point geocoordinates do
not exist, then Informatica AddressDoctor uses the Standard Geocode database as a fallback
to interpolate the geocoordinates, if the Standard Geocode database is loaded. Otherwise,
Informatica AddressDoctor returns the EGC0 (no geocode available) status code for the
given address.

Parcel Centroid geocoding. The geocoordinates are calculated for a point that is at the
geographic center of the parcel of land. To use parcel centroid geocoding, you download
only the Parcel Centroid database. If the parcel centroid geocoordinates do not exist, then
Informatica AddressDoctor returns the EGC0 (no geocode available) status code.
Version 5.4.2 extends the point address geocoding support to the following European countries:
Austria, Denmark, Germany, the Netherlands, and Sweden. In earlier versions, the support for point
address geocoding was limited to North America. Informatica AddressDoctor will extend support for
other countries as and when the need arises.
The corresponding status values returned with the processing result via AD_GetResultXML()or
AD_GetResultParameter() are as follows:
Value
Description
EGCN
Informatica AddressDoctor cannot find the geocoding database.
EGCU
The geocoding database is not unlocked.
EGCC
The geocoding database is corrupt.
EGC0
Informatica AddressDoctor could not append geocoordinates to the input address
because no geocode is available for the address.
EGC4
Geocoordinates are only partially accurate to the postal code level. For example,
795xx.
EGC5
Geocoordinates are accurate to the postal code level.
EGC6
Geocoordinates are accurate to the locality level.
EGC7
Geocoordinates are accurate to the street level.
EGC8
Geocoordinates are accurate to the house number level. (Estimated location of
the parcel of land with street-side offset.)
EGC9
High-precision arrival point geocoordinates. (Measured entryway to the parcel of
land.)
EGCA
High-precision parcel centroid geocoordinates. (Measured center of the parcel of
land.)
To use point geocoding for any of the supported countries, you must download the corresponding
Point Address Geocoding database. The High Precision Arrival Point database provides
geocoordinates for a point that is placed in the center of a street segment in front of given address,
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
76
whereas the Parcel Centroid database provides geocoordinates for a point that is at the geographic
center of the parcel of land.
Arrival Point and Parcel Centroid Databases for Point Geocoding
The following table lists the High Precision Arrival Point databases and the Parcel Centroid databases
for the supported countries.
Country
Arrival Point Database
Parcel Centroid Database
Austria
AUT5GCAP.MD
AUT5GCPC.MD
Canada
CAN5GCAP.MD
CAN5GCPC.MD
Denmark
DNK5GCAP.MD
DNK5GCPC.MD
Finland
FIN5GCAP.MD
FIN5GCPC.MD
Germany
DEU5GCAP.MD
DEU5GCPC.MD
Hungary
HUN5GCAP.MD
HUN5GCPC.MD
Latvia
LVA5GCAP.MD
LVA5GCPC.MD
Luxembourg
LUX5GCAP.MD
LUX5GCPC.MD
Mexico
MEX5GCAP.MD
Not available
Netherlands
NLD5GCAP.MD
NLD5GCPC.MD
Norway
NOR5GCAP.MD
NOR5GCPC.MD
Slovenia
SVN5GCAP.MD
SVN5GCPC.MD
Sweden
SWE5GCAP.MD
SWE5GCPC.MD
UK
GBR5GCAP.MD
Not available
USA
USA5GCAP.MD
USA5GCPC.MD
5.20
CAMEO Status Values
With version 5.2.8 of Informatica AddressDoctor, a new enrichment type ‘CAMEO’ has been
introduced. The CAMEO Status values indicate if CAMEO codes are available for the input address or
the reason why no codes are available.
Value
Description
ECON
No CAMEO codes provided because no CAMEO database for the country is
available.
ECOI
No CAMEO codes provided – no CAMEO lookup was performed, as the address
could not be corrected and has an Ix ProcessStatus.
ECO0
No CAMEO codes provided because no CAMEO code was found for the input
address.
ECO1
CAMEO codes available.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
77
5.21
CASS Status Values
Informatica AddressDoctor 5 provides the output required by the USPS CASS Standard, see chapter
6.24 for details on the actual output available. The corresponding status values returned with the
processing result via AD_GetResultXML()or AD_GetResultParameter() are:
Value
Description
ECA0
CASS output not available (for this address)
ECA1
CASS attributes only partially provided (some databases are missing)
ECA2..4
Reserved for future use
ECA5
CASS attributes provided
5.22
SERP Status Values
Informatica AddressDoctor 5 provides the output required by the Canada Post SERP Standard, see
chapter 6.24 for details on the actual output available. The corresponding status values returned
with the processing result via AD_GetResultXML()or AD_GetResultParameter() are:
Value
Description
ESE0
SERP output not available (for this address)
ESE1
SERP attributes provided
If the Validation type is CERTIFIED and the SERP Enrichment Status is ON, two enrichments are
provided: CATEGORY and EXCLUDED_FLAG. For details, see chapter 6.24.2.
5.23
SNA Status Values
Informatica AddressDoctor 5 provides the output required by the La Poste SNA Standard, see
chapter 6.24 for details on the actual output available. The corresponding status values returned
with the processing result via AD_GetResultXML()or AD_GetResultParameter() are:
Value
Description
ESN0
SNA output not available (for this address)
ESN1
SNA attributes provided
5.24
AMAS Status Values
Informatica AddressDoctor 5 provides the output required by the Australia Post AMAS Standard, see
chapter 6.24 for details on the actual output available. The corresponding status values returned
with the processing result via AD_GetResultXML()or AD_GetResultParameter() are:
Value
Description
EAM0:
AMAS output not available (for this address)
EAM1:
AMAS output is provided – Address is corrected or validated and DPID is
delivered
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
78
Value
Description
EAM2
AMAS output is not provided – no correction or validation is possible – no
DPID can be returned
5.25
SendRight Status Values
The SendRightStatus parameter of the EnrichmentData may contain the following values:
Value
Description
ESR0:
SendRight output not available (for this address)
ESR1:
SendRight output is provided
5.26
Country Specific Enrichment
Informatica AddressDoctor provides additional enrichment output required by the local markets for
the following countries:

Austria

Serbia

Brazil

South Africa

France

Switzerland

Germany

United Kingdom

Japan

USA

Poland
See chapter 6.13 for details on the actual output available. You must use a valid unlock code to use
the supplementary databases for these countries.
5.26.1 Country Specific Enrichment Status Values
The corresponding status values returned with the processing result via AD_GetResultXML()or
AD_GetResultParameter() are:
5.26.2 For USSupplementary:
EUS0:
US country specific output not available (for this address)
EUS1:
US country specific attributes provided (not necessarily all attributes are populated)
EUSC:
Database is corrupt
EUSN:
Database not found
EUSU:
Database not unlocked
5.26.3 For GBSupplementary:
EGB0:
GB country specific output not available (for this address)
EGB1:
GB country specific attributes provided (not necessarily all attributes are populated)
EGBC:
Database is corrupt
EGBN:
Database not found
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
79
EGBU:
EJP0:
Database not unlocked
5.26.4 For JPSupplementary:
JP country specific output not available (for this address)
EJP1:
JP country specific attributes provided (not necessarily all attributes are populated)
EJPC:
Database is corrupt
EJPN:
Database not found
EJPU:
Database not unlocked
5.26.5 For RSSupplementary:
ERS0:
RS country specific output not available (for this address)
ERS1:
RS country specific attributes provided (not necessarily all attributes are populated)
ERSC:
Database is corrupt
ERSN:
Database not found
ERSU:
Database not unlocked
5.26.6 For BRSupplementary:
EBR0:
BR country specific output not available (for this address)
EBR1:
BR country specific attributes provided (not necessarily all attributes are populated)
EBRC:
Database is corrupt
EBRN:
Database not found
EBRU:
Database not unlocked
5.26.7 For CHSupplementary:
ECH0:
CH country specific output not available (for this address)
ECH1:
CH country specific attributes provided (not necessarily all attributes are populated)
ECHC:
Database is corrupt
ECHN:
Database not found
ECHU:
Database not unlocked
5.26.8 For DESupplementary:
EDE0:
DE country specific output not available (for this address)
EDE1:
DE country specific attributes provided (not necessarily all attributes are
populated)
EDEC:
Database is corrupt
EDEN:
Database not found
EDEU:
Database not unlocked
5.26.9 For ZASupplementary:
EZA0:
ZA country specific output not available (for this address)
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
80
EZA1:
ZA country specific attributes provided (not necessarily all attributes are populated)
EZAC:
Database is corrupt
EZAN:
Database not found
EZAU:
Database not unlocked
5.26.10 For ATSupplementary
EAT0:
AT country-specific output not available (for this address)
EAT1:
AT country-specific attributes provided (not necessarily all attributes are populated)
EATC:
Database is corrupt
EATN:
Database not found
EATU:
Database not unlocked
5.26.11 For FRSupplementary
EFR0:
FR country-specific output not available (for this address)
EFR1:
FR country-specific attributes provided (not necessarily all attributes are populated)
EFRC:
Database is corrupt
EFRN:
Database not found
EFRU:
Database not unlocked
EPL0:
EPL1:
5.26.12 For PLSupplementary
PL country-specific output not available (for this address)
FR country-specific attributes provided
(not necessarily all attributes are populated)
EPLC:
Database is corrupt
EPLN:
Database not found
EPLU:
Database not unlocked
5.26.13 Country Specific Enrichment Output Fields
The following output fields are currently supported:
5.26.14 For USSupplementary:
COUNTY_FIPS_CODE
3 digit number identifying a county in the United States. The United States Federal Information
Processing Standard (FIPS) maintains a set of codes that identify states, counties, and other
territorial possessions. The two-digit state code identifies each state. The three-digit county code
identifies a county within a state. The five digits of the state and county code can uniquely identify
any county
STATE_FIPS_CODE
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
81
2 digit number identifying states in the United States. The Federal Information Processing Standard
(FIPS) controls the numerical and alphabetical codes that identify states and other territories of the
United States.
MSA_ID
The Metropolitan Statistical Area identification number (MSAID) is a 4 digit number that identifies an
urban area with a population greater than 50,000.
CBSA_ID
Represents a Core-Based Statistical Area (CBSA) identification number. A CBSA identifies an urban
area with a population greater than 10,000. A CBSA can be a Metropolitan Statistical Area or
Micropolitan Statistical Area. A Metropolitan Statistical Area has over 50,000 inhabitants. A
Micropolitan Statistical Area has between 10,000 and 50,000 inhabitants.
A CBSA_ID is a 5 digit number.
FINANCE_NUMBER
A finance number has six digits. The output is a code assigned to United States post offices and other
postal facilities to enable collection of cost and statistical data.
The first two digits of the finance number identify the state. The final four digits identify the USPS
post office or postal facility.
RECORD_TYPE
A single-character code that describes the type of a mailbox or delivery. For example, the
code can indicate if the address is in a high-rise building (value H) or a post office box (value P).
CMSA_ID
Represents a Consolidated Metropolitan Statistical Area (CMSA) identification number. A PMSA
becomes a CMSA if local opinion favors the designation. The CMSA_ID is a 4 digit unique number.
TIME_ZONE_CODE
1 to 3 characters numerical value identifying the difference to GMT. Example would be “-5” for
Eastern Standard Time.
TIME_ZONE_NAME
3 Characters identifying the time zone the address is in like “EST” Eastern Standard Time
CENSUS_TRACT_NO
Census Tract is a statistical subdivision of a county. The CENSUS_TRACT_NO. is a 6 digit number.
CENSUS_BLOCK_NO
Census Block is the smallest entity for which the Census bureau collects census information. The
CENSUS_BLOCK_NO is a 4 digit number.
CENSUS_BLOCK_GROUP
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
82
A Census Block Group is a group of Census blocks sharing the same first digit.
PMSA_ID
Represents a Primary Metropolitan Statistical Area (PMSA) identification number. Two or more
PMSA are created if a MSA reaches a size of 1 million or more people. The PMSA_ID is a 4 digit
unique number.
MCD_ID
Represents a Minor Civic Division which is a primary legal subdivision of a county defined by the
Government. The MCD_ID is a 5 digit number.
PLACE_FIPS_CODE
5 digit number identifying localities in the United States. The Federal Information Processing
Standard (FIPS) controls the numerical codes that identify localities in the United States.
5.26.15 For BRSupplementary:
Address Doctor provides the Brazilian Institute of Geography and Statistics (IBGE) code as an
enrichment output field for Brazilian addresses. For ecommerce, a government agency in Brazil
publishes a list of cities/states and their official numeric seven digit code called the IBGE code. This
code is used for taxation and auditing purposes. Every order that gets placed is eventually crossreferenced with the city and state to get the associated IBGE code.
You must have the new supplementary data for Brazil, BRA5E1.MD as well as version 5.4.1 or later of
Informatica AddressDoctor to leverage this functionality.
For example, if you enter the following address:
Rua da Matriz 9
Centro
Glória do Goitá-pe
55620-000
Brazil
Along with the validated output, Informatica AddressDoctor returns the following enrichment value:
IBGE_CODE: 2606101
5.26.16 For DESupplementary:
Informatica AddressDoctor now provides the following enrichment output fields for German
addresses:

DEU_AGS. The Amtliche Gemeindeschlüssel (AGS) is a variable length code that uniquely
identifies a locality in Germany. There may be more than one locality for a given AGS
code.

DEU_LOCALITY_ID. The Locality ID is a variable length code that uniquely identifies a
German locality.

DEU_STREET_ID. The Street ID is a variable length code that uniquely identifies a
German street address.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
83
You must have the new German database as well as Version 5.4.1 or later of Informatica
AddressDoctor to leverage this functionality.
For example, if you enter the following address:
Röntgenstr. 9
67133 Maxdorf
Germany
Along with the validated output, Informatica AddressDoctor returns the following enrichment
values:
DEU_AGS: 07338018
DEU_LOCALITY_ID: 68015519
DEU_STREET_ID: 100560690
5.26.17 For CHSupplementary:
Swiss Post has introduced an additional two characters to the postal codes. Informatica
AddressDoctor has updated its engine to allow the output of the additional postal code characters as
an enrichment field. The new field is named POCO_EXT.
You must have the new supplementary data for Switzerland, CHE5E1.MD, as well as version 5.4.2 or
later of Informatica AddressDoctor to leverage this functionality. To use the new enrichment field,
you must set the EnrichmentSupplementaryCH parameter to ON in the Parameters.xml file.
For example, if you enter the following address in Batch mode:
Hohlen 1
3800 Sundlauenen
Switzerland
Along with the validated output, Informatica AddressDoctor returns the following enrichment
values:
Status: ECH1
POCO_EXT: 05
5.26.18 For GBSupplementary:
DELIVERY_POINT_SUFFIX
The Royal Mail assigns a two-character suffix to every mailbox in a UK post code area. It uses the
post code and delivery point suffix to identify every mailbox.
The delivery point suffix format is a digit followed by a letter.
UDPRN (Unique Delivery Point Reference Number):
The Unique Delivery Point Reference Number, or UDPRN, is an eight character code that uniquely
identifies each postal address of the Royal Mail PAF database. The UDPRN keeps a constant
reference that remains uniquely tied to the physical delivery point regardless of any changes in the
address.
ADDRESS_KEY
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
84
Informatica AddressDoctor Version 5.6.0 extends the U.K. address enrichment to include Address
Keys provided by Royal Mail. Address Keys are 8-digit numeric codes that map to addresses in the
Postcode Address File (PAF) from Royal Mail. You can use Address Keys in conjunction with
Organization Keys and the PostCode Type to uniquely identify an address.
5.26.19 For JPSupplementary:
Informatica AddressDoctor provides the old Choumei Aza Code, the new Choumei Aza Code, and the
Gaiku code enrichments to Japan Addresses. The Choumei Aza code is an eleven-digit code defining
a unique delivery point for Japan addresses. The Gaiku code is a four-digit code that denotes a city
block in a Japan address.
CHOUMEI_AZA_CODE
Returns the old Choumei Aza Code. For this setting to work, you must have the
MatchingExtendedArchive attribute of the Process element set to ON to include the old Choumei
Aza code in the output.
NEW_CHOUMEI_AZA_CODE
Returns the new Choumei Aza Code.
GAIKU_CODE
Returns the Gaiku code.
5.26.20 For RSSupplementary:
Post Serbia has introduced an additional six-digit Postal Address Code (PAK) which goes down to the
street level. The PAK ensures that mail is delivered correctly and promptly to recipients in Serbia. For
items that are addressed to a P.O. Box, “poste restante” or to a military address, the PAK is not
needed in the address.
POSTAL_ADDRESS_CODE
The postal address code (PAK).
5.26.21 For ZASupplementary:
Informatica AddressDoctor provides the National Address Database (NAD) ID as an enrichment
output field for South African addresses. The NAD ID is a unique numeric ID assigned to each street
address. You must have the new South African database as well as version 5.4.2 or later of
Informatica AddressDoctor to leverage this functionality.
For example, if you enter the following address:
4 Balmoral Road
Vincent
East London
5247
South Africa
Along with the validated output, Informatica AddressDoctor returns the following enrichment value:
NAD_ID: 2170232
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
85
5.26.22 For ATSupplementary
POSTAL_ADDRESS_CODE
Informatica AddressDoctor provides the Postal Address Code as an enrichment to Austrian
addresses. You must have the AUT5E1.MD database installed and EnrichmentSupplementaryAT in
the parameter.xml set to ON. This is supported only in Informatica AddressDoctor versions 5.5.0 and
later.
For example:
<InputData>
<AddressElements>
<Country Item="1" Type="NAME">AUT</Country>
<Locality Item="1" Type="COMPLETE">Perchtoldsdorf</Locality>
<PostalCode Item="1" Type="UNFORMATTED">2380</PostalCode>
<Province Item="1" Type="COUNTRY_STANDARD">Niederösterreich</Province>
<Street Item="1" Type="COMPLETE">Plättenstraße</Street>
<Number Item="1" Type="COMPLETE">7</Number>
</AddressElements>
</InputData>
Along with the validated output, Informatica AddressDoctor returns the following enrichment
values:
Status: EAT1
PAC: 105176447
5.26.23 For FRSupplementary
INSEE_CODE
Informatica AddressDoctor provides the INSEE code and the INSEE-9 code as enrichments to French
addresses.
The INSEE code is a numerical indexing code used by the French National Institute for Statistics and
Economic Studies (INSEE) to identify various entities including French communes and departments.
INSEE codes are particularly helpful in uniquely identifying French communes that share the same
name, spelling, and pronunciation. Of a five-digit INSEE code for a commune, the first two digits
represent the department and the last three denote the commune. INSEE codes are also used as
National Identification Numbers for French citizens.
The INSEE-9 code is also known as the IRIS code. IRIS stands for aggregated units for statistical
information in French, and represents a demographic group that contains a maximum of 2000
people. France is composed of around 16,100 IRIS units including 650 units in overseas departments.
To use this enrichment, you must have the FRA5E1.MD database installed and
EnrichmentSupplementaryFR in parameter.xml set to ON. This is supported only in Informatica
AddressDoctor versions 5.5.0 and later.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
86
For example:
<InputData>
<AddressElements>
<Country Item="1" Type="NAME">FRA</Country>
<Locality Item="1" Type="COMPLETE">AGEN</Locality>
<PostalCode Item="1" Type="UNFORMATTED">47000</PostalCode>
<Street Item="1" Type="COMPLETE">RUE DU PUITS DU SAUMON</Street>
<Number Item="1" Type="COMPLETE">6</Number>
</AddressElements>
</InputData>
Along with the validated output, Informatica AddressDoctor returns the following information:
STATUS: EFR1
INSEE_CODE 47001
INSEE_9_CODE 470010115
5.26.24 For PLSupplementary
GMINA_CODE, LOCALITY_TERYT_ID, STREET_TERYT_ID
Informatica AddressDoctor provides Gmina code, Locality and Street TerytIDs as enrichments for
addresses in Poland. National Official Register of the Territorial Division of the Country (TERYT) is the
official agency of Poland that is responsible for identifiers and names of territories, localities, roads,
buildings, and so on. Gmina is the Polish equivalent of communes or municipalities. Gmina code and
TerytIDs are assigned and managed by TERYT. To use these enrichments, you must have the
POL5E1.MD database installed and EnrichmentSupplementaryPL in parameter.xml set to ON. This is
supported only in Informatica AddressDoctor versions 5.5.0 and later.
For example:
<InputData>
<AddressElements>
<Country Item="1" Type="NAME">POL</Country>
<Locality Item="1" Type="COMPLETE">Wrocław</Locality>
<PostalCode Item="1" Type="UNFORMATTED">50510</PostalCode>
<Province Item="1" Type="COUNTRY_STANDARD">dolnośląskie</Province>
<Street Item="1" Type="COMPLETE">ul. Laskowa</Street>
<Number Item="1" Type="COMPLETE">1</Number>
</AddressElements>
</InputData>
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
87
Along with the validated output, Informatica AddressDoctor returns the following enrichment
values:
Status: EPL 1
GMINA_CODE: 2183
LOCALITY_TERYT_ID: 0986544
STREET_TERYT_ID: 10666
5.27
Element Status and Relevance Values
Element status values give a detailed explanation of the outcome of the validation operation. They
are only meaningful after a validation operation has been performed, even though some
information is available after a parsing operation for the “ElementInputStatus” value.
In Informatica AddressDoctor 5 now 20 address elements are covered in both, “ElementInputStatus”
and “ElementResultStatus”. The former provides per element information on the matching of input
elements to reference data, while the latter categorizes the result in more detail than the overview
process status values described in section 5.17 (by indicating if and how the output fields have been
changed from the input fields).
5.27.1 Element Positions
The element positions (from left to right) are, where level 0 pertains to the Item 1 status
information, while level 1 summarizes the status information on Items 2-6 (see chapter 5.9 on
address element items):
Position
Description
1
PostalCode level 0
2
PostalCode level 1 (for example, ZIP+4 – Plus 4 addition)
3
Locality level 0
4
Locality level 1 (for example, Urbanisation, Dependent Locality)
5
Province level 0
6
Province level 1 (for example, Sub Province)
7
Street level 0
8
Street level 1 (for example, Dependent street)
9
Number level 0
10
Number level 1
11
Delivery service level 0 (for example, PO Box, GPO, Packstation, Private Bags)
12
Delivery service level 1
13
Building level 0
14
Building level 1
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
88
Position
Description
15
SubBuilding level 0
16
SubBuilding level 1
17
Organization level 0
18
Organization level 1
19
Country level 0 (Mother country)
20
Country level 1 (for example, Territory)
See chapter 5.3 for more in-depth information on address elements.
5.27.2 ElementInputStatus
The possible values for validation are:
Value
Description
0
The input address contains no data at this position.
1
The data at this position cannot be found in the reference data.
2
The position cannot be checked because reference data is missing.
3
The data is incorrect. The reference database suggests that the Number or
DeliveryService value is outside the range expected by the reference data.
In batch and certified modes, the input data at this position is passed
uncorrected as output. In suggestion list modes, Informatica AddressDoctor can
provide alternatives.
4
The data at this position matches the reference data, but with errors.
5
The data at this position matches the reference data, but the data element was
corrected or standardized. For example:
6

Parsing: Splitting of house number for “MainSt 1”

Validation: Replacing an input that is an exonym, or dropping
superfluous fielded input that is not valid according to the country
reference database
The data at this position matches the reference data without any error.
For parsing, the following values are possible:
Value
Description
0
The input address contains no data at this position.
1
The element at this location was moved to another position.
2
The element at this position matched the reference data value but needed to be
normalized.
3
The data at this position is correct.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
89
5.27.3 ElementResultStatus
ElementResultStatus is set after validation to indicate whether verification (“verified”) or correction
(“changed”) was possible.
The following table describes the possible values for the address elements in positions 1 through 18:
Value
Description
0
The output address contains no data at this position.
1
The data at this position cannot be found in the reference data. The input data is
copied to the output data.
2
Data at this position is not checked but is standardized.
3
Data at this position is checked but does not match the expected reference data.
The reference data suggests that the number data is not in the valid range. The
input data is copied to the output. The status value applies in batch mode only.
4
Data at this position is validated but not changed because reference data is
missing.
5
Data at this position is validated but not changed because multiple matches exist
in the reference data. The status value applies in batch mode only.
6
Data validation deleted the input value at this position.
7
Data at this position is validated but contained a spelling error. Validation
corrected the error by copying the value from the reference data.
8
Data at this position is validated and updated by adding a value from the
reference data.
It can also mean that the reference database contains additional data for the
input element. For example, validation can add a building or sub-building number
if a perfect match is found for the street name or building name.
9
Data at this position is validated but not changed, and the delivery status is not
clear. For example, the DPV value is wrong.
C
Data at this position is validated and verified, but the name data is out of date.
Validation changed the name data.
D
Data at this position is validated and verified but changed from an exonym to an
official name.
E
Data at this position is validated and verified. However, data validation
standardized the character case or the language. Address validation can change
the language if the value fully matches a language alternative. For example,
address validation can change "Brussels" to "Bruxelles" in a Belgian address.
F
Data at this position is validated, verified, and not changed, due to a perfect
match with reference data.
Positions 19 and 20 in the output string relate to country data. The country data values apply to the
COUNTRYRECOGNITION process mode also. For more information, see chapters 5.11.8 and 5.12.3).
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
90
The following table describes the possible values for the address elements in positions 19 through
20:
Value
Description
0
The output address contains no data at this position.
1
The country is not recognized.
4
The country is recognized from the DefaultCountryISO3 setting.
5
The country is not recognized because multiple matches are available.
6
The country is recognized from a script.
7
The country is recognized from the address format.
8
The country is recognized from major town data.
9
The country is recognized from province data.
C
The country is recognized from territory data.
D
The country is recognized from the country name, but the name contains errors.
E
The country is recognized from the country name without errors.
F
The country is recognized from the ForceCountryISO3 setting.
5.27.4 ElementRelevance
In addition to the element status values described previously, information is available on which of
the address elements of the address processed are actually relevant from the local postal operator’s
point of view. The possible values for each address element are “1” for relevant and “0” otherwise.
For any given address, all address elements with a value of “1” must be present for an output
address to be deemed valid by the local postal authority. “ElementRelevance” may well vary from
address to address for countries with different address types; for example, rural versus metropolitan
addressing. Furthermore, AddressElements that have actually been validated against reference data
(i.e. with ElementResultStatus 7 and higher) may override the default ElementRelevance value
defined for that AddressElement.
Note that “ElementRelevance” is really only meaningful for a “ProcessStatus” value of Cx or Vx (and
possibly I3 and I4 for Process Mode INTERACTIVE, see chapter 5.17 for details on “ProcessStatus”).
5.28
Extended Element Result Status Fields
5.28.1 Address Resolution Code (ARC)
The Address Resolution Code is a twenty character output string similar to the existing Element
Status fields which is populated for invalid (Ix Process Status Code) records. The ARC explains why an
address is rejected and directs you to possible resolutions. Informatica AddressDoctor generates the
following Address Resolution Code values:
Value
Description
2
Missing element in address.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
91
Value
Description
3
Numeric provided inside element is outside permissible range – for example,
wrong numeric inside street name or house number; 100 Main St when house
numbers range from 400-800.
4
Multiple inputs for the element.
5
Input element ambiguous / multiple matches.
6
Element contradicts other elements. For example, the postal code information
and locality information do not match.
7
3 strike rule/too many corrections in combination of several elements.
8
General Postal Authority Rule.
Note that for all other scenarios this value will be zero.
5.28.2 ARC = 3 (Numeric provided inside an element is outside permissible range)
The house number in the following example is outside the range for the address. Two suggestions
are generated for the address in interactive mode.
Input
Output
Output
Röntgenstr. 10
Röntgenstr. 1-9
Röntgenstr. 2-8
67133 Maxdorf
67133 Maxdorf
67133 Maxdorf
Germany
Germany
Germany
Process Status = I3: Data could not be corrected completely.
Element
EIS
ERS
ARC
Relevance
Explanation
House
Number
3
7
3
1
House Number provided is outside
the permissible range
5.28.3 ARC = 4 (Multiple inputs for the element)
Processing the following address in batch mode gives a process status of I4 and an ARC value of 4.
Input
Output
Street =Rue des Ardennes
Street=Rue des Ardennes
House Number= 21
House Number=21
Postal Code=75019
Postal Code=75019
Locality=75935 Paris
Locality=Paris
Process Status = I4: This address has multiple postal codes and cannot be resolved.
Element
EIS
ERS
ARC
Relevance
Explanation
Postal Code
6
0
4
0
Multiple postal codes for the
address
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
92
This results in ARC = 4 for the postal code; i.e. multiple inputs for the element (two postal codes in
input).
5.28.4 ARC = 5 (Input element ambiguous / multiple matches)
Processing the following address in interactive mode gives a process status of I1 and an ARC value of
5.
Input
Output
C.P. 102
C.P. 102
120557 Bucuresti
120557 Bucuresti
Romania
Romania
This address is ambiguous because the input postal code is incorrect for Bucuresti. There are
multiple suggestions for it in interactive mode; therefore the address is copied to the output, and a
value of 5 is assigned to the postal code.
Process Status Code = I1: Data could not be corrected and no suggestions are available in interactive
mode.
Element
EIS
ERS
ARC
Relevance
Explanation
Postal Code
3
0
5
0
Input element ambiguous because
the postal code does not exist for
the locality, leading to multiple
suggestions in interactive mode.
The address is therefore copied
5.28.5
ARC = 6 (element contradicts other elements; for example, Postal Code/Locality
mismatch)
Processing the following address in certified mode gives a process status of I2 and an ARC value of 6.
Input
Output
301-703 Riverwood Ave
301-703 Riverwood Ave
Winnipeg MB T5A 0P8
Winnipeg MB T5A 0P8
Canada
Canada
The postal code and locality values in the input contradict each other. Postal code T5A 0PA is for
Edmonton in Alberta and not for Winnipeg, Manitoba. SERP certification rules state that the postal
code cannot be changed.
Process Status Code = I2: Data could not be corrected in certified mode.
Element
EIS
ERS
ARC
Relevance
Explanation
Postal Code
6
0
6
0
Postal Code contradicts Locality
Locality
6
0
6
0
Locality contradicts Postal code
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
93
5.28.6 ARC = 7 (Too many corrections)
Processing the following address in interactive mode gives a process status of I4 and an ARC value of
7.
Input
Output
Peterlenstrasse 14
Peter-Anders-Str. 14
1000 Berlin
12057 Berlin
Germany
Germany
Process Status = I4: Data could not be corrected completely. The postal code 1000 is incorrect for
Berlin and Peterlenstrasse does not exist in Berlin. Therefore, these elements must be corrected
aggressively to get some results returned. It is unclear whether these elements are completely
correct. The elements are, therefore, assigned an ARC value of 7.
Element
EIS
ERS
ARC
Relevance
Explanation
Postal Code
4
7
7
1
Too many corrections
Street
4
7
7
1
Too many corrections
5.28.7 ExtElementStatus (EERS)
The Extended Element Result Status (EERS) code is a twenty character output string similar to the
Element Status fields for valid or corrected addresses. The EERS informs the user that additional
information may be available in the reference database for the given address. The code can return
the following values:
Value
Description
1
Data available for the element in the database, but not used for validation
2
Element unchecked, but changed because of wrong syntax/format
3
Numeric in element correct, but element changed because of wrong
syntax/descriptor
4
Element correct or unchecked, but moved because of wrong format
5
Alternative available in database – for example, language, preferred locality name,
alias name
6
Unvalidated parts inside element like additional information
7
Level change like moving HNO1 to HNO2 or swapping Locality2 with Locality1
8
Type change for fielded input only; for example, moving SubBuilding to Building
Level 2
9
General Postal Authority Rule
A
Dominant match for dual address processing
B
Relevance is only a country-wide default and cannot be trusted
C
Fast Completion Overflow
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
94
Value
Description
D
Numeric for range expansion (interpolated)
E
Language not available for the country, database language returned
F
Output address is outdated
Note that for all other scenarios this value will be zero.
5.28.8 EERS = 2 (Element unchecked, but changed because of wrong syntax/format)
Processing the following address in batch mode gives a process status of V2 and an EERS value of 2.
Input:
Output:
113/115 Rue Germaine Tailleferre
113 Rue Germaine Tailleferre
75019 Paris
75019 Paris
France
France
Process Status = V2: Address is correct but some elements could not be verified.
Element
EIS
ERS
EERS
Relevance
Explanation
House
Number
2
2
2
1
The input contains the wrong syntax
for house number; i.e. two house
numbers. The first part of the house
number (113) is not found in the
database and is therefore copied to
the output and the second part 115 is
removed from the element
5.28.9 EERS = 3 (Numeric in element correct, but element changed because of wrong
syntax/descriptor)
Processing the following address in batch mode gives a process status of V3 and an EERS value of 3.
Input
Output
18-20 Rue Edouard Jacques
18 Rue Edouard Jacques
75014 Paris
75014 Paris
Process Status = V3: Verified – input data correct on input but some elements were standardized.
France does not permit ranges for house numbers. Therefore, the first part of the house number is
matched against the reference database and the “-20” is removed. This leads to an assignment of 3
for EERS for the house number element of the address.
Element
EIS
ERS
EERS
Relevance
Explanation
House
Number
5
E
3
1
Numeric in element correct, but
element changed because of wrong
syntax/descriptor
5.28.10 EERS = 4 (Element correct or unchecked, but moved because of wrong format)
Processing the following address in batch mode gives a process status of C4 and an EERS value of 4.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
95
Input
Output
Organization: Sinopia Financial Services
Sinopia Financial Services
DAL1: 4 Place del la Pyramide
Immeuble Ile de France
DAL2: Immeuble Ile de France
4 Place De La Pyramide
CSLLN: Paris La Defense CEDEX 92912
Puteaux
Country: France
92912 Paris La Defense CEDEX
France
Process Status = C4: Corrected
Building Immeuble Ile de France has been moved above the street in the output and gets an EERS
status of 4.
Element
EIS
ERS
EERS
Relevance
Explanation
Building
6
F
4
0
Building moved one level up, i.e.
above the street, because the input
format was incorrect
5.28.11 EERS = 5 (Alternative available in database – for example, language, preferred
locality name, alias name)
In this example, the PreferredLanguage parameter is set to “Database”, and the address is processed
in batch mode.
Input
Output
Koningstraat 4
Rue Royale 4
Brussels
1000 Bruxelles
Belgium
Bruxelles-Capitale
Belgium
Process Status = C4: Corrected
Element
EIS
ERS
EERS
Relevance
Explanation of EERS
Locality
5
E
5
1
Alternative language available in
database
Province
0
8
5
0
Alternative language available in
database
Street
4
7
5
1
Alternative language available in
database
5.28.12 EERS = 6 (Unvalidated parts inside element like additional information)
Processing the following address in batch mode gives a process status of V2 and an EERS value of 6.
Input
Output
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
96
Leona Vicario 7528 C 1
Leona Vicario 7528 C 1
Condominio Campestre Del Valle
Condominio Campestre Del Valle
52177 Metepec, Mex
52177 Metepec, Mex
Mexico
Mexico
Process Status = C3: Corrected but some elements could not be checked. LEONA VICARIO is found in
the database as a valid street. “7528 C” is an unvalidated part of the street output. The house
number is copied. It is not known whether the house number is relevant for delivery, and therefore
gets an EERS of B.
Element
EIS
ERS
EERS
Relevance
Explanation
Street
6
F
6
1
7528 C is an unvalidated part of the
Street
House
Number
2
4
B
1
The element relevance is only a
country-wide default and cannot be
trusted
5.28.13 EERS = 7 (Level change like moving HNO1 to HNO2 or swapping Locality2 with
Locality1)
Processing the following address in batch mode gives a process status of C3 and an EERS value of 7.
Input
Output
RUA EDUARDO RIZK 1135
RUA EDUARDO RIZK 1135
GUARUJÁ
BALNEÁRIO CIDADE ATLÂNTICA
BALNEÁRIO CIDADE ATLÂNTICA-SP
GUARUJÁ-SP
11441-140
11441-140
BRAZIL
BRAZIL
Process Status = C3: Corrected but some elements could not be checked.
Element
EIS
ERS
EERS
Relevance
Explanation
Locality1
5
E
7
1
Locality1 swapped with Locality2 –
elements have changed level
Locality2
6
F
7
0
Locality2 swapped with Locality1 –
elements have changed levels
5.28.14 EERS = 8 (Type change for fielded input; for example, moving Sub-Building to
Building Level 2)
The EERS value of 8 is only set for fielded input. Processing the following address in batch mode
gives a process status of V2 and an EERS value of 8.
Input
Output
Organization = CENTRE GESTION AGREE
Organization = CENTRE GESTION AGREE
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
97
ENTREPRISES
ENTREPRISES
Building = IMPASSE DE PECHABOUT
Building = BOUTOULLE
Sub-building = BOUTOULLE
Street = IMPASSE DE PECHABOUT
House Number = 53
House Number = 53
Delivery Service Name = BP
Delivery Service Name =BP
Delivery Service Number = 40098
Delivery Service Number = 40098
Postal Code = 47003
Postal Code = 47003
Locality = AGEN
Locality = AGEN CEDEX
Country = FRANCE
Country = FRANCE
Process Status = V2: Verified – input data correct but some elements could not be verified because
of incomplete reference data.
Element
EIS
ERS
EERS
Relevance
Explanation
Street
6
F
8
1
The street data in the output was the
building data in the input
Building
6
F
8
0
The building data in the output was
the sub-building data in the input
5.28.15 EERS = A (Dominant match for dual address processing)
In this example the DualAddressPriority is set to Street. This yields an EERS status of A for the Street
element in batch mode.
Input
Output
3 Poplar St
PO BOX 2
PO BOX 2
3 Poplar St
New Haven 06513
New Haven CT 06513-4325
USA
USA
Process Status = C4: Corrected.
Element
EIS
ERS
EERS
Relevance
Explanation
Street
6
F
A
1
Street is the dominant match
5.28.16 EERS = B (relevance is only a country-wide default and cannot be trusted)
A value of B in the extended element result status output field indicates that the relevance value
cannot be trusted and is only a country-wide default value.
User interpretation:
Element empty, Relevance = 0, EERS = 0
No information about this element.
Element empty, Relevance = 0, EERS = B
This use case will not occur.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
98
Element empty, Relevance = 1
This use case must not happen; a missing
relevant element should lead to a rejection.
Non-empty element, Relevance = 0, EERS = 0
Element is definitely not relevant.
Non-empty element, Relevance = 1, EERS = 0
Element is definitely relevant.
Non-empty element, Relevance = 0, EERS = B
Element relevance is a country-wide default.
Non-empty element, Relevance = 1, EERS = B
Element relevance is a country-wide default.
5.28.17 EERS = C (Fast Completion Overflow)
In this example the MaxResultCount = 20 and the following elements are entered in the Fast
Completion mode.
Locality = New York
Country = USA
This results in an EERS value of C (Fast Completion Overflow) for the postal code element.
Process Status = Q1: Suggested address incomplete.
Element
EIS
ERS
EERS
Relevance
Explanation
Postal Code
0
8
C
1
More than 20 suggestions – overflow
available
5.28.18 EERS = D (Numeric for range expansion (interpolated))
This value for the EERS output field is assigned if the RangesToExpand parameter is set to “ALL”.
In the following example, the delivery service numeric range is 1-40. Only the interval limits of 1 and
40 are confirmed in the database. For all other 38 results, the EERS for Delivery Service = D in the
Interactive and Fast Completion modes.
Input
Delivery Service = Postfach
Postal Code = 91279
Locality = Kirchenthumbach
Country = Germany
RangesToExpand = “ALL”
Process Mode = Fast Complete
Process Status = Q3: Suggestions available – complete address
Element
EIS
ERS
EERS
Relevance
Explanation
Delivery
Service
0
8
D
1
The numbers 1 and 40 are confirmed
in the database. For all other 38
results the delivery service numbers
will be interpolated and the EERS
status is set to “D”
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
99
5.28.19 EERS = E (Language not available for the country, default language returned)
In this example the PreferredLanguage parameter = English, and the address is validated in batch
mode.
Input:
Output:
Koningstraat 4
Rue Royale 4
Brussels
1000 Brussels
Belgium
Brussels-Capitale
Belgium
Process Status = C4
Element
EIS
ERS
EERS
Relevance
Explanation
Street
4
7
E
1
Street not available in the preferred
language (English) therefore defaults
to Database (French)
Province
0
8
E
0
Province not available in the preferred
language (English) therefore defaults
to Database (French)
5.29
ResultPercentage Values
The “ResultPercentage” value gives an indication how similar a result is to the parsed input, values
close to 100% imply high similarity. They are mainly provided to allow for filtering out too extensive
corrections in records with Cx BATCH “ProcessStatus” value (see chapter 5.17) in master data
management environments with very stringent data quality requirements.
Also, “ResultPercentage” may be used to determine which INTERACTIVE results show the least
deviation from input. Informatica AddressDoctor discourages using “ResultPercentage” values for
any other use case scenarios than the two described above.
5.30
Language ISO Code Output
In situations where a result address contains data from the database, its language may be output via
the ResultData parameter LanguageISO3 as an ISO 639 3-letter code, i.e. “DEU” for German. For
transliterated output the original language will be reported , that is, “JPN” in case of romanized
Japanese output.
5.31
Address Types
Informatica AddressDoctor can populate the AddressType output field with a value that represents
the type of mailbox that the address identifies.
For United States addresses, Informatica AddressDoctor returns the address type values that the
United States Postal Service specifies. The United States Postal Service includes a Record Type value
in the reference data that it provides for domestic addresses. Mail carriers from other countries do
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
100
not specify an address type flag in the same manner, aside from New Zealand, which specifies a rural
address, and Canada, which specifies a large volume receiver.
For most countries, Informatica AddressDoctor uses a range of criteria to interpret the address type
from the validated address data. For example, Informatica AddressDoctor can recognize a mailbox at
an organization when the mailbox serves a large volume receiver.
Note that Informatica AddressDoctor cannot guarantee the accuracy of the address types when the
reference data does not contain address type information. For more information on the address
types in different countries, see the sections below.
5.31.1 Country-Specific Address Type Indicators
When the reference data for a country does not contain a formal address type designator,
Informatica AddressDoctor uses the data in the address to determine the address type. Informatica
AddressDoctor uses different data elements to assign address types to addresses from different
countries. When you read the address type values for a country that does not define address type
indicators, consider the criteria that Informatica AddressDoctor uses to infer an address type from
the address data.
Informatica AddressDoctor defines a range of criteria to infer the address types in the following
countries:

Australia

Canada

France

New Zealand
Informatica AddressDoctor also defines a set of criteria that infer the address type when you process
United States addresses in Fast Completion mode.
Address Type Indicators in Addresses from the United States
Informatica AddressDoctor returns the United States Postal Service address type for a United States
address when you perform validation in Batch, Certified, or Interactive mode.
The following table describes the address types that the United States Postal Service can specify for
United States addresses:
Address Type
Description
F
The address identifies an organization.
G
The address is a general delivery address. In a general delivery address, the
postal code and the recipient data identify the address.
H
The address identifies a high-rise building. The address contain sub-building
elements such as apartment or suite.
P
The address identifies a Post Office Box or a delivery service.
R
The address is a rural route/highway contract address.
S
The address identifies a street.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
101
Address Type
Description
U
Unidentified. The address is not valid, and Informatica AddressDoctor does not
assign an address type.
Address Type Indicators in Addresses from Australia
The following table lists the address types that Informatica AddressDoctor can return for Australia
addresses:
Address Type
Description
B
The address identifies a building.
F
The address identifies an organization.
L
The address post code identifies the organization as a large volume receiver. The
reference data adds or validates the organization name.
Informatica AddressDoctor can determine that the address is a large volume
receiver in one of the following ways:


The address post code identifies the organization as a large volume
receiver.
The reference data does not contain street or building information.
P
The address identifies a Post Office Box or a delivery service.
S
The address identifies a street. S is the default address type. If Informatica
AddressDoctor cannot determine the address type from the address data, it
returns the default value.
U
Unidentified. The address is not valid, and Informatica AddressDoctor does not
assign an address type.
If an address meets the criteria for more than one address type, Informatica AddressDoctor assigns
the first applicable address type from the following list:
L, F, P, B, S
Note: For Australia addresses, Informatica AddressDoctor can return information relevant to the
address type on other output elements. Consult the Process Status, Element Input Status, and
Element Result Status values.
Address Type Indicators in Addresses from Canada
The following table lists the address types that Informatica AddressDoctor can return for Canada
addresses:
Address Type
Description
B
The address identifies a building.
F
The address identifies an organization. In Canada addresses, the type F
addresses are a subset of the type L addresses. Therefore, the address type F
also indicates a large volume receiver.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
102
Address Type
Description
G
The address is a general delivery address. In a general delivery address, the
postal code and the recipient data identify the address. Informatica
AddressDoctor uses the delivery record in the reference data to identify the
address type.
L
The address post code identifies the organization as a large volume receiver. The
address might or might not contain an organization name.
P
The address identifies a Post Office Box or a delivery service.
R
The address identifies a rural route. Informatica AddressDoctor uses the delivery
record in the reference data to identify the address type.
S
The address identifies a street. S is the default address type. If Informatica
AddressDoctor cannot determine the address type from the address data, it
returns the default value.
U
Unidentified. The address is not valid, and Informatica AddressDoctor does not
assign an address type.
If an address meets the criteria for more than one address type, Informatica AddressDoctor assigns
the first applicable address type from the following list:
F, L, P, B, R, S, G
Address Type Indicators in Addresses from France
The following table lists the address types that Informatica AddressDoctor can return for France
addresses:
Address Type
Description
B
The address identifies a building.
F
The address identifies an organization. The address does not include a CEDEX
post code.
G
The address is a general delivery address. The reference data does not contain a
match for the street information, but the reference data contains a match for
the CEDEX post code in the address.
L
The address post code identifies the organization as a large volume receiver. The
address might or might not contain an organization name. The reference data
uses the CEDEX post code to add or validate the organization name.
P
The address identifies a Post Office Box or a delivery service.
S
The address identifies a street. S is the default address type. If Informatica
AddressDoctor cannot determine the address type from the address data, it
returns the default value.
U
Unidentified. The address is not valid, and Informatica AddressDoctor does not
assign an address type.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
103
If an address meets the criteria for more than one address type, Informatica AddressDoctor assigns
the first applicable address type from the following list:
L, F, P, B, S, G
Address Type Indicators in Addresses from New Zealand
The following table lists the address types that Informatica AddressDoctor can return for New
Zealand addresses:
Address Type
Description
B
The address identifies a building.
F
The address identifies an organization.
L
The address post code identifies the organization as a large volume receiver. The
reference data adds or validates the organization name.
Informatica AddressDoctor can determine that the address is a large volume
receiver in one of the following ways:


The address post code identifies the organization as a large volume
receiver.
The reference data does not contain street or building information.
P
The address identifies a Post Office Box or a delivery service.
R
The address identifies a rural route. Informatica AddressDoctor uses the delivery
record in the reference data to identify the address type.
S
The address identifies a street. S is the default address type. If Informatica
AddressDoctor cannot determine the address type from the address data, it
returns the default value.
U
Unidentified. The address is not valid, and Informatica AddressDoctor does not
assign an address type.
If an address meets the criteria for more than one address type, Informatica AddressDoctor assigns
the first applicable address type from the following list:
L, F, P, B, R, S
Address Type Indicators in United States Addresses in Fast Completion Mode
The following table lists the address types that Informatica AddressDoctor can return for United
States addresses in fast Completion mode:
Address Type
Description
B
The address identifies a building.
F
The address identifies an organization.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
104
L
The address post code identifies the organization as a large volume receiver.
The reference data adds or validates the organization name.
Informatica AddressDoctor can determine that the address is a large volume
receiver in one of the following ways:


The address post code identifies the organization as a large volume
receiver.
The reference data does not contain street or building information.
P
The address identifies a Post Office Box or a delivery service.
S
The address identifies a street. S is the default address type. If Informatica
AddressDoctor cannot determine the address type from the address data, it
returns the default value.
U
Unidentified. The address is not valid, and Informatica AddressDoctor does not
assign an address type.
If an address meets the criteria for more than one address type, Informatica AddressDoctor assigns
the first applicable address type from the following list:
L, F, P, B, S
Address Type Indicators for the Rest of the World
The following table lists the address types that Informatica AddressDoctor can return for all
countries that the preceding sections do not cover:
Address Type
Description
B
The address identifies a building.
F
The address identifies an organization.
L
The address post code identifies the organization as a large volume receiver.
The reference data adds or validates the organization name.
Informatica AddressDoctor can determine that the address is a large volume
receiver in one of the following ways:


The address post code identifies the organization as a large volume
receiver.
The reference data does not contain street or building information.
P
The address identifies a Post Office Box or a delivery service.
S
The address identifies a street. S is the default address type. If Informatica
AddressDoctor cannot determine the address type from the address data, it
returns the default value.
U
Unidentified. The address is not valid, and Informatica AddressDoctor does not
assign an address type.
If an address meets the criteria for more than one address type, Informatica AddressDoctor assigns
the first applicable address type from the following list:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
105
L, F, P, B, S
Note: Informatica AddressDoctor can return information relevant to the address type on other
output elements. Consult the Process Status, Element Input Status, and Element Result Status
values.
5.32
Return Codes
The use of Informatica AddressDoctor may result in success or error conditions signaled via return
codes.
All API functions return an AD_I32 (32 bit signed integer) return code value:

A value of 0 (zero) indicates success.

A negative value of -10000 or below indicates a very critical error, and further processing is
usually impossible. It is strongly advised to shut down the whole process, as it may be in an
instable state.

Negative values between -1 and -9999 indicate critical errors, and further processing may be
impossible.

A positive value of 1000 and above indicates non-critical errors, and further processing is
possible. Return code values between 1 and 999 have been assigned to warnings, indicating
possible issues with configuration settings, address input or output.
The return value must always be checked for by the calling logic. While it informs about fundamental
errors, the actual validation results are returned via separate API functions (see chapter 6.11).
Following are the most common error return codes, including an explanation (see the API
documentation in chapter 10.2 for a complete and up-to-date list):
5.32.1 Success
The operation was completed without error:
Code Description
0
OK, no error
5.32.2 Warnings
The operation was completed, but maybe with an unexpected result:
Code Description
1
The SetConfig.xml contained at least one corrupt unlock code
2
The SetConfig.xml contained at least one expired or not yet valid unlock code
3
The SetConfig.xml listed at least one database file which was not found
4
The SetConfig.xml listed at least one corrupt database file
5
The SetConfig.xml listed at least one database with a not supported version
6
The SetConfig.xml listed at least one database which is not supported (i.e. DEU CERTIFIED)
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
106
Code Description
7
No valid unlock code for a database file
8
The SetConfig.xml listed at least one database at least two times
9
The MaxMemoryUsageMB setting in SetConfig.xml was too small to fulfil all preloading
settings and/or the CacheSize setting
10
The environmental settings (for example, OperatingSystem) in at least one Unlock Code are
incompatible to the current machine
11
The SetConfig.xml contained at least one none supported type of unlock code
100
An input element or line which already had content was overwritten
101
The AddressComplete input has too many lines, extra lines will be ignored for further
processing
102
At least one character sequence of a string is not valid (i.e. contains control codes or does
violate some constraint); these sequences are replaces by spaces
200
The output buffer is too small, the output was written, but truncated
201
At least one character of the output could not be encoded in the chosen encoding, these
characters were replaced by an underscore ('_')
300
The engine usage period has expired or is not activated yet
301
The unlock code for a database file has expired or is not activated yet
400
Address lines and/or Address Complete were given on input; this part of the input was
ignored
401
More than 10 lines were given via FormattedAddressLines or AddressComplete as input; the
lines beyond 11 were ignored
500
The MaxResultCount in Parameters.xml was larger than the value in SetConfig.xml; for this
reason it was reduced to the value in SetConfig.xml
900
No database at all was found, probably because the path was wrong
901
No database at all was opened, probably because the path was wrong and/or no valid unlock
code was given
902
Error while attempting to open at least one of the extra CASS DBs
5.32.3 Errors
The requested operation was not executed:
Code Description
1000
A pointer parameter was NULL
1001
A function parameter was 0
1002
A NULL pointer to an object was used (not relevant for C API)
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
107
Code Description
1003
Two XMLs were given (as string and within a file), only one must be given
1004
The output buffer size is not valid (i.e. 0)
1005
Buffer misalignment, an AD_WCHAR* points to an odd address
1100
A parameter is out of range or illegal
1101
An XML string is invalid
1200
The character sequence of a string is not valid (i.e. contains control codes or does violate
some constraint)
1201
The encoding parameter did not match the character size of the API call, i.e. UCS2 (16 bit) vs.
char (8 bit)
1300
No SetConfig.xml was given as parameter for AD_Initialize
1301
The engine has already been initialized
1302
AD_DeInitialize() failed because not all AddressObjects have been released
1400
No AddressObject is available (all AddressObject handles have already been obtained via
AD_GetAddressObject())
1401
The passed AddressObject handle is not valid
1500
A database file has not been found
1501
A database file is invalid/corrupt
1502
No valid unlock code for a database file
1503
A database file has a non-supported version.
1600
A feature has not been unlocked
1700
The country could not be identified or is fundamentally unsupported
1701
The country is not supported for this this processing mode and type of input
1800
Results are available, for this reason no AO modification is allowed
1801
XML and direct API calls were used intermixed when setting the input data of an
AddressObject
1802
AD_Process() has not been called successfully, no result is available
1803
The attempted operation was invalid, i.e. trying to set incompatible address elements
1900
The result index parameter is out of range (must be >= 1)
1901
The output buffer is too small to hold the result, no output was written
5.32.4 Critical Errors
No further calls, except possibly AD_Initialize() or AD_DeInitialize() should be made to the
engine:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
108
Code
Description
-1300 The engine has not yet been initialized, need to call AD_Initialize()
-1600 No valid unlock code was given
-1601 The engine usage period has expired or is not activated yet
-1602 A clock inconsistency has been detected
-9900 A memory allocation request failed
-9901 A file operation failed
5.32.5 Very Critical Errors
Very Critical Errors should only occur under highly adverse circumstances. No further calls, except
possibly AD_DeInitialize() should be made to the engine - report that you actually encountered one
of these errors:
Code
Description
-10000
Some unknown exception has been thrown; this event should never occur
-10001
Some internal assertion has failed; this event should never occur
-10002
Some internal error has been encountered; this event should never occur
5.33
OptimizationLevel
Informatica AddressDoctor processing allows setting the “OptimizationLevel” attribute in
Parameters.xml (see the DTD in chapter 10.1) upon AD_Initialize() for controlling the trade-off
between processing speed and quality:
NARROW: The parser will honor input assignment strictly, with the exception of separation of
House Number from Street information.
STANDARD: The parser will separate address element more actively, for example:
o
Province will be separated from Locality information
o
PostalCode will be separated from Locality information
o
House Number will be separated from Street information
o
SubBuilding will be separated from Street information
o
DeliveryService will be separated from Street information
o
SubBuilding will be separated from Building information
o
Locality will be separated from PostalCode information
WIDE: Parser separation will happen similarly to STANDARD, but additionally up to 10 parsing
candidates will be passed to validation for processing. Validation will widen its search tree
and take additional reference data entries into account for matching.
Note that adjusting “OptimizationLevel” might have no effect for countries that lack the postal
reference data information required for the kind of separation described above.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
109
Obviously, increasing separation granularity from NARROW to DEFAULT already consumes some
processing power, but the major impact on processing speed here is from Informatica
AddressDoctor validation processing a larger search tree, thus increasing the number of data
accesses and comparisons for the “OptimizationLevel” WIDE, in an attempt to make the most out of
the input data given.
Thus a recommended batch usage pattern for Informatica AddressDoctor would be (assuming rather
low levels of address quality):
Run a quick sweep through your data using the COUNTRYRECOGNITION process mode (see
chapter 5.11.8) for separating out those records lacking country information, which might
have to be amended manually before further processing them.
Do a fast check of overall record quality using “OptimizationLevel” NARROW to identify the valid
or correctable records and separate out all records that have not resulted in a V or C
“ProcessStatus” value (see chapter 5.17).
Feed those problematic records back into Informatica AddressDoctor, processing them with
“OptimizationLevel” WIDE to see what might be salvaged, indicated by a V, C or I4
“ProcessStatus” value.
5.34
Preloading
Performance is often critical when deploying Informatica AddressDoctor with large databases.
Typically, the I/O subsystem is the slowest component in a system. As memory prices have fallen
sharply, users can now afford machines with a lot of installed memory.
To utilize the available memory for performance optimization, Informatica AddressDoctor offers the
“PreloadingType” attribute for each DataBase element. It allows loading Informatica AddressDoctor
reference databases (.md files) into the main memory of the computer.
The following preloading types are available
No preloading (PreloadingType="NONE" - the default)
Partial preloading (PreloadingType="PARTIAL")
Full preloading (PreloadingType="FULL")
Partial preloading will load the metadata and indexing structures into memory. The reference data
itself will remain on the hard drive. Partial preloading offers some performance enhancements and
is an alternative when not enough memory is available to fully load the desired databases.
Full preloading will move the entire reference database into memory. This may need a significant
amount of memory for countries with large databases such as the USA or the United Kingdom, but it
will increase the processing speed significantly.
However, there are conditions where full preloading can have a negative impact on speed. See
chapter 6.25 for details on this topic. Note that Informatica AddressDoctor itself requires additional
memory (see chapter 2.3) in addition to the memory used for preloading.
The “PreloadingType” attribute can be set per database as a configuration parameter of the
AD_Initialize() call of Informatica AddressDoctor. If no preloading type is explicitly set, the default
preloading (“NONE”) will be used.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
110
With version 5.1.4, Memory Mapped Files have been introduced as the new default preloading
mechanism (PreloadingMethod=”MAP” in SetConfig.xml, see Appendix 10.1 for the DTD). Even
though Informatica AddressDoctor continues to support the preloading method LOAD
(PreloadingMethod=”LOAD”), discourages the use of LOAD method for new deployments.
When using the default "MAP" method, the engine uses the file mapping mechanism of the
operating system. To actually force the file contents into memory, the data is touched (read) once
upon AD_Initialize. The "LOAD" mechanism on the other hand uses a memory allocation call
and then reads the .md file data into the allocated memory block (see chapter 5.37 also).
So in case enough physical memory is actually present, the behavior, including speed, is absolutely
identical (although not completely: The OS will typically write-protect mapped data, thereby possibly
masking certain bugs.). Specifically, in low memory conditions, the OS either starts discarding the
mapped data or swaps the loaded data out to disc.
Memory Mapping has two advantages over the LOAD preloading method:
In multi-process conditions (multiple processes running Informatica AddressDoctor using a
common set of .md files) the operating system will load the data into main memory only
once, thus sharing preloaded reference databases between separate processes.
The operating system will never write reference data contents to the paging file, in case of low
memory conditions (but they might get dropped from the file system cache; if the data is
needed later on, it is simply read from disk again).
However, due to larger alignment requirements of the OS, "MAP" will use up more virtual memory
space (2-3% more for all files). As "MAP" is the default, "PreloadingMethod" may be omitted if
enabling "MAP" is intended.
Since large amounts of memory may be allocated during preloading, with significant data amounts
moved into memory, it might take some time to load the databases into memory. Databases will be
preloaded in the order they are passed via SetConfig.xml (see the respective DTD in Appendix 10.1)
on AD_Initialize().
The following information is available through AD_GetConfigSettingsXML() to check which databases
have been successfully preloaded after the AD_Initialize() call:

CountryISO3

Type (BATCH_INTERACTIVE | FASTCOMPLETION | CERTIFIED | GEOCODING|
GEOCODING_ARRIVAL_POINT | GEOCODING_PARCEL_CENTROID | CAMEO|
ADDRESS_CODE_LOOKUP)

Path

Status

Size

Version

StartDate

ExpirationDate
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
111

UnlockStartDate

UnlockExpirationDate

ReleaseDate

DataDate

Encoding

PreloadingType (FULL | PARTIAL | NONE)

PreloadingSize
If a database could not be found or pre-loaded for some reason, the corresponding database ISO
code does not have such a Database section. For all pre-loaded databases there will be a Database
section which contains a “PreloadingType” attribute specifying the actual preloading type.
Resetting the preloading parameters after the AD_Initialize() function has been called is only
possible by issuing AD_DeInitialize() first (preceded by AD_ReleaseAllAddressObjects() for releasing
all AddressObjects, see chapter 6.1).
5.35
Caching
Caching reserves a certain portion of “MaxMemoryUsageMB” (see the SetConfig.xml DTD in
Appendix 10.1) for speeding up file system lookups in reference data that has not been preloaded.
Using the “CacheSize” attribute in SetConfig.xml (passed upon AD_Initialize()) the amount of
memory reserved in such a way may be controlled - valid settings are NONE, SMALL, LARGE. Using
the standard setting of “LARGE” is always recommended, unless all reference data needed is
preloaded (so that “NONE” may be used) or the memory footprint needs to be reduced via the
“SMALL” or “NONE” setting. However, “NONE” should be avoided, unless memory is really
extremely scarce.
The size of the cache may be determined through AD_GetConfigSettingsXML(): The actual size of the
cache may be less than requested, if not enough memory is available (i.e. “SMALL”, although
“LARGE” was requested).
5.36
Multithreading
Informatica AddressDoctor API is multi-threading-safe, any number of threads may call any of the
API functions at any time without having to fear a crash due to data corruption. However, it is strictly
to be avoided to call multiple API functions from different threads at the same time using the same
AddressObject; such a call sequence is typically a programming error.
While Informatica AddressDoctor enables benefitting from multi-core processor architectures, the
actual thread handling is strictly in the domain of the calling application: No threads are actually
created or destroyed by the engine and there are no API functions to process more than one
address.
The number of threads which the engine actually allows to process addresses in parallel (by calling
AD_Process() from a separate thread per address) is configurable (the default is 1); if more threads
than configured using “MaxThreadCount” (see SetConfig.dtd) call AD_Process() at the same time, the
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
112
additional threads are blocked until other threads currently calling AD_Process() return. Note that
such blocking only influences the timing and sequence of the AD_Process() calls, but not the data
processing and its outcome.
The figure below illustrates parallel address processing in a multi-threaded environment with n
calling threads, but “MaxThreadCount” set to 4:
Address n
Address 5
Address 4
Address 3
Address 2
Address 1
processing
waiting
waiting
AddressDoctor 5
Currently there is no similar limitation on calls to any other API functions which have an
AddressObject handle as parameter; however, a limit like for AD_Process() may be imposed in the
future. However, this would be totally transparent to the calling thread(s).
“MaxThreadCount” should normally not be set to a value larger than the number of available
cores/CPUs, possibly minus one (to allow for operating system overhead), as this is unlikely to
increase performance. For the moment, a practical maximum value for “MaxThreadCount” of 1024
is enforced in SetConfig.xml (see the DTD in chapter 10.1 for reference). If the maximum number of
AddressObjects as set by “MaxAddressObjectCount” is smaller than “MaxThreadCount”,
“MaxThreadCount” is internally reduced to the number specified by “MaxAddressObjectCount” as
no more parallel calls to AD_Process() could be made anyway.
The actual value of “MaxThreadCount” can be determined by calling AD_GetConfigSettingsXML().
It is recommended to set “MaxAddressObjectCount” to the number of threads set with
“MaxThreadCount”. However, depending on the implementation, 2 AddressObjects per thread are
necessary if a double-buffering mechanism is employed.
The largest performance gains (the best scalability) will be achieved in a multi-core environment
with full preloading for all accessed databases, as otherwise the multiple threads will be
blocked frequently by calls to the file system. In fact, this effect may become so dominant, that
the scalability in most relevant cases will be significantly reduced, if the accessed databases are
not preloaded.
Note: The term scalability refers to the speedup factor which is achieved when trying to utilize
additional cores/CPUs. Obviously the best possible speedup factor for N cores/CPUs as opposed to
using only one core would be N, that is, N-times more addresses could be processed per hour when
utilizing N cores/CPUs instead of one. In reality, such a perfect scaling is almost never achieved,
either because only parts of the called functions can operate in parallel or there is a contention for
some system resource(s) such as the front side bus, file system or memory allocation functions.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
113
The internal design of Informatica AddressDoctor allows a good or even very good scaling if the
computer system itself is designed appropriately (i.e. big large local caches for each core, fast
memory buses) and blocking due to file system contention is avoided.
5.37
Memory Management
Informatica AddressDoctor handles different types of objects, such as address objects, pre-loaded
reference address databases, and caches, in its memory. While making memory allocations for
Informatica AddressDoctor, you must consider these different objects that have specific memory
requirements.
You can divide the memory requirements of Informatica AddressDoctor into the following blocks:

General memory block. Used for general management functions. Typically, the general
memory block size is 7 MB.

Thread memory block. Used for address processing and validation routines. As many thread
memory blocks are created as the number of simultaneous threads your Informatica
AddressDoctor is configured to handle.

The size of a thread memory block is about 38 MB for 32-bit systems and 48 MB for 64-bit
systems.

Address object memory block. Used to store the address objects defined. As many address
object memory blocks are created as the number of address objects your Informatica
AddressDoctor is configured to handle at any given time.

The size of an address object memory block is about 3.7 MB + (0.24 MB x the value set for
MaxResultCount) in the case of 32-bit systems. For 64-bit systems, the size of an address
object memory block is about 4.8 MB + (0.24 MB x the value set for MaxResultCount).

Memory block reserved for caching. Informatica AddressDoctor reserves one cache memory
block for each of the validation or processing threads.

Memory blocks for preloading reference address databases. Memory requirement for
preloading reference address databases. This value for this depends on the number and size
of the databases that you want to preload.

Unallocated memory block.
The following figure gives a schematic overview of the memory layout used for those different
object types:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
114
General Memory Block
…
Thread 1 Memory Block
Thread z Memory Block
…
AddressObject 1 Memory Block
AddressObject y Memory Block
Memory Block* reserved for Caching
Unallocated Memory Space
Preloaded Country ISOx
Preloaded Country ISO1
…
MaxMemoryUsageMB
You can configure the MaxMemoryUsageMB parameter to specify the maximum available memory
for Informatica AddressDoctor. Memory allocation for different blocks are controlled by the values
you set for the following parameters:

MaxThreadCount. The maximum number of threads that Informatica AddressDoctor can
process simultaneously. The value set for this parameter controls the number of thread
memory blocks and thus, the total memory allocation for the thread blocks.

MaxAddressObjectCount. The maximum number of address objects that Informatica
AddressDoctor can store. You can set a maximum of double the number configured for

MaxThreadCount. The value you set for MaxAddressObjectCount controls the number of
address object memory blocks and thus, the total memory allocation for the address objects.

CacheSize. The memory reserved for caching purpose. If the CacheSize parameter is set to
None, no memory is allocated for caching. When CacheSize is set to Small, Informatica
AddressDoctor allocates 0.4 MB of cache memory block for each of the threads. When the
CacheSize is set to Large, Informatica AddressDoctor allocates 0.75 MB of cache memory
block for each of the threads. For example, if MaxThreadCount is set to 4 and CacheSize to
Small, Informatica AddressDoctor allocates a total of 1.6 MB for cache memory block.
5.37.1 Calculating Memory Requirements
If the Informatica AddressDoctor configuration on a 32-bit system includes MaxThreadCount=4,
MaxAddressObjectCount=8, CacheSize=SMALL, and MaxResultCount=20, you can calculate the
dynamic memory requirement as follows:
7 + 8 x (3.7 + 20 x 0.24) + 4 x (38+0.4) = 228.6 MB where 7 is the general memory block size;
3.7, the size of an address object memory block; 20, the value set for MaxResultCount; 0.24,
the size of a result object; 4, the number of threads; 38 the size of the thread block; and 0.4,
the cache memory block size when CacheSize is set to small.
To calculate the total memory requirement, add the total size of the reference address
databases that you want to preload.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
115
If the Informatica AddressDoctor configuration on a 64-bit system includes MaxThreadCount=6,
MaxAddressObjectCount=6, CacheSize=LARGE, and MaxResultCount=100, you can calculate the
dynamic memory requirement as follows:
7 + 6 x (4.8 + 100 x 0.24) + 6 x (48 + 0.75) = 472.3 MB where 7 is the general memory block
size; 4.8, the size of an address object memory block; 100, the value set for MaxResultCount;
0.24, the size of a result object; 6, the number of threads; 48, the size of the thread block;
and 0.75, the cache memory block size when CacheSize is set to large.
To calculate the total memory requirement, add the total size of the reference address
databases that you want to preload.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
116
6. How do I…
6.1 …initialize Informatica AddressDoctor?
or AD_InitializeW() must be called to actually initialize the engine: It evaluates the
settings and configures the engine accordingly (see chapter 5.6 for an overview). Only after this
function has returned successfully may AD_GetAddressObject() or any other functions be called. If the
engine was not initialized properly, all Informatica AddressDoctor API functions will produce a return
code of -1300 (see chapter 5.32 for reference).
AD_Initialize()
For example:
AD_Initialize(
"<?xml version='1.0' encoding='iso-8859-1' ?>\n"
"<!DOCTYPE SetConfig SYSTEM 'SetConfig.dtd'>\n"
"<SetConfig>\n"
"<General />\n"
"<UnlockCode>(Enter Code here)</UnlockCode>\n"
"<DataBase CountryISO3='ALL' Type='BATCH_INTERACTIVE' Path='/ADDB'
PreloadingType='NONE'/>\n"
"</SetConfig>\n",
NULL,
NULL,
NULL
);
Or in Java:
// Initialize the Engine using the 'Direct' API
AddressDoctor.initialize(
"<?xml version='1.0' encoding='UTF-16LE'?>\n" +
"<!DOCTYPE SetConfig SYSTEM 'SetConfig.dtd'>\n" +
"<SetConfig>\n" +
"<General WriteXMLEncoding='UTF-16LE' />\n" +
"<UnlockCode>(Enter Code here)</UnlockCode>\n" +
"<DataBase CountryISO3='ALL' Type='BATCH_INTERACTIVE' Path='/ADDB'
PreloadingType='NONE'/>\n" +
"</SetConfig>",
null,
null,
null
);
Alternatively, the SetConfig XML string can be stored in an external file. In this case, the initialize call
looks like this:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
117
AD_Initialize(
NULL,
"SetConfig.xml",
NULL,
NULL
);
Or in Java:
// Initialize the Engine using the 'XML' API
AddressDoctor.initialize(
null,
“SetConfig.xml”,
null,
null
);
The following return codes (see section 5.32 for an explanation of return codes) are typical warnings
and errors returned by AD_Initialize():














AD_SC_WRN_INIT_UNLOCKCODE_CORRUPT (1)
AD_SC_WRN_INIT_UNLOCKCODE_EXPIRED (2)
AD_SC_WRN_INIT_DB_NOT_FOUND (3)
AD_SC_WRN_INIT_DB_CORRUPT (4)
AD_SC_WRN_INIT_DB_UNSUPPORTED_VERSION (5)
AD_SC_WRN_INIT_DB_NOT_SUPPORTED (6)
AD_SC_WRN_INIT_DB_NOT_UNLOCKED (7)
AD_SC_WRN_INIT_MULTIPLE_DB_ENTRIES (8)
AD_SC_WRN_MAXMEMORYUSAGE_TOO_SMALL (9)
AD_SC_WRN_ INIT_UNLOCKCODE_ENVIRONMENT_MISMATCH (10)
AD_SC_WRN_MAXRESULTCOUNT_REDUCED (500)
AD_SC_ERR_INIT_NO_DB_FOUND (900)
AD_SC_ERR_INIT_NO_DB_OPENED (901)
AD_SC_ERR_EXTRA_CASS_DBS_ERROR (902)
If one of these codes is returned, the engine is initialized - however, some potential problem
occurred which needs to be investigated: For that purpose, retrieving GetConfig.xml via
AD_GetConfigSettingsXML() is strongly advised, as its contents provide additional information about
problems with unlock codes and / or database files.
AD_DeInitialize() must be called last to de-initialize the engine; the engine is then ready to be
initialized again: All AddressObjects must have been released by calling AD_ReleaseAddressObject()
or AD_ReleaseAllAddressObjects() before calling AD_DeInitialize(), see chapter 4.1 for a full
example (including Java).
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
118
6.2 …determine Informatica AddressDoctor version?
can be called at any time, even before calling AD_Initialize() / AD_InitializeW(),
to retrieve the zero-terminated engine version string in the format x.x.x.x, i.e. "5.0.0.251".
AD_GetVersion()
6.3 …specify processing or input parameters and a result format?
An AddressObject has a result format configuration, for possible attributes see Parameters.dtd in
chapter 10.1 (and the most common parameters are described starting with chapter 5.12). These
parameter attributes can be set for each AddressObject individually by calling AD_SetParametersXML().
For example:
AD_SetParametersXML(hAOHandle,
"<?xml version='1.0' encoding='iso-8859-1' ?>\n"
"<!DOCTYPE Parameters SYSTEM 'Parameters.dtd'>\n"
"<Parameters>\n"
"<Process Mode='BATCH' />\n"
"<AddressElementStandardize>\n"
"<Country Casing='UPPER' />\n"
"</AddressElementStandardize>\n"
"</Parameters>\n",
NULL,
NULL
);
Or in Java:
// This code assumes you’ve already acquired m_oAO as the active AddressObject
m_oAO.setParametersXML(
"<?xml version='1.0' encoding='UTF-16LE' ?>\n" +
"<Parameters>\n" +
"<Process Mode='BATCH'/>\n" +
// Java uses UTF-16LE as default encoding for its String method
"<Input Encoding='UTF-16LE'/>" +
"<Result Encoding='UTF-16LE'/>" +
"<AddressElementStandardize> \n" +
"<Country Casing='UPPER' />\n" +
"</AddressElementStandardize> \n" +
"</Parameters>",
null
);
As shown for SetConfig.xml in chapter 6.1, alternatively a file name may be provided:
AD_SetParametersXML( hAOHandle,
NULL,
NULL,
"Parameters.xml"
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
119
);
Or in Java:
// This code assumes you’ve already acquired m_oAO as the active AddressObject
m_oAO.setParametersXML(
null,
"Parameters.xml"
);
Instead of setting attributes for each AddressObject individually, Parameters.xml may already be
passed on AD_Initialize() (refer to the API documentation in chapter 10.2 for details), thus applying
global defaults to all AddressObjects that do not have individual parameters set via the method
described above.
For example:
AD_Initialize(
"<?xml version='1.0' encoding='iso-8859-1' ?>\n"
"<!DOCTYPE SetConfig SYSTEM 'SetConfig.dtd'>\n"
"<SetConfig>\n"
"<General />\n"
"<UnlockCode>(Enter Code here)</UnlockCode>\n"
"<DataBase CountryISO3='ALL' Type='BATCH_INTERACTIVE' Path='/ADDB'
PreloadingType='NONE'/>\n"
"</SetConfig>\n",
NULL,
"<?xml version='1.0' encoding='iso-8859-1' ?>\n"
"<!DOCTYPE Parameters SYSTEM 'Parameters.dtd'>\n"
"<Parameters>\n"
"<Process Mode='BATCH' />\n"
"<AddressElementStandardize>\n"
"<Country Casing='UPPER' />\n"
"</AddressElementStandardize>\n"
"</Parameters>\n",
NULL
);
Or in Java:
AddressDoctor.initialize(
"<?xml version='1.0' encoding='UTF-16' ?>"+
"<!DOCTYPE SetConfig SYSTEM 'SetConfig.dtd'>"+
"<SetConfig><General WriteXMLEncoding='UTF-16' />"+
"
<UnlockCode>(Enter Code here)</UnlockCode>"+
"
<DataBase CountryISO3='ALL' Type='BATCH_INTERACTIVE'"+
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
120
"
Path='/ADDB' PreloadingType='NONE' />"+
"</SetConfig>", null,
"<?xml version='1.0' encoding='UTF-16' ?>"+
"<!DOCTYPE SetConfig SYSTEM 'Parameters.dtd'>"+
"<Parameters WriteXMLEncoding='UTF-16'>"+
"
<Input Encoding='UTF-16' />"+
"
<Result Encoding='UTF-16' />"+
"</Parameters>", null);
Again, the Parameters XML string can be stored in an external file (as is the case for SetConfig, see
above). Then the AD_Initialize() call would look like the following:
AD_Initialize(
NULL,
"SetConfig.xml",
NULL,
"Parameters.xml"
);
Or in Java:
AddressDoctor.initialize(
"<?xml version='1.0' encoding='UTF-16' ?>"+
"<!DOCTYPE SetConfig SYSTEM 'SetConfig.dtd'>"+
"<SetConfig><General WriteXMLEncoding='UTF-16' />"+
"
<UnlockCode>(Enter Code here)</UnlockCode>"+
"
<DataBase CountryISO3='ALL' Type='BATCH_INTERACTIVE'"+
"
Path='/ADDB' PreloadingType='NONE' />"+
"</SetConfig>", null,
"<?xml version='1.0' encoding='UTF-16' ?>"+
"<!DOCTYPE SetConfig SYSTEM 'Parameters.dtd'>"+
"<Parameters WriteXMLEncoding='UTF-16'>"+
"
<Input Encoding='UTF-16' />"+
"
<Result Encoding='UTF-16' />"+
"</Parameters>", null);
Note that adjusting parameters might have no effect for countries that lack the postal reference
data information required for their making a difference, examples would be “OptimizationLevel”
(chapter 5.33), “PreferredLanguage” (chapter 5.12.2) or “MatchingScope” (chapter 5.12.5). For a
reference on country coverage see:
http://www.addressdoctor.com/en/countries-data/country-list.html
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
121
6.4 …handle unlock codes?
To validate addresses from a country an unlock code is required. Each code unlocks a number of
countries for a specified time. The unlock code is passed to the initialize function (see chapter 6.1).
The function will return an error if the code is no longer valid or not correct at all. For more details
on error return codes see 5.32. Note that separate unlock codes for validation, Address Code
Lookup, supplementary, geocoding, and CAMEO databases are required.
The use of multiple unlock codes is supported. The unlock codes have to be passed to the initialize
function one after another, as shown in the example below.
If there is more than one unlock code for a country the one with the longest valid date is used.
Outdated unlock codes are ignored as long as there is one code that is still valid, for example:
Code A unlocks DEU and USA validation until 31.12.2009
Code B unlocks CHE and USA validation until 31.12.2010
Code C unlocks CHE and USA geocoding until 31.12.2010
In this case DEU validation will be unlocked until 31.12.2009 while CHE and USA validation and
geocoding continue to be unlocked until 31.12.2010. Note that unlock codes also carry a start date
and will be invalid before that date, information on unlock codes may be queried using
AD_GetConfigSettingsXML(), see the chapter 6.6 for details.
The following very simple code example shows how to use multiple unlock codes:
AD_Initialize(
"<?xml version='1.0' encoding='iso-8859-1' ?>\n"
"<SetConfig>\n"
"<General />\n"
"<UnlockCode>(Enter Code A here)</UnlockCode>\n"
"<UnlockCode>(Enter Code B here)</UnlockCode>\n"
"<UnlockCode>(Enter Code C here)</UnlockCode>\n"
"<DataBase CountryISO3='USA' Type='GEOCODING' Path='/ADDB' PreloadingType='NONE' />\n”
"<DataBase CountryISO3='CHE' Type='GEOCODING' Path='/ADDB' PreloadingType='NONE' />\n”
"<DataBase CountryISO3='ALL' Type='BATCH_INTERACTIVE' Path='/ADDB' PreloadingType='NONE'
/>\n"
"</SetConfig>\n",
NULL,
NULL,
NULL
);
Or in Java:
AddressDoctor.initialize(
"<?xml version='1.0' encoding='UTF-16LE'?>\n" +
"<!DOCTYPE SetConfig SYSTEM 'SetConfig.dtd'>\n" +
"<SetConfig>\n" +
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
122
"<General WriteXMLEncoding='UTF-16LE'/>\n" +
// Engine & DB Unlock Code
"<UnlockCode>(Enter Code A here)</UnlockCode>\n" +
"<UnlockCode>(Enter Code B here)</UnlockCode>\n" +
"<UnlockCode>(Enter Code C here)</UnlockCode>\n" +
"<DataBase CountryISO3='USA' Type='GEOCODING' Path='/ADDB' PreloadingType='NONE'/>\n" +
"<DataBase CountryISO3='CHE' Type='GEOCODING' Path='/ADDB' PreloadingType='NONE'/>\n" +
"<DataBase CountryISO3='ALL' Type='BATCH_INTERACTIVE' Path='/ADDB'
PreloadingType='NONE'/>\n" +
"</SetConfig>",
null, null, null);
6.5 …configure reference databases?
While for convenience reasons the virtual ISO code “ALL” is provided for defining default settings,
you may adjust paths and pre-loading settings (see chapter 5.34) for each country reference
database type separately. The following lines would be an example of a non-trivial SetConfig.xml
(see chapter 6.1 also and the DTD in appendix 10.1):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE SetConfig SYSTEM "C:/AddressDoctor/DTD/SetConfig.dtd">
<SetConfig>
<General WriteXMLEncoding="UTF-16" MaxMemoryUsageMB="2048" MaxAddressObjectCount="10"
MaxThreadCount="2"></General>
<UnlockCode>Address Validation Unlock Code</UnlockCode>
<UnlockCode>Geocoding Unlock Code</UnlockCode>
<DataBase CountryISO3="USA" Type="CERTIFIED" Path="C:/AddressDoctor/DB/CASS"
PreloadingType="FULL"></DataBase>
<DataBase CountryISO3="CAN" Type="CERTIFIED" Path="C:/AddressDoctor/DB/SERP"
PreloadingType="FULL"></DataBase>
<DataBase CountryISO3="USA" Type="SUPPLEMENTARY" Path="C:/AddressDoctor/DB/Enrichment"
PreloadingType="PARTIAL"></DataBase>
<DataBase CountryISO3="GBR" Type="SUPPLEMENTARY" Path="C:/AddressDoctor/DB/Enrichment"
PreloadingType="PARTIAL"></DataBase>
<DataBase CountryISO3="ALL" Type="GEOCODING" Path="C:/AddressDoctor/DB/Geocoding"
PreloadingType="NONE"></DataBase>
<DataBase CountryISO3="ALL" Type="CAMEO" Path="C:/AddressDoctor/DB/CAMEO"
PreloadingType="NONE"></DataBase>
Path="C:/AddressDoctor/DB"
<DataBase CountryISO3="ALL" Type="BATCH_INTERACTIVE"
PreloadingType="NONE"></DataBase>
<DataBase CountryISO3="ALL" Type="FASTCOMPLETION" Path="C:/AddressDoctor/DB"
PreloadingType="NONE"></DataBase>
</SetConfig>
Note that any country specific settings must precede the “DataBase” elements with
CountryISO3=”ALL” to be actually applied (the effective database settings and their unlock status
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
123
may be verified through GetConfig.xml, as described in chapter 6.6; specifically check for “DataBase”
element entries with Status attributes other than "ACTIVE"). In case of conflicting “DataBase”
elements, the first occurrence in SetConfig.xml will always have precedence.
There are a few notable deviations from that standard behaviour for most countries:
For CountryISO3=”USA” and Type=”CERTIFIED” the GetConfig.xml output will only list a subset of the
available reference database files as “DataBase” element with Type=”CERTIFIED”. Also, USA5BI.md
must always be available, as this database is the basis for CASS processing (see chapter 6.24) also.
Furthermore, US CERTIFIED mode will not work without pre-loading and full pre-loading will always
be enforced on some of the CASS databases described in chapter 3.3.2, irrespectively of the settings.
For CountryISO3=”CAN” and Type=”CERTIFIED” the GetConfig.xml output will now list CAN5C1.MD
as the database, as there is now a separate database necessary for certified processing (see chapter
6.24.2).
For CountryISO3=”JPN” and Type=”FASTCOMPLETION” the GetConfig.xml output will only list one
“DataBase” element with Type=”BATCH_INTERACTIVE” also, due to a slightly different internal
database layout.
6.6 …determine the current engine settings?
On the global Informatica AddressDoctor level, calling AD_GetConfigSettingsXML() will return a
GetConfig.xml with the engine configuration, which has been set upon calling AD_Initialize(), see
chapter 6.1.
Accordingly, calling AD_GetParametersSettingsXML() will return a Parameters.xml with the engine
default set of parameters, which again have been set upon calling AD_Initialize(). In contrast, the
parameters effectively applied when processing each AddressObject (which may well be identical to
these global settings unless explicitly set using AD_SetParametersXML(), see the preceding chapter 6.3)
can be queried via AD_GetParametersXML(). See the API reference in chapter 10.2 for details.
6.7 ...assign an address to the AddressObject?
In order to achieve the best possible processing, it is important to understand the structure of your
input data. One can then decide on the best way to input the data into Informatica AddressDoctor
AddressObject (see chapter 5.7).
In general, address data will exist as one of the following:

Fielded data. In some databases, particularly ones driven by direct input all the data may be
fielded (for example, Street, City, State, ZIP/PostCode are all stored in individual fields).
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
124

Partially fielded data. In many databases, address data has been partially broken out. For
example a separate state or postal code field. But some of the address is left in generic “address
lines”.

Unfielded data. This could be data derived from scanning address labels for example.
6.7.1 General Overview
If an old input address is still present, it must be cleared by a call of AD_ClearData(). The input
address data can be set either with the direct API functions or via the AD_SetInputDataXML() function.
Example for fielded input (direct API):
AD_SetInputAddressElement( hAOHandle, "Country", 1, NULL, "Canada" );
AD_SetInputAddressElement( hAOHandle, "PostalCode", 1, NULL, "G1R 3X2" );
AD_SetInputAddressElement( hAOHandle, "Locality", 1, NULL, "Toronto" );
AD_SetInputAddressElement( hAOHandle, "DeliveryService", 1, NULL, "PO Box 1827" );
Or in Java:
m_oAO.setInputAddressElement("Country", 1, null, "Canada");
m_oAO.setInputAddressElement("PostalCode", 1, null, "G1R 3X2");
m_oAO.setInputAddressElement("Locality", 1, null, "Toronto");
m_oAO.setInputAddressElement("DeliveryService", 1, null, "PO Box 1827");
Example for fielded input (XML API):
AD_SetInputDataXML( hAOHandle,
"<?xml version='1.0' encoding='ISO-8859-1'?>\n"
"<!DOCTYPE InputData SYSTEM 'InputData.dtd'>\n"
"<InputData>\n"
"<AddressElements>\n"
"<Country Item='1' Type='NAME'>SGP</Country>\n"
"<Locality Item='1' Type='COMPLETE'>Singapore</Locality>\n"
"<PostalCode Item='1' Type='FORMATTED'>048624</PostalCode>\n"
"<Street Item='1' Type='COMPLETE'>Raffles Place</Street>\n"
"<Number Item='1' Type='COMPLETE'>80</Number>\n"
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
125
"<Building Item='1' Type='COMPLETE'>#50-01 UOB Plaza 1</Building>\n"
"<Organization Item='1' Type='NAME'>AddressDoctor GmbH</Organization>\n"
"</AddressElements>\n"
"</InputData>\n"
);
For comparison in Java:
m_oAO.setInputDataXML(
"<?xml version='1.0' encoding='UTF-16'?>"+
"<!DOCTYPE InputData SYSTEM InputData.dtd'>"+
"<InputData>"+
"<AddressElements>"+
"
<Key>4711</Key>"+
"
<Country Item='1' Type='NAME'>SGP</Country>"+
"
<Locality Item='1' Type='COMPLETE'>Singapore</Locality>"+
"
<PostalCode Item='1' Type='FORMATTED'>048624</PostalCode>"+
"
<Street Item='1' Type='COMPLETE'>Raffles Place</Street>"+
"
<Number Item='1' Type='COMPLETE'>80</Number>"+
"
<Building Item='1' Type='COMPLETE'>#50-01 UOB Plaza 1</Building>"+
"
<Organization Item='1' Type='NAME'>AddressDoctor GmbH</Organization>"+
"</AddressElements>"+
"</InputData>");
6.7.2 Fielded address input
Fully fielded addresses will typically provide the most reliable results when cleansing an address.
Even in databases that have the address components in separate columns it is not uncommon to
have the house number and the street name in the same field.
The structure may look like this:
COUNTRY
FIRSTNAME
NAME
STREET
TOWN
STATE
United
States
Mark
Myers
7563 Bangor Ave
Hesperia
CA
United
States
Istvan
Edgars
87 MILL LN
New York
NY
ZIPCODE
10123
An address is still considered to be fielded when house number and street name reside in the same
field. In this case the field containing the house number and street name may be assigned to the
Street attribute together.
To support environments where databases contain address data broken into discrete fields
Informatica AddressDoctor allows direct input of each address component (including addressing
information such as contact, organization, and so on) via the “AddressElements” element of
InputData.xml (see DTD in chapter 10.1). Possible address elements are: Key, Country, Locality,
PostalCode, Province, Street, Number, Building, SubBuilding, DeliveryService, Organization and
Contact.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
126
Though the data is input to specific address elements, Informatica AddressDoctor can still perform
parsing in case the data has been stored in the incorrect fields (depending on the
“OptimizationLevel” chosen, see chapter 5.33). Incorrect fielding of data is particularly common for
international addresses. If there is a high level of incorrect fielding it may be desirable to explore
other input strategies (pre-processing to correct the fielding, concatenation and partially structured
input, and so on).
Note that InputData.xml does not only allow assigning an item attribute to each input address
element but even supports flagging each of these items with corresponding type information, down
to address sub-element level (see chapter 5.10 for reference).
For example, available types for the “Contact” address element or sub-elements are:
COMPLETE (element), FIRST_NAME (sub-element), MIDDLE_NAME (sub-element),
LAST_NAME (sub-element), NAME (sub-element), TITLE (sub-element), FUNCTION (subelement), SALUTATION (sub-element) and GENDER (sub-element)
Or for “Organization”:
COMPLETE (element), NAME (sub-element), DESCRIPTOR (sub-element) and DEPARTMENT
(sub-element).
Consequently, you might either want to assign “AddressDoctor GmbH Support” as one
“Organization” address element item of type “COMPLETE” or in sub-element items of:
“AddressDoctor” with type “NAME”, “GmbH” with type “DESCRIPTOR” and “Support” with type
“DEPARTMENT”. That such type attributes are provided on input for each address sub-element (i.e.
item) is absolutely crucial for correct output formatting in the case of “Contact” and “Organization”
addressing information, which is not covered by postal reference data.
See the DTD in chapter 10.1 for a complete and up-to-date list of all the “AddressElements” item
types supported by Informatica AddressDoctor, noting the limitations described in chapter 5.10.
6.7.3 Partially fielded address input
Often databases contain contact information separate from address data. But the address itself is
broken into “address lines”.
For example:
COUNTRY CUSTOMER
ADDRESS_LINE_1
ADDRESS_LINE_2
CITY
STATE
ZIP
USA
John Smith
7563 Bangor Ave
Suite 107
Hesperia
CA
92345
USA
Vlad Marcos
Acme Products
3198 MARINO ST
El Paso
TX
79925
In this case the address data is input using the fielded address elements where possible (for
example, Contact, Province, Locality, Country, PostalCode), and then the “AddressLines” element of
Input.xml (see DTD in chapter 10.1) is used to input the remaining data. Typically, that will involve
filling the “DeliveryAddressLines” (DAL) sub-element with input data, but in case data is available in
that specific format, using the “RecipientLine” and “CountrySpecificLocalityLine” (CSLLN) subelements is also possible. As in the case of fully fielded address data, when data has been partially
broken out, the best results are obtained by assigning that data to the appropriate address element.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
127
6.7.4 Unfielded address input
Since unfielded data has no explicit structure (other than line feeds) this input is the most flexible.
However, for the same reason, it will also produce the least reliable results.
To populate an AddressObject with unfielded data, developers will need to use the
“AddressComplete” element of Input.xml. The address is simply passed to “AddressComplete” as a
set of strings separated by line feeds (see the following example).
To return the best results it is important to set the most appropriate of the following “FormatType”
attribute (see chapter 5.13 for details) of the Input element:






ALL
ADDRESS_ONLY
WITH_ORGANIZATION
WITH_CONTACT
WITH_ORGANIZATION_CONTACT
WITH_ORGANIZATION_DEPARTMENT
The use of “AddressComplete” must not be combined with other address input, except for
“Country”. In addition, better results will be obtained if the addresses resemble at least some of the
structure used in the respective country.
As an example,
John Smith
7563 Bangor Ave
Hesperia CA 92345
USA
yields significantly better results than:
John Smith
7563
Bangor Ave
Hesperia
CA
92345
USA
A typical database structure might look like this:
ADDRESS_1
ADDRESS_2
ADDRESS_3
ADDRESS_4
ADDRESS_5
ADDRESS_6
John Smith
7563 Bangor
Ave
Suite 107
Hesperia CA
92345
USA
AddressDoctor
GmbH
Steffen
Niehues
Röntgenstr. 9
67133 Maxdorf
Deutschland
Vlad Marcos
c/o Acme
Products
123 Main Street
#12
El Paso TX
79925
United States
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
128
In case of such data, the FormattedAddressLine (FAL) sub-element of AdddressLines might be a more
appropriate alternative, which allows for input of up to 19 unfielded address lines
6.8 …validate an address?
The AddressObject must have been filled with an input address. Processing an address is achieved by
calling AD_Process(). The process mode must have been set to BATCH, INTERACTIVE,
FAST_COMPLETION, CERTIFIED, or ADDRESSCODELOOKUP, otherwise the default processing mode
BATCH is used. Detailed results are retrieved by specific API functions, see below in chapter 6.11.
Example for C:
AD_Process( hAOHandle );
And for Java:
AddressDoctor.process(m_oAO);
6.9 …parse an address?
The AddressObject must have been filled with an input address. Processing an address is achieved by
calling AD_Process(). The process mode must have been set to PARSE. Detailed results are retrieved
by specific API functions, see below in chapter 6.11.
For example:
AD_SetParametersXML( hAOHandle,
"<?xml version='1.0' encoding='iso-8859-1' ?>\n"
"<!DOCTYPE Parameters SYSTEM 'Parameters.dtd'>\n"
"<Parameters>\n"
"<Process Mode='PARSE'/>\n"
"</Parameters>\n",
NULL
);
AD_Process( hAOHandle );
Or in Java:
m_oAO.setParametersXML(
"<?xml version='1.0' encoding='UTF-16LE' ?>\n" +
"<Parameters>\n" +
"<Process Mode='PARSE'/>\n" +
// Java uses UTF-16LE as default encoding for its String method
"<Input Encoding='UTF-16LE'/>" +
"<Result Encoding='UTF-16LE'/>" +
"</Parameters>",
null);
AddressDoctor.process(m_oAO);
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
129
6.10
…check the process mode?
After AD_Process() has been called, the “ModeUsed” attribute of the “Result” element will allow
checking that the process mode used was actually the one intended (see chapter 5.11 for possible
process mode fallbacks):
char sResultParameters[32];
AD_GetResultParameter(hAOHandle,"ModeUsed",sResultParameters,sizeof(sResultParameter));
Or in Java:
System.out.println(m_oAO.getResultParameter("ModeUsed"));
6.11
…retrieve a suggested correction?
must have been called upfront to process the input address, only then results are
available: The return code of AD_Process() already gives some indication of fatal errors (for example,
country not identified – see section 5.32 on return codes).
AD_Process()
When using the direct API, the first step before calling AD_GetResultAddressElement() is always
retrieving the number of results first by calling
AD_GetResultCount()
, while the number of items or lines for a specific item can be retrieved by calling
AD_GetResultAddressElementItemCount()
or
AD_GetResultAddressLineCount()
, respectively.
Example (direct API, no error handling):
AD_U32 ulNumResults;
size_t stCurResult;
AD_GetResultCount( hAOHandle, &ulNumResults );
for( stCurResult = 1; stCurResult <= ulNumResults; stCurResult++ )
{
char sStreet[ 256 ];
AD_U32 ulNumItems;
size_t stCurItem;
AD_GetResultAddressElementItemCount( hAOHandle, 1, "Street", &ulNumItems );
for( stCurItem = 1; stCurItem <= ulNumItems; stCurItem++ )
{
AD_GetResultAddressElement( hAOHandle, stCurResult, "Street", stCurItem,
"COMPLETE", sStreet, sizeof( sStreet ) );
printf( "Result %u: Street item %u: %s\n", stCurResult, stCurItem, sStreet );
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
130
}
}
Or in Java:
int NumResults = m_oAO.getResultCount();
int CurResult;
for (CurResult = 1; CurResult <= NumResults; CurResult++) {
int NumItems = m_oAO.getResultAddressElementItemCount(CurResult, "Street");
int CurItem;
for (CurItem = 1; CurItem <= NumItems; CurItem++) {
System.out.println(m_oAO.getResultAddressElement(CurResult, "Street", CurItem,
"COMPLETE"));
}
}
Example (C XML API, no error handling):
char sResultXML[ 16 * 1024 ];
AD_GetResultXML( hAOHandle, sResultXML, sizeof( sResultXML ) );
Example (Java XML API, no error handling):
String sResultXML = "";
sResultXML = m_oAO.getResultXML();
6.12
...retrieve the result status and additional information?
For the direct API, AD_GetResultParameter() will return more detailed processing result information
(see the code shown in chapter 6.10 for another example), for example, the process status value
explained in chapter 5.17:
char sResultParameters[32];
AD_GetResultParameter(hAOHandle,"ProcessStatus",sResultParameters,
sizeof(sResultParameter));
Or in Java:
System.out.println(m_oAO.getResultParameter("ProcessStatus"));
To get a detailed status for any specific address element result, AD_GetResultDataParameter() can be
called. Likewise, for Enrichments you may call AD_GetResultEnrichmentDataParameter().
For a list of all parameters available to “Result”, “ResultData” and “ResultEnrichmentData”, see the
attributes for those elements of Result.dtd (chapter 10.1): For instance, the “Result” element
provides parameter attributes like “ProcessStatus” or “ModeUsed”, while the “ResultData” element
provides parameter attributes like “ElementInputStatus” or “ElementResultStatus”.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
131
The XML API on the other hand, has only a single function AD_GetResultXML() which writes a
complete result XML text to the passed buffer (see chapter 6.11 above). The level of detail contained
in that XML construct may be influenced using the three attributes “AddressElements” (NONE,
STANDARD, DETAILED), “AddressLines” (ON, OFF) and “AddressComplete” (ON, OFF) of the “Result”
element in Parameters.xml (see DTD in chapter 10.1 and chapter 6.3).
Note that XML output of all possible address element Types (see chapter 5.10) is only available when
the “AddressElements” attribute for Result.xml is set to “DETAILED”. Many of the types described in
the DTD (see chapter 10.1) are only available where supported by the available reference data and
thus may vary greatly from country to country and even address to address. They are primarily
provided for analytical purposes for now, while for most practical applications the default result
output Type “COMPLETE” is best suited, as returned for “AddressElements” set to “STANDARD”.
6.13
...retrieve address enrichments?
Informatica AddressDoctor supports the following enrichments (for all Process Modes, except
FAST_COMPLETION):



















GeoCoding (set EnrichmentGeoCoding=”ON”)
Point Address Geocoding (set EnrichmentGeoCodingType to “NONE”, “ARRIVAL_POINT”,
or “PARCEL_CENTROID”. Default is “ARRIVAL_POINT”)
CAMEO (set EnrichmentCAMEO=”ON”)
SupplementaryUS (presently providing COUNTY_FIPS_CODE, STATE_FIPS_CODE, MSA_ID,
CBSA_ID, FINANCE_NUMBER, RECORD_TYPE, CSMA_ID, TIME_ZONE_CODE,
TIME_ZONE_NAME, CENSUS_TRACT_NO, CENSUS_BLOCK_NO, CENSUS_BLOCK_GROUP,
PMSA_ID, MCD_ID and PLACE_FIPS_CODE) set EnrichmentSupplementaryUS="ON")
SupplementaryGB (presently providing DELIVERY_POINT_SUFFIX, UDPRN, and
ADDRESS_KEY; set EnrichmentSupplementaryGB="ON")
SupplementaryJP (set EnrichmentSupplementaryJP="ON")
SupplementaryRS (set EnrichmentSupplementaryRS="ON")
SupplementaryBR (set EnrichmentSupplementaryBR="ON")
SupplementaryDE (set EnrichmentSupplementaryDE="ON")
SupplementaryZA (set EnrichmentSupplementaryZA="ON")
SupplementaryCH (set EnrichmentSupplementaryCH="ON")
SupplementaryPL (introduced in Version 5.5.0, this enrichment supports Gmina code,
Locality and Street TerytIDs for Poland. Set EnrichmentSupplementaryPL ="ON")
SupplementaryFR (introduced in Version 5.5.0, this enrichment supports INSEE code for
France. Set EnrichmentSupplementaryFR="ON")
SupplementaryAT (introduced in Version 5.5.0, this enrichment supports the PAC code
for Austrian addresses. Set EnrichmentSupplementaryAT="ON")
SERP (set EnrichmentSERP="ON")
CASS (set EnrichmentCASS="ON")
SNA (set EnrichmentSNA="ON")
AMAS (set EnrichmentAMAS="ON")
SENDRIGHT (set EnrichmentSENDRIGHT=”ON”)
Enabled enrichments are processed as the last processing step when calling AD_Process():
To enable Geocoding for example, the “Process” attribute “EnrichmentGeoCoding” within
Parameters.xml (see Appendix 10.1) must be set to ON (default for all enrichments is OFF) and the
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
132
attribute “EnrichmentGeoCodingType” must be set to NONE, ARRIVAL_POINT, or
PARCEL_CENTROID. Respective switches are provided for all enrichments, see the Parameters.xml
DTD in Appendix 10.1 for details.
Enrichments might be subject to providing an extra unlock code (as is the case for Geocoding and
supplementary databases, see chapter 6.4) and will usually require extra database files (see chapter
6.5 for examples).
Enrichment results can then be obtained in the direct API case by first calling the function
AD_GetResultEnrichmentElementExists()
to check for their existence and then
AD_GetResultEnrichmentElement()
for actually retrieving them.
is provided to access the enrichment specific result
information, like “GeoCodingStatus” (for a list of the available parameter attributes see the elements
of Result.dtd in chapter 10.1). When using the XML API, calling AD_GetResultXML() provides all
enabled enrichment results also.
AD_GetResultEnrichmentDataParameter()
For example code see chapter 6.11, also see chapters 5.19 to 2 for GeoCoding, CAMEO, CASS, SERP,
AMAS, SNA and the Supplementary status values and chapter 6.24 for details on the certified CASS,
SERP, AMAS, SNA and SendRight enrichments.
6.14
...analyze error conditions?
For C, AD_GetLastError() provides you with the last error return code (see section 5.32 for a return
code overview) and AD_GetExtendedErrorMsg() allows access to extended information pertaining to
the last error. Error messages often point to configuration issues that are best analyzed by referring
to GetConfig.xml or Parameters.xml (see chapter 6.6 on how to obtain those).
For Java you use AddressDoctorException.getExtendedMessage() for that same purpose. Make sure to
wrap Informatica AddressDoctor and AddressObject calls with try/catch blocks for proper exception
handling – for a more detailed example see the code in chapter 4.1:
try
{
AddressDoctor.process(m_oAO);
iLastError = AddressDoctor.getLastError();
System.out.println("Process returned " + iLastError);
} catch (AddressDoctorException ex)
{
System.out.println("Exception during process: " + ex.toString());
}
The ConsoleDemo test application in C and Java provided by Informatica AddressDoctor (see chapter
7.1) may prove helpful in analyzing error conditions. Collect the information listed in chapter 9.3
before contacting Informatica AddressDoctor Support.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
133
6.15
...assign and process addresses in non-Latin script?
In pretty much the same way as the examples shown in the preceding chapters 6.7 and 6.8. Make
sure to input your addresses using the appropriate bit width for the source character set you are
using (see chapter 5.8, UTF-16 is typically the safe choice for non-Latin character sets).
Here is an example of a Japanese Kanji address:
<?xml version="1.0" encoding="UTF-16"?>
<InputData>
<AddressElements>
<Country Item="1" Type="NAME">JAPAN</Country>
</AddressElements>
<AddressLines>
<FormattedAddressLine Line="1">〒 949-7277</FormattedAddressLine>
<FormattedAddressLine Line="2">新潟県南魚沼市国際町 777 番地</FormattedAddressLine>
<FormattedAddressLine Line="3">国際大学</FormattedAddressLine>
</AddressLines>
</InputData>
The Rōmaji result, illustrating the Informatica AddressDoctor transliteration capabilities via
PreferredScript set to “LATIN” in Result.xml (see DTD in chapter 10.1), would look like this:
<?xml version="1.0" encoding="UTF-16"?>
<Result ProcessStatus="C4"
ModeUsed="BATCH"
Count="1"
CountOverflow="NO"
CountryISO3="JPN"
PreferredScript="LATIN"
PreferredLanguage="DATABASE">
<ResultData ResultNumber="1"
MailabilityScore="3"
ResultPercentage="83.20"
ElementResultStatus="F0F8F040400040000060"
ElementInputStatus="60606020200020000060"
ElementRelevance="10111000000000000010">
<AddressElements>
<Country Type="NAME_EN" Item="1">JAPAN</Country>
<Locality Item="1">MINAMIUONUMA-SHI</Locality>
<Locality Item="2">ANAJISHINDEN</Locality>
<PostalCode Item="1">949-7277</PostalCode>
<Province Item="1">NIIGATA-KEN</Province>
<Street Item="1">KOKUSAI-CHŌ</Street>
<Number Item="1">777 BANCHI</Number>
<Building Item="1">KOKUSAIDAIGAKU</Building>
</AddressElements>
<AddressLines>
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
134
<DeliveryAddressLine Line="1">777 BANCHI KOKUSAI-CHŌ KOKUSAIDAIGAKU</DeliveryAddressLine>
<CountrySpecificLocalityLine Line="1">MINAMIUONUMA-SHI NIIGATA-KEN 9497277</CountrySpecificLocalityLine>
<FormattedAddressLine Line="1">777 BANCHI KOKUSAI-CHŌ
KOKUSAIDAIGAKU</FormattedAddressLine>
<FormattedAddressLine Line="2">ANAJISHINDEN</FormattedAddressLine>
<FormattedAddressLine Line="3">MINAMIUONUMA-SHI NIIGATA-KEN 9497277</FormattedAddressLine>
<FormattedAddressLine Line="4">JAPAN</FormattedAddressLine>
</AddressLines>
<AddressComplete>777 BANCHI KOKUSAI-CHŌ KOKUSAIDAIGAKU
ANAJISHINDEN
MINAMIUONUMA-SHI NIIGATA-KEN 949-7277
JAPAN
</AddressComplete>
</ResultData>
</Result>
Similarly a Russian example, in Cyrillic script:
<?xml version="1.0" encoding="UCS-2LE"?>
<InputData>
<AddressElements>
<Country Item="1" Type="NAME">RUS</Country>
</AddressElements>
<AddressLines>
<FormattedAddressLine Line="1">Международный университет в Москве</FormattedAddressLine>
<FormattedAddressLine Line="2">Ленинградский проспект 17</FormattedAddressLine>
<FormattedAddressLine Line="3">125040 Москва</FormattedAddressLine>
</AddressLines>
</InputData>
Results in (with PreferredScript set to “ASCII_SIMPLIFIED” this time, to suppress special characters
like the “ž” in “Meždunarodnyj”, see chapter 5.12.1 for reference):
<?xml version="1.0" encoding="UCS-2LE"?>
<Result ProcessStatus="C4"
ModeUsed="BATCH"
Count="1"
CountOverflow="NO"
CountryISO3="RUS"
PreferredScript="ASCII_SIMPLIFIED"
PreferredLanguage="DATABASE">
<ResultData ResultNumber="1"
MailabilityScore="4"
ResultPercentage="82.50"
ElementResultStatus="F0F080F0F000400000E0"
ElementInputStatus="60600060600020000060"
ElementRelevance="10101000000000000010">
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
135
<AddressElements>
<Country Type="NAME_EN" Item="1">RUSSIAN FEDERATION</Country>
<Locality Item="1">Moskva</Locality>
<PostalCode Item="1">125040</PostalCode>
<Province Item="1">Moskva</Province>
<Street Item="1">Leningradskij pr-kt</Street>
<Number Item="1">17</Number>
<Building Item="1">Mezdunarodnyj Universitet V Moskve</Building>
</AddressElements>
<AddressLines>
<DeliveryAddressLine Line="1">Mezdunarodnyj Universitet V Moskve</DeliveryAddressLine>
<DeliveryAddressLine Line="2">Leningradskij Pr-Kt 17</DeliveryAddressLine>
<CountrySpecificLocalityLine Line="1">Moskva</CountrySpecificLocalityLine>
<FormattedAddressLine Line="1">Mezhdunarodnyj Universitet V Moskve</FormattedAddressLine>
<FormattedAddressLine Line="2">Leningradskij Pr-Kt 17</FormattedAddressLine>
<FormattedAddressLine Line="3">Moskva</FormattedAddressLine>
<FormattedAddressLine Line="4">125040</FormattedAddressLine>
<FormattedAddressLine Line="5">Russian Federation</FormattedAddressLine>
</AddressLines>
<AddressComplete>Mezhdunarodnyj Universitet V Moskve
Leningradskij Pr-Kt 17
Moskva
125040
Russian Federation
</AddressComplete>
</ResultData>
</Result>
6.16
…use Informatica AddressDoctor with multiple processor cores?
Let us assume a four processor core machine on which three cores are to be used for address
processing: The main thread of the program integrating Informatica AddressDoctor 5 calls
AD_Initialize(); (see chapter 6.1) with MaxThreadCount=3 and MaxAdressObjectCount=3 (see chapter
5.36) and creates three worker threads for processing addresses.
Each worker thread then acquires one AddressObject handle via AD_GetAddressObject( &hAOHandle
and subsequently keeps repeating the following sequence (see chapters 6.7.1, 6.8 and 6.11):
);
AD_SetInputDataXML( hAOHandle, <XML string> );
AD_Process( hAOHandle );
AD_GetResultXML( hAOHandle, sResultXML, sizeof( sResultXML ) );
AD_ClearData( hAOHandle );
When you are finally shutting down, the main thread destroys all worker threads and de-initializes
Informatica AddressDoctor (see chapter 6.1):
AD_ReleaseAllAddressObjects();
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
136
AD_DeInitialize();
6.17
…produce valid Informatica AddressDoctor XML?
Any XML input to Informatica AddressDoctor should always be well-formed and validated against
the DTDs provided for that purpose by Informatica AddressDoctor (see chapter 10.1). Note that the
sequence of the XML elements does matter (but not that of their attributes), which can be checked
through DTD validation as well.
Refer to http://wikipedia.org/wiki/XML for an introduction to XML. Apart from XML functionality
being an integral part of most modern Integrated Development Environments (IDEs), there is a
diverse choice of free validating XML editors, like WMHelp XMLPad or XML Copy Editor from
SourceForge.net.
When dealing with XML files produced on different platforms, note that end-of-line (EOL) characters
differ between Windows (CR+LF) and UNIX (LF), see http://wikipedia.org/wiki/Linebreak.
6.18
…use Informatica AddressDoctor XML for flexible Business
Processes?
Standards like BPEL (the Business Process Execution Language) allow for more flexible business
processes implemented using information technology. For instance, you might model and
implement a business process including global address verification based on an InputData.xml
template that contains placeholder variables mapping to certain input data columns provided by
data sources.
Some of the external influences (like new postal regulations) the business side might have to react
on, may thus be implemented without programming knowledge, simply by adjusting these
placeholders in the XML template.
For example, let us assume you are dealing with addresses for a country that has recently introduced
a postal code system.
So far, your InputData.xml template might have looked like this (the “$” character is used to delimit
placeholder names here):
<?xml version='1.0' encoding='UTF-16'?>
<!DOCTYPE InputData SYSTEM 'InputData.dtd'>
<InputData>
<AddressElements>
<Key>$COLUMN1$</Key>
<Country Item='1' Type='NAME'>$COLUMN7$</Country>
<Locality Item='1' Type='COMPLETE'>$COLUMN6$</Locality>
<Street Item='1' Type='COMPLETE'>$COLUMN5$</Street>
<Building Item='1' Type='COMPLETE'>$COLUMN4$</Building>
<Organization Item='1' Type='NAME'>$COLUMN2$</Organization>
<Contact Item='1' Type='NAME'>$COLUMN3$</Contact>
</AddressElements>
</InputData>
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
137
Due to that new postal regulation, postal codes have been added to the data source, which now
need to be verified also. A new eighth column has thus been made available and can be mapped as
easily as follows:
<?xml version='1.0' encoding='UTF-16'?>
<!DOCTYPE InputData SYSTEM 'InputData.dtd'>
<InputData>
<AddressElements>
<Key>$COLUMN1$</Key>
<Country Item='1' Type='NAME'>$COLUMN7$</Country>
<Locality Item='1' Type='COMPLETE'>$COLUMN6$</Locality>
<PostalCode Item='1' Type='UNFORMATTED'>$COLUMN8$</PostalCode>
<Street Item='1' Type='COMPLETE'>$COLUMN5$</Street>
<Building Item='1' Type='COMPLETE'>$COLUMN4$</Building>
<Organization Item='1' Type='NAME'>$COLUMN2$</Organization>
<Contact Item='1' Type='NAME'>$COLUMN3$</Contact>
</AddressElements>
</InputData>
All that is needed to facilitate this kind of change is a simple editor as described in chapter 6.17.
6.19
…use Informatica AddressDoctor for Master Data Management?
Informatica AddressDoctor provides a batch validation mode that was designed for mass data
address quality, for example, for use in Master Data Management (MDM) or Data Integration
systems. This validation mode (see chapter 5.11.1 for details) allows address input into the
AddressObject irrespective of data quality. The input is then automatically corrected to the extent
possible, returning the single most likely candidate as the processing result.
When designing an application for batch processing, call the AD_Process() function with the BATCH
validation process mode (see chapter 5.11.1). Informatica AddressDoctor returns a single corrected
result whenever possible (Process Status “Vx” or “Cx”, see chapter 5.17).
For tackling severe address quality challenges, a recommended batch usage pattern for Informatica
AddressDoctor 5 based on the “OptimizationLevel” concept is described in chapter 5.33.
6.20
…use Informatica AddressDoctor in an eBusiness Environment?
Informatica AddressDoctor provides an interactive validation mode that was designed for point of
data entry address quality, for example, for use in online registration forms, be it for a web shop, an
auction platform or a customer feedback system. This validation mode (see chapter 5.11.2 for
details) allows for address input into the AddressObject irrespective of data quality. The input is then
automatically corrected to the extent possible, returning a choice of likely candidates. If processing
can identify one definite candidate, the result returned will only be that candidate
When designing an application for interactive entry, call the AD_Process() function with the
INTERACTIVE validation process mode (see chapter 5.11.2), for example, using the web form content
that has been posted to a web server online.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
138
Informatica AddressDoctor returns a number of possible results (candidates), which are then to be
presented to the user entering data for picking the most correct result. If the input data entered was
already complete and correct, there would obviously be no need for such user interaction.
Note that the user should usually have the option to edit the returned result once more before final
submission: For instance in the case of new construction activity, an address might not yet be
featured even in the most recent set of postal reference data.
6.21
…use the Quick Address Entry Feature?
Informatica AddressDoctor features a validation mode (Fast Completion) that can be used in call
center environments where data entry personnel should be assisted in their data entry task. The
same use case will usually apply to Customer Relationship Systems (CRM), Property and Reservation
Management Systems (PMS) or Point of Sales (POS) systems. This validation mode (see chapter
5.11.3 for details) allows for incomplete address input into the AddressObject. This input is
automatically completed to the extent possible.
When designing an application for quick address entry it is possible to call the AD_Process() function
with the FASTCOMPLETION validation process mode (see chapter 5.11.3) after each keystroke.
Provided the reference databases are either accessible quickly or even stored locally, pick lists can
be displayed in real time.
As an example we are going to input the following data:
Country: USA
Locality: Wash
Street: Pennsyl
Informatica AddressDoctor returns 100 results (suggestions) and an overflow indication will be set: If
the “CountOverflow” attribute of Result.xml (see DTD in chapter 10.1) is set to YES, this indicates
that potentially more results would be available. It is then recommended that the AD_Process()
function is called again with additional input data.
6.22
…use Informatica AddressDoctor in a multi-tenant hosted
environment?
A multi-tenant hosted solution requires initialization of separate Informatica AddressDoctor
instances with a customer-specific unlock code for each, in order to meet the terms and conditions
set by the different reference data providers.
Informatica AddressDoctor has examined making use of a RAM disk to share the reference database
files across these several Informatica AddressDoctor instances (typically threads):

Internal benchmarks using ramfs on Linux have shown that the address validation
throughput of a RAM disk (with PreloadingType="NONE", see chapter 5.34) is only about
12% less compared to full preloading (PreloadingType="FULL"), in the case of 4 threads
running on a 4 core machine.

Blocking of the different threads thus seems reduced by the low latency of RAM compared
to hard disk storage.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
139

This is a good compromise between speed and hardware requirements, as about 8-10G of
RAM should suffice to hold the world Batch/Interactive reference databases in a shared
ramdisk.
Note that with Informatica AddressDoctor 5.1.4 a new default method of preloading has been
introduced (PreloadingMethod=”MAP”, see chapter 5.34) which allows sharing memory mapped
reference database files across instances out of the box, without the performance hit due to the
ramdisk driver.
Where several GB RAM are not available for memory mapped files, each customer will then at least
require separate storage with their own copy of the reference database files for performance
reasons:

These are then only partially preloaded (PreloadingType="PARTIAL") to their own
Informatica AddressDoctor instance (0.5 to 1 GB RAM per customer should usually suffice
here), which features a caching facility for this use case as well.

Keeping separate copies of the reference database files should ensure that customer
instance I/O accesses to these files don't block each other - note that full preloading is a prerequisite for proper multicore scalability (see chapter 5.36), so such a partially preloaded
setup will probably limit the usable processor cores per customer thread to no more than
two (because of I/O blocking again, this time between the multiple threads used for one
customer).

We have found SATA Solid State Disks to improve performance vastly in such a setup, for
reference see chapter 6.25.
6.23
…use Informatica AddressDoctor for Web Services?
As demonstrated in chapter 4.1, Informatica AddressDoctor 5 introduced an XML API (see Appendix
10.2 for reference) that makes it even easier to integrate global address correction in Web Services
environments - be it Software as a Service (SaaS) SOAP calls for Internet cloud computing or an
Enterprise Service Bus (ESB) as part of a Service Oriented Architecture (SOA) in the Intranet.
Simply feed address data from your web service in XML format directly into Informatica
AddressDoctor via AD_SetInputDataXML() (see chapter 6.7.1 for details), which only requires prior
XML transformation (using broadly adopted technologies like XSLT, see
http://wikipedia.org/wiki/XSLT) on the basis of the DTD information made available by Informatica
AddressDoctor (see chapters 6.17 and 10.1).
Note that Informatica AddressDoctor also offers secure and synchronous Web Services for direct and
ready-to-use integration via SOAP (see the product line overview in chapter 2). The following figure
provides an overview of the Informatica AddressDoctor Data Quality Platform:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
140

Batch (max. 10 addresses per SOAP call)
o


Validation and automatic correction of addresses in batch with immediate results.
Interactive
o
Online validation of addresses with interactive correction.
o
Ideal for online shops and CRM systems.
FastCompletion
o
Support for call centers.
The Informatica AddressDoctor Web Services have a proven track record in both, high availability (>
99.9 %) and high volume throughput. Web Service pricing is very competitive and transaction based.
Also, address enrichment options are available, for details see
http://www.addressdoctor.com/en/products/ecommerce.
6.24
...validate an address in CERTIFIED mode?
For some countries, Informatica AddressDoctor offers a special validation process mode “CERTIFIED”
which is used to validate an address according to the certification rules defined by the local postal
authority. This validation type allows integrators to develop their own application for certification by
the respective postal organization. Special database files may be necessary for CERTIFIED processing
- for details see the following chapters.
It is very important to note that the following Parameters must not be changed from their default
settings to ensure proper CERTIFIED processing:




PreferredLanguage (see chapter 5.12.2)
MatchingAlternatives and MatchingScope (see chapter 5.12.5)
GlobalMaxLength, GlobalCasing, and AddressElementStandardize (MaxLength or Casing, see 5.14)
OptimizationLevel (see chapter 5.33)
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
141
6.24.1 ...process an address following the rules for CASS certification?
For US addresses, Informatica AddressDoctor engine offers a special validation process mode
“CERTIFIED” which is used to validate an address according to the USPS CASS rules. This validation
type allows integrators to develop their own CASS Application for certification by USPS. Special
database files are necessary for CASS processing - for details see chapter 3.3.2.
For CountryISO3=”USA” and Type=”CERTIFIED” the GetConfig.xml output will only list a subset of the
available reference database files as “DataBase” element with Type=”CERTIFIED”. Also, USA5BI.md
must always be available, as this database is the basis for CASS processing (see chapter 6.24) also.
Furthermore, US CERTIFIED mode will not work without pre-loading and full pre-loading will always
be enforced on some of the CASS databases described in chapter 3.3.2, irrespectively of the settings.
During the validation process the input address is corrected according to CASS rules. In this process
all CASS attributes are generated and the ZIP + 4 is added to the ZIP code. The output address is
retrieved from the AddressObject as usual (see chapter 6.11). A CASS processing status value (see
chapter 5.21) can be retrieved from the AddressObject through the “CASSStatus” attribute returned
with the “EnrichmentData” element of the Result.xml.
To actually have CASS attributes available in the CASS element of Result.xml, the Process attribute
“EnrichmentCASS” within Parameters.xml (see Appendix 10.1) must be set to ON (default is OFF).
With “EnrichmentCASS” set to OFF, Informatica AddressDoctor still provides ZIP+4 codes (as
PostalCode item type “ADD_ON”), as long as USA5BI.MD is available in the database folder. For
convenience reasons ZIP+4 codes are also provided by the US BATCH process mode, although some
result variations may very well occur – the definite ZIP+4 reference is available in US CERTIFIED
process mode only.
You may check for correct initialization of all required CASS databases (see chapter 3.3.2 for the full
list) by querying GetConfig.xml (see the respective DTD in Appendix 10.1) for the
EnrichmentSupportInfo element:
<EnrichmentSupportInfo CountryISO3="USA" Type="CERTIFIED">FULL</EnrichmentSupportInfo>
Note that the CASS attribute output provided by Informatica AddressDoctor is only valid for use
during a special validation period, varying depending on the product.
The valid time ranges are as follows (as defined on the USPS PS 3553 form, which will need to be
created by the calling application to qualify for USPS mailing discounts, for an example, see
http://ribbs.usps.gov/cassmass/documents/tech%5Fguides/PS_FORM_3553):
ZIP + 4/DPV Coded
From Date
To Date
30 days before (the 15th of each month or bimonthly) or no later than 105 days
180 days after the ZIP + 4
valid “From” date.
after the file date.
Total Delivery
Point Barcoded
30 days before (the 15th of each month or
bimonthly) or no later than 105 days
180 days after the DPBC
valid “From” date.
after the ZIP + 4 product file date.
Total Carrier
30 days before or up to 105 days after the ZIP +
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
90 days after the Carrier
142
Route Coded
From Date
To Date
4, Five-Digit ZIP, or the Carrier
Route Valid “From” date.
Route product date (the 15th of each month or
bimonthly) or up to 105 days after the file date.
Five-Digit Coded
30 days before (the 15th of each month or
bimonthly) or no later than 105 days
365 days after the FiveDigit Valid “From” date.
after the ZIP + 4, Five-digit ZIP, or the Carrier
Route product date.
Note that any application based on Informatica AddressDoctor 5 must meet the CASS Terms and
Conditions to qualify for USPS mailing discounts:
http://ribbs.usps.gov/cassmass/documents/tech_guides/FORMS/CASSDEVS.pdf
The following list shows which CASS attributes are available (see Result.dtd in Appendix 10.1 also) –
for an explanation of the different attributes, refer to the CASS documentation at
http://ribbs.usps.gov/cassmass/documents/tech_guides/TECHNICAL_GUIDES/CASSTECH_N.PDF:
Carrier Route Answer
Record Type Code
Delivery Point Answer
Delivery Point Check Digit Answer
High-rise Default
High-rise Exact
Rural route Default
Rural route Exact
DSF² LACS Indicator
DPV Confirmation Indicator*
DPV CRMA Indicator*
DPV False Positive Indicator*
DPV Footnote 1*
DPV Footnote 2*
DPV Footnote 3*
Concatenation of DPV Footnotes*
Result of the call to the DPV NOSTATS Table
Result of the call to the DPV VACANT Table
LACSLink Return Code*
SUITELink Return Code*
ZIPMove Return Code*
Early Warning System (EWS) Return Code
Congressional District
Barcode
Residential Delivery Indicator **
eLOT Ascending/Descending
eLOT Sequence Number
CARRIER_ROUTE
RECORDTYPE
DELIVERY_POINT
DELIVERY_POINT_CHECK_DIGIT
HIGHRISE_DEFAULT
HIGHRISE_EXACT
RURALROUTE_DEFAULT
RURALROUTE_EXACT
LACS
DPV_CONFIRMATION
DPV_CMRA
DPV_FALSE_POSITIVE
DPV_FOOTNOTE_1
DPV_FOOTNOTE_2
DPV_FOOTNOTE_3
DPV_FOOTNOTE_COMPLETE
DSF2_NOSTATS_INDICATOR
DSF2_VACANT_INDICATOR
LACSLINK_RETURNCODE
SUITELINK_RETURNCODE
ZIPMOVE_RETURNCODE
EWS_RETURNCODE
CONGRESSIONAL_DISTRICT
BARCODE
RDI_INDICATOR
ELOT_FLAG
ELOT_SEQUENCE
Note: Attributes marked with * will only be populated for US customers as per USPS licensing
restrictions.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
143
See chapter 7 in http://ribbs.usps.gov/dpv/documents/tech_guides/DPV_LPR.PDF for details on how
to programmatically act on DPV_FALSE_POSITIVE being set.
Attributes marked ** requires customers to acquire the data from USPS to enable the optional part
of the processing. See Chapter 3.3.2 on how to acquire the data and rename the necessary files.
6.24.2 ...process an address following the rules for SERP certification?
For Canada addresses, Informatica AddressDoctor offers a special validation process mode
“CERTIFIED” which is used to validate an address according to the Canada Post SERP rules. This
validation type allows integrators to develop their own SERP Application for certification by Canada
Post. As the new databases now contain PoCAD (Point of Call Address Data) data, an additional
CAN5C1.MD is needed for certified mode. Those who want to use the new engine with older
databases will have to make a copy of CAN5BI.MD and rename the copy to CAN5C1.MD. See chapter
6.5 also.
You may check for correct initialization of all required databases by querying GetConfig.xml (see the
respective DTD in Appendix 10.1) for the EnrichmentSupportInfo element:
<EnrichmentSupportInfo CountryISO3="CAN" Type="CERTIFIED">FULL</EnrichmentSupportInfo>
A SERP processing status value (see chapter 5.22) can be retrieved from the AddressObject through
the “SERPStatus” attribute returned with the “EnrichmentData” element of the Result.xml. To
actually have SERP attributes available in the SERP sub-element of EnrichmentData in Result.xml, the
Process attribute “EnrichmentSERP” within Parameters.xml (see Appendix 10.1) must be set to ON
(default is OFF).
Note that SERP certification requirements are only met, when the “PreferredScript” attribute is set
to “ASCII_SIMPLIFIED” (see chapter 5.12.1).
If the Validation type is CERTIFIED and the SERP Enrichment Status is ON, two enrichments are
provided:
CATEGORY and EXCLUDED_FLAG
The category provides the following possible values:
Value
Description
V
Verified. The process status is of type Vx.
C
Corrected. The process status is of type Cx.
N
Incorrect. The process status is of type Ix.
VQ
Valid, but questionable. Rural addresses (those with a '0' as second digit in the
PostalCode, for example, "K0A 1L0") are usually considered valid because they are
determined by the PostalCode. Questionable means that either delivery
information is missing in the input or that some part or all of the delivery input has
not been verified by the database. See also the address accuracy handbook
provided by Canada Post.
V1A
Valid, residential type record. Some records in the database containing buildings
are marked as apartment type records, either residential or commercial. This
information is provided in the enrichment.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
144
Value
Description
V2A
Valid, commercial type record. This refers to commercial building records in the
database.
C1A
Corrected, residential type record.
C2A
Corrected, commercial type record.
Since December 2010, PoCAD data has been added to the databases to provide more detailed suite
information. (Note that the corresponding database CAN5C1.MD will be made available early
January 2011.)
The EXCLUDED_FLAG informs about PoCAD addresses with wrong user input.
The Informatica AddressDoctor output for this flag can either be empty or the Text ‘EXCLUDED’:
EXCLUDED: Incorrect suite input for a PoCAD address, category N, process status Ix
Effective January 17, 2011, the statement of accuracy has to report addresses as Excluded. However,
starting August 1, 2011 this flag will no longer be needed. Addresses will no longer show up as being
excluded.
See Result.dtd in Appendix 10.1 also. Refer to Canada Post’s website for more information:
http://www.canadapost.ca/cpo/mc/business/productsservices/atoz/addressaccuracy.jsf
6.24.3 ...process an address following the rules for AMAS certification?
For Australian addresses, Informatica AddressDoctor offers a special validation process mode
“CERTIFIED” which is used to validate an address according to the Australia Post AMAS rules. This
validation type allows integrators to develop their own AMAS Application for certification by
Australia Post. Special databases are necessary for AMAS processing - for details see chapter 3.3.24.
These new databases contain Postal Address File (PAF) data including Australia Post’s Delivery Point
Identifiers (DPIDs).
The additional AMAS information can be found in the section EnrichmentData of the Result.xml,
You may check for correct initialization of all required AMAS databases (see chapter 3.3.24 for the
full list) by querying GetConfig.xml (see the respective DTD in Appendix 10.1) for the
EnrichmentSupportInfo element:
<EnrichmentSupportInfo CountryISO3="AUS" Type="CERTIFIED">FULL</EnrichmentSupportInfo>
The following status codes or parameters are available (see AMAS documentation for more details):
Parameter
Description
ERRORCODE
Internal error code
RECORD TYPE
Type of address (for example, S for Street)
DELIVERY_POINT_ID
Delivery point identifier (DPID), 8 digits
LOT_NBR
Lot number (for example, 100)
POSTAL_DELIVERY_NBR
Postal delivery number (for example, "00123" of "123A”)
POSTAL_DELIVERY_NBR_PFX
Postal delivery number prefix (for example, "A" of "A123”)
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
145
Parameter
Description
POSTAL_DELIVERY_NBR_SFX
Postal delivery number suffix (for example, "A" of "123A”)
HOUSE_NBR_1
House (street) number 1 (for example, "00123" of "123A456B)
HOUSE_NBR_SFX_1
House (street) number 1 suffix (for example, "A" of "123A456B)
HOUSE_NBR_2
House (street) number 2 (for example, "00456" of "123A456B)
HOUSE_NBR_SFX_2
House (street) number 2 suffix (for example, "B" of "123A456B)
Other AMAS relevant fields are regular address elements and can be found in the standard
ResultData section.
Beginning with version 5.2.8 the Locality Sub field “PREFERRED_NAME” has to be used to comply
with AMAS rules because the COMPLETE or NAME fields may contain vanity names if they were
entered instead of the official names requested by the postal administration of Australia.
6.24.4 ...process an address following the rules for SNA certification?
For French addresses, Informatica AddressDoctor offers a special validation process mode
“CERTIFIED” which is used to validate an address according to the La Poste SNA rules. This validation
type allows integrators to develop their own SNA Application for certification by La Poste. No special
database files apart from FRA5BI.md are necessary for CERTIFIED processing. See chapter 6.5 also.
You may check for correct initialization of all required databases by querying GetConfig.xml (see the
respective DTD in Appendix 10.1) for the EnrichmentSupportInfo element:
<EnrichmentSupportInfo CountryISO3="FRA" Type="CERTIFIED">FULL</EnrichmentSupportInfo>
For SNA certified processing (see: http://www.laposte.fr/sna) it is required to enter addresses in a
six line FormattedAddressLine format, including empty lines wherever a part of the address is
missing:
Line 1:
ORGANIZATION IDENTIFICATION or IDENTITY OF THE ADDRESSEE
Line 2:
INDIVIDUAL IDENTIFICATION (i.e. Company Contact) or DELIVERY POINT ACCESS
INFORMATION (i.e. SubBuilding)
Line 3:
DELIVERY POINT LOCATION (i.e. Building)
Line 4:
STREET NUMBER or PLOT and THOROUGHFARE
Line 5:
DELIVERY SERVICE or THOROUGHFARE COMPLEMENTARY IDENTIFICATION
Line 6:
POSTCODE and LOCALITY or CEDEX POSTCODE and DISTRIBUTION AREA INDICATOR
A SNA processing status value (see chapter 5.23) can be retrieved from the AddressObject through
the “SNAStatus” attribute returned with the “EnrichmentData” element of the Result.xml. To
actually have SNA attributes available in the SNA sub-element of EnrichmentData in Result.xml, the
Process attribute “EnrichmentSNA” within Parameters.xml (see Appendix 10.1) must be set to ON
(default is OFF).
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
146
Note that SNA certification requirements are only met when the “PreferredScript” attribute is set to
“ASCII_SIMPLIFIED” (see chapter 5.12.1) and “GlobalMaxLength” to 38 (with the ”MaxLength” for
each AddressElement set to 0, see chapter 5.14).
The only SNA attribute available is “CATEGORY” with possible values of “ORI/RES/AVE/NOK”, as per
La Poste definition (see Result.dtd in Appendix 10.1 also). Note that the SNA certification of the
CERTIFIED mode for FRA is still pending.
6.24.5 …process an address following the rules for SendRight certification?
Informatica AddressDoctor has passed the stringent rules set by New Zealand Post to obtain the
SendRight Certification. For more information on the Certification Programme, contact New Zealand
Post directly at www.nzpost.co.nz
One of the requirements of New Zealand Post was that no address cleansing occurs during the
certification process. Therefore it is recommended that customers use the Batch mode before
running their addresses through the Certification mode if address corrections or standardizations are
required.
Here is a quote from the SendRight Certification Handbook, Section 2.4 Software that does more
than PAF validation (page 5):
“SendRight™ certification only assesses PAF validation and SOA-issuing functionality. Any other
functionality such as address cleansing must not impinge on the functionality to be assessed as it
may invalidate the testing process and results.
The purpose of the software testing is to determine whether the software can take an input address,
match it to the PAF data elements and accurately calculate the desired result.”
For more details, refer to the SendRight Certification Handbook from New Zealand Post.
http://www.nzpost.co.nz/sites/default/files/uploads/shared/sendrightcertification.pdf
6.25
...optimize performance?
The speed of your application will depend on the functionality of Informatica AddressDoctor that is
used. Parsing and Country Recognition, as implemented by Informatica AddressDoctor (AD_Process()
for C) with the Process Modes PARSE or COUNTRYRECOGNITION, do not access any databases
whereas all other modes (BATCH, INTERACTIVE, FAST_COMPLETION, ADDRESSCODELOOKUP and so
on) do.
When validating an address, a number of read operations access the corresponding country
database. These accesses are random in nature. To reduce the number of read operations accessing
the hard disk, preloading part of or all of a database is very much recommended. See chapter 5.34
for details on this topic.
However, if the size of the free physical memory (or the part made available to the process running
Informatica AddressDoctor) is too small to fully preload every country database needed, Informatica
AddressDoctor must access the hard disk. To reduce the number of file system calls needed,
Informatica AddressDoctor manages its own cache (see section 5.35). The operating system will also
cache file accesses, thereby significantly speeding up subsequent calls. Naturally, the OS must have
sufficient free memory available for this purpose.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
147
Preloading on the other hand, reduces the amount of memory the operating system can use
efficiently for file caching and it may even temporarily swap out the preloaded data blocks to the
hard disk. For this reason it is recommended to limit the memory amount used for preloading, if it
can be foreseen that additional hard disk accesses are necessary (as would for instance be the case,
when the total memory available is not sufficient to allow full pre-loading of all country reference
databases needed).
As minimizing accesses to the hard disk is a key to validation performance, installing more memory
(see chapter 5.37 on memory allocation) will speed up processing significantly, as hard disk accesses
can be avoided this way. See chapter 6.22 for reference if you are running a multi-tenant installation
of Informatica AddressDoctor 5.
Note that some operating systems can use more memory for file caching than the supported
memory size per process: For example, 32 Bit Windows 2003 Server Enterprise Edition can address
up to 32 GB of RAM, but the limit per process is still 2 GB (or rather 3 GB in case of using the /3GB
boot.ini switch, for reference see http://support.microsoft.com/kb/291988) and the limit for
standard CPU memory access (unless using AWE/PAE, see http://support.microsoft.com/kb/283037
for reference) is 4 GB.
Monitoring whether you have enough memory installed is possible using a tool to monitor resource
utilization (such as the Performance Monitor perfmon.exe on Windows). If sufficient memory is
installed, there should be almost a 100% utilization of one processor core when processing a large
batch of records in single-thread mode.
While Informatica AddressDoctor provides the means to utilize multi-threading internally (see
chapter 5.36), having more than one core or processor in the system speeds up processing in itself
already, because other threads in the system can run independently.
Here is the summary of tips to optimize the performance of validation

Install as much memory as possible to allow country databases to be fully pre-loaded into
memory. At least as much memory as the size of the most often used country databases
combined plus 256 MB should be available. If all countries available from Informatica
AddressDoctor are to be used simultaneously, add more memory to cover the entire size of all
databases.

Preload at least the databases of frequently used countries with the proper parameters set in
the SetConfig.xml passed to the AD_Initialize() function.

When full preloading is not an option, store the database files on a fast hard disk or even better
a SATA Solid State Disk (ideally exceeding 200MB/sec read transfer rate - for development
purposes, high-speed USB or FireWire flash modules exceeding 30MB/sec read transfer rate
might suffice). Especially the access latency (average seek time) should be minimized: Internal
Informatica AddressDoctor benchmarks for “PreloadingType=NONE” with an Intel X25M G2
SATA SSD have shown a typical performance increase of a factor 20.

Keep the Informatica AddressDoctor reference databases on a separate hard drive. Read and
write address data from other drives. Make absolutely sure to keep the database files
defragmented, internal tests have shown that performance may easily decrease by as much as
35% when the files are heavily fragmented.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
148

Informatica AddressDoctor is very data-intensive, with a significant amount of non-localized
memory accesses during processing: As such, it greatly benefits from direct multi-channel
memory access (for example, via Quick Path Interconnect or HyperTransport) with high
bandwidth and low latency, combined with large processor caches, such as found in top-of-the
line server processors.

Use high performance multi-core processors, like Intel Xeon X55xx/65xx/75xx and higher, AMD
Opteron 24xx/84xx and higher or IBM POWER7 and higher. Provided there is enough memory
available for full preloading, the processor clock frequency will directly determine the speed of
address processing. See http://www.spec.org/cpu2006/results/rint2006.html for a comparison
of integer processing throughput between different processor architectures.

When running batch processes without having a sufficient amount of memory installed, try to
process records ordered by country with intermittent re-initialization of Informatica
AddressDoctor using the appropriate pre-loading settings (see chapter 5.34). The engine will also
benefit from internal and OS caches for addresses sorted by country as compared to addresses
in random order, as they would for instance occur in a Web Service environment.
Examples for typical performance-oriented settings:
Given Resources
System 1
System 2
System 3
System 4
System 5
Cores for Informatica
AddressDoctor
1
2
4
6
12
RAM for Informatica
AddressDoctor
512
1024
2048
6000
16000
MaxMemoryUsageMB
450
950
1950
5950
15950
CacheSize
SMALL
LARGE
LARGE
LARGE
LARGE
MaxThreadCount
1
2
4
6
12
MaxAddressObjectCount* 1
2
4
6
12
PreloadingMethod
MAP
MAP
MAP
MAP
MAP
PreloadingType for very
important countries
PARTIAL
FULL
FULL
FULL
FULL
PreloadingType for
important countries
NONE
PARTIAL
PARTIAL
FULL
FULL
PreloadingType for
remaining countries
NONE
NONE
NONE
PARTIAL
FULL
Chosen Settings
* This setting depends on the implementation of the calling code. In some scenarios with double
buffering two times the given value may be used.
In reality the PreloadingType depends on the size of the databases, so for system 2 a FULL preload
for a couple of countries may not be possible in case of large databases. The above examples System
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
149
1 and 2 are typical for 32-bit Java usage scenarios. When benchmarking Informatica AddressDoctor,
consider the following:
The operating system will have to read data from the hard disk if any of the databases used are not
fully preloaded. These file system accesses are cached, at least until the OS file cache is full. This
leads to the effect that physical hard disk accesses are always necessary for the first addresses of a
specific country.
Later on some or even all of these accesses will hit the file cache. For this reason, the processing
speed of the first addresses (first meaning the first few thousand) of a specific country is usually
much lower than for the later ones. Thus, it is recommended to use at least 50.000 addresses
per country to produce realistic benchmark results. If, on the other hand, all accessed databases are
fully preloaded, speed is not expected to vary with the number of addresses already processed so
far.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
150
7. Demonstration Applications
The Informatica AddressDoctor package is accompanied by demonstration applications that can be
used to quickly test the functionality of the library. See the ZIP archive structure described in chapter
3.2 for reference.
7.1 ConsoleDemo Application
The ConsoleDemo application provided as source code under src and also contained as an
executable in the bin directory, gives an overview of the basic address validation process.
Before running the application, copy the example XML files from etc over to your working directory,
so that they may be found by the executable and edited for experimentation purposes. Specifically,
a sample XML file InputData.xml containing an address for XML processing via ConsoleDemo –xml or
ConsoleDemoJava –xml, respectively is provided in etc. Ensure that the minimal SetConfig
configuration XML provided contains a valid Unlock Code that you received when purchasing
Informatica AddressDoctor and the correct destination path your reference database files have been
unpacked to (see chapters 6.4 and 6.5 for details):
Alternatively, make sure to copy (or link) at least the Swiss reference database (CHE5BI.MD, see
chapter 3.3) to the working directory before running the ConsoleDemo executable: The
ConsoleDemo application will attempt to validate a sample address from Switzerland that requires
this database (otherwise, that example address will only be parsed).
Remember that the contents of the lib directory may have to be added to your shared library path
(set PATH=%PATH%;.\lib on Windows or export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./lib on Unix) for
an executable using the C-API to work.
For the Java-API simply call (for UNIX, see chapter 3.2.2 also):
java –Xss2048k -cp bin:lib/AddressDoctor5.jar -Djava.library.path=lib ConsoleDemoJava
And for Windows:
java –Xss2048k -cp bin;lib/AddressDoctor5.jar -Djava.library.path=lib ConsoleDemoJava
7.2 AddressCheck (Windows only)
Starting with Informatica AddressDoctor 5.1, the AddressCheck demonstration application featuring
a Windows GUI is made available in binary form in the bin sub-directory (AD5_WIN_32 ZIP archive
only). Note that AddressCheck requires installation of the Microsoft .NET Framework 2.0 or higher to
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
151
run and is provided as is for experimentation purposes, without warranty or support of any kind.
Note that you will have to copy the AddressDoctor5.dll from lib and the SetConfig.xml example from
etc/C to the bin directory, for AddressCheck to be able to locate them (a corresponding
AddressCheck.cfg that allows for configuring these paths is written upon the first successful
initialization).
Like for the ConsoleDemo, make sure that the minimal SetConfig configuration XML provided
contains a valid Unlock Code that you received when purchasing Informatica AddressDoctor and the
correct destination path your reference database files have been unpacked to (see the preceding
chapter 7.1 for an example and chapters 6.4 and 6.5 for details, AddressCheck requires a
“MaxAddressObjectCount” of at least 6). For 64 Bit Windows systems, the following command might
be needed for running AddressCheck:
corflags addresscheck.exe /32bit+
AddressCheck allows for interactive entry of fielded, partially fielded or unfielded address data (see
chapter 6.7) and processing in different ProcessModes (see chapter 5.11), using the processing
parameter settings (see chapters 5.12, 5.13 and 5.14) chosen via menus:
Also, it may be used for producing valid Informatica AddressDoctor InputData (hit the “Get XML”
button on the “XML Input” Tab after parsing or validating your address entered on one of the other
Tabs “Fielded Input”, “Partially Fielded Input” or “Unfielded Input”), GetConfig, Parameters and
Result XML files (see chapter 6.17 also) for submission to Informatica AddressDoctor Support (see
chapter 9.3).
The “Status Help” button is very useful in analysis of the Element Input and Result Status values (see
chapter 5.27) after processing.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
152
8. Sample Address Data for Testing
The following addresses are provided for you to test your implementation of Informatica
AddressDoctor. For each address the status code values are provided and explained. If not otherwise
mentioned the addresses have been processed using the Validation process mode Suggestions
(INTERACTIVE) that is explained in chapter 5.11.2. The data was input using the
FormattedAddressLine element, see chapter 6.7.4. Further example address input and output may
be found in chapters 4.1 and 6.15.
8.1 Addresses with Status Code Vx
Addresses whose processing results in a status code of Vx were correct on input. Depending on
other parameters some minor standardizations may take place.
8.1.1 Correct Address
The following input address is entirely correct, the postal code is properly spaced and the address
also is in the proper capitalization for a Swedish address. Because of this, no standardization will
have to take place.
VASAGATAN 22
111 20 STOCKHOLM
SVERIGE
The ElementInputStatus (see chapter 5.27.2) would be: 60600060600000000060
With the PreferredScript parameter set to Latin Script (LATIN) and PreferredLanguage set to English
(ENGLISH) see Chapter 5.12.2 the result would be (process status value V4, see chapter 5.17):
Street: VASAGATAN
HouseNumber: 22
POBox:
Locality: STOCKHOLM
PostalCode: SE-111 20
Province: STOCKHOLMS LÄN
Country: SWEDEN
The ElementResultStatus (see chapter 5.27.3) would be: F0F080F0F000000000E0
8.1.2 Address with Exonym replaced
The following input address is written with the English exonym for München (Munich). Because this
is a correct name for the city the overall status value would now be V3.
Prinzregentenstr. 93
81677 Munich
Germany
The ElementInputStatus would be: 60500060600000000060
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
153
With the PreferredLanguage parameter set to English (ENGLISH) the result would be:
Street: Prinzregentenstr.
HouseNumber: 93
POBox:
Locality: Munich
PostalCode: 81677
Province: Bavaria
Country: GERMANY
The ElementResultStatus would be: F0D080F0F000000000E0
With the PreferredLanguage parameter set to the reference data standard (DATABASE) and
CountryType set to “NAME_DE” the result would then be:
Street: Prinzregentenstr.
HouseNumber: 93
POBox:
Locality: München
PostalCode: 81677
Province: Bayern
Country: DEUTSCHLAND
The ElementResultStatus would still be: F0D080F0F000000000E0
8.2 Addresses with Status Code Cx
Addresses that Informatica AddressDoctor can automatically correct will result in a status code of Cx.
This indicates that either some address components were missing or incorrect. The returned address
can be used instead of the original input address.
8.2.1 Address with missing Postal Code
The following input address is basically correct, but it is missing the postal code. Informatica
AddressDoctor automatically appends the correct postal code and return a status value of C4 for the
address:
2827 yonge street
toronto on
Canada
The ElementInputStatus would be: 00606060600000000060
With PreferredLanguage set to DATABASE the result would be:
Street: YONGE STREET
HouseNumber: 2827
POBox:
Locality: TORONTO
PostalCode: M4N 2J4
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
154
Province: ON
Country: CANADA
The ElementResultStatus would be: 80F0F0F0F000000000E0
8.2.2 Address with Misspellings in Street and City Name
The following input address is basically correct, but has misspellings in the street and city name.
Informatica AddressDoctor automatically corrects these misspellings and return a status value of C4
for the address.
100 GOULD ST
Neu York NY 10038
United States
The ElementInputStatus would be: 60406040600000000060
With PreferredLanguage set to DATABASE the result would be:
Street: GOLD ST
HouseNumber: 100
POBox:
Locality: NEW YORK
PostalCode: 10038-1605
Province: NY
Province Item 2 (County): NEW YORK
Country: UNITED STATES
The ElementResultStatus would be: F870F870F000000000E0
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
155
9. Miscellaneous Topics
This chapter lists various topics that have not been discussed before.
9.1 Background on the (Postal) Reference Database
In order to validate postal addresses, so called postal reference data is required. This reference data
is typically a collection of locality (city) names, streets, provinces, building numbers, postal codes (ZIP
codes), and Post Office Box numbers. Informatica AddressDoctor obtains this data from various
sources around the world and updates it regularly. The specific update schedule for a country can be
found on the Informatica AddressDoctor Web Site. Also available online is an interactive map that
illustrates the latest updates and data coverage:
The world map that shows countries, shown in blue, that Informatica AddressDoctor supports for
address validation. An interactive version of this match is available on the Informatica
AddressDoctor website.
The reference data is typically provided by postal organizations around the globe. The Informatica
AddressDoctor development team checks each dataset and then transfers the data into a central
data store. This master database is then used to create the postal reference data files in the
Informatica AddressDoctor-proprietary, platform-independent (database) file format.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
156
Argentina
Armenia
Australia
Austria
Belgium
Anguilla
Angola
Canada
China
…
Denmark
Andorra
Egypt
Algeria
Finland
Albania
France
Afghanistan
Germany
Central
AddressDoctor
data store
Zimbabwe
Hungary
United
States
Italy
United
Kingdom
AddressDoctor
File Format
Japan
Turkey
Luxemburg
Spain
Mexico
Singapore
Sweden
Norway
Netherlands
Each country has its own reference database. The databases follow a specific naming scheme that
makes it easy to tell them apart.
XXX5BI.MD
XXX represents the ISO3 code of the country. A list of these codes can be found at the Informatica
AddressDoctor website: http://www.addressdoctor.com/en/countries_data/isocodes.asp
The databases are self-contained and platform independent that is they can be used on Windows,
Solaris, Unix, or Linux without changes. An external database system or run-time files are not
required.
9.1.1 Database Format
The Informatica AddressDoctor postal reference database is a read-only file that stores the postal
reference data and all required indexes for fast data access. The data conversion process to create
this database format is very resource intensive and is performed on a cluster of high speed
computers. While the creation of the database is resource intensive, the access to the data is very
fast.
The databases contain information for fuzzy (fault tolerant) searching as well as for all process
modes supported by Informatica AddressDoctor.
9.1.2 Database Size
The postal reference databases for all countries combined (without enrichments) require
approximately 15 to 20 GB storage space.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
157
9.1.3 Database Updates
Updates to the postal databases are available regularly. Their frequency depends on updates made
available to Informatica AddressDoctor by the data providers. For some countries monthly updates
are available, while others only have an irregular update frequency. When the Informatica
AddressDoctor team receives new data, it is first checked for accuracy and consistency. Then, the
data is transferred into the central data store where enrichment operations take place. Exonyms
(alternate names) for places and streets are added and indexes for fast access are stored in the
database.
To replace a database with an updated version, simply copy the new database file over the existing
file. While doing this, however, no application may be accessing the databases.
9.2 Postal Certifications
Some postal operators have instituted a certification process for software vendors. The certification
will ensure that the software conforms to the rules and regulations of a specific postal organization.
Depending on the intended use of the product, a certification might not produce the best results for
poor input data. Certifications tend to be very strict and their major goal is to avoid that improperly
addressed mail enters the postal system. The primary goal is not to improve all addresses that can
possibly be corrected.
Informatica AddressDoctor Version 5 has been certified by USPS for CASS Cycle M and is regularly
submitted to USPS for re-certification. Informatica AddressDoctor Version 5 was certified by Canada
Post for SERP in 2010 and is regularly submitted to Canada Post for re-certification. In 2011, the
engine was certified for the AMAS Cycle 2011 and is regularly submitted to Australia Post for recertification. To process addresses according to the specific rules defined by postal organizations, a
special process mode is available (process mode CERTIFIED, see chapter 5.11.5).
In 2012, the engine was certified for the France (SNA) and New Zealand (SendRight) certifications.
In 2013, the engine was re-certified for the SendRight Cycle 2014.
In 2013, the engine was re-certified for the SERP Cycle 2014.
The current engine meets the following certification cycles:

AMAS Cycle 2013

CASS Cycle N

SERP Cycle 2014

SendRight Cycle 2014

SNA (HEXAPOSTE, HEXAVIA)
Note: The Statement of Accuracy (SOA) ID is now changed from “ADR13_xxxxxxxx” to
“ADR14_xxxxxxxx”, where xxxxxxxx is the unique identifier for the SOA.
9.3 Support Information
You may contact Informatica AddressDoctor Support at: [email protected]
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
158
When doing so, make sure to provide the following four XML files (see chapters 6.6 and 6.12 for
more details and 10.1 for the corresponding DTDs) in a ZIP archive, after having run them through
the ConsoleDemo (see chapter 7.1) application provided by Informatica AddressDoctor to check for
reproducibility of your issue:
SetConfig.xml – may be retrieved using AD_GetConfigSettingsXML() (in Java: getConfigXML())
Parameters.xml - may be retrieved using AD_GetParametersXML() (in Java:
getParametersXML())
InputData.xml - may be retrieved using AD_GetInputDataXML()(in Java: getInputDataXML())
Result.xml - may be retrieved using AD_GetResultXML()(in Java: getResultXML())
These XML files will provide Informatica AddressDoctor support with a basic set of information like
the software library and reference database versions as well as the parameter settings used to
process an input address facing issues. Additionally, the following information will be needed to
assist with your problem:
Platform version and patch level, Informatica AddressDoctor is run on (for supported
platforms see chapter 2.2), including bitness (32 or 64 bit).
In case of Java: JDK version and the parameters used to initialize the JVM (for example, –Xmx
for maximum heap, see chapter 3.2.2 for examples). Additionally, the Java stack trace (in
case of a crash).
A detailed description of the steps required to trigger the problem.
In case of a crash that could not be reproduced using the Informatica AddressDoctor
ConsoleDemo, a compact binary test application that actually triggers the crash.
A constantly updated list of frequently asked questions (FAQ) can be found on the Informatica
AddressDoctor Web Site at: http://www.addressdoctor.com/en/support/FAQ
9.4 Recommended Database Layout for International Addresses
Postal addresses come in numerous varieties around the world. The formats vary in the placement
of postal codes, the placement of building numbers, the usage of provinces and the length of
address elements. Informatica AddressDoctor recommends using just one database layout to store
addresses from all countries of the world. The fields of the proposed format are then mapped to the
various elements that appear in different countries.
As an example, the United States has states, while Canada has provinces. Japan is divided into
prefectures and Switzerland into cantons. Instead of having separate fields for each, Informatica
AddressDoctor maps all of these subdivisions to the “province” field. This mapping is done for all
address elements that can be represented in an AddressObject (see 5.7).
Thus, we recommend storing addresses in a format amenable to AddressObject mapping. As
business and consumer addresses vary in the information they require, some fields are not required
for consumer addresses.
Note that all fields are of type character to allow for any combination of numeric and alpha content.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
159
Field name
Field length (min.)
Content
Organization
50
Company or Organization name including a
company type descriptor such as Inc., AG, or
GmbH
Department
50
Department or Mail Stop information
Function
60
Function of the contact
Gender
1
Gender of the contact
FirstName
40
First name of the contact
MiddleName
40
Middle name of the contact
LastName
50
Last name of the contact
Building
50
Building name. Frequently used in the United
Kingdom
Subbuilding_1
50
Information that further subdivides a Building,
for example, the floor.
Subbuilding_2
50
Information that further subdivides a Building,
for example, the suite or apartment number.
Street_1
50
Name of the street or thoroughfare
Street_2
50
Dependent street or thoroughfare
Number_1
15
Number of a Building/House in a street.
Placement varies by country.
Number_2
15
Number of a Building/House in a dependent
street. Placement varies by country.
DeliveryService_1
50
Code of the respective post office in charge of
delivery.
DeliveryService_2
50
Post Box descriptor (POBox, Postfach, Case
Postale, and so on) and number.
Locality_1
50
Primary place name. Typically a “province” is
subdivided into localities. Some countries may
contain yet another hierarchy level for
subdividing provinces. Examples are counties
in the US and Kreise in Germany
Locality_2
50
Dependent place name that further
subdivides a Locality. Examples are colonias in
Mexico, Urbanisaciones in Spain
Locality_3
50
Dependent place name that further
subdivides a Locality. An example would be
Mahalle in Turkey.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
160
Field name
Field length (min.)
Content
SortingCode
10
Speeds up delivery in certain countries for
large localities, like for example Prague or
Dublin.
PostalCode
10
Postal code or ZIP code.
Province_1
50
Store the state, province, canton, prefecture
or other sub-division of a country.
Province_2
50
Dependent province information that further
subdivides a province. An example would be a
US county.
CountryName
50
Optionally needed if required for display. It is
recommended to just store the ISO code so
that the country name can be displayed in any
language.
CountryISO
3
ISO alpha3 code according to ISO 3166. Can be
used to generate the name of a country in any
language.
When data has been stored in the format suggested above, this is of major benefit when using
Informatica AddressDoctor functionality for automatically generating addresses for printing and
display.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
161
10. Appendix
10.1
API Document Type Definitions
The following accompanying DTD files are provided in the documentation package (see chapter
3.1.2):
SetConfig.dtd - Configuration settings passed with AD_Initialize()
Parameters.dtd - Parameters passed with AD_SetParametersXML() or AD_Initialize() with SetConfig)
InputData.dtd - Structure of data input as XML, using AD_SetInputDataXML() and AD_GetInputDataXML()
Result.dtd - Structure of the XML result from AD_GetResultXML()
GetConfig.dtd - Structure of the XML result from AD_GetConfigSettingsXML()
IMPORTANT Notice: As Informatica AddressDoctor sees the DTDs referred to here as part of the API
definition, where disruptive changes must be minimized, these files may at any given time contain
elements without apparent functionality. Refer to the chapters 5 and 66 of this document to
understand what functionality is actually available in Informatica AddressDoctor 5 and how it should
be used. Simply relying on the DTDs with their comments will not suffice as basis for a successful
integration of Informatica AddressDoctor.
10.2
API Reference
For details on the available function calls and parameters, see the accompanying Application
Programming Interface Reference provided in HTML format for C and Java as part of the
documentation package (see chapter 3.1.2). Again, the API Reference should not be used without
prior consultation of this documentation, as explained before in the notice to chapter 10.1.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
162
10.3
Schematic Representation of Informatica AddressDoctor Processing
Flow
Address Input
Plausibility Check
Tokenization
Step 1
Country Detection
Parsing
Tokenization
Step 2
Parser 1
USA
Parser 2
JPN
Parser 3
CHN
Parser n
XXX
Validation
Formatting
Standardization
(Truncation/
Casing)
Enrichment
Geo Coding
CASS
...
Validate n
XXX
Enrichment OFF
Formatting
Validate 3
CHN
PARSE
Normalization
Validate 2
JPN
COUNTRYRECOGNITION
Validate 1
USA
Transliteration
Plausibility Check
Address Output
10.4
AddressElement Output Examples
To view the international address formats, see the International Address Formats page under the
Countries and Data section on the Informatica AddressDoctor website at
http://www.addressdoctor.com/en/countries-data/address-formats.html.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
163
10.5
Province Output
When retrieving the Province of a validated or corrected address from the AddressObject, you will
receive either the Province name or the Abbreviation, according to the postal rules of this country.
The following table shows what is returned for a specific country:
ISO Code
ABW
AFG
AGO
AIA
ALB
AND
ARE
ARG
ARM
ATA
ATG
AUS
AUT
AZE
BDI
BEL
BEN
BES
BFA
BGD
BGR
BHR
BHS
BIH
BLR
BLZ
BMU
BOL
BRA
BRB
BRN
BTN
BWA
CAF
CAN
CHE
CHL
CHN
CIV
CMR
COD
COG
COK
Province output form
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Abbreviation
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Abbreviation
Province name
Province name
Province name
Province name
Province name
Abbreviation
Abbreviation
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
164
COL
COM
CPV
CRI
CUB
CUW
CYM
CYP
CZE
DEU
DJI
DMA
DNK
DOM
DZA
ECU
EGY
ERI
ESH
ESP
EST
ETH
FIN
FJI
FLK
FRA
FRO
GAB
GBR
GEO
GHA
GIB
GIN
GMB
GNB
GNQ
GRC
GRD
GRL
GTM
GUY
HKG
HND
HRV
HTI
HUN
IDN
IND
IOT
IRL
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
165
IRN
IRQ
ISL
ISR
ITA
JAM
JOR
JPN
KAZ
KEN
KGZ
KHM
KIR
KNA
KOR
KWT
LAO
LBN
LBR
LBY
LCA
LIE
LKA
LSO
LTU
LUX
LVA
MAR
MCO
MDA
MDG
MDV
MEX
MKD
MLI
MLT
MMR
MNE
MNG
MOZ
MRT
MSR
MUS
MWI
MYS
NAM
NER
NFK
NGA
NIC
Province name
Province name
Province name
Province name
Abbreviation
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Abbreviation
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
166
NIU
NLD
NOR
NPL
NRU
NZL
OMN
PAK
PAN
PCN
PER
PHL
PNG
POL
PRK
PRT
PRY
QAT
ROU
RUS
RWA
SAU
SDN
SEN
SGP
SGS
SHN
SLB
SLE
SLV
SMR
SOM
SRB
SSD
STP
SUR
SVK
SVN
SWE
SWZ
SXM
SYC
SYR
TCA
TCD
TGO
THA
TJK
TKL
TKM
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
167
TON
TTO
TUN
TUR
TUV
TZA
UGA
UKR
URY
USA
UZB
VAT
VCT
VEN
VGB
VNM
VUT
WSM
YEM
ZAF
ZMB
ZWE
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Abbreviation
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Province name
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
168
10.6
Reference Data Copyright Notices
Australia:
Copyright 2009. Based on data provided under license from PSMA Australia Limited
(www.psma.com.au).
Canada:
In case Licensed User licensed the Canadian reference database, it contains Postal Code OM data
copied under license from Canada Post Corporation. The Canada Post Corporation file from which
this data was copied is from the most current data available from Canada Post Corporation at the
time Informatica AddressDoctor made the data available to Licensed User respective Integrator.
Great Britain:
You are receiving or have received information which is derived from databases (or parts or extracts
thereof) of which Royal Mail is the owner or creator, or otherwise authorised to use (the "Data").
Royal Mail owns, or is licensed, all Intellectual Property Rights which subsist in and/or relate to that
Data from time to time. You must not at any time copy, reproduce, publish, sell, let, lend, extract,
reutilise or otherwise part with possession or control of or relay or disseminate any part of this
information or use it for any purpose other than your own private or internal use.
New Zealand:
The address data within the PAF is sourced from New Zealand Post, Land Information New Zealand
and the Crown. New Zealand Post and Crown copyright reserved.
United States of America:
© United States Postal Service® 2009. Prices are not established, controlled or approved by the
United States Postal Service®. The following trademarks and registrations are owned by the USPS®:
CASS Certified™, CASS™, DPV™, United States Postal Service®, USPS®, ZIP + 4®, ZIP Code™, ZIP™
Geocodes:
The data (“Data”) is provided for your personal, internal use only and not for resale. It is protected
by copyright, and is subject to the terms and conditions which are agreed to by you, on the one
hand, and Informatica AddressDoctor (“AddressDoctor”) and its licensors (including their licensors
and suppliers) on the other hand.
© 2009 NAVTEQ. All rights reserved.
The Data for areas of Canada includes information taken with permission from Canadian authorities,
including: © Her Majesty the Queen in Right of Canada, © Queen's Printer for Ontario, © Canada
Post Corporation, GeoBase®, © Department of Natural Resources Canada. All rights reserved.
NAVTEQ holds a non-exclusive license from the United States Postal Service® to publish and sell
ZIP+4® information.
© United States Postal Service® 2009. Prices are not established, controlled or approved by the
United States Postal Service®. The following trademarks and registrations are owned by the USPS:
United States Postal Service, USPS, and ZIP+4.
Data for Europe and World Markets:
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
169
Territory
Notice
Australia
”Copyright. Based on data provided under license from PSMA Australia
Limited (www.psma.com.au).“
Austria
“© Bundesamt für Eich- und Vermessungswesen”
Croatia, Cyprus,
Estonia, Latvia,
Lithuania, Moldova,
Poland, Slovenia &
Ukraine
“© EuroGeographics”
France
“Source: © IGN 2009 – BD TOPO ®”
Germany
“Die Grundlagendaten wurden mit Genehmigung der zuständigen
Behörden entnommen”
Great Britain
“Based upon Crown Copyright material.”
Greece
“Copyright Geomatics Ltd.”
Hungary
“Copyright © 2003; Top-Map Ltd.”
Italy
“La Banca Dati Italiana è stata prodotta usando quale riferimento anche
cartografia numerica ed al tratto prodotta e fornita dalla Regione
Toscana.”
Jordan
“© Royal Jordanian Geographic Centre”
Norway
“Copyright © 2000; Norwegian Mapping Authority”
Portugal
“Source: IgeoE – Portugal”
Spain
“Información geográfica propiedad del CNIG”
Sweden
“Based upon electronic data: © National Land Survey Sweden.”
Switzerland
“Topografische Grundlage: © Bundesamt für Landestopographie.“
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
170
11. Glossary
11.1.1 ISO Country Codes
The international standard for Country Codes ISO 3166 is one of the most widely used standards
maintained by ISO TC 46. It provides a standard numeric and 2-letter and 3-letter alphabetic codes
for 240 countries or areas of special sovereignty. First released in 1974, ISO 3166 has grown to
encompass three parts, including two new sections on codes for subdivisions (states, regions, major
cities, and so on) and a listing of retired codes. For a list, see
http://www.addressdoctor.com/en/countries_data/isocodes.asp
11.1.2 Normalization
Normalization refers to the consolidation of address element descriptors, for example, to treat
“Street”, “ST” and “St.” all as equivalent to “St.” It is on one hand applied in an internal step before
validation to aid in matching. On the other hand, normalization is used to produce address element
output meeting the postal regulations for each country.
11.1.3 Parsing
Parsing is the capability to split an unstructured address string into meaningful entities. That means
an unstructured address such as
AddressDoctor GmbH
Röntgenstr. 9
D-67133 Maxdorf
would be split into
Company: AddressDoctor GmbH
Street: Röntgenstr.
Number: 9
Postal Code: 67133
Locality: Maxdorf
Country: Germany
Parsing can also be used to rearrange incorrectly fielded data.
11.1.4 Romanization
Romanization is a method of using letters of the Roman alphabet (ABCD...) to recreate the sounds of
a language whose writing system may or may not use the Roman alphabet. A Chinese Hanzi
romanization system would thus be a method of using the Roman alphabet to pronounce Chinese
Hanzi characters.
11.1.5 Standardization
Postal regulations and target database standards require address element length and casing to be
adjusted or standardized.
11.1.6 Tokenization
Address input needs to be separated into tokens for mapping these to address elements/items.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
171
11.1.7 Transformation
Transformation is the process of changing one character into other characters of the same character
set. A string like 'Änderung' would be transformed using the Transform method to for example, an
HTML encoded version of this string: '&Auml;nderung'.
11.1.8 Transcription/Transliteration
Transcription/transliteration is the process of changing one character of one character set into other
characters of another character set, such as converting from Greek to Latin, or Japanese Katakana to
Latin. This conversion makes use of either the sound of the character or the spelling.
11.1.9 Unified Ideographs
Unified ideographs are characters of the CJK writing system. They consist of Chinese Hanzi, Japanese
Kanji, and Korean Hanja.
11.1.10 Validation
Validation is the process of checking individual address elements against postal reference data. The
validation process will, for instance, verify if a postal code or a locality exist. The validation process
will also check if a street name is spelled properly and if the postal code and locality combination
given is correct for the building number provided for this street.
Informatica AddressDoctor Documentation – Last Revision: 5-Nov-14 @ 12:26
172