Anti-Money Laundering with Text Analytics

www.basistech.com
[email protected]
+1 617-386-2090
Anti-Money Laundering with
Text Analytics
Name Matching Strategies for Compliance, Risk Reduction and
Business Growth
Pg. 1
Anti-Money Laundering with Text Analytics
Name Matching Strategies for Compliance, Risk Reduction and Business Growth
INTRODUCTI O N
Vigorous enforcement of Anti-Money Laundering (AML) regulations has dramatically impacted
financial institutions around the globe. HSBC’s $1.9 billion settlement1 is the biggest fine on record,
and numerous other banks have faced stiff fines for failing to adhere to AML regulations. According
to Reuters, “U.S. and European banks have now agreed to settlements with U.S. regulators
totaling some $5 billion in recent years on charges they violated U.S. sanctions and failed to police
potentially illicit transactions.”2 These fines are not only being imposed by U.S. regulators. Recently
the Reserve Bank of India penalized six banks for violation of know-your-customer (KYC) and AML
rules.
In light of this risk, Citigroup and JPMorgan have pulled back on operations in profitable emerging
markets3 in the Middle East, Africa and Asia. Additionally, Barclays has chosen to stop doing
business with over 250 money transfer companies4, used largely by Somali diaspora communities
to send remittances – as much as $2 billion annually – to friends and family. At the same time that
“U.S. and European banks have now agreed to
settlements with U.S. regulators totaling some $5 billion
in recent years on charges they violated U.S. sanctions
and failed to police potentially illicit transactions.”
these banks and financial institutions are scaling back their riskier operations, they are also being
forced by regulators5 to improve their money laundering controls. This represents an increase in
costs and a substantial loss in potential revenue.
Central to AML and KYC regulations are the mandate to match new and existing customers with
the latest watchlists from OFAC6, FinCen7, FATCA8, The United Nations9, The European Union10,
The World Bank11, and other regulatory bodies around the globe. Fundamentally, a financial
organization needs to ensure that systems and procedures are in place to correctly identify and flag
the names of these individuals – taking into account variations in spelling, word order, languages,
writing systems, and other factors (e.g., source, quality, completeness, accuracy, consistency) in
order to be compliant. If you can do this well, you can reduce your risk and expand your potential
markets across the globe.
© 2015 Basis Technology Corporation. “Basis Technology Corporation” , “Rosette” and “Highlight” are registered trademarks of Basis Technology Corporation. “Big Text Analytics” is a trademark
of Basis Technology Corporation. All other trademarks, service marks, and logos used in this document are the property of their respective owners. (2014-12-18-AML)
Pg. 2
Anti-Money Laundering with Text Analytics
Name Matching Strategies for Compliance, Risk Reduction and Business Growth
THE COMM O N E L E ME N T: N A ME S
From wire recipients to check payees, the one factor KYC systems can rely on having is a name. But,
a name can appear many different ways: as nicknames, as foreign spellings, as “bad” spellings, and
in multiple writing systems. Comparing a name to the OFAC list, as well as other government lists,
is no trivial task.
Would you be able to find the following people on watch lists if they applied to be your customer?
-- An attorney for the Colombian Medellín cartel accused of laundering drug money
-- A Saudi national denied entry into the United Kingdom for financing terrorist groups
-- A British businessman fighting extradition to the U.S. to face a bank fraud case
-- The brother of a Latin American politician tied to Mexico’s Zambada crime syndicate
The answer, and its consequences, depends on the type of name matching you use. Employ the
wrong type and one of two outcomes may occur:
1. You will not correctly match the applicant with a name on a list (a false negative); or
2.You will incorrectly match the applicant with a name on a list (a false positive).
A false negative means that you will say “yes” to someone you don’t want as a customer. A false
positive means you’ll say “no” to valuable business, or you’ll waste time and effort, laboriously and
manually, “clearing their name”. Either outcome can harm your reputation, expose you to legal
liability, and result in lost business.
Matching names against watch lists would be much easier if everyone spelled names consistently.
But even people who speak the same language often spell the same name in different ways (Sean
vs. Shawn vs. Shaun), use a common nickname (Charles vs. Charlie vs. Charley), include different
words in a name (First Middle Last vs. First Last) or change the order of the words in a name (First
Last vs. Last, First). And the problem is compounded for names in non-Latin scripts.
Consider Mohamed Mustafa ElBaradei, the Director General of the International Atomic Energy
Agency. A review of recent news articles yields at least four different spelling variations:
-- Mohammad Al Baradei
-- Mohamad al Baradi
-- Mohamed ElBaradei
-- Mohamed Elbaradei
None of the above spellings are “right” because his name is properly written in Arabic as: ‫محمد‬
‫مصطفى الربادعى‬‎. How his name appears in English depends on how it sounds to different people.
Pg. 3
Anti-Money Laundering with Text Analytics
Name Matching Strategies for Compliance, Risk Reduction and Business Growth
To complicate things further, each watchlist may use a different convention for writing names. In
fact the Arabic name ‫ محمد‬is commonly transliterated into English up to 11 different ways:
-----
Mahammad
Mohamad
Mohammad
Mohamed
-----
Mohammed
Muhamad
Muhammad
Muhamed
-- Muhammed
-- Muhamet
-- Muhammet
This does not include the numerous ways in which Mohammad might be “misspelled”. Consider
on top of that, the fact that ‫ محمد‬might be transliterated to another language entirely, such as
Chinese: 穆罕默德. For example, the Japanese Financial Intelligence Center maintains a Taliban/AlQaeda watchlist12 with names written in English and Japanese; all of which are transliterations of
the original Arabic name.
In the age of global commerce, you need a name matching system that understands linguistics – so
that the right matches will occur, and the wrong matches (false positives) will not occur, even if
name spelling varies.
NAÏVE VS LIN GU IST IC ME T H O D O LO GIE S
Traditional methods of name matching do not use linguistic knowledge. They apply what might
be called the naïve or “brute force” approach. This approach matches names by addressing them
simply as a sequence of letters. If the two names have the same letters in the same order (or a simple
approximation) then they’re considered a match. Naïve methods attempt to handle name variability
(as in the Mohammed example) by maintaining an exhaustive list of possible name options. The
downside to this process is that it requires the continuous collection and storage of name variations.
In addition, it relies on enormous computing power to continuously check every name against a
massive and ever-growing list. This approach is also particularly vulnerable to new transliterations
of non-English names, exposing you to an unacceptable risk of not matching the new name
variation to an existing person on a watch list.
A knowledge-based approach is significantly better. Rather than try to specify every possible name
variation, a computer is “taught” to identify variations based on automated linguistic methods –
the same linguistic patterns people use to construct such variations in real life. This often involves
applying phonology (how words sound) to orthography (how words are written).
In the “Charley versus Charlie” example a linguistic engine knows that in this case, “ley” and “lie”
sound the same, and that both forms are reasonable variations of the same name. Other types of
knowledge also apply, such as formal vs informal names, where the “e” sounding suffix indicates
Pg. 4
Anti-Money Laundering with Text Analytics
Name Matching Strategies for Compliance, Risk Reduction and Business Growth
a nickname for the formal name “Charles.” Linguistic algorithms can recognize that the many
different potential spellings of Mohammed “sound” the same and in fact, refer to the name ‫محمد‬,
without the need to maintain an exhaustive list containing each variation.
CULTURAL KN O W L E D GE
If a name is on a watch list, you’re responsible for matching it – regardless of your staff’s language
competency. What happens when a Saudi person of interest applies for an account in London using
an English spelling of his name based on his dialect that differs significantly (but predictably) from
his name on the watch list? Or what happens when a bank employee in Chicago puts a Mexican
national’s paternal surname into his system’s middle name field not knowing that naming customs
place it before the maternal surname? Cases like this can easily generate false negatives and
positives without the cultural awareness.
Fortunately these types of name variations do not occur at random. Rather, they follow established
patterns of human practice depending on the specific language and cultural context. By identifying
the cultural origin of a name, the patterns can be “reverse engineered” to identify the many ways
that different words are used to represent similar names.
ADVANTAGE S O F T H E L IN GU IST IC MO D E L
The benefits of a knowledge-based approach are clear. Because a computer can apply the knowledge
humans use to derive name variations, you don’t need actual humans to specify as many variations
as they can. Nor does the computer have to make potentially billions of naïve comparisons. Results
are more accurate, they take less time to achieve, and the systems that produce these results can be
much more scalable.
The linguistic approach therefore has several key advantages:
-- Fewer false positives: Computers employ the same knowledge humans would
-- Fewer false negatives: Humans do not have to know all possible variations
-- Faster checking: Not all variations are compared explicitly
-- Greater scalability: Fewer (or smaller) machines can check more names
ctor
ople, places, and organizations
Pg. 5
connections in your data
er
ween many variations
names into English
hing In Sight
Analyzer
ments Of Your Text
ROSETTE
Entity Extractor
Tagged Entities
Anti-Money Laundering with Text Analytics
Name Matching Strategies for Compliance, Risk Reduction and Business Growth
ver
slator
REX
RES
RNI
ROSETTE
Entity Resolver
Real Identities
ROSETTE
Name Indexer
Matched Names
As linguistics experts with deep understanding at the intersection of language and technology,
RNT
ROSETTE
Name Translator
Basis Technology has developed Rosette Name Indexer (RNI), a knowledge-basedTranslated
solution to the Names
challenge of name matching.
ROSETTE
RNI
performs
an intelligent comparison based on linguistic, orthographic, and phonologic
RCA
Categorizer
Sorted Content
algorithms. By working on the name in its original script, as opposed to translating the name into
English, RNI takes advantage of all the available contextual information to properly match names to
ROSETTE
aRSA
target list without
introducing the inevitable errors that transliteration introduces.
Sentiment Analyzer
Actionable Insights
Example: U.N. Al-Qaida Sanctions Data13
Name (original script):
‫فهد محمد عبد العزيز الخشيبان‬
Transliterated Name:
FAHD MUHAMMAD 'ABD AL-'AZIZ AL-KHASHIBAN
Identified aliases:
-- Fahad H. A. Khashayban
-- Fahad Mohammad Abdulaziz Alkhoshiban
-- Fahad H. A. al-Khashiban
-- Fahad H. A. Khasiban
-- Fahd Muhammad’Abd al-‘Aziz al-Khushayban
-- Fahad al-Khashiban
-- Fahd Khushaiban
-- Fahad Muhammad A. al-Khoshiban
R N I A N D A R A B IC N A ME S
Arabic names may be written with honorifics, given name,
family name, patronymics (son of x, father of y), tribal
affiliation, city of birth, and more. All of this information
can provide valuable clues when matching one name to
another. Take the following example:
TITLE
GIVEN NAME
PATRONYMIC
FAMILY NAME
Al-Sheikh
Abdullah
Bin Hassan
Al-Ashqar
‫الشيخ‬
‫عبد الله‬
‫بن حسن‬
‫ٔالشقر‬
Since the U.N. list provides the original Arabic script, name
identification becomes significantly less ambiguous as compared
to the multiple spelling variations of the transliterated name. RNI
These names may appear in Arabic as:
matches all these documented AKAs along with the numerous
variants that are not listed.
-- Al-Sheikh Abdullah Al-Ashqar (no patronymic) or
-- Abdullah Al-Ashqar (no title, no patronymic) or
-- Al-Sheikh Abdullah Bin Hassan Bin Mohammad AlAshqar (with grandfather’s patronymic)
It’s the same name, but the variations are many and complex. RNI understands the structures
of names in each language, so instead of generating countless variations to look up, it does an
intelligent comparison of names based on linguistic, orthographic, and phonological algorithms.
Pg. 6
Anti-Money Laundering with Text Analytics
Name Matching Strategies for Compliance, Risk Reduction and Business Growth
INTEG RATIN G R N I
RNI’s API provides output to optimize your decision-making process. Matching names are returned
with a similarity score from 0 to 100% and minimum match thresholds can be set to constrain the
quality of the results returned and to balance speed and accuracy. Specific rules can also be applied to
ignore particular words, force certain names to match, and adjust the score of other kinds of matches to
align match results to business rules.
RNI has two integration options to fit your use case. The first is to be integrated more loosely as
an independent index of names, parallel to your existing data repository with application logic
for joining results. The second is to be integrated more tightly as a plug-in to that repository (e.g.,
Apache Solr), storing names alongside non-name data while spanning both queries.
B A SIC R N I WO R K F LOW
1. N A M E I N P U T
RNI
DE NY
3. OUTPUT
2 . WATC H L I ST M ATC H I N G
AP P R OVE
E VALUATE
When a name is processed by RNI, it compares the name to a series of watchlists to determine a match.
Based on the similarity score returned by RNI, the system can trigger different next steps in your workflow.
Pg. 7
Anti-Money Laundering with Text Analytics
Name Matching Strategies for Compliance, Risk Reduction and Business Growth
UNDERS TAN D IN G W H O YO U R CU STO ME R S
ARE PROTEC T S YO U A N D T H E M
AML and KYC compliance is a highly complex challenge with intricacies that are constantly in flux.
Associating a person’s name with all of its different variations is not a trivial task—one which must
be done with careful consideration and accuracy.
RNI allows for the most accurate means for resolving names across languages and scripts. The
efficient name matching technology in RNI handles spelling variations and errors, non-standard
Romanization, and the cultural vagaries of how names are written in each language. In addition,
RNI understands the structures of names in each language, so instead of generating countless
variations to look up, it does an intelligent comparison of names based on linguistic, orthographic,
and phonological algorithms. The results are also ranked by relevancy with a match score so that
further analysis can be done more efficiently. With RNI, organizations can increase the coverage
and accuracy of searching for foreign names and more readily become compliant within the
labyrinth of the regulatory environment.
1. http://www.reuters.com/article/2012/12/11/us-hsbc-probe-idUSBRE8BA05M20121211
2. http://www.reuters.com/article/2012/12/11/us-hsbc-probe-idUSBRE8BA05M20121211
3. http://www.ft.com/intl/cms/s/0/47c3432a-aa5d-11e2-9a38-00144feabdc0.html?siteedition=intl#axzz2lIhpkzmo
4. http://www.ft.com/intl/cms/s/0/f0eb197e-ff4d-11e2-8a07-00144feabdc0.html#axzz2lIhpkzmo
5. http://www.huffingtonpost.com/2013/03/26/citigroup-money-laundering_n_2956270.html
6. http://www.treasury.gov/resource-center/sanctions/SDN-List/Pages/default.aspx
7. http://www.fincen.gov/
8. http://www.irs.gov/Businesses/Corporations/Foreign-Account-Tax-Compliance-Act-(FATCA)
9. http://www.un.org/sc/committees/1267/aq_sanctions_list.shtml
10. http://eeas.europa.eu/cfsp/sanctions/consol-list_en.htm
11. http://web.worldbank.org/external/default/main?contentMDK=64069844&menuPK=116730&pagePK=64148989&piPK=64148984&quer
ycontentMDK=64069700&theSitePK=84266
12. http://www.npa.go.jp/sosikihanzai/jafic/todoke/list/list.pdf
13. http://www.un.org/sc/committees/1267/AQList.htm