Data Dictionaries: The Complete AZ

Data Management
Data Dictionaries:
The Complete A-Z
In IT departments across the industry, data officers are asking the same question—
how do I manage my data? One solution that should be part of the foundation is a
data dictionary—a comprehensive description of all the data within an organization,
providing a common key to all the repositories across an enterprise. But will it lead to
a universal language? Faye Kilburn reports
A–C Acquisition to Complexity
The data assets of the average financial firm are in a constant state of flux.
Be it through organic growth, mergers and acquisitions or restructuring,
existing departments are perennially reshuffled and new departments
tacked onto the side like conservatories. Take the spate of global mergers and banking acquisitions since
the recession—as you read this, the
respective IT departments of these
companies are likely to still be working to bridge the gulf between their
reference data repositories.
In an ideal world, firms would be
able to mastermind and execute
seamless integrations, but in reality
they have had to take a more pragmatic approach. A lot of middle- and
back-office systems are so old that
it is not worth re-plumbing them to
take data from a different source.
Hugh Stewart, SmartStream
Therefore, to keep a lid on cost but
speed up transition, new systems
are added piecemeal to organizations; a practice out of which today’s
glutinous, sprawling data repsitories
have grown.
Making matters worse are a new
wave of more complex instruments
and products. “Our products used
to be managed in silos, but more
and more our customers want crossproducts,” says Paris-based PierreJean Crouy, head of reference data
at European bank Société Générale.
“Therefore, in terms of reference
data, we are facing an increasing
need to bring various legs into a
combined product.”
D–G Data Management to
Governance
To deal with this patchwork quilt of
problems, a pre-crisis movement
emerged towards creating a single
golden copy repository. Unfortunately,
as firms flocked to acquire what was
touted as financial nirvana, they
found the associated costs to be too
high to warrant investment.
Today, the industry’s attitude
towards data repositories has taken
a u-turn—“the industry can cope
with a degree of ambiguity if it
means we don’t spend so much,”
says London-based Martin Cole,
managing director of SIX Telekurs
UK. As such, a new objective to have
aligned data across all repositories
has taken the place of the industry’s
zealous obsession with a single hub.
Tying in nicely with this shift in
the board-room is a post-crash regulatory focus on proper data governance. “The data governance flag
has been waving for a few years,
but it was largely unnoticed in the
wider organization. Before that, the
data guy sat in the corner dealing
with data. They weren’t operations, they weren’t IT and no-one
really cared about them because
they didn’t make any money,” says
London-based Hugh Stewart, sales
director for SmartStream’s DClear
Services business. Now, when a portfolio is priced or capital calculated,
the regulator wants a breakdown of
what data was used, where it came
from and how it was cleaned, so it is
vital firms know their data.
H–N Homogeneity to
Nomenclature
This is where a data dictionary comes
in. Simply put, a data dictionary is a
set of data about the data or rather,
definitional standards for what data
will be. If a data asset is defined as
“a bid price” and that definition is
standardized, anything else that is “a
bid price” in any other data repository across the firm will always have
the same definition, regardless of its
actual label.
© 2010 Incisive Media Investments. All rights reserved. Used by permission. First published in IRD March 2011
This means that if bid prices are labelled
“BIDPRICE” in a Bloomberg feed; “BPRICE”
in a Thomson Reuters feed; and “bid” in the
firm’s own repository; the data dictionary
would enable IT departments to identify
them as the same attribute based on their
meaning.
“What a data dictionary actually does is
allow non-techies to deal with data,” says
Stewart—SmartStream has developed
a data dictionary as part of its D-Clear
offering. “If you look at the operating
instructions and the user manuals of data
feeds or large software products, it is
perfectly complicated. You go from page 9
to page 52 to page 256, and then forget why
you were looking in the first place.”
As such, vendors such as SmartStream
turn manuals into simple descriptions of
each data label, marking a significant shift
away from focus on the ‘word,’ which is
often hard to standardize, to a focus on the
‘meaning,’ which is not.
O–R Obstacles to Returns
Although data dictionaries are regarded as
best practice, there are nuances to their
creation that can be overlooked by firms.
For example, being sure one data vendor’s
“BIDPRICE” correlates with another’s
“BPRICE” requires a dedicated team of IT
boffins who are skilled in identifying and
matching data.
Furthermore, even after a dictionary has
been constructed, the process is ongoing.
The next merger or acquisition comes
along and a whole raft of systems need to
be brought into the fold.
However, for all the drawbacks, the data
dictionary has the potential to deliver
significant returns on investment. “One
of the great things about implementing a
data dictionary,” says London-based Mike
Davis, senior analyst at Ovum (part of the
Datamonitor Group), “is there should be a
level of return in the fact that a) you should
get a better view of your customer and b)
you could potentially reduce the number of
systems to maintain and support, and ultimately the size of IT department.”
In fact, as data dictionaries become more
widely used, these efficiency savings are
set to grow—vendors such as software
provider SmartStream, for example, plan
to transform the data dictionary into an
open source community of users so that if
someone extends the dictionary, the extension can be shared, which reduces resource
expenditure for all involved.
Elsewhere, firms are looking to use
the dictionary to enhance the business
www.waterstechnology.com/ird
process itself. “For us, a major incentive
in 2011 is to improve our data dictionary
by including the business processes the
data is contributing to,” says Société
Générale’s Crouy.
“Currently, we have a simple yes or no
code that says whether we have authorization to trade with a counterparty or not, but
now we are adding more complex conditions—for example, the authorization for
specific products, and if the counterparty
is based in Spain.”
S–U Semantics to Universality
While the outlook for data dictionaries is
fairly positive, there is one last worry for
the regulator when it comes to external
reporting—with banks, asset management
firms and vendors all adapting their own
data dictionary definitions, how can they
be sure how one firm’s definition equates to
another?Attempting to address this problem is the EDM Council, an international
trade association that has devoted the past
two and a half years to compiling a semantics repository.
“The semantics repository is slightly
different to a data dictionary, in that
a data dictionary will define words
and terms; whereas our ontology has
mandatory and optional data attributes
and business relationships described for
all the instruments,” says Washington
DC-based Mike Atkin, managing director
of the EDM Council.
Although the semantics repository is
currently an open source
activity—which means
subscription to its
definitions
are
not
compul-
sory—there are aspirations to transform it
into an industry standard.
“The Council is working with the Object
Management Group (OMG) to establish
how the repository can be maintained as an
OMG standard. We operate in a global environment, so it makes sense to do it across
the industry,” Atkin explains.
V–Z Vendors to Zeitgeists
At the behest of a standards body, the
industry may be dragged towards adopting
a universal language and generally speaking,
it probably won’t be a bad thing.
There is, however, one missing ingredient; with independent trade associations such as the EDM Council calling for
industry-wide standardization, should the
likes of Bloomberg, Thomson Reuters et al
have to conform with their data dictionaries too?
While SIX Telekurs’ Cole thinks it is
enough for vendors to make sure their
data dictionary is easily available and
understandable, the EDM Council’s Atkin
says anything that makes it easier for
clients to integrate data into the environments of their customers is good for
vendors. “Indeed, a few major ones are
mapping our semantics work to their
structures,” he adds.
In fairness to both, the discussion may
be a little premature. While most firms do
have a data dictionary or definitional standards, an industry-wide semantics repository is still some way off. Even so, the fact
that proper data governance
is on the table at all
is a reflection of
the times.
March 2011