Semantics: A Trillion-Dollar Cottage Industry

O917-01.qxd 7/24/03 4:12 PM Page 1
CHAPTER
1
Semantics: A Trillion-Dollar
Cottage Industry
The U.S. economy is perched precariously on top of some 200 billion lines
of aging legacy mainframe code1 and a comparable amount of newer, but no
less endangered, code on various flavors of servers and PCs. This represents
a $3 trillion investment,2 most of which will need to be replaced over the next
decade, at a price more likely to hit $10 trillion.
This is pretty much business a usual, except for two things:
•
A large percentage of this cost, perhaps as much as half, is avoidable.
•
The approach we take to this next round of replacement will determine
how much of this investment really will be an “investment” that will
carry forward to subsequent generations of technology.
And the technology on which the realization of these benefits hinges is
not really a technology at all. It is a 2500-year-old branch of philosophy, made
suddenly relevant by a confluence of developments: semantics.
Consider the following:
•
The Mars Climate Observer crashed into the surface of Mars, a victim
not of a technical problem, but of a semantic misunderstanding concerning the units of measure used to calculate the thrust.
•
Between half and three quarters of the $300 billion spent annually on
systems integration is spent resolving semantic issues.
1. Rekha Balu, “(Re)Writing Code,” Fast Company, April 2001, pp. 181–189.
2. Paul Strassman, “End Build-and-Junk,” Computerworld, July 5, 1999. Available at
www.strassmann.com/pubs/cw/end-junk.shtml.
1
O917-01.qxd 7/24/03 4:12 PM Page 2
2
CHAPTER 1
Semantics: A Trillion-Dollar Cottage Industry
•
The entire Y2K adventure was two semantic problems piled on top of
each other, the first being the simple problem of determining whether
“01” meant “1901” or “2001,” the second being that the stewards of
many of the affected systems had no way to understand the applications in enough fidelity to predict what would happen to them if they
were altered.
•
The most promising technologies currently offered up to solve our
application development and implementation problems—Enterprise
Application Integration (EAI), XML, Business Rules, Web Services,
Collaboration, and of course the Semantic Web—all share a foundational reliance on semantics.
Perhaps this is enough to whet your appetite, and maybe about now you
are wondering: “Where can I buy some semantics?” or “How do I ‘do’ semantics?” or “Can I implement semantics in my organization?” But that’s not
the nature of semantics. Semantics is a discipline you apply, not a technology
you buy.
Monsieur Jourdain, in Jean Baptiste Molière’s play The Bourgeois
Gentleman:
“And when one speaks, what is that?”
“That is prose, Monsieur.”
“What! When I say, ‘Nicole, bring me my slippers, and give me my nightcap’; is that prose?”
“Yes, Monsieur.”
“Well, well, well! To think that for more than forty years I have been
speaking prose, and didn’t know a thing about it. I am very much obliged to
you for having taught me this.”
Like Monsieur Jourdain in the accompanying sidebar, I trust most software developers will be quite pleased to find they have been applying semantics their entire career. Maybe you haven’t been intentional or rigorous about
it, but in order to get anything at all done in the world of software you have
had to deal with semantics.
In this book, we look at every aspect of business systems anew. We also
put semantics under the microscope and find out what it is composed of, and
how that might guide our further investigations. And we look at our applications and our development technologies from the point of view of semantics, to see how that changes our perceptions.
Before we go any further, let’s get this out of the way:
O917-01.qxd 7/24/03 4:12 PM Page 3
The Semantic Era of Information Systems
Semantics
3
Semantics is the study of meaning.
Semantics is often defined as the study of the meaning of words, but we are
going to take the broader definition here, allowing for the possibility for
meaning to reside in something other than just words. Ultimately, the relevance and success of our application systems rest on what the symbols that
we are manipulating inside the computer really mean in the “real world.” Of
importance is not only what they mean—but do the people, and other computer programs, that deal with the presented information understand and
agree with the meaning as implied by the system?
The Semantic Era of Information Systems
Most of what we had thought were the hard problems of computer science
and business system development have been solved. We know how to write
efficient algorithms. We know the most effective ways to process and store
data. We’ve solved the problems of getting diverse computer platforms to
interoperate. We routinely store terabytes (trillions of bytes) of data in data
warehouses. The average home has more processing power at its disposal than
the largest corporation of just a generation ago. We’ve connected nearly a
billion devices to a single gigantic Internet.
What we’re left with, and what I believe will occupy us for most of the
next decade or two, are some problems that don’t lend themselves to quite as
mechanical a resolution. We have to determine what systems we really want
to build. We have to find a way to determine what parts of a system need to
be made flexible for future change, and which are likely to be stable for a long
time. We need a way to understand the systems we already have, before we
attempt to change them. We need a way to communicate with trading partners without a long burn-in period. And above all else we need a way that
computers can help us with some of the processes that up until now we have
thought of as being in the exclusive realm of the human: interpretation, negotiation, and reasoning.
Scratch the surface on any of these issues and you’re into semantics.
Indeed, for many of these problems, once the semantic issues are resolved,
the remaining technical problems are routine. No period of time is exclusively
focused on one issue, but there are periods of time when certain issues rise to
the top as the issue on which progress will be marked. In the 1980s it was
application development: We had an incredible appetite to build computerized versions of all our manual processes. In the early 1990s it was user interfaces: What could we do to make these systems easier to learn and use? Later
O917-01.qxd 7/24/03 4:12 PM Page 4
4
CHAPTER 1
Semantics: A Trillion-Dollar Cottage Industry
it was interconnections: If we could just overcome the barriers to getting our
customers and supply chains, to say nothing of our internal systems, hooked
up, we’d be able to move forward. Currently the top-of-mind issue may be
security. But the ground swell is developing that suggests an impending sea
change toward a semantic focus that may last a fair while.
This book is meant to be your guide for taking advantage of this shift, at
a minimum to avoid overinvestment in projects, technologies, and approaches
that are unlikely to stand up to the changes. But for many of you this will be
the opportunity to vault ahead of your competitors, either corporately or individually. Let’s spend a minute discussing how this book can help with that.
The Plan of this Book
The first third of this book (Chapters 1 through 5) is descriptive. It steeps
you in what semantics is and explains why something so seemingly simple
can be so complex. We deal briefly with the history of semantics and some
of the closely related fields, to familiarize you with this rich subject. To make
sure that you are clear about what aspects of our semantic conundrum were
created by our systems and which were there before computers, we start the
investigation of semantics in business systems before the arrival of computers.
We then follow the progression through to the present, having looked at some
of the areas that have used semantics the most to date: data modeling and
metadata development.
The second third of the book (Chapters 6 through 10) is prescriptive
and covers approaches and methodologies to uncover and make more explicit
the semantics that are already implicit in your business and your business
systems. This section is built for practitioners who wish to suffuse what they
currently do with techniques and approaches that will raise the level of semantic awareness in all their system-related activities. As such we will cover the
role of interpretation in semantics, as well as ways to elicit, record, and convey
a more complete semantic understanding of the systems and processes.
The last third of the book (Chapters 11 through 15) is subscriptive in
that it deals with relatively new technologies and approaches, some or all of
which you are likely to be subscribing to in the future and each of which has
a semantic twist to it. The chapter on XML deals with getting maximum
value out of the tags, which have the potential to carry semantic information.
The EAI chapter deals with using the study of semantics to overcome the
single largest cause of integration difficulty: late discovery of semantic incongruities. To prevent Web Services from re-creating the tangle of point-to-point
connections that characterize so many integration efforts, we describe a
O917-01.qxd 7/24/03 4:12 PM Page 5
A Brief History of Semantics
5
semantically inspired approach to their adoption. Chapter 14 discusses the
Semantic Web, the follow-on project to the World Wide Web. Fortunately,
we don’t need to explain the semantic aspect of it, but we do cover some
of the less obvious technologies that are being promoted along with the
Semantic Web, as well as a scenario that should be helpful in visualizing
how the Semantic Web will be used.
The book wraps up with a short chapter on getting started in your semantic endeavors, and two appendices: one a set of annotated resources for those
who would like to pursue this further, and the other a glossary of the many
arcane terms that this subject involves.
A Brief History of Semantics
I’ll make this brief, but I do believe there are some developments in the long
history of semantics that will still be relevant in the twenty-first century. There
are some philosophical arguments that we must be aware of, or we can waste
considerable time.
Figure 1.1 outlines some of the key developments in the history of
semantics. For our purposes, some of the key developments included the
following:
Pragmatism
Linguistics
Ancient
Greece
Spoken
Language
Artificial
Intelligence
Written
Language
Enlightenment
700,000 BC
20,000 BC
1700 AD
1960 AD
400 BC
1930 AD
1870 AD
FIGURE 1.1
Key developments in the history of semantics.
O917-01.qxd 7/24/03 4:12 PM Page 6
6
CHAPTER 1
Semantics: A Trillion-Dollar Cottage Industry
•
Spoken language—Most people rank the use of a spoken language
with the development of tools as the defining event that separated our
ancestors from the rest of the primate family. Semantically, early man
had to make a giant leap from screaming and pointing to the use of
abstract sounds to represent things that were not in the immediate
environment.
•
Written language—The advent of writing raised the bar considerably.
Tone and gestures were no longer available as adjuncts to aid with the
communication of meaning. Perhaps the most important development
was the ability to communicate with people who were not present.
Syntax and grammar gradually developed as writing became more formalized.
•
Ancient Greece—The self-reflective knowledge of meaning with which
our language was dealing had to wait until the Golden Age of Greece
to be articulated. We don’t know much about Socrates’ formal position
on semantics, other than that his famous Socratic method was mostly
aimed at finding deeper meaning in thoughts, words, and deeds. Plato’s
forms are a good representation of his take on semantics. He believed
that we infer knowledge of the perfect forms (for example, a circle)
from the less than perfect examples we come in contact with (round
things). His metaphor of the cave concerns how we can make inferences only indirectly about the essence of things. Aristotle’s wideranging contributions included a great deal on classification and the
establishment of identity, both central concerns for semantics. His syllogisms form the basis of how we can infer knowledge of a particular
item, once we ascribe it to a type.
•
The Enlightenment—The semantic embers burned dimly through the
Middle Ages, and even the Renaissance, with its advances in many
areas, saw little new work on semantics. Sir Francis Bacon, Sir Isaac
Newton, and René Descartes shifted the semantic debate to focus on
what could be observed and verified experimentally. A series of later
Enlightenment thinkers—Empiricists such as David Hume, Thomas
Reed, John Locke, and George “If a tree falls in a forest” Berkeley—
debated the role of the human observer as establishing context in a
world otherwise devoid of meaning.
•
Pragmatism—Charles Pierce was responsible for several early and
thought-provoking, high-level conceptual ontologies and for a formal
approach to logic applied to semantics. William James, another prag-
O917-01.qxd 7/24/03 4:12 PM Page 7
Putting Semantics in its Place
7
matist, brought us some of the concepts of verification and the belief
that nature is to be understood deductively.
•
Linguistics—By comparatively investigating human languages, and
especially anthropologically studying the languages of cultures that have
not been exposed to mainstream languages, we have learned a great deal
about what aspects of language are likely innate and what aspects are a
product of culture. Some of the notable contributors included Alfred
Korzybski, Noam Chomsky, Ludwig Wittgenstein, Eleanor Rosch, and
George Lakoff, who, although they were not all purely in the linguistic
field, all contributed greatly to the twentieth century’s advances in this
field. In particular, Rosch and Lakoff have contributed some of the
seminal work on what constitutes a category or a type, a topic that
those of us in the business of information systems use constantly with
little understanding of what we are describing.
•
Artificial intelligence (AI)—The AI community has contributed many
subfields to this pursuit, including the formalization of ontologies
(organization of meaning of terms), inferences (how we deduce new
information from presented information), and interpretations (for
example, how a computer system can be built to interpret spoken
English).
This brings us more or less up to the present. Yes, I’ve slighted some
groups or individuals, but I wanted to get as much of the flavor for the long
history of the subject as possible without becoming tedious. Throughout this
rich history, people have been refining fields of knowledge, primarily within
the domain of philosophy, specialized to study various aspects of the way we
understand our place in the cosmos. In the next section we introduce some
of these fields of study as they relate to semantics.
Putting Semantics in its Place
Semantics is not a stand-alone discipline; it is interlocked with various other
areas of study that borrow from it, and it from them. If you decide to pursue
this study further, Figure 1.2 should be a helpful roadmap or at least provide
some idea of where the major boundaries are.
Semantics is about meaning, and about distinguishing things that are
close in meaning from each other. As such, we should spend a moment
clarifying semantics by distinguishing it from several other terms that are
related.
O917-01.qxd 7/24/03 4:12 PM Page 8
8
CHAPTER 1
Semantics: A Trillion-Dollar Cottage Industry
Metaphysics
Epistemology
Ontology
Phenomenology
Linguistics
Semiotics
Cosmology
Philosophical Theology
Mereology
Semantics
FIGURE 1.2
Syntax
Pragmatics
Semantics in relationship to other branches of metaphysics.
•
Metaphysics—Metaphysics attempts to explain the fundamental nature
of everything, in particular the relationship of mind to matter. This is
the more traditional definition and is not to be confused with many
popular definitions that deal with occultism and mysticism.
•
Epistemology—Epistemology is the branch of philosophy that studies
the nature of knowledge. This is more concerned with how we know
things than with what things mean.
•
Mereology—You may not think there could be a branch of study
devoted to the relationship of parts to wholes, but there is and this is it.
The relationship to semantics is a bit complex. At one level mereology
informs us whether we are attempting to understand the meaning of
something in its entirety or whether understanding its constituent parts
is sufficient. On the other hand we need to apply semantics to the
many mereological distinctions to understand what it means to include
something, be part of something, or contain something.
•
Phenomenology—Phenomenology is a philosophy based on the belief
that reality is composed of objects and events as they are perceived by a
human mind. The sophists believe that “man is the measure of all
things” and that reality is as we perceive it to be. “Idealism,” the belief
that the only real world is the “ideal” world and that the physical world
is constantly changing, is a form of phenomenology.
•
Linguistics—Linguistics is the study of language, and generally is a
broader concept and includes semiotics. Linguistics also covers many
other disciplines not related here, such as the study of sounds.
•
Ontology—Ontology is a branch of metaphysics that deals with structures of systems. Currently, it is associated with organization and classi-
O917-01.qxd 7/24/03 4:12 PM Page 9
A Semantic Solution to a Semantic Problem
9
fication of knowledge. It is closely related to semantics, the primary distinction being that ontology concerns itself with the organization of
knowledge once you know what it means. Semantics concerns itself
more directly with what something means.
•
Semiotics—Semiotics is the study of signs and symbols as used in language. It is a broader study than just the study of meaning in that it
incorporates syntax, semantics, and pragmatics.
•
Syntax—Syntax as a philosophical study is concerned with first-order
logic, or how to construct very basic grammars. It forms the basis for
formal semantics.
•
Pragmatics—Pragmatics is a branch of semiotics concerned with the
relationship between language (or signs) and the people using them.
How does social context interact with meaning? The word pragmatic is
often used to mean practical. This is an important body of work relative to semantics, especially as we come to apply semantics in a predominantly social context (business).
•
Cosmology—Cosmology is a subdiscipline of metaphysics that concerns itself with the nature of being. It is concerned with how the universe works, not with what our terms mean. It has come to be
associated more with astronomy of late. Relative to semantics, it asks
“Why?,” whereas semantics asks “What?”
•
Philosophical theology—Philosophical theology is the branch of
metaphysics that deals with the relationship of a deity relative to the
phenomenology of the world. It has historically been a trump card in
the discussion of semantics, in that the meaning of things we deal with
in semantics could be construed to have a meaning not available to us
but only to a divine creator.
I hope that this overview is useful in describing a few of the other fields
that have been closely related to semantics over its long history.
A Semantic Solution to a Semantic Problem
To get us started, I’ve outlined a sketch that says, in effect: We have trillions
of dollars worth of business software installed and in use. It is obsolete, or
soon to become obsolete, and we are going to have to replace it. I make the
claim that much of the complexity of these systems has its roots in semantics, as do most of the newer technologies with which we are now presented.
O917-01.qxd 7/24/03 4:12 PM Page 10
10
CHAPTER 1
Semantics: A Trillion-Dollar Cottage Industry
And I further claim that a systematic study of the application of semantics to
business systems is our best hope for the future.
But I haven’t really made an airtight case for these claims. That’s what the
rest of the book is about.
I could have opened with the George Jetson-style world of the future
where your refrigerator not only talks with your thermostat, but they have
meaningful conversations. And your day timer understands the office politics
of staff scheduling. But you’re not likely to buy that “if only” technologic
utopian world.
Instead, I’d rather appeal to that side of you that knows the current state
of business systems is a deplorable mess, many times more complex than it
needs to be, and yet is still not up to the tasks we have in store for it. You
suspect that things could be much better than they are now. You’re eager to
find out what to do to make things better.
We’ll get there, but before we do, let’s take a moment to understand how
we built this semantic cacophony.