GERHARD SCHURZ AND KAREL LAMBERT*
OUTLINE OF A THEORY OF SCIENTIFIC UNDERSTANDING
Appeared in: Synthese 101, No 1, 1994, 65 - 120.
ABSTRACT. The basic theory of scientific understanding presented in Sections 1-2 exploits
three main ideas. First, that to understand a phenomenon P (for a given agent) is to be able to fit
P into the cognitive background corpus C (of the agent). Second, that to fit P into C is to
connect P with parts of C (via "arguments" in a very broad sense) such that the unification of C
increases. Third, that the cognitive changes involved in unification can be treated as sequences
of shifts of phenomena in C. How the theory fits typical examples of understanding and how it
excludes spurious unifications is explained in detail. Section 3 gives a formal description of the
structure of cognitive corpuses which contain descriptive as well as inferential components.
The theory of unification is then refined in the light of so called "puzzling phenomena", to
enable important distinctions, such as that between consonant and dissonant understanding. In
Section 4, the refined theory is applied to several examples, among them a case study of the
development of the atomic model. The final part contains a classification of kinds of
understanding and a discussion of the relation between understanding and explanation.
1. INTRODUCTION
The goal of physiological psychology, according to a recent text
(Thompson 1967, p. 1), is "to understand how the complicated system
of cells comprising the brain functions to produce the almost infinite
variety of behavior patterns displayed by organisms".1 Beyond the
intrinsic philosophical interest in the subject of scientific understanding
such declarations stimulate, the notion increasingly has come to occupy
center stage in the enduring debate over the nature of scientific
explanation. For Salmon, for example, yielding scientific
understanding is a criterion by means of which scientific explanations
are identified (Salmon 1978; see also Lambert 1988, 1990). Moreover,
for at least three decades, certain philosophers of science have
suggested that the key feature of scientific explanations is to yield
scientific understanding, and thus support, if only implicitly, the
importance of ascertaining the exact nature of understanding and its
relationship to explanation.2 On the other hand, van Fraassen has
recently asserted in (1985, p. 642) that a major source of the differences
between himself and Salmon vis-à-vis scientific explanation probably
lies in their different views about the relation between scientific
understanding and scientific explanation.
The primary task of this essay is to outline a theory of scientific
understanding. It is in terms of this theory that the relation between
scientific understanding and scientific explanation shall be ascertained.
Two constraints help to define this enterprise. First, the explication of
scientific understanding to be developed presumes that scientific
understanding is an intersubjective (objective) notion and independent
of the psychological features of given persons. Second, to avoid any
hint of circularity in the relation between scientific explanation and
scientific understanding, the explication of scientific understanding
will be independent.
In the remainder of this introduction we give an informal account of
our theory of scientific understanding. The later sections are both devoted to precise explications and extensions of our theory, and to its
applications. Hereafter we use the expressions 'understanding' and
'scientific understanding' synonymously unless otherwise specified.
1.1. Understanding as Fitting into a Cognitive Corpus
We begin with an example. When the famous aviator, Eddie Rickenbacker, was found three weeks after he crashed into the Pacific Ocean
during World War II, he was extremely fatigued and nearly starved.
Nevertheless, he had no appetite. Why? Suppose a scientist provides
the following answer: Because Rickenbacker was extremely fatigued,
and the rhythmic contractions in the duodenum tripping off blood
chemistry changes initiating appetite were blocked by extreme fatigue.
This answer clearly yields understanding of Rickenbacker's lack of
appetite. But what is the specific information in which that
understanding consists? The answer does involve new descriptive
information, for example, that Rickenbacker was extremely fatigued,
but it is not merely this kind of information which constitutes
understanding. To understand a phenomenon P is to know how P fits
into one's background knowledge.
In this idea of scientific understanding,3 a key ingredient is the
notion of background knowledge; we shall refer to it more formally as
the cognitive corpus C. In the example above, the phenomenon (P) that
Rickenbacker had no appetite initially did not fit into the cognitive
corpus because one expects starved persons to be hungry and hence to
have an appetite; that is, one would have expected P not to be the case.
The scientist's answer above fits P into the cognitive corpus by
showing that P is inferable from certain facts and physiological laws.
Although this deductive way of fitting P into the cognitive corpus C is
an important one, it is not unique. For example, one can also try to fit P
into C by giving statistically relevant factors, or by indicating what ends
P satisfies, etcetera. An adequate theory of scientific understanding
must accommodate all scientifically relevant ways of fitting a
phenomenon into the cognitive corpus.
Like the concept of explanation, the concept of understanding has a
"natural" location in the logical theory of questions and answers. To
understand means to be able to give an answer to an understanding
seeking how-question, namely, "How does P fit into the cognitive
corpus C?" An explanation seeking why-question - "Why P?" - can
also be implicitly an understanding seeking question because an answer
of the form "Because X" shows implicitly how P fits into C. But it is
only an answer to the above how-question which explicitly describes
how P fits into C, for example, by stating that P is inferable from X. The
basic idea, then, can be regimented as follows: A sentence A yields
understanding of P if and only if it answers the question, "How does P
fit into C?".
What follows about understanding seeking how-questions from
erotetic logic à la Belnap and Steel (1976) can be said very briefly. The
logical form of such questions is "What is the way in which P fits into
C?", formally "(?X)(P fits into C in the way X)", where X is a variable
ranging over statements (or sets of them). Their presuppositions have
the form "There is at least one way in which P fits into C", formally
"(X)(P fits into C in the way X)".
Understanding is a ternary relation between an answer A, a
phenomenon P and a cognitive corpus C: A yields understanding of P
relative to C, or, as we shall say later, A contributes understanding of P
to C. C is intended to be the cognitive corpus of the inquirer. So C
contains all statements known or believed by the inquirer (including
observation statements, laws, theories and hypotheses of varying
degree of belief). Since the inquirer may vary, C may vary. It is not
useful to fix C. For example it is not useful to require C to be the current
state of scientific knowledge, because then the resulting notion of
understanding would be historically relative rather than objective. But
if understanding is treated as a ternary relation, with an explicit
argument place for cognitive corpuses, it can be explicated in an
objective way.
In this essay, we mean by a phenomenon roughly anything expressed
by a declarative statement, singular or general, true or false. So
phenomena are nonlinguistic entities which may or may not obtain.
(This notion corresponds to "Sachverhalte", often translated as "states
of affairs".) It is important to note that C does not contain the
phenomena literally, but only their cognitive representations in the
form of statements. We will represent phenomena by special statements
such that each 'elementary' phenomenon is represented in C by exactly
one such statement. The method of representation, described later, is a
cornerstone of our theory, and allows us to identify 'loosely' phenomena
with their linguistic representations in C. So, "a phenomenon is
contained in C, or fits into C" literally means "the statement expressing
this phenomenon is contained in C, or fits into C".
The notion of fitting a phenomenon into a cognitive corpus C is more
complicated than appears at first glance. Since an answer A to the
question "'How does P fit into C?" contains much new information not
contained in C, it changes C into some successor state C*. C is the
cognitive corpus of the inquirer before he received an answer, when P
is not yet understood. C* is the cognitive corpus after the inquirer has
received an answer, when, provided the answer is adequate, P is
understood. Understanding, thus, is a feature of the development of C
into C* (C C*). Because C* emerges from C by adding the
information in the answer A, we shall call C* the A-successor of C and
denote it by C + A. If the information in A is consistent with C, this
addition yields a mere expansion of C; otherwise it yields a revision of
C (precise definitions are given in Section 3.3).
The new information supplied by A may be descriptive, and be either
empirically factual or theoretical. If the new information contained in A
is only factual, we say that A is factually innovative. For example, for
an inquirer already versed in the physiological theory of appetite, and
who did not know of Rickenbacker's ordeal, what is new in the answer
is the empirical fact that Rickenbacker was extremely fatigued. If the
new information contained in A consists only of new laws or theories,
we say that A is theoretically innovative.4 For example, those answers
that provided understanding of the negative results of the
Michelson-Morley experiment, or of the stable orbits of electrons in
atoms (contra classical electrodynamics), are theoretically innovative.
On the other hand, the answer A may contain no new descriptive
information, but only present an inference in which P is inferable from
some premises X already known in C. Then the only reason why P is
not understood in C is that the new inference is not known or not
mastered in C. For instance, in theoretical chemistry it often happens
that all relevant descriptive information, factual as well as theoretical,
about a certain molecular configuration is known, but nevertheless
understanding of this configuration is absent because the solution to the
differential equation describing this configuration, which is a purely
mathematical task, is not known.
To accommodate both descriptive knowledge and inferential
knowledge in our theory of cognitive corpuses, we shall treat a
cognitive corpus C as a pair K, I, where K is a set of phenomena (those
believed by the inquirer), and I is a set of inferences (those mastered by
the inquirer). One can think of K as the "parts" of the cognitive corpus,
and of I as the set of instructions about how to connect them. Fitting P
into C = K, I is always elliptical for fitting P into K via I. Since the
A-successor of C is denoted by C + A, it follows that C + A =
K + A, I + A. Answers which are only inferentially innovative change
I but not K; that is, K + A = K. Purely factual and/or theoretically
innovative answers change only K but not I; that is, I + A = 1.5
1.2. General Characteristics of Fitting-Into
There is a literal sense in which the dynamic character of understanding
conflicts with the basic idea according to which a statement A
contributes understanding of P to C just in case it answers the question
"How does P fit into C?". For if A causes C to change into C + A, then A
no longer describes how P fits into C but rather how it fits into C + A.
This "conflict" shows that the process of fitting is an interactive process.
To fit something X into something Y often consists not only in giving X
a certain place in Y, but also in a certain rearrangement or change of Y
so that X can be given a natural place in the new arrangement of Y.
By way of analogy, consider puzzle building. C is the uncompleted
puzzle, and P is a new piece one is trying to fit into C. Now it often
happens that P can be fitted into C only if C is first changed. For
instance, one may first have to insert other new pieces into the puzzle
before P can be inserted. This corresponds to an expansion, in which A
adds new descriptive information to C while retaining all information
already in C. Or the effort to insert P in C may make it clear that the
puzzle C has been mistakenly arranged, and therefore must be rearranged to accommodate P. This corresponds to a revision in which
the information in A conflicts with information already in C, and thus
requires information deletion in C in order to accommodate P. Finally,
there are 'lucky' moments where one sees how to insert a new piece into
the puzzle without the addition or deletion of other pieces. They
correspond to the situation in which A is purely inferentially innovative,
where, in other words, A changes I, but not K.
The interactive character of the process of fitting into makes it clear
that the question, 'How does P fit into C?" is elliptical for "How can C
be changed into C* such that P fits into C*?". Since "P fits into C*" is
itself elliptical for "P fits into K* via I*", if one knows that C* = K*,
I* such that P fits into C*, one also knows how P fits into K* - namely,
by means of I*. In other words, since a cognitive corpus C also contains
the inferential knowledge which constitutes the means by which P is
fitted into it, how-questions about P's fitting into C are fully reflected in
how-questions about changes in C.
Because fitting a phenomenon P into a knowledge system K via a
given I is interactive, it is not merely a matter of finding the right
"location" of P in K, the right "parts" of K with which P can be
connected via I; hence, it is not merely a local concern. Fitting P into K
also implies that P and K* together form a coherent system. Coherence
is a property distinct from connectability. It is a property of K*
(including P), not simply of the local place in K* in which P has been
fitted. Indeed, connectability of P does not always imply increase of
coherence. It may happen that the change of knowledge K K*
occasioned by A enables P to be connected with parts of K*, thus
producing a kind of local coherence, but only at the cost of destroying
the coherence of K* elsewhere. If, in such a event, the loss of coherence
in K* exceeds the gain in local coherence, one would not normally say
that P has been successfully fitted into K* and hence that P is
"understood" in K.
There are two main cases of this kind. First, A may contain new
antecedent facts F which make it possible to "fit" P into K* but which
do not themselves fit into K*. An example is the following: You see a
friend, you know to be very stable, with slashed wrists (=P), and you
ask: Why? Suppose you are told that he tried to kill himself (=A). A is
even more puzzling than P; even though it enables one to connect P
with K* it produces incoherence elsewhere in K* and thus does not
satisfy the quest for understanding. Second, A may contain new theories
which enable P to be "fitted" into K* but which do not themselves fit
into K*. An example is the case of Bohr's atomic model in 1913.
Although it enabled the phenomenon of the discrete hydrogen spectrum
(= P) to be connected with K*, it was incoherent vis-à-vis classical
electrodynamics. And, indeed, most physicists at the time were not
inclined to say that Bohr's atomic model really provided understanding
of atomic behaviour.
The total coherence of a body of knowledge, then, is properly treated
as a gradual notion reflecting how well its various local parts fit with
each other. This occasions the fundamental thesis of this essay; to fit P
into K* means (informally) to connect P with parts of K* such that the
coherence of K* is increased relative to the coherence of K. The total
coherence of K* increases just in case the local coherence resulting
from the connection of P with parts of K* is not outweighed by a loss of
coherence elsewhere in K*.
1.3. Scientific Constraints on Fitting-Into
We turn now to those features which distinguish scientific
understanding from other kinds of understanding. To borrow a
metaphor from physics, the picture of understanding described so far
has to be placed under the 'boundary conditions' of the scientific
method.
Here scientific knowledge systems are treated as linguistially
represented information systems. Connections between their elements
generally arise from transmitted information. These information
transmitting connections are basically reflected in the notion of an
argument ibs (in the broad sense). An argument ibs is any pair of a set
of statements Prem (the premises) and a statement Con (the conclusion)
where a 'sufficient' amount of information is 'transmitted' from Prem to
Con. This is denoted by Prem Con. In probabilistic terms, this means
that the probability (degree of belief) of Con is increased by Prem to a
"sufficiently" great value. This probabilistic characterization figures
only as an intuitively necessary condition for being a correct argument
ibs, reflecting the characteristic feature of information transfer. Since it
is doubtful that a sufficient condition for correct arguments ibs can be
given in general, we will specify a list of the kinds of correct arguments ibs, including deductive, approximative-deductive and inductive
arguments, which extensionally defines the correctness of an argument
ibs. Only in the case of a deductive argument does correctness coincide
with validity and, hence, the truth preserving detachability of the
conclusion.
In addition to these logical features, scientific knowledge systems
are supposed to reflect reality, and reality is grasped via the actually observed phenomena, the empirical data. The elements of scientific
knowledge systems should not seek merely to be mutually coherent;
they should also be empirically confirmed. However, empirical
confirmation is not independent of coherence; it is a special case of the
latter, namely, coherence with the data. (Approaches to the
confirmation of theories via their explanatory coherence are based on
this insight; cf. Harman 1965; Thagard 1978, 1989). This implies two
decisive constraints on coherence in science:
First, that the phenomena be connected in an non-circular way via,
arguments ibs. Indeed, if this is the case, then many phenomena are
connected with, and hence are at least partially reducible to, a small set
of unconnected, basic phenomena. So, non-circular coherence
coincides with unification. This can be put into a slogan: coherence
minus circularity = unification.
Second, that the phenomena be connected in a non-circular way via
arguments ibs always has to contribute to unification of the subset of
phenomena called the data. Unification of hypotheses is only of value if
they themselves contribute to data unification. This requirement is
satisfied by giving the unification of data a certain preference over the
unification of hypotheses by means of different 'costs' and 'gains'.
Together, the two constraints guarantee that scientific unification yields
empirical confirmation as a byproduct.
1.4. Fitting-Into as Connecting plus Unifying
To summarize, fitting a phenomenon P into a given cognitive corpus C
= K, I is done by means of an argument ibs which connects P with
parts of K such that K's unification increases. More specifically, every
answer A which shows how P fits into the purely descriptive segment of
C* (=K + A) claims, at least implicitly, that an argument ibs Prem
Con which connects P with K + A such that K + A is more unified than K
is sound.6 The claim that Prem Con is sound means that that
argument is correct and that its premises are rationally acceptable (in a
precise sense, see Sections 3.1-2). Our informal explication
of fitting can also be put in the form of the slogan: fitting = connecting
+ unifying.
The phenomenon P need not always be identified with the
conclusion of the connecting argument ibs, although this is the most
important case. When P = Con, what is being explicated is a case of
understanding-why P; to understand why P is to know the causes or
reasons for P, or some phenomena to which P is "reducible". But even
when P is a premise of the connecting argument it may yield a certain
kind of understanding, namely understanding-about P. To understand
something about P is to know that from P, together with other
knowledge, certain interesting things follow.
The idea of unification as crucial to scientific understanding is not
new. Our theory of unification differs from other approaches in some
important respects. First, we don't primarily treat unification in the
traditional sense of deductive systematization which underlies the approaches of Friedman and Kitcher.7 To be applicable to interesting
cases of science, our theory embodies various kinds of connections;
besides the many kinds of arguments ibs there are also heuristic
connections, disconnections, multiple connections, etcetera. Second, in
contrast to Friedman's and Kitcher's approach, ours has an 'inbuilt datapriority' and a method of knowledge representation which together
enable the successful exclusion of spurious unifications (see Section
2.3 and fn. 22). Moreover, the requirement of non-circularity is
essential for distinguishing scientific from spurious unification. The
latter requirement distinguishes our approach from coherentist
approaches like that of Lehrer's (1974, ch. 7, 8) and Thagard's (1989).
To avoid misunderstandings, it should be noted that this requirement
excludes only complete circles, that is, circularly connected phenomena
which are not connected with the data. In contrast, partial circles, that
is, circularly connected phenomena which have some independent data
confirmation, may very well contribute to scientific unification.
Third, the intended objects of scientific unification are not primarily
scientific statements or other linguistic structures, but the real phenomena.8 The appeal to reality in the search for unification prevents
artificial unification; science can't produce more unification than really
exists in the world. This view of scientific unification has three
important consequences. First, scientific unification must yield
empirical confirmation.9 Second, the method of knowledge
representation must guarantee that every elementary phenomenon is
represented by exactly one elementary statement in K such that
unification in K reflects unification of real phenomena. Third, the
connections between elements in K are intended to reflect real
connections in nature (with reservations to be explained in the next
paragraph).
Talk about real connections inevitably leads to the issue of causality.
Like Salmon (1984; 1989, Section 3.7) we think that causality is important in the explication of scientific understanding, but like van Fraassen
(1980, p. 124) and Kitcher (1989, Section 6.3-5) we do not think that
special views about causality have the status of unimpeachable
metaphysical truths; rather the notion of causation is theory relative and
can change with changing cognitive paradigms, for example, as in
relativistic mechanics versus quantum mechanics. So we assume that
every scientific corpus C contains some general principles about causality, that is, about the structural features of the real connections between real phenomena (in a certain field).10 A system of such principles
is called a causal frame. Causal frames themselves have to be justified
via their unification effect. Connections in K which accord with the
causal frame are called causal connections; the others noncausal. We
will assume, in the explication of unification, that causal connections
are generally preferred over noncausal ones. This implies that if no
preferred causal connections are available, then noncausal connections
may contribute something to scientific unification. For a noncausal law
such as "if the barometer falls, a storm will ensue" reflects at least
something about reality, namely an empirical correlation, although one
which does not directly rely on a causal connection. Moreover, such a
law has scientific value insofar as it enables one to predict phenomena,
although it does not enable one to control them. This balanced view,
giving noncausal connections less value than causal ones but not zero
value, fits actual science very well.
Finally, our treatment of scientific understanding avoids the van
Fraassen objection (1980, p. 109f) that unification is a global notion but
understanding of a phenomenon is a local matter, and Lambert's
complaint (1988, p. 316f) that a single answer to an understanding
seeking question "How does P fit into K*?" does not usually produce
global unification. Our treatment does not require that an answer A to
an understanding seeking question produce global unification, but only
that it increase unification. We characterized an understanding producing answer in terms of its incremental, or differential, role in the process
of unifying our knowledge. Overall unification is not produced by a
single answer to an understanding seeking how-question, but is gradually produced by a large set of such answers in the historical development of the cognitive corpus because each of these answers leads to a
positive unification shift. By way of analogy, consider the differential
notion of velocity: if a body x has a nonzero velocity v at only one or
some time points (where v is taken as primitive), it will not move any
distance. Only if x has a nonzero v during an entire time interval, will
there be a certain "visible" movement of x.
2. THE BASIC THEORY
2.1. Answers to Understanding-Seeking How-Questions
The complete erotetic version of an understanding seeking how-question is this: "What information A changes C into C + A such that P fits
into K + A via I + A ?". In what follows it is this question form which is
abbreviated by "How does P fit into C + A ?". The erotetic standard
form of the answer would be: "The information A that changes C into C
+ A such that P fits into K + A via I + A is ---", where "---" stands for the
information which changes C, that is, the claim that a certain argument
ibs is sound. In what follows, we identify the answer statement with the
claim "---". So, in the question "How does P fit into C + A?", "A"
figures as a bound variable (bound by the question-operator) but when
we speak of the answer A, "A" figures as the constant "- --" instantiating
that variable.
DEFINITION 1. A statement A contributes understanding of P to C iff
A is an adequate answer to the question "How does P fit into C + A ?",
where the phenomenon P is contained in the cognitive corpus C.
This definition calls for some remarks. First, contributing understanding of P presupposes that P is already known or believed in C.
Second, the notion of "contributing understanding", as explicated in
Definition 1, assumes that the answer A is not yet known in C. It seems
to us that this notion is the basic one; other notions, like that of
"containing understanding" are reducible to the it.11 Third, C = K, I
stands for the cognitive corpus of some idealized agent at some time.
The corresponding real agent may be a person, a scientific community,
an intelligent machine such as a computer, etc. Definition 1 abstracts
from the agent and speaks only about cognitive corpuses. Just as it
makes sense to say in abstracto that the brain of an agent (rather than
merely the agent) understands, so it makes sense to say in abstracto that
C understands.
As mentioned, K is the set of (elementary) phenomena known or
believed by the agent at the given time, and I is a set containing the
arguments ibs mastered by the agent at the given time. As a rationality
condition, we require the elements in K to be rationally acceptable, and
the arguments in I to be correct (objective definitions of these notions
are given in Sections 3.1-2).
DEFINITION 2. A is an adequate answer to the question "How does P
fit info C + A ?" iff A includes the claim that an argument ibs Prem
Con is sound and this argument fits P info C + A, where C + A = K +
Prem, 1+ (Prem Con).
As mentioned earlier, the claim that Prem Con is sound implies the
claim that the premises Prem are rationally acceptable and that Prem
Con is correct. Prem is the descriptive portion of the answer and is
added to K, while Prem Con is the inferential part of the answer and
is added to I. Accordingly, K + A = K + Prem and I + A = I + (Prem
Con).
There are various ways in which A can include the claim that a
certain argument is sound. Since some part of A may already be known
in C, the following useful distinctions can be made. A directly includes
the claim that Prem Con is sound (or A is a direct answer) if this
claim follows from A.12 A indirectly includes the claim that Prem
Con is sound (or A is an indirect answer) if this claim follows from A
together with what is known already in C but neither from A alone nor
from C alone. So A includes the claim in question indirectly if some
part of the direct answer is already known in C and A mentions only that
part not yet known. If a direct answer includes information already
known in C, it is cognitively redundant (relative to C); otherwise it is
cognitively nonredundant. Of course, it may also happen that some part
of A not only is cognitively redundant, but is even logically redundant
in the sense that that part of A is logically unnecessary for fitting P into
K + A via the argument Prem Con.
Often A contains further information. If the inference Prem Con
is new vis-à-vis C, then A must include a proof of that inference, or at
least sufficient hints enabling the agent to master the inference. If the
premise set contains new descriptive information, then A may mention
additional evidence enabling the agent to accept the new descriptive
information. This occasions a distinction between a minimal answer A,
containing just the claim that Prem Con is sound, and an enriched
answer, containing additional kinds of information such as that just
specified.
The kind of answer which is especially relevant here is, of course,
the direct, minimal answer. It may be expressed colloquially in several
ways: "Con follows from the fact(s) that &Prem", where "&Prem"
denotes the conjunction of Prem's elements, or "Prem Con is a
sound argument", or "&Prem is true and Con follows therefrom", etc.
DEFINITION 3. Prem Con fits P into C + A iff Prem Con is a
connector in C + A which contains P as the conclusion (that is, P = Con)
or in the premise set (that is, P Prem) such that C + A is more unified
than C.
Definition 3 expresses the core of our theory, the explication of fitting
as connecting plus unifying. The requirement that Prem Con
connects P with (some part(s) of) K + A is contained in the requirement
that Prem Con is a connector in C + A (see Definition 4) which
contains P either as conclusion or as a premise. This corresponds to the
two kinds of understanding mentioned earlier, understanding-why and
understanding-about.
DEFINITION 4. Prem Con is a connector in C (=K, I) iff (i) (Prem
Con) I, (ii) Prem K, Con K, and Con Prem, (iii) there is no
proper subset Prem* of Prem such that Prem* Con satisfies clauses
(i) and (ii).
Definition 4 makes clear that connectors in C are more than arguments ibs in I. Its premises and conclusion must be in K (which is not
required from elements in I). They must connect different phenomena,
whence the requirement that Con Prem (i.e. P P is no connector).
Moreover, as required in (iii), connectors must be relevant in the sense
that every premise is indeed necessary to infer the conclusion. This
requirement guarantees that the increase of unification required in
Definition 3 is really due to those "parts" of K + A with which P gets
connected, and not due to irrelevant premises, as in the argument
"{T, P&Q} P", where T is a theory having nothing to do with P but
which may unify other parts of K. Such an answer may yield
understanding, but not understanding of P. This "relevance" requirement guarantees that the local and the global component of "understanding P" are combined in the right way.
Since we require the elements in K to be rationally acceptable, and
the arguments in I to be correct (as explicated in Sections 3.1-2),
Definition 4 entails that a connector is a sound argument ibs. What
happens if Prem Con, claimed to be sound by the answer A, is not
really sound? As will emerge later, if Prem is not rationally acceptable
then K + A = K, and if Prem Con is not correct, then I + A = I. In
neither case can Prem Con be a connector in C + A and, therefore,
the answer A can't be adequate.
2.2. Unification: The Basic Picture
The explanation of the conditions under which a cognitive corpus C + A
is more unified than a cognitive corpus C is very complex. It is a
weighted combination of several factors, and there are several
situations in which the question whether one cognitive corpus is more
unified than another depends on "subjective" considerations, for
example, the weight of predictive power versus that of theoretical
explanatory power. But there are, nevertheless, some objective rules
which will enable one, in given situations, to make an effective
comparison of the unification afforded by a pair of cognitive corpuses.
Unification is not explicated as a quantitative measure because
assigning numbers to degrees of unification would be quite arbitrary.
Unification is better treated as a comparative concept; that is, as an
ordering relation over the set of cognitive corpuses. Since there will be
some situations in which it is not possible to decide objectively which
of a pair of cognitive corpuses is more unified, the relation that C* is
more unified than C is only a partial ordering relation. That partial
ordering is explicated as follows.
A basis of K is any subset K* of K such that every member of
(K - K*) is inferable from a subset of K*, that is, such that for every
member M of (K - K*) there exists a connector Prem M in C with
Prem K*. The unification basis of K is that basis of K which yields
the greatest unification of K, according to the criteria developed below.
It is denoted by Kb and the phenomena in it are called the basic
phenomena of K. The complementary set Ka (=K - Kb) is called the set
of assimilated phenomena.13 The partition U(C) = {Kb, Ka} is called the
unification classification of K. We also write U(K) instead of U(C).
According to Section 1.4, science seeks to unify the data. Thus, the
fewer the phenomena that are basic and the more the data that are
assimilated in K, the greater the unification of K will be. But for the
estimation of unification it is not enough just to "count" the phenomena. We assume that every phenomenon P is associated with a certain
intrinsic weight wP, which reflects the cognitive complexity of P. For
the reasons given two paragraphs previously, all one can assume is a
partial ordering among the weights of phenomena. So, the concept
of weights is comparative rather than numerical. As a minimal rule for
wP, we assume that the cognitive complexity of a general phenomenon
is greater than that of a singular one, and that of a theoretical phenomenon is greater than that of an observational one.14 Further plausible
weight-comparisons, restricted to certain contexts, will be assumed
later.
If a phenomenon P is newly added to Kb, its weight is an intrinsic
cost: - wP. If, on the other hand, P is assimilated, this 'cost' is 'saved'.
Moreover, every datum D in K has a certain intrinsic gain, independently of whether it is in Kb or in Ka. For reasons of simplicity, we
identify this gain with the (positive) intrinsic weight, wD. (By convention, gains are written in the unsigned form, and costs in the negative
form). This implies that adding new and basic data to K which do not
affect other parts (in particular do not falsify accepted theories) neither
increases nor decreases the total unification of K. If we call every
phenomenon which is not a datum (D) a hypothesis (H),15 we obtain the
following basic rules for adding phenomena to K: adding a datum D to
Kb costs and gains nothing, while adding a hypothesis H to Kb costs
-wH; adding a datum D to Ka gains wD, while adding a hypothesis H to
Ka costs and gains nothing.
Besides these intrinsic costs and gains, the addition of a phenomenon
P to K also has extrinsic costs or gains, reflecting the decrease or
increase of the unification of K due to position al changes of other
phenomena caused by the addition of P to K. It is characteristic of this
approach that only the data (D) have intrinsic gains vis-à-vis unification.
Thus hypotheses have no intrinsic gain. Of course, good hypotheses
must have an extrinsic gain. This reflects the fact that in science, a
theoretical speculation has no value 'in itself'; it must be empirically
confirmed, via its ability to unify data.
Giving only data an intrinsic gain is a rather simple method to
guarantee the 'data-priority' in scientific unification required in Section
1.16 Two general remarks are appropriate here. First, our approach is
not committed to an instrumentalistic position vis-à-vis scientific
theories. What is implied is that a realistic interpretation of theories is
justified only to the extent to which these theories have an extrinsic gain,
i.e. are empirically confirmed via data unification. Second, our
approach assumes a distinction between data and hypotheses and
thereby a distinction between observational and theoretical concepts.
But even if one takes the radical view that all data are theory-dependent,
our approach still applies: the only difference would be that then a
change of fundamental theories in K would imply changes in
associating intrinsic gains with phenomena in K.
Now, given two corpuses with unification classifications U (K) =
{Kb, Ka} and U (K*) = {K*b, K*a}, we can always decompose the transformation leading from U (K) to U (K*) (that is, U (K) U (K*)) into a
sequence of one element changes, called shifts. There are three kinds of
shifts: a new element P can be added to a set Kx U (K), denoted by
P+x (x = b or x = a); it can be subtracted from such a set, denoted
by P-x; or it can move from one set Kx U(K) to another set Ky U(K),
which is denoted by Pxy. The cost or gain of a shift, say s, is called its
u-value, u(s); if the u-value is positive, it is a gain, otherwise a cost. The
u-values of additions are given by the rules three paragraphs above; the
u-value of a subtraction P-x is just the negation of the corresponding
addition P+X . Finally, a move P x-y is formally decomposable into the
subtraction P-x and the addition P+y;17 so its u-value is given as the sum
of the u-values of P-x and of P+y. Summarized, we obtain:
Additions: u(H+b) = wH; u(H+a) = 0; u(D+b) = 0; u(D+a) = wD
Subtractions: u(H-b) = wH; u(H-a) = 0; u(D-b) = 0; u(D-a) = wD
Moves: For P = H as well as P = D: u(Pba) = wP; u(Pab) = wP
Given a sequence of shifts leading tram K to K*, the unification of C
and C* can be compared by "adding up" the u-values of the shifts. We
denote the u-value of a sequence of shifts si by the corresponding
sequence of u-values: u((s1, . . . , sn)) = (u1, . . . , un) where ui = u(si).
Strictly speaking, the u-value of an addition which introduces both a
cost and a gain must also be represented as a pair of two u-values:
u(D+b) = (wD, wD). [This pair can be viewed as the u-value of a pair of
two 'hypothetical' shifts, one introducing only the cost, the other
introducing only the gain]. In spite of their merely comparative character, u-values obey some obvious rules. For instance, combining two
positive u-values gives an increased u-value: (w1, w2) > w1 ; combining
a positive and a negative u-value of the same absolute weight cancels
out to zero (w,w) = O. Formally, the set of u-values together with the
sequence-operation "," forms a commutative group, partially ordered
by "", where "" is strictly monotonic with respect to ",". This fact
implies the following general rule (Comp), according to which all our
shift diagrams will be evaluated. (Justifications of these claims are
given in the appendix.) Given a u-sequence (that is, a sequence of
u-values) u(s) = (u1, . . . , un) of a given shift-sequence s, a fractionation
of u(s) is any set of u-sequences, F(s) = {us1, . . . , usm}, such that u(s)
can be obtained from their concatenation (us1, . . . , usm) by association
and commutation. A negative u-sequence us-; (i.e. us-< 0) is balanced
by a positive one us+ (>0) iff us+ us-; us- is dominated by us+ if
us+ > us-. (Similarly, if "negative" and "positive" are exchanged).
Now, (Comp) says:
(Comp): For any shift sequence s: If there exists a fractionation
F(s) = {us1, . . . , usm} of u(s) and a one-to-one function f assigning to
every negative us- F(s) a positive us+ F(s) which balances us-,
then u(s) 0. If in addition there exists a positive us+ in F(s) which
dominates its negative f-counterpart us- (= f -1(us+)), or which has no
such counterpart, then u(s) > 0. Similarly, if "negative" and "positive"
are exchanged, and "" and ">" are replaced by "≤" and "<", respectively.
We will represent the sequence of shifts leading from K to K* with
associated u-values graphically by a shift diagram. If the comparative
weight-rules together with (Comp) imply that the u-value of the entire
shift sequence is positive (negative), then the unification of C* is
greater (smaller) than that of C, which is abbreviated by
C*>uC(C*<uC). Otherwise, the situation allows no unique comparision.
Consider some basic examples. We assume that we are dealing with
understanding-why; so the phenomenon to be understood (P) appears
in the conclusion of the argument claimed to be sound by the answer
(A).
Consider first a case of purely inferential innovation where Prem
K and only the argument Prem P is new, that is, not in I. To fix the
ideas, let P be the fact that a ball thrown through the air at angle
follows the path of a parabola attaining maximum distance when =
45°, and let Prem contain the already known classical Newtonian
theory of motion. (The unknown inference Prem P, of course, is
deductive employing mathematical axioms). The shift diagram then is
this:
K
Pb a
wP
Prem
P
b
Prem
K+A
P
a
b
a
The transformation CC +A consists of one move Pba with gain
wP, so the unification has increased, C + A >u C.
Consider now a case of a purely factual innovation. Here the argument Prem P and the requisite laws or theories in Prem are already
known, but not the antecedent fact D (a datum). For example, let P be
the known fact that a filled, closed bottle of water, standing on a
balcony, burst during the night, let L be the known law that whenever
the temperature drops significantly below 32°F, a filled, closed bottle of
water will burst, and let D be the new fact that the temperature dropped
significantly below 32°F during the night (so the known argument is
{L, D}P). The shift diagram for this situation is as follows:
K
L
D+b
Pb a
L
P
0
wP
D
b
a
b
K+A
P
a
The transformation CC +A consists of the addition D+b, which costs
nothing because D is a datum, and the move Pba with gain wP; so
C + A >u C.
As a rule, in a shift diagram all and only those phenomena in K are
mentioned which are involved in shifts during the transformation
KK + A. For example, in the immediately preceeding diagram this
implies the assumption that the new fact D is neutral with respect to the
remaining knowledge in K (i.e., no further fact gets assimilated and no
theory gets falsified by D+b).18
Consider next a case of purely theoretical innovation where it is
assumed that Prem P is the argument in the immediately preceding
example, but now only the law L is new information while D and P
are known (and the inference is mastered). Then the shift-diagram
would be as follows:
K
P
L+b
D
-wL
b
Pb a
L
wP
a
K+A
D
P
b
a
The net change of unification is (wL,wP) and because wL wP (a law is
more general and thus has more weight than a singular fact), it is
negative, i.e. the unification decreases. (The same would hold even if D
were a hypothesis, since D does not shift). So the introduction of a new
law merely to accomodate a single newly inferred fact will decrease the
unification. But if there are a lot of facts explained by the new law, then
the situation will change and the u-increase of the newly inferred facts
will dominate the u-decrease of the new law. Assume the weight of the
law is dominated by the weight of n single facts (where the latter have
equal weights), wL n.wP, where here and in the following, n.wP stands
for a sequence (w1, . . . , wn) with wi = wP for each 1≤ i ≤ n. More
formally, L = x(Dx Px), Pi = Pai, Di = Dai, and wPi = wP (for all i).
Then the u-decrease due to the addition of L will be dominated by the
u-increase based on m newly inferred single facts if m n. The
shift-diagram is:
K
D1,...,Dm
L+b
(Pi )b a
P1,...,Pm
-wL
m. wP
b
a
D1,...,Dm
K+A
L
P1,...,Pm
b
a
The same holds when one seeks to understand a law, and the answer
to the inquiry introduces a new theory. A new basic theory which
permits the assimilation of only one law would be u-decreasing, but
would be u-increasing if many new laws are assimilated via the new
theory.
The two preceding shift diagrams show how the unification of facts
yielded by a law reflects the process of empirical confirmation:19 one
single positive instance (one correctly inferred fact) is not enough to
confirm it, but if the number m of positive instances is sufficiently large,
the law will be empirically confirmed. (How great the number m must
be is arbitrary, what counts is that for any choice of weights m can be
made sufficiently large).2o
A general remark on the relation between unification and empirical
confirmation is in order. Unification yields empirical confirmation as a
by-product, but is not identical with it. Unification, and thus understanding, differs from confirmation in two important respects. First, if
in the shift diagram above, the law L stands in conflict with some
accepted theory T in K, then it might happen that the u-decrease of K
due to this conflict with T dominates the u-increase due to the unification of single facts (wL, m.wP), whence the addition of L to K will
decrease K's unification, although L is empirically confirmed. This
shows that the empirical confirmation of L reflects only a special kind
of unification, namely unification below - that is, roughly speaking, the
unification effect of L with respect to the subset of those phenomena in
K which get assimilated via L, or which are additonal premises in the
assimilating connectors. In contrast, total unification reflects also all
unification effects above, e.g., how well L fits into higher level theories
in K. Does this mean that unification, although not identical with
empirical confirmation, coincides at least with 'overall confirmation'
(including confirmation effects below as well as above)? The answer is
no, because the second difference is that unification is intended to
reflect the real connections and thus gives causal connections a
preference over noncausal ones, whereas for 'overall confirmation',
such a preference is irrelevant.21
To summarize, empirical confirmation can be explicated in our
theory of unification as follows: Given C = K, I and a hypothesis H
(not contained in K), the set of K-phenomena below H, K/H, is the set of
all phenomena occuring (as premise or as conclusion) in connectors in
C + H which have H in their premise set. Then, H is empirically
confirmed in C + H iff the addition of H to C/H : = K/H, I is
u-increasing.
The unification of a given corpus C not only depends on the number
and weights of basic and assimilated phenomena in K, but also on the
strength of the connections in C. So, it is natural to assume that if a
phenomenon P is assimilated in K via a connector Prem P weaker
than a deductive one, then some 'rest' cost of P still remains; let us
denote this rest cost by waP. waP is a value ranging between 0 and wP;
it is 0 if the connector is deductive; for inductive connectors it increases
with the decreasing probability value of the connector; for approximative-deductive arguments, it increases with decreasing approximation. Finally, waP also reflects the mentioned priority of causal
versus noncausal connections: waP is positive if P is assimilated by
means of a noncausal connector; the case waP = 0 holds only in case of a
deductive causal connector.
It may happen that the phenomenon P in question is already assimilated in K, and the answer increases its understanding by presenting a
new argument which makes P better assimilated and thus increases the
unification. In terms of shift diagrams, this situation can be represented
by a move "within Ka", Paa' = (Pa, P+a'); the subtraction dissolves the
assimilation of P in Ka via the old argument, and the addition
assimilates P in Ka by means of a new argument. The gain of this
sequence is (waP, wa'P)
2.3. Spurious Unifications of the First Kind
We turn now to what Kitcher (1981, p. 526) calls one of the gravest
problems for any account of unification, namely the problem of distinguishing between genuine (scientific) unification and spurious unification. It is very important to distinguish between two different kinds
of spurious unifications, referred to as the "first kind" (which rest on
empirically contentless speculations), and the "second kind" (which
rest on logical irrelevancies and redundancies). The second kind of
spurious unification will be eliminated by our method of knowledge
representation in Section 3.1, by showing that arguments ibs containing
such irrelevancies do not really connect the phenomena and so on
logical grounds alone can't unify them. Here we discuss spurious
unifications of the first kind, which are philosophically deeper because
they really connect the phenomena, but they do not unify the data and
thus provide no scientific unification.
Assume the corpus K to contain many data of the same type, Dai
(1≤ i ≤ n), and the question is asked how they fit into K. The answer
assimilates them by postulating that the individals ai have a certain
nonobservational (theoretical) property, Tai, connected with Dai via the
speculative law L = x(Tx Dx). This is a typical kind of speculation
without scientific value; anything can be 'understood' by postulating
hypotheses of this kind. For example, Dai might be the fact that it was
stormy on the day ai, Tai says that Zeus was angry at the day ai, and the
law L claims that whenever Zeus is angry, it is stormy. Or to take
Kitcher's example, Dai might be anything, Tai says that god wants
that Dai is the case, and L says that whatever god wants to be the case, is
the case. The shift diagram is as follows:
K
K+A
Da1
L+b
(Tai) + b
(Dai )ba
Dan
-wL
(-wTai)i≤n
(wDai)i≤n
b
a
L
Ta1
Da1
Tan
Dan
b
a
For each i ≤ n, the singular hypothesis Tai has no intrinsic gain, and its
intrinsic cost wTai dominates its extrinsic gain, that is, the gain wDai of
the move (Dai)ba ,since Tai is theoretical; so ((wTai)i≤n ,(wDai) i≤n) < 0.
Even if the weights of Tai and Dai were equal, the net effect would be a
u-decrease because of the additional cost wL of the newly added law.
Most importantly, this kind of answer yields a unification decrease for
any number n of inferred data Dai. This is the typical case of a
nonscientific speculation: the set of hypotheses {L, Tai 1 ≤ i≤ n} is not
empirically confirmable.
This shift diagram shows also why the exclusion of circular connections is essential for the elimination of spurious unifications of the first
kind. For example, assume the law L = x(Tx Dx) is replaced by the
stronger law L* = x(Tx Dx), saying that Zeus is angry if and only if it
is stormy. If circular connections were admitted, then the Tai's and the
Dai's would count as assimilated and hence be in Ka, which would yield
a "u-increase" provided n is great enough. Of course, this "u- increase"
is a spurious unification, corresponding to the famous example where
one "explains" the storm by pointing out that Zeus was angry, and when
asked how he knows this, answers "because it is stormy". If two
phenomena P1, P2 are circularly related (via x(P1x P2x), only one
may get assimilated and thus unified with the help of the other, but not
both. So, in this situation there is a choice between two possible
unification classifications, one in which P1 is basic and P2 assimilated,
and vice versa. But since in many biconditional laws, one direction is
causal and the other noncausal, the preference of causal connections
over noncausal ones will determine the choice.
Another kind of nonscientific understanding is yielded by
explanation of empirical regularities by speculative "powers", e.g. the
explanation of the narcotic effect of opium by its "calming power".
Here L = x(Ox Nx) is the already accepted (and confirmed)
empirical law (0 for 'opium', N for 'narcotic effects'), which is in
question, and the answer presents the two speculative laws T1 = x(Ox
Px) (P for 'calming power') and T2 = x(Px Nx). The shift diagram
shows that the result will be a unification decrease, because both T1 and
T2 have more (or at least not less) weight than L.
K
L
b
(T1)+b
(T2) + b
Lba
T1
-wT1
-wT2
wL
T2
a
K+A
L
b
a
Let us, in contrast, discuss now a simple example of an empirically
significant theoretical law. Assume F1, . . . , Fm are empirically
described kinds of substances and G1, . . . , Gn are observational predicates; Kb contains the empirical laws Lik = x(Fix Gkx) for all
i {1, . . . , m} and k {l, . . . , n}. This is the typical situation where one
conjectures in science that there is a certain "intrinsic" and not directly
observable property of objects, say T(x), common to all empirical
substances F1, . . . , Fm, which has as its empirical effects the empirical
properties G1, . . . , Gn. T(x) may e.g. mean that "x has metallic
structure"; then the Fi are different kinds of metals, and the Gk are the
typical properties of metals. In the resulting shift diagram, m + n
theoretical laws, namely Ti = x(Fix Tx) for all i {l, . . . , m} and
T k = x(Tx Gkx) for all k {1, . . . , n}, are introduced in order to
infer m.n empirical laws Lik:
K+A
K
L11 ... L1n
(T1)+b
Lm1 ... Lmn
-m . wT
b
a
(T2) + b
-n . wT
Lba
T1 ... Tm L11 .... L1n
m . n . wL
T1 ... Tn
b
Lm1.... Lmn
a
Assuming the Ti's and T'k’s have approximately equal weight, and
similarly the Lik's, the net effect is (m.n.wL, (m + n).wT). Although
wT > wL (because T is more theoretical than L), we can assume that
wT is dominated by s.wL for sufficiently great s, whence the unification
will increase for sufficiently great m and n, because m.n >>m + n. So,
the answer, which presents this set of theoretical laws, increases our
understanding; and at the same time, the theoretical laws become
(simultaneously) empirically confirmed. This shift diagram also shows
the holistic character of the understanding yielded by theories: if just a
few single laws Ti are added to K, the result is always a u-decrease;
only if many Ti’s plus many T'k’s are simultaneously added, will the
unification increase. (Similarly, the confirmation of these theoretical
laws is holistic in nature). It is in this way that our basic theory accounts
for the difference between empirically significant and nonsignificant
theoretical laws, and thus enjoys an advantage over other approaches to
unification or coherence, such as those of Kitcher and of Thagard.22
3. FORMAL DEVELOPMENT: AMPLIFICATION AND
REFINEMENT OF THE THEORY
3.1. Knowledge Representation by Relevant Elements and Spurious
Unification of the Second Kind
FL is a formal language rich enough to express the statements and
inferences contained in the cognitive corpus C of the underlying agent
AG (at a given time). FL contains at least the language of first order
predicate logic. Moreover, FL contains only logical symbols and variables for nonlogical symbols of certain types (in particular
propositional variables p, q, . . .; predicates F, G, . . .; individual
variables a, b . . . (free), x, y . . . (bound)). So a development of
knowledge which introduces a new descriptive notion of some type in
FL will not change the formal language FL. In the following, a, ß, . . .
denote statements and Γ, Δ, . . . sets of them. is a statement of FL if it
is well-formed according to FL, and Δ is an inference over FL if its
premises and conclusion are statements of FL. L denotes the full logic
of FL (and ├L denotes deducibility in L). In most applications L can be
assumed to be classical or free predicate logic. Vis-à-vis AG, L is the
logic of AG were he logically omniscient, but since we do not assume
AG to be omniscient, his deductive inferential knowledge will be
weaker than L.
Recall that C = K, I. K is the relevant representation of KNOW,
which is the set of all statements of FL which are known or believed by
AG, and I is the set of all arguments ibs over FL mastered by AG. Note
that the premises and the conclusion of these inferences need not be in
KNOW; indeed arguments with purely hypothetical premises are
crucial in thought-experiments. We adopt several scientific rationality
conditions, which fall into two classes: acceptability and closure conditions. Acceptability conditions are: (1) The inferences in I are correct
in a sense soon to be explained, (2) KNOW is logically consistent, and
(3) all statements in KNOW which are not logically or analytically true
are empirically confirmed (by the data in KNOW), at least to some
degree. The closure condition is: (4) KNOW is closed under all deductive and inductive arguments ibs in I.
That AG 'masters' the arguments Prem Con in I means two things:
first, that AG is able to find a proof for Prem Con, and second, that if
Prem were contained in KNOW, then AG would be aware that Con is
connected with Prem in KNOW by that argument, and so would accept
also Con provided is deductive or inductive. In fact, this is contained
in our closure condition.23
To obtain K, KNOW must be decomposed into 'minimal parts' such
that every 'part' corresponds intuitively to one elementary phenomenon. For this purpose, we adopt the method of "relevant elements".24
The basic definitions underlying the method are these:
DEFINITION 5a. is a relevant conclusion of Γ iff Γ├L and there
exists no predicate which is replaceable in on some of its occurrences
by any other predicate of the same degree salva validitate.
Propositional variables are regarded as predicates of degree 0. Typical
examples of irrelevant conclusions according to Definition 5a are
(where the replaceable variables are underlined):
p├L p q ;
p├L q p ; p├L p & (q q); x(Fx Gx)├L x ((Fx & Hx) Gx);
etc. In particular, if is a relevant conclusion of Δ, then 's predicates
are contained in Δ's. The converse does not hold; e.g. p├L p p is
irrelevant (for details cf. fn. 24).
DEFINITION 5b. is a relevant element of KNOW iff KNOW and:
(i) is a relevant conclusion of KNOW, (ii) there exists no finite set Δ
of relevant conclusions of KNOW each of which is shorter than such
that ├L & Δ holds, (iii) is not the universal instantiation of some
ß KNOW satisfying (i), nor the existential generalization of a
conjunction of such ß's, and (iv) among all relevant conclusions
ß KNOW logically equivalent with and satisfying (i)-(iii), is the
first according to some fixed, say alphabetical, enumeration of all
statements.25
K is now identified with the set of KNOW's relevant elements.
(Phenomena in K are denoted by capital letters, like P, Q, etc.)
Definition 5b contains the core of the method of knowledge
representation, which rules out spurious unifications of the second kind.
For instance, if P is a phenomenon in KNOW and I contains the
inferences of addition P├L P (for any sentence ), then infinitely
many irrelevant conclusions of the form P a are in KNOW. But, of
course they don't really express 'new' phenomena beyond P and thus
can't contribute to the unification. They are excluded from K by (5b)(i);
whence the inferences P├L P can't be connectors over K. Clause
(5b)(i) also excludes irrelevant laws from K: if x(Fx Gx) is in
KNOW, then x((Fx & Hx) Gx) is not in K.
Clause (5b)(ii) decomposes relevant consequences into their
smallest relevant conjuncts. Examples of decompositions according to
(5b) (ii) are, e.g., p & q {p, q}, x(Fx Hx Gx) & Fa
{x (Fx Gx), x (Hx Gx), Fa}; but not p {p q, p q},
because here the conjuncts are irrelevant consequences of p. This
clause solves the famous 'conjunction paradox' besetting the approach
of Kitcher (1981).26 For any set of phenomena l, . . . , n in KNOW,
KNOW always contains a statement from which they all follow, namely
their conjunction 1& . . . &n (provided AG masters this inference).
But this conjunction is not really a new phenomenon beyond its
conjuncts; so it does not contribute to the unification. It is excluded
from K by clause (5b)(ii), whence inferences of the form & ß├ L
can't be connectors over K.
Clause (5b)(iii) rules out yet further spurious unifications. For instance, every conjunction of the form Fa & Ga & . . . implies the existential statement x(Fx & Gx & . . .); but this statement does not constitute a new phenomenon beyond {Fa,Ga, . . .} and thus does not
contribute to the unification. Similarly, every law x(Fx Gx) implies
the instantiation Fa Ga for every individual , but these
instantiations are not new phenomena and thus don't contribute to the
unification. For the same reason, breaking up homogeneous temporal
phenomena into smaller temporal parts does not produce new
phenomena. For example, if one knows that Mary was ill last week,
then one knows that Mary was ill during every second of time last week.
But this does not constitute unification of a multitude of new
phenomena.27
Among statements satisfying 5b(i)-(iii) there might be still several
which are logically equivalent (e.g. x(Fx Gx) and x(Gx Fx),
etc.) and thus express the same phenomenon. Clause (5b)(iv) guarantees that every elementary phenomenon is expressed by only one
relevant element. - We note finally that the same line of reasoning can
be applied to most of the explanation paradoxes in the debate on
D-N-explanation: it can be shown that the deductive-nomological arguments underlying them can't be connectors over K and thus can't
contribute understanding.28
Care must be taken in the interpretation of the dictum that every
element of K corresponds to exactly one real phenomenon, given Definition 5. Notice that K may also include disjunctions, for example,
"This substance is silver or tin" (Fa Ga), provided they are relevant
elements of K. This is the case, for instance, when K contains both the
law x(Dx (Fx Gx)) and the data statement Da, and the example
disjunction is inferred from them. For those reluctant to say that disjunctions, whether relevant or not, describe real phenomena, it would
be more appropriate to say that every element of K corresponds either to
a real elementary phenomenon or to a disjunction of such.
3.2. Kinds of Arguments ibs and Weak Connections
An argument Prem Con is correct if it belongs to one of the
following kinds of correct argument forms ibs:
(i) Deductive Arguments, denoted by Prem├ L Con, in which Prem
logically implies Con in accordance with the underlying logic L; in this
case correctness coincides with validity.
(ii) Approximative Arguments, denoted by Prem a Con. Central in
standard cases of unification in science, approximative inferences are
"indirect" applications of valid deductive inferences: Prem a Con is
correct if and only if there is a Con* such that Prem logically implies
Con* and Con* approximates Con.29
(iii) Inductive Arguments. These arguments are not truth preserving.
We distinguish between two kinds of inductive arguments,
probabilistic and nonnumerical. Both rely on a nondeterministic
covering law. In probabilistic arguments, this is a statistical law, the
basic form of which is p(G/F) = r, (where p is the statistical probability).
According to the view of information transmitting arguments adopted
here, (i) the probability value must be 'sufficiently' high, at least greater
than 1/2, and (ii) the law must be positively relevant, that is
p(G/F) > p(G).30 Nonnumerical inductive arguments, developed
for example in nonmonotonic logic (McDermott and Doyle 1980),
rely on qualitative nondeterministic laws, here denoted by
F n G, and say that "if something is F, then it is G in the 'normal' case".
In an inductive argument, the conclusion Ga is inferred from a law L of
either type and the antecedent fact Fa.
A requirement for the correctness of an inductive argument of either
type is that it is correct only if there are no reasons in the remaining part
of one's knowledge to believe that, in spite of the acceptability of the
law and the antecedent fact, the conclusion is false or has a significantly different probability value. This point, recently emphasized in
nonmonotonic logic, has been explicated by Hempel as the requirement
of maximal specificity. Ga is inferrable from Fa and L only if L is
maximally specific for the individual a with respect to the knowledge
system K - abbreviated by MS(L, a/K).31 In contrast to Hempel, we do
not require the maximal specificity requirement as a separate condition on inductive arguments of the form "L, Fa i Ga". Rather, it is
treated as a special premise in the argument, as in nonmonotonic logic.
So
inductive
arguments
have
the
general
form
"L, Fa, MS(L, a/K) i Ga", where L is either a statistical law p(G/F)= r
requiring the conditions mentioned above (in which case r is simultaneously the inductive probability of the conclusion given the premises),
or it is a nonnumerical law of the form FnG. Inductive inferences in
this reconstruction contain statements which say something about K
itself in their premises. Moore (1985) calls them 'autoepistemic' statements. We assume here that these autoepistemic statements are elements of K. The current reconstruction has the great advantage that it
enables a unified treatment of deductive and both sorts of inductive
arguments.
There exist further kinds of weak connections, which, though only
indirectly based on arguments ibs, are nevertheless embeddable in the
present theory.
Via-chain connections: Often a phenomenon P is connected with
some basic
phenomena Prem Kb via a chain of connectors
...
Prem P (e.g., in genetic explanations). In the inductive case,
via-chain connections are not reducible to single arguments; they offer
an additional kind of weak connection. So via-chain assimilated phenomena can be included as an additional kind of weakly assimilated
phenomena in Ka (with a certain rest cost).32
Random-Connections: Probabilistic arguments ibs, in our sense, are
probability-increasing and high-probability arguments. Jeffrey (1971)
and Salmon (1984, p. 109) have convincingly argued that even low
probability arguments may provide understanding if they inform us
about the totality of causes of an event and hence about everything that
can be understood about that event. A low probability argument
informs about the "totality of causes" iff its probabilistic law
'L' = 'p(G/F) = r' is probabilistically complete in K, abbreviated by
PC(L/K), which means that according to K's bank of laws, there exists
no further knowable cause (or antecedent condition) statistically
relevant for G and not contained in F. Improbable events with probabilistically complete knowledge are the typical case of random events. For
example, there is nothing 'puzzling' in a royal flush in a poker play
(= G), because we know there is no 'hidden' cause of the royal flush
beyond the fact that it was a poker play (= F); it was just a matter of
accident. We can include such random events as weakly assimilated
phenomena by (i) adding to I an additional set of low probability
"inferences" of the form "p(G/F) = r, Fa, PC(L/K) 1 Ga" (where r is
low; "l" stands for "low") and (ii) defining a phenomenon Ga K as
random-assimilated if it is the conclusion of a low probability connector in C.
Contrast-Connections: Often we do not simply ask "Why P?', but
"Why P, rather than Q1, . . . ,Qn?", where {P, Q1, . . . , Qn} is called the
contrast class of P, abbreviated as Contr(P). For example, we want to
know why, among all runners (the contrast class), just John has won the
race. Van Fraassen (1980, p. 148) has convincingly argued that it is not
necessary in this case that the premises Prem of the inductive argument
increase the probability of P to a value greater than 1/2; it suffices that
Prem favour P against Contr(P), that is, the probability of P, given
Prem, is greater than the probability of every other member of Contr(P),
given Prem. Our probabilistic arguments ibs are just a special case of
these 'contrast-connections', namely those with the trivial contrast class
{P,P}. We can include contrast-assimilated phenomena as weakly
assimilated phenomena by (i) extending the set I as in the case of
random-connections, (ii) assuming a function which assigns to
members P of K a contrast class Contr(P),33 and (iii) defining a
phenomenon PK as contrast-assimilated if there exists a set of premises Prem in K which probabilistically favours P against Contr(P).
3.3. Expansion, Contraction and Revision of Cognitive Corpuses
According to a standard account in the dynamics of knowledge systems
(cf. Gärdenfors 1988, ch. 3), the A-successor of a knowledge system S,
S + A, is defined in terms of expansion, contraction and revision. There
are two main cases. If A is consistent with S, then S + A is the expansion
of S with respect to A. If A is inconsistent with S, then S must first be
contracted in a minimal way such that it does not imply S. S + A is then
defined as the result of expanding the contracted subsystem by A; this is
called the revision of S by A. To give a general description, assume that
a knowledge system is a set of statements S constrained by certain
acceptability conditions AC and certain closure conditions CL. In
Gärdenfors (1988, p. 22), AC is consistency and CL is closure under
logical inference (logically omniscient knowledge). Our knowledge
systems are constrained by a much weaker CL, namely, closure under
all arguments in I, and a much stronger AC, namely that contingent
statements must be empirically confirmed (to some degree).
The expansion of S with respect to , S+, is defined simply as the
closure of S∪ {} under CL, provided S∪ {} satisfies AC; otherwise
S+ = S. The contraction of S with respect to , S -, is more difficult to
define. The problem is this. Suppose S implies . Then there is usually
more than one subset of S with equal maximal cardinality that doesn't
imply . A common solution is to order the subsets according to their
'importance' and to choose the most important set as the contraction. So,
S - is that subset of S of maximal cardinality and maximal importance
which does not imply , (i.e. is consistent with ), satisfies AC and is
closed under CL - provided such a set exists,34 otherwise S - = S.
The revision of S with respect to S -, S, then, is defined simply as
the closure of (S - ) + under CL, provided it satisfies AC; otherwise
S = S. The significance of AC in the process of 'adding' lies in the fact
that if the answer A contains an alleged new hypothesis, not empirically
justified, no expansion nor any revision can satisfy AC, whence S + A
will coincide with S. In short, the rational inquirer will simply not
accept the answer.
A troublesome problem in the case of contraction is that, frequently,
clear criteria for 'importance' are missing. This problem is easily solved
here by means of the notion of unification: the most important subset
among all potential candidates for S - will be that which achieves the
greatest unification for S.
When a knowledge system is contracted with respect to ,
previously irrelevant consequences may become relevant in the
contraction. For example, assume the law L := x(Fx Gx) is in K, and
the answer A contains the descriptive information Fa & Ha & Ga,
which falsifies L.
In such a case, the scientist rarely if ever will drop the entire law, but
rather assert a weakened form of it by restricting its antecedent
(application domain) such that it becomes consistent with A.35 In this
case, the restriction would be L* = x((Fx & ~Hx) Gx). However L*
is an irrelevant consequence of L, which is in KNOW (provided AG
masters the 'trivial' inference L├L L*) but not in K.
Therefore, contractions and revisions must be applied to KNOW
itself, and not to its relevant representation K. The resulting revised
knowledge system KNOW *A has to be split into 'relevant parts' again
before evaluating its unification. We proceed in the same way with
expansions, because the expansion K+A of K need not be a set of
relevant elements. To summarize, the A-successor of KNOW, KNOW +
A, is defined as KNOW+A[KNOW*A] if A is consistent [inconsistent]
with KNOW. By K + A, we mean the set of relevant elements of KNOW
+ A, where KNOW is the full knowledge system underlying its
representation K (and assumed implicitly with K).
For I, the definition of the A-successor is much simpler. Because
inferences can't contradict each other, I cannot be contracted or revised, but only expanded. Moreover, we have no closure conditions for
I. So, the expansion of I with respect to ∆ , I + (∆ ), is just the
union I ∪ {∆ }, provided ∆ is a correct argument ibs;
otherwise I + (∆ ) = I. This means that if the rational inquirer gets
an incorrect argument as answer, he won't accept it.
3.4. Dissimilated and Heuristically Assimilated Phenomena: The
Refined Picture of Unification
Assume, while at a market, that you see an elephant wandering around
(P). How is this puzzling fact P to be fitted into your cognitive corpus?
You ask (Q1): "Why is there an elephant walking around in the market?". Suppose the answer you receive is (A1): "Because the market
was designated an animal park some months ago". You are now able to
inter P from the answer (A1) and your background theory. Yet,
intuitively, (A1) does not provide any real understanding of P because
(A1) is even more puzzling than the original fact P. Though (A1) offers
a local connection, it does not unify the cognitive corpus because one
puzzling fact is simply exchanged for the even more puzzling fact, A1,
that the market was designated an animal park some months ago.
Suppose instead that (Q1) had evoked the answer (A2): "Because a
movie is being made at the market". (A2) is a typically "satisfying"
answer; it yields understanding of P despite the unlikeliness of your
market ever being chosen as a movie set. Nevertheless, such things do
happen, and, hence, there is nothing puzzling about it.
The problem this example poses for our basic theory of scientific
understanding is that the two answers (Al) and (A2) cannot be distinguished vis-à-vis their unification effect. What is needed, of course, is a
satisfactory analysis of what it means to be puzzling.
To be puzzling does not mean merely to be improbable. As shown in
Section 3.2, random events are low probability events which are not
puzzling at all. In contrast, the elephant walking in the market is not a
random event; one knows that there must be causes, at present unknown,
for the elephant's presence in the market. But - and this is essential to
our analysis of 'puzzling' phenomena - every imaginable cause of the
elephant's presence in the market, according to one's known bank of
laws and inferential knowledge, is very unlikely; there is strong
evidence against it. For elephants are usually kept in zoos or circuses,
and, it is being supposed, there are no such institutions around.
Moreover, even if there were, how could the elephant escape?
We informally characterize a puzzling - hereafter a dissimilated phenomenon as one which stands in conflict with K, and, moreover, every
imaginable 'cause' or 'reason' for it, according to the laws and theories
of K, stands in conflict with K.36 To explicate this idea more precisely,
we say first that a singular phenomenon stands in conflict with K iff it is
unlikely, given the remainder of K, and is not random-assimilated in K.
Second, a singular phenomenon P in K is unlikely, given the remainder of K, iff there exists an inductive argument in I having P as the
conclusion and satisfying all preconditions for detaching the conclusion
except that P is in K (cf. fn. 31). This can be expressed more succinctly
by saying that in the (hypothetically revised) corpus C + P, there
exists an inductive connector with P as the conclusion. For a
phenomenon P not in K, being unlikely amounts to the condition that
there is a connector in C having P as conclusion.
Third, an 'imaginable cause or reason' is explicated as follows. Let T
be a theory. A T-auxiliary statement is any statement in the language of
T describing a special condition, such as an an intitial or boundary
condition. Then ∆ is a set of 'imaginable causes or reasons' for a
phenomenon P in K iff ∆ is a set of T-auxiliary statements, for some
theory T in K, such that P is inferrable from T ∪ ∆ with an argument
in I - in other words, such that T ∪ ∆ P is a connector in C + ∆.
(Note that ∆ need not be in K. The "imaginability" of ∆ is contained in
the requirement that ∆ appears in the premises of an inference mastered
by AG).
Virtually the same analysis also applies to dissimilated law
phenomena. Think, for example, of the laws describing distant
correlations in quantum mechanics (the EPR paradox). They conflict
with the locality assumption of relativity theory; moreover, according
to the Bell's theorem, other things being equal, every attempt to
understand these correlations in terms of hidden variables would be
inconsistent with the locality assumption. The only difference is that for
laws, the notion of "standing in conflict with K" must be explicated in a
nonprobabilistic way. Typically, a law L in K stands in conflict with K
iff some general theory T in K contains a special clause in its antecedent
excluding a special class of phenomena to which the phenomena
described by (the antecedent of) L belong, and this exception clause is
necessary to avoid inconsistency, that is, if T did not contain this special
clause, T and L would be inconsistent with K.37 For laws which are not
in K, to stand in conflict with K simply reduces to being inconsistent
with K. To summarize:
DEFINITION 6. A phenomenon K is dissimilated in K iff (i)
stands in conflict with K, and (ii) for every law or theory T in K and
every set of T-auxiliary statements ∆ such that T ∪ ∆ is a
connector in (C + ∆), at least one ß ∆ stands in conflict with K.
It is clear, given Definition 6, why (Al) is dissimilated, but (A2) is
not. Of course, (A2) is unlikely, but one can imagine quite plausible
causes (∆) for making movies in markets. For example, one can
imagine that the director has chosen this location for a scene in which
an elephant makes a miraculous escape from his cruel trainer. Indeed,
given K, we don't know that this isn't the case; after all it is a causal
scenario which is not unlikely. This discussion leads to a second
important notion, the notion of a heuristically assimilated phenomenon:
DEFINITION 7. A phenomenon K is heuristically assimilated in
K iff there exists a law or theory T in K and a set of T-auxiliary
statements ∆ none of which stands in conflict with K, such that
T ∪ ∆ is a connector in (C + ∆).
The notion of heuristically assimilated phenomena plays an important
role in the scientific understanding of empirical facts by scientific theories. To borrow an example from Kitcher (1981, 512-515), the success
of Darwin's theory of evolution as a process of mutation and selection
was not so much due to its actual capacity to provide understanding of
the appearance and disappearance of particular species, but rather to its
promise as a picture in terms of which the evolutionary facts can be
inferred, if only the empirical initial and boundary conditions of evolution could be known.
The groundwork for a subtler treatment of unification having now
been laid, let Kß be, as before, the best unification basis such that every
phenomenon in Ka = (K Kß) is assimilated with premises in Kß (either
directly or 'weakly' in the sense of Section 3.2). In contrast to Section
2.2, not all phenomena in Kß are 'basic'; Kß rather divides into three
important subsets: Kh is the set of heuristically assimilated phenomena,
Kd is the set of dissimilated phenomena, and Kb = (Kß - (Kh ∪ Kd)) is the
set of basic phenomena. Basic phenomena are primarly those which are
not in the range of application of a law or theory, and hence can neither
be dissimilated nor heuristically assimi- lated. For example, every
fundamental law or theory in K is basic.
The method of assessing the unification via the costs and gains of
phenomena (in terms of their weight) remains the same, except that
now dissimilated, basic and heuristically assimilated phenomena must
be distinguished. Evidently the intrinsic cost of adding a dissimilated
phenomenon is much higher than that of adding a basic phenomenon,
which in turn is much higher than adding a heuristically assimilated
phenomenon. Expressing the the absolute value of the costs in terms of
weights, we have wdP>wP >whP > 0; where wdP, wP and whP stand for
the weight of obtaining P as dissimilated, basic or as heuristically
assimilated phenomenon, respectively. The intrinsic gain of P is given
as in the basic picture: independent of P's position, it is wP if P is a
datum, and zero otherwise.
According to these rules, the u-values of additions are calculated as
follows: For data: u(D+d) = - (wdD, -wD)[= (-wdD, wD)] - the negated
form indicates that the pair is negative; u(D+b) = 0, u(D+h) = (wD, -whD)
- the unsigned form indicates that this pair is positive;
u(D+a) = wD [or (-wD, waD), respectively]. For hypotheses:
u(H+d) = - (wdH ); u(H+b) = -wH, u(H+h) = -whH; and u(H+a) = 0
[or - waH, respectively]. As in the basic picture, the u-values of subtractions P-x and of moves Px→y (x and y ranging over d, b, h, a) are
calculated from additions: u(P-x).,) = -u(P+x), and u(Px→y) = (u(P-x),
u(P+y)). For example, u(D-d) = (wdD, -wD); u(Pd→h)=( wdP , -whP).
The weight of a dissimilated phenomenon P is the higher, the
stronger the conflict between P (and P's possible reasons) and the
remainder of K. The natural maximum of dissimilation is realized when
every imaginable reason for P is simultaneously excluded as impossible
for certain theoretical reasons; whence there exists no possible reason
for P in C. An example is the EPR paradox generated in the context of
relativity theory plus quantum mechanics; every possible cause for
distant correlations is excluded by the locality assumptions of relativity
theory.
The lower limit of wdP has to be significantly greater than wP. (A
plausible assumption is wdP ≥ 2.wP because obtaining P as dissimilated
means that previously P was expected in K; so we lose P as assimilated and obtain P as not dissimilated; we assume wP = wP). The
weight of a heuristically assimilated phenomenon, whP, ranges between
wP and 0, and is the greater, the more likely P's possible reasons are in
K, and the stronger the underlying connector.
Generally speaking, the phenomena in fields "h" and "a" of shift
diagrams are somehow assimilated and thus contribute positively to
unification, while the phenomena in fields "d" and "b" contribute
negatively to unification. This is not exactly reflected in the case of the
addition of hypotheses, because our 'simple method' gives all hypotheses zero-gain. But if we employ the more refined method of calculation mentioned in fn. 16 it follows, for every empirically confirmed
phenomenon P (confP > 1/2), that the u-value of its addition to d or b
will be negative, and the u-value of its addition to h and a will be
positive.
4. APPLICATIONS TO CASE STUDIES AND IMPLICATIONS
4.1. Consonant versus Dissonant Cognitive Changes
Consider again the example of purely factual innovation in Section 2.2:
P is the puzzling fact that the bottle of water on the balcony has burst, A
is the new antecedent fact that the temperature has fallen below 32°F,
and L is the respective covering law already known in K. Now A not
only is not puzzling, but is heuristically assimilated; possible
causes for the fact that temperature has fallen below freezing are easy
to give. The development in knowledge is pictured thus (“L→”
means that it doesn't matter where L lies):
K
K+A
L
A+h
(wA ,-wh A)
P
d
b
h
Pd
a
L
wdP
a
d
b
A
P
h
a
In contrast to the basic picture, the addition of A to K in the refined
picture is u-increasing rather than u-decreasing because A gets heuristically assimilated. So every shift in this shift diagram is u-increasing.
That is why the answer seems so "good". On the other hand, the
example of the elephant in the market (= P) and the puzzling answer
that the market has been designated an animal park (= A) has the
following shift-diagram:
K
K+A
A+d
L
-(wdA ,-wA)
P
d
b
h
a
Pd
a
L
wdP
d
b
A
P
h
a
This diagram consists of the u-decreasing shift A+d and the u-increasing
shift Pd→a Whether they are in sum u-increasing or u-decreasing depends on the degree of dissimilation of A in (K + A) compared to that of
P in K. Since in our example the answer A is even more puzzling than P,
it might be that (wdA, -wA) > wdP, which would imply a total u-decrease.
If, on the other hand, A and P are of approximately equal degree- of
dissimilation, then the total effect would be a u-increase (in the amount
of wA, the intrinsic gain of A). An example would be an answer like
"The elephant escaped from the circus over there", which would
undoubtedly still puzzle the inquirer, but probably less than before. However, independent of such variations, the essential characteristics
of the latter kind of shift diagram is that it contains both u-increasing
and u-decreasing shifts. This situation is different from the shift
diagram above, where every shift is u-increasing. (The same holds for
the answer A2 in our elephant example).
This gives rise to an important distinction. The understanding provided by a u-increasing cognitive change (and, also, the cognitive
change itself) is called consonant if its shift diagram consists of only uincreasing shifts. If the shift diagram also contains u-decreasing shifts,
the understanding provided (as well as the cognitive change itself) is
called dissonant. It follows from our rules that a consonant cognitive
change adds or moves phenomena only into Ka or to Kh,38 but never into
Ku or Kd; a dissonant change does. Dissonance comes in natural
degrees; a cognitive change is more dissonant the more and the stronger
u-decreasing shifts it contains.
The intuitive notion of a really satisfying answer to an understanding
seeking how-question, one which is "clear" and "beyond any doubt", is
an answer which provides consonant understanding. An example is the
answer in the first of the two preceding shift diagrams. In contrast, the
understanding yielded by the answer in the second shift diagram is
clearly dissonant. Intuitively, one would reflect its dissonance by
saying: "A may give one some understanding of P; but nevertheless one
doesn't really understand A!".
There are some close connections between these notions of
consonant and dissonant cognitive changes and Kuhn's notions of
normal and revolutionary science. If K is a richly developed knowledge
system (as scientific disciplines typically are), the only basic
phenomena in K are fundamental theories, because every fact or
empirical law will fall in the range of some theory in K and thus be
either assimilated, heuristically assimilated or dissimilated. Scientific
development remains 'normal' as long as new empirical phenomena can
at least be heuristically covered by the accepted fundamental theories the accepted "paradigm" in Kuhn's sense. Thus understanding in
normal science is consonant. Normal science becomes unstable when
phenomena are discovered which resist heuristical assimilation and
obstinately remain dissimilated. They correspond to Kuhn's
"anomalies". In such stages, scientific understanding becomes
dissonant. Sometimes (but not always) this leads to Kuhn's famous
"revolutionary" stage where the old paradigm is replaced by a new one.
Since a shift diagram is dissonant not only when a new one is
dissimilated, but also a new basic phenomenon, is added, such
paradigm changes are necessarily dissonant.
There is, however, an important difference between Kuhn's view and
ours. Whereas Kuhn's paradigms involve an inescapable element of
incommensurability and thus, subjectivity, our approach enables an
objective comparison of scientific stages with different paradigms via
their success in unification, and thus, an objective notion of progress in
scientific understanding.39
4.2. A Case Study of an "Aha!-Experience"
The present theory neatly explains the typical cognitive change
underlying a "sudden flash of insight" - an "Aha!" experience. It is a
consonant change consisting of a cascade of parallel strongly
u-increasing moves - typically d → a moves - in which previously
puzzling phenomena become assimilated via a fact which is itself
heuristically assimilated. We give an example from psychology. To
simplify the following shift diagrams, we will omit the arrows
representing single shifts and draw only K and K + A. There is no loss in
information because the single shifts plus their associated u-values can
be reconstructed from K and K+A.
Assume one knows (Fl) that a certain friend always behaves in a
stable manner, (F2) that he seems to be extroverted, (F3) that sometimes he seems never to be around, and at other times is often present,
but when next seen, (S) his wrists are observed to have been slashed.
The surprising fact S provokes the question "Why?". His answer (Al) "I
tried to kill myself !" seems to explain the slashed wrists, but is rather
puzzling because Fl and F2 make it very unlikely that he would attempt
suicide; in other words, no plausible cause of the fact in Al presents
itself. More importantly, Fl itself becomes puzzling in virtue of Al; it
also becomes difficult to maintain F2 as heuristically covered; so, we
assume, F2 becomes basic. The shift diagram for this change is as
follows:
K1
K2(=K1+A1)
S
F1
F1
F2
A1
F2
F3
d
b
h
S
F3
a
d
b
h
a
The chain of shifts leading from K1 to K2 clearly is u-decreasing, since
the gain of Sd→a is dominanted by the costs of Al+d, Flh→d and F2h→b,
respectively (assuming equal weights w and wd ≥ ~ 2.w, wh ≤ w/2, we
have 2.w gain but ≥ 3.w costs).
Now suppose that when asking "But why, in Heaven's name, did he
try to commit suicide?", one gets the answer (A2) that the person has
been manic-depressive for some years. This answer will doubtlessly
provoke an Aha! experience. (F3) suddenly becomes important because
it enables one to understand why the person seemed mentally stable in
spite of his psychological illness, that is, (F3) now prevents (F1) and
(F2) from being counterevidence to (A2), the person's asserted attempted suicide. Furthermore, (A2) is itself no longer puzzling but is
heuristically assimilated, because there are several plausible reasons for
the onset of manic-depressive disease. The cognitive change, as is evident in the shift diagram below, is dramatic:
K2
K3(=K2+A2)
F1
A1
F1
F2
S
A2
F3
d
b
h
A1 F2 S
F3
a
d
b
h
a
4.3. A Scientific Case Study: The Development of the Atomic Model
In advanced science, singular observations are usually omitted; it rather
starts with general observational phenomena (laws) as its "data", like
the empirical formula relating volume and pressure of a gas, or that
describing the spectrum of hydrogen. (The inductive step from singular
to general 'data' is implicitly contained in what is called the reproducibility of experiments). We adapt our approach to this advanced situation by treating general scientific data like singular data, i.e., we
associate intrinsic gains with them.
It was known in the late 19th century that atoms emit light when
excited by light or heat, where the frequencies of these emissions obey
a highly regular and discrete pattern. This frequency pattern was
empirically well determined for hydrogen. It is described by the
formula f = R(1/n2 – 1/m2), where f = the frequency, n and m are
integers, and R = Rydberg's constant. For n = 2 the formula yields the
famous Balmer series, discovered in 1885; n = 1 yields the
Lyman-series, n = 3 the Paschen series and n = 4 the Bracket series. We
label the laws describing these spectral data SPEC. There was, of
course, a strong desire to have understanding of the frequency pattern
according to SPEC.
Since Hertz and Maxwell it had been known that light consists of
electromagnetic waves. Late in the 19th century Thomson proposed
that atoms consist of "positive matter" in which the negative electrons
are embedded. We shall label Thomson's theory of the atom ATh,
classical electrodynamics (including electrostatics) EDYN and the
electromagnetic wave theory of light EWL. (MECH and EDYN - and
maybe also EWL and ATh - consist of several 'parts' and thus correspond not to single relevant elements but to sets of such. But, most
importantly, the following u-comparisons are independent of any specific logical reconstruction of these theories.)
Now ATh + EDYN + MECH + EWL induced the expectation that
SPEC were derivable. For assuming the electrons to be oscillating,
atoms according to ATh would behave like Hertz dipols; so if the
oscillation frequencies of these electrons were understood, then the
emission of certain light frequencies could be understood via the electromagnetic radiation emitted by the Hertz dipols. In this state of
knowledge, call it Ko, SPEC was not at all puzzling, but rather was to
some extent heuristically assimilated.
In 1911, Rutherford's experiments with -beams showed that particles can go through thick layers of atoms, although some of them
are scattered at large angles. We label these experimental data ERu.
Rutherford concluded that electrons in an atom have to be quite far
away from a compact positive nucleus (so that the -particles either fly
through these "holes" in atoms, or they bounce off the nucleus). So ATh
was falsified (disconfirmed). Together with Bohr, Rutherford proposed the famous "planetary" model of atoms (ARuB), according to
which the electrons surround the positive nucleus in certain circular
orbits. ARuB was embedded in classical mechanics (centrifugal forces)
and electrostatics (Coloumb forces between nucleus and electrons). It
provided a way to explain ERu. Moreover, it evoked the hope that the
emission spectrum (SPEC) could be understood as caused by the
electron's change of orbits. But then several stunning dissimilations
emerged.
First, according to MECH, every orbit is equally possible. So it was
puzzling why the lines governed by SPEC were sharp and discrete,
rather than continuous (as predicted by MECH), leading to the dissimilation of SPEC. Second, according to EDYN, the circulating electrons
should emit energy, and, hence, collapse into the nucleus. So EDYN,
applied to ARuB, led to a prediction violating the data (namely the
apparent stability of the hydrogen atom). But ARuB could not be rescinded in the light of ERu. So ARuB itself became dissimilated in the
light of EDYN, a dissimilation which was maximal in the sense that
there is no possible 'explanation' (assimilation) of the stability of atoms
from the point of view of EDYN. Indeed, formal consistency could be
achieved only by restricting EDYN in certain ad hoc ways, for example,
by holding that the EDYN principles for swinging dipols don't apply to
electrons in atoms. These ad hoc restrictions, call them REDYN, have a
strong u-decreasing effect, because they imply that many microphysical phenomena, previously heuristically assimilated in virtue of EDYN,
are no longer heuristically assimilated because EDYN no longer
applies to them. We denote this loss of previously heuristically
assimilated phenomena simply as LOSS; for instance, every previously
heuristically assimilated phenomenon about electrons in atoms is now
in LOSS.
Summarized, the shifts leading from Ko to this new knowledge situation, K1, are as follows. (We don't mention EWL in the diagram
because it does not change its position):
K0
K1
EDYN
SPEC
SPEC
MECH
ATh
EDYN
MECH
LOSS
ARuB
ERu
LOSS
REDYN
d
b
h
a
d
b
h
a
Two u-increasing shifts, namely (ERu)+a and (ATh)-b, stand against
several strongly u-decreasing effects: (ARuB)+d, (REDYN)+b,
(SPEC)h-+d and (LOSS)h-+b' So the total effect is strongly u-decreasing.4O
The familiar cognitive corpus had been thoroughly shaken. It was in
this context that Bohr suggested his famous stability postulate, PB, that
stable electron orbits are defined by the condition that their angular
momentum is a multiple integer of Planck's quantum. Let us call this
new cognitive stance K2. Based on the assumption that the spectral
frequencies correspond to the energy differences between the electron's
orbits, Bohr successfully derived SPEC from PB (plus ARuB) with impressive exactness. So there was the strongly u-increasing move
(SPEC)d→a. But at the cost that in K2, PB was now another maximally
dissimilated hypothesis, because PB is inconsistent with the continuity
principle of MECH (according to which electron orbits are indistinguishable).41 Accordingly, ad hoc restrictions for MECH (labelled
RMECH) were called for, which excluded microphysical objects from the
application range of the continuity principle. This again caused the
loss of several previously heuristically assimilatable phenomena, which
we call LOSS*. The shift-diagram is as follows:
K1
EDYN
SPEC
EDYN
PB
MECH
ARuB
EPu
LOSS
REDYN
ARuB
LOSS*
K2
MECH
SPEC
LOSS
ERu
LOSS*
REDYN
RMECH
d
b
h
a
d
b
h
a
In summary, one u-increasing shift (SPEC)d→a stands against three
u-decreasing shifts (PB)+d, (RMECH)+b, and (LOSS*)h→b. Even if one
thinks that the u-decrease generated by these latter shifts is compensated by the admirable success of Bohr's derivation of SPEC, the
change certainly is not consonant, but rather is quite dissonant.
Moreover, if one compares K2 with Ko, then the total effect is still
strongly u-decreasing. This explains why most physicists think that
Bohr's derivation provided no real understanding.
What happened when, in 1923, deBroglie proposed the duality principle that every particle has a certain wave length which corresponds to
its momentum, and when, in 1925, Schrödinger gave his quantum
mechanical interpretation of Bohr's postulates? Let us designate all of
the elements of the new paradigm as QUMECH. QUMECH replaced
MECH, and later quantum electrodynamics (QUEDYN) replaced
EDYN. QUMECH, together with parts of QUEDYN, provided an
approximative derivation and thus a theoretical understanding of ARuB
and of PB; moreover LOSS and LOSS* were again theoretically covered (in part actually, and in part heuristically). Nevertheless, some
riddles, like the famous EPR paradox, still remained in QUMECH; we
denote these new dissimilated elements by ?QU. The shift diagram
leading to this new situation (K3) is this:
K2
EDYN
PB
ARuB
d
K3
PB
MECH
SPEC
LOSS
EPu
QUMECH
ARuB
?QU
LOSS*
SPEC
QUEDYN
EPu
REDYN
LOSS
RMECH
LOSS*
b
h
a
d
b
h
a
This transition generates an overwhelming u-increase because of the
strongly u-increasing moves (ARuB)d→a, (PB)d→a, (LOSS)b→h/a and
(LOSS*)b→h/a.Yet the transition is not consonant because of the riddles
in ?QU. This explains why (almost) all modern day physical scientists
accept QUMECH, and yet many of them feel that the scientific understanding afforded by QUMECH is less than satisfactory.42
4.4. Kinds of Understanding
Both understanding-why and understanding-about are understandingfamilies, which consist of the following subkinds.
Understanding-why i.b.s. (in the broad sense): Every argument which
contributes understanding-why-P contains P as its conclusion and thus
has the effect that P makes a u-increasing shift, either info Ka or into Kh.
Depending on the nature of this shift, three main kinds of
understanding-why can be distinguished. Understanding-why P i.n.s.
(in the narrow sense) occurs when P moves from Kd or Kb into Ka.
Understanding-how-possible P occurs when P moves tram Kd or tram
Kb to Kh, that is, when the answer gives only 'possible' causes or reasons
and thus enables a heuristic assimilation of P. For example, suppose
one sees the elephant walking around in the marketplace, asks "How is
that possible?", and gets the answer that possibly the elephant came
from the nearby circus.43 Finally, understanding how P typically occurs
if P moves tram Kh to Ka, or if P, already assimilated, gets better
assimilated by an answer which presents more detailed causes. In the
elephant example, an instance is provided if one knows already that the
elephant (possibly or definitely) came from the nearby circus, asks "But
how?", and gets the answer that the caretaker forgot to lock the cage.
Understanding about: A scientifically important case of understanding-about occurs when the phenomenon P is a fundamental law
(or theory). To give an example, assume that an expert is asked why, in
Newton's theory, bodies retain their velocity when acted upon by no
external forces (= P)? It would not be possible for him to contribute any
understanding-why-P because P is a fundamental law of the underlying
theory. "But", he might add, "I can tell you how it fits into the theory. It,
along with many different kinds of antecedent conditions, allows one to
inter many interesting phenomena". Indeed, the expert's answer does
improve our understanding of P because it shows the role P plays as a
premise in assimilating, and thus unifying, many phenomena in K. The
answer thereby increases unification because it reveals new
connections (with P in the premises). We shall call this kind of
understanding role understanding.44
A different case of understanding-about occurs when an answer
causes a previously dissimilated phenomenon P to become basic by
removing certain counterarguments against it. It typically occurs when
a scientist postulates a new fundamental phenomenon P in order to
make certain facts explainable, but when asked "Why P?", the only
available answer is "Why not?", that is, to show that P - although basic
- is at least not dissimilated. We call this kind of understanding- about
'why not not' understanding.
A subtle kind of understanding-about is functional, or teleological,
understanding. Here the phenomenon P produces a certain effect E
which has a high value for the underlying object o which is the common
topic of concern of P and E. So, the general form of a functional or
teleological argument is ∪ {P(o)} E(o), where contains the laws
and boundary conditions and E has high value for the object o. In
natural language such arguments are paraphrased as P(o) in order to do
E(o). Functional understanding in the Rickenbacker example is
afforded by the assertion that his appetite was blocked in order to
decrease the rate of depletion of bodily tissues.
The scientific importance of functional understanding lies in this
feature: the answer, which explicitly provides only an
understand-ing-about P, enables implicitly a kind of
understanding-why P. For, in the typical case of functional
understanding, there is a special back- ground theory in K which
enables a heuristic assimilation of P whenever P is shown to have
certain valuable effects. For example, the back- ground theory may be
evolutionary theory - which, in the Rickenbacker example, tells us that
decreasing the rate of depletion of bodily tissues in the state of near
starvation increases the chances of survival and thus offers us a
heuristic account of the evolutionary development of appetite-blocking
mechanisms. Similarly in the case of the teleological understanding of
actions, an answer like "Person p did action A in order to reach goal G"
generates, by common sense background theory, the causal argument
that p did A because p intended to reach G.45
4.5. On the Relation between Understanding and Explanation
Many philosophers believe that understanding and explanation are
conceptually interrelated as follows (cf. in. 2): explanations are the
means of understanding (EMU). In our theory, the means of making
phenomena understandable are the arguments ibs claimed to be sound
by adequate answers to understanding-seeking questions; we call them
understanding-constituting arguments. So, if explanations are the
means of understanding, it would be natural to identify explanations
with these understanding-constituting arguments (or with a paraphrase
of them in the form of an explanatory answer). So we can express the
thesis (EMU) by the following biconditional: Every explanation is (or
paraphrases) an understanding-constituting argument, and vice versa.
But is (EMU) true?
Since the answer depends on the underlying notion of explanation,
an uncontroversial answer cannot be expected. Still, the following
points are worth noting. Consider first the direction () of the biconditional: Is every explanation an understanding-constituting argument?
Most investigators, with the exception, perhaps, of van Fraassen, are
inclined to believe that providing understanding is a necessary
condition for explanation. However, the theories of explanation
developed by ", many of them, for instance by Hempel and Salmon,
make appeal only to local connections and fail to include the important
global feature of increasing unification. But the examples in the
previous sections have shown that establishing a local connection, even
if it contains the complete list of causes, as required by Salmon, in
general is not sufficient to yield understanding. To reemphasize the
point, if one asks his (apparently stable) friend "Why are your wrists
cut?", and then gets the answer "I cut them with a sharp piece of glass",
he knows the causes, but does not yet really understand (why should his
friend have done that?). If the conditional () is taken seriously, then
an explanation can be adequate only if it increases global unification.
Accordingly, if one insists on a local notion of explanation one must
reject the conditional ().
Consider now the direction () of (EMU): Is every kind of understanding-constituting argument a kind of explanation? Since the current
theory of understanding includes many kinds of understandings,
anyone who holds a more restricted notion of explanation will say no.
For instance, if one resticts scientific explanation to why-explanations
i.b.s., then any argument which contributes only understanding-about
will fail to correspond to a case of explanation. (This is even more
evident if one thinks that every explanation is a potential prediction, as
did early Hempel, or depicts a complete set of causes, like Salmon, or
favours the explanadum within a given contrast class, like van
Fraassen; all these constitute only special kinds of understanding).
On the other hand, there seems to be no really strong reason why one
can not speak of a corresponding kind of explanation for every kind of
understanding. For instance, it makes perfectly good sense to speak of
role-explanations as the means of role-understanding; the same holds
for 'why-not-not explanations', 'meaning-explanations', and
'ex-planations-about' in general. Viewed from this perspective, it seems
reasonable to adopt (EMU). In effect, it yields a definition
o[ explanation in terms of the theory of understanding. But if
explanations are the means of understanding, what then is
understanding itself? It is natural to identify it with the development
C C + A initiated by an adequate answer to the corresponding
understanding-seeking question. - Note, finally, that (EMU) does not
imply that explicit answers to explanation-seeking questions, in short,
e-answers, are identical with explicit answers to understanding-seeking
questions, in short, u-answers. In the case of why-questions, e-answers
have the form "P because Prem", while u-answers have the form "P fits
into C + A via the sound argument Prem P". But e-answers are
implicit answers to corresponding understanding-seeking questions:
they exhibt the way P fits into the cognitive corpus, while u-answers
explicitly describe it.46
Many problems about explanation can be treated from the perspective of our theory. For example, the various elements of the cognitive
corpus can be used to give an "objective explication" of the much
discussed "pragmatic aspects" of explanations.47 Another, and final
example is the problem about the status of explanations based on mere
empirical laws, like "This piece of copper expanded because it was
heated and all copper expands when heated". The covering law of this
argument unifies a lot of data. Nevertheless, many scientists and
philosophers of science think that such explanations are never satisfactory. Why? Because it is being assumed that an already developed
cognitive corpus C contains general theories declaring, for instance,
that empirical phenomena are caused by atoms and molecules, which
are controlled by certain forces, etc. This means that every empirical
law falls in the range of application of these selfsame theories. So every
new law that cannot be understood in terms of these theories will be
dissimilated in C. For this reason no contemporary scientist would
believe that the explanation of the expansion of copper in terms of its
being heated and the respective empirical law yields scientific
understanding. What is required is that the understanding be based on
the behavior of molecules. But for a child or cognitively primitive adult
there need not be anything puzzling about the explanation in question.
So, to reemphasize the point, according to the approach in this essay,
being puzzling is theory-relative; if all of a sudden dinosaurs were to
run up and down the freeways of California, biologists would no doubt
be puzzled, but not necessarily be their young offspring.
APPENDIX: ON THE ALGEBRA OF SHIFTS AND THEIR
U-VALUES
Let A be a nonempty set and "," a binary operation on A, called
concatenation, assigning to each a and b in A a unique element in A,
denoted by the pair (a, b). A, "," is a group if the following axioms are
satisfied: (G1:) a, b, c A: ((a, b), c) = (a, (b, c)) (associativity),
(G2:) 0 A a A: (a,0) = a (existence of a zero-element 0), (G3:)
a A a-1 A: (a, a-l) = 0 (existence of an inverse oper-ation-1).
(G1-3) imply that the inverse operation is unique, that (a, b)-l = (b-l, a-1),
and that (a, 0) = (0, a). Associativity permits iterated con- catenations to
be written as sequences (a1, . . . , an); omitting the inner brackets.
(Infinite sequences, if admitted, must be included as basic elements). A
group is commutative if in addition (G4:) a , b A :
(a, b) = (b, a) holds. If A, "," satisfies only (G1), it is called a
semi-group.
Let S be the set of all shift sequences. S is "nearly" a group: there is a
sequence-operation; a zero shift, denoted by Ø, which leaves K as
it is; and for every s S there exists of an inverse s-1 S satisfying (s,
s-1) = Ø. The inverse of an addition is the corresponding subtraction
((P+x)-1 = P-x )) and vice versa. The inverse of a move is calculated as
(Px→y)-1 = (P-x, P+y)-1 = (P-y, P+x) = Py→x. Formally, shift-sequences can
be treated as functions s:K→K, where K is a suitably specified set of
"possible" knowledge systems; "concatenation" then means
"function-product". But a given shift may not be sensibly applied to
every knowledge system K; for instance, the application of P-b makes
sense only if P is in Kb. So not all concatenations of shifts make sense;
associativity and commutativity hold only in the sensible cases.
Assume a function u: S → U, assigning to each shift s its u-value u(s)
U. It follows from the structure of S that the set U of the u- values of
shift-sequences forms a full commutative group. The u-value of
shift-concatenations is inductively defined as the concatenation of the
respective u-values, u((s1, s2)) = (u(s1), u(s2)); the 'zero' u-value 0 is the
u-value of the zero shift, u(Ø) = 0; and the inverse of a u-value u,
denoted by -u, is the u-value of the corresponding inverse shift:
-u = u(s-1), where u = u(s). All u-values can be concatenated, even if the
underlying shifts can't be concatenated, for concatenation is understood
only hypothetically: (u(s1), u(s2)) = u((s1, s2)) if s1 and s2
are concatenatable. Associativity and commutativity hold.48
Assume "≥" is a strictly monotonic ordering relation on a given
group A, "," (">" and "=" are defined via "≥";"=" means equality, not
identity). It follows from measurement theory that if ≥ is total (a, b
A: a ≤ b or b ≥ a) and moreover archimedean (a, b A:n.b ≥ a for
some n ℕ , where "n.b" = "a sequence of n b's"), then it determines
uniquely an extensive scale, that is, it determines, up to multiplication
with a constant k Re (real numbers), a mapping r: A→ Re satisfying
additivity [r((a, b)) = r(a) + r(b)]. However, all that is assumed in our
theory is a partial ordering among the u-values - which can't determine
a quantitative scale.49 Nevertheless, the partially ordered group U, ",",
"≥ ") satisfies the important condition of strict monotonicity:
(M)
If u1 ≥ u2, then (u1, u3) ≥ (u2, u3) [so also (u3, u1) ≥
(u3, u2) by (G4)].
As is well-known, (M) together with (Gl-4) implies that (M) holds
also in the opposite direction; moreover this equivalence holds when ≥
is replaced by >, or by =. Well-known consequences of (Gl-4) + M
are (T1:) u ≥ 0 iff -u ≤ 0; (T2) u1 ≥ u2 iff - u1 ≤ - u2; and (T3) if u1
≥ 0 and u2 ≤ 0, then (u1, u2 ) ≥ 0 iff u1 ≥ - u2. Again (T1-3) hold if ≥
[≤ ] is replaced by > [<], or by =.
The rule (Comp) in Part 1, Section 2.2 is now easily proved. Assume
that F(s) = {usl, . . . , usm} is a fractionation of u(s) = (ul , . . . , un), and f
is a one-to-one function assigning to each negative us- in F(s) a positive
us+ in F(s) with us+ ≥ -us-. Let F-(s) be the set of negative u-sequences
in F(s), and F*(s) be the subset of nonnegative u-sequences in F(s)
which are not in the range of f. Consider the sequences x:= ((us-, f (us- ))
us- F- (s)) and y:= (us+ us+ F*(s)). Every pair in x is ≥ O (by T3),
and thus x ≥ 0 (by M); every element in y is ≥ O, and thus y ≥ 0 (by
M); whence (x, y) ≥ 0 (by M). But (x, y) = u(s) by association and
commutation (G1, G4), which proves the first half of (Comp). If in
addition there exists at least one pair in x or one element in y which is
greater than 0, then (x, y) > 0 (by M), which establishes the second half.
(The versions with ≤ and = follow similarly).
The partial ordering among u-values is induced by that among the
weights. For simplicity, we identify weights with the corresponding
positive u-values (wP > 0); so -wP < 0 and -(-wP) = wP. The weight
function can be extended to sequences of phenomena as follows. If P
denotes the set of all sequences of "possible phenomena", that is,
elements in ∪ K, then w is formally a function w: P→W assigning to
each sequence of possible phenomena its weight; it is inductively defined via w((P, Q)) = (wP, wQ). P, "," forms a semi-group, and W, ","
a commutative semi-group. U is obtained as the closure of the union of
W and {O} under the operations of inverse and concatenation.
(Similarly, S can be obtained from P). The partial ordering ≥ among
weights satisfies (M), and in addition (Pos:) (w1, w2) > wl which means
weights are always "positive".
NOTES
* We are indebted to Peter Woodruff, Brian Skyrms, Jim Woodward and an unknown referee
for helpful comments. Thanks also are due to the Focused Research Program on Scientific
Explanation at the University of California, Irvine, who underwrote the expenses of the
research.
1
Examples of this way of characterizing the goal of this or that scientific discipline are easily
multiplied.
2
See, for example, Scriven (1959); Bromberger (1965, p. 80); Jeffrey (1971, p. 24);
Friedman (1974, p. 6); Hempel (1977, p. 100); Tuomela (1980, p. 212); Kitcher (1981, p. 508);
and Achinstein (1983, p. 16).
3
This idea was first promoted by Lambert in (1988).
4
This category contains conceptual innovation as a special case because learning new theories
typically involves learning new concepts.
5
Various combinations of innovations are possible. For example, many answers in Newton's
Principia were doubtlessly simultaneously factually, theoretically and inferentially innovative.
6
Viewing scientific understanding as involving an argument which increases the unification of
the knowledge background was first proposed in Schurz (1983) and extended in (1988).
7
Cf. Friedman (1974, Section 5) and Kitcher (1981, Section 6), (1989, Section 4.1, Section 5).
8
Here the difference with Kitcher's approach is in the details: technically, in his approach
arguments get unified via argument patterns (1981, Section 6; 1989, Section 4.2), while
informally, his approach is intended to capture the unification of phenomena (1989, Section 8).
9
Morrison (1990, p. 327) has complained that Friedman (1983) does not show how the
unification provided by a theory T yields (empirical) confirmation of T, although Friedman
repeatedly emphasizes it. Nor does Kitcher's approach show (and perhaps does not intend to
show) how unification can yield confirmation. This is a further difference from our approach to
unification.
10
For example, Hume's causality principle says that there is nothing real about causal
connections except empirical regularity. Again, Leibniz, Newton and Kant all sponsored the
doctrine that causality is time-forward directed deterministic necessity.
11
Roughly speaking, the reduction is as follows: A C describes the understanding of P
contained in C iff A contributes understanding to C - A, where "C - A" denotes the
A-predecessor of C, that is, the maximal subset of C not containing A.
12
"Follows" must be understood here relative to deductive inferences in I. Statements like
"Prem Con is correct" must be expressed in a metalanguage which denotes statements of the
object language by primitive names; the point is to avoid the result that the correctness of every
deductive inference in I follows "from A" because it follows "from nothing".
13
We prefer the notion of "assimilated" over that of "inferred" or "reduced", because we
include kinds of connectors very different from those associated with the traditional ideas of
inference or reduction. Note also that while "x is connected with. . ." is a symmetric notion,
allowing x to occur as the conclusion as well as a premise of a connector, "x is assimilated to . . .
" is an asymmetric notion, requiring x to occur as the conclusion of a connector.
14
A useful specification of the latter rule is this: a singular theoretical (nonobservational)
phenomenon shall weigh not less than the minimal number of observations necessary to infer it,
given the laws in K, either deductively, inductively or abductively.
15
This definition implies that D must be singular, but H may be general, and that D must
contain only observation al concepts, but H may contain theoretical concepts.
16
A more refined method would give phenomena P a gradual intrinsic gain wP. conf P,
depending on their degree empirical confirmation conf P , where conf P ranges between 0
and 1. Since the empirical confirmation conf P in turn depends on the data unification achieved
by P (as explained in Section 1.3), this measure would imply that the intrinsic gains of
phenomena depend on their extrinsic gains. This would lead to the rather complicated feedback
situation of connectionist models like that of Thagard (1989): starting from some prior values
for conf P, the unification would have to be computed in cycles, each updating conf P. For
comparative purposes, our simple approach yields the same results as the refined one, while
avoiding the complicated computation procedure of the refined approach. However, as will be
seen later, in some situations, the refined method is advantageous.
17
In spite of this formal decomposibility, we treat moves as a third kind of shift, for intuitive
and practical reasons.
18
Note further that a shift diagram assumes the unification classification U (C) of the cognitive
corpus C to be uniquely given. There might exist some rare cases where this assumption does
not hold, that is, where there exist two different 'equally best' bases of K. For those cases, a
unique unification comparison can be obtained by declaring that Cl >uC2 holds (in the
unrelativized sense) iff for every unification classification U(C2) there exists a U(C1) such that
Cl > uC2 holds relative to U(C1) and U(C2).
19
The facts Di and Pi need not be data; but because they are assumed to be accepted in K, they
must themselves be empirically confirmed (see Section 3.1); so, by unifying them, L gets
empirically confirmed.
20
According to the preceding shift diagrams, whenever a new positive instance Di & Pi is
added to a corpus K containing the law L = x(Dx Px), the unification of K increases. But
certainly, science can not indefinitely increase unification, und thus understanding, just by
collecting more and more instances of the same law (cf. Alston 1971). Collecting instances of
the same law obeys the law of 'decreasing limit-utility'. We take this into account by adding the
following rule: wP shall be the smaller, the more phenomena of the same (or 'similar') type are
already in K.
21
A third difference concerns numerical probabilistic laws: they can't be confirmed via single
positive instances, but only via sample frequencies. For these kind of laws empirical
confirmation is an even more special kind of unification, namely unification of those
phenomena below which contain information about sample frequencies.
22
Kitcher's approach is based on the unification of arguments via argument patterns (cf. in. 8).
He excludes argument patterns like "God wants P, and whatever God wants is the case,
therefore: P" as 'spurious' by the criterion that in such an argument pattern any sentence may be
substituted for P (1981, pp. 527f). But (as he himself mentions in 1981, on pp. 529), it is
possible to give spurious argument patterns to which this criterion does not apply: for example,
the argument pattern with the speculative law "If Zeus' mood is such and such, then the weather
is such and such" allows only the inference of weather conditions, but not of any sentence.
Kitcher's presentation in (1989) does not salve this problem (cf. Section 7.2-3). Thagard's
(1989) approach to confirmation (acceptability) is based on a notion of "explanation" taken as
primitive. It is based on a symmetric (and thus, circular) notion of 'explanatory coherence',
which assumes that if a set of premises 'explains' a phenomenon P (e.g., logically implies it),
then each premise coheres with P, and all premises mutually cohere (pp. 436f). An application
of his assumptions to our first example of a spurious unification yields the result that both
premises will get confirmed, because the datum Da has a 'prior activation', and hence will
'activate' both premises L and Ta. (Thagard mentions similar limitations of his approach in
connection with parapsychology; p. 454).
23
Closure under approximative arguments must not be required because it would yield
contradictions. The requirement of closure under inductive arguments may seem problem- atic
because the relation of high conditional probability is not transitive; however this problem is
handled by the autoepistemic formulation of inductive arguments explained below. Note also
that the closure condition does not apply to K because many arguments in I have conclusions
which are not relevant elements.
24
This method is based on the theory of relevant deduction developed in Schurz (1983, ch. V.l;
1991) and Schurz/Weingartner (1987).
25
This definition assumes that defined symbols in K are replaced by primitive symbols.
26
This paradox was first mentioned by Hempel (1965, p. 273, fn. 36). The remarks in fn. 22
about the limitations of Kitcher's method to exclude spurious unification apply also to his
treatment of the conjunction paradox. (He does not distinguish between the two kinds of
spurious unifications).
27
Clause (5b,iii) is meant to cover also instances of axiom schemes.
28
To give so me examples: Hempel's Paradox: x(Fx Gx), Ga, Fa Ha ├L Ha (1965, p.
276), the Eberle-Kaplan-Montague Paradox: (xT(x) x(Fx Gx)), (T(a) Fa) Ga ├L
Ga (1961, p. 421), and Kim's Paradox: x(Fx Gx), Fa Ga ├L Ga (1963, p. 287), can't be
connectors over K (because Fa Ha, (T(a) Fa) Ga, and Fa Ga, respectively, are
irrelevant elements of K). Cf. Schurz (1983, ch. III) for a detailed overview of
pseudoexplanations, and Schurz (1988).
29
There exist two important ways in which Con* can approximate Con: the first is closeness
approximation, which concerns the approximative inference of quantitative empirical laws, and
the second is limit approximation, and concerns the approximative reduction of theories. For
closeness approximation cf., e.g., Niiniluoto (1982), who gives numerical values for the degree
of approximation, and also Hooker (1981); for limit approximation cf. Scheibe (1981) and
Stegmüller (1986, pp. 246-53).
30
Because it is a requirement that KNOW be closed under inductive arguments in I, (i) is a
minimal condition for avoiding inconsistency of KNOW.
31
A suitable technical explication of the requirement MS(L, a/K) in the context of explanation
is found in Hempel (1968), a refined one in Schurz (1988); for a discussion cf. Fetzer (1981)
and Salmon (1989, pp. 68ff). In the context of inference, where it is not presupposed that the
conclusion belongs to K, the requirement MS(L, a/K) has to contain the additional condition
that the negation of the conclusion, Ga, is not known in K on noninductive grounds, e.g. by
observation; otherwise the requirement that KNOW is closed under I would yield
contradictions.
32
Moreover a phenomenon in Ka can be assimilated to Kb in several different ways, via
different connectors or connector chains. Then the rest cast will decrease with the number of
different ways. This refinement makes it possible to explain the value of partial circles for
unification: assume P1 and P2 are connected with same independent pieces of evidence D1 and
D2, respectively, and in addition circularly related via the law x(P1x P2x). Then PI is
assimilated to Kb in two ways: directly with D1, and indirectly with D2 via P2; similarly for P2.
This increases the strength of the assimilations of P1 and P2.
33
Possibly, more than one contrast class is assigned to P (cf. van Fraassen 1980, p. 127). Then P
is assimilated in several ways (cf. fn. 32).
34
If there exists more than one subset of K of maximal cardinality and maximal importance
satisfying these conditions, there are several strategies; one is just to choose one of them as Ka,
another is to take the closure of their intersection, provided it satisfies AC.
35
The restriction of Snell's law to homogenous substances is a clear example.
36
A related but unclearly explicated idea was suggested by Bromberger (1965, p. 82) in his
notion of a "p-predicament".
37
For example, let T = x(F1x& F2x G1x), and let L = x(F1x & F2x & F3x G2x), where
x(G1x G2x) K. In addition the class F1x must be much more general than the class F2x,
according to a 'classification hierarchy' accompanying K. 'F2x' thus, figures as a typical "ad
hoc"-restriction.
38
For reasons mentioned at the end of Section 3.4, the refined method of calculating u is more
appropriate when adding hypotheses.
39
The same remark applies when our approach is compared with Toulmin's (1963) view of
understanding as a reduction to certain historically given 'ideals of natural order'.
40
For any 'minimally plausible' assumption of weights.
41
Of course, it also contradicts EDYN, but in K2 there exists only the restricted form EDYN +
REDYN which is consistent with PB.
42
A similar case study is possible for Planck's law of black body radiation, as is evident from
Kuhn and Stöckler (1988).
43
In order to embed answers of this sort formally into the basic theory of Section 2, it is
necessary to generalize Definition 3 such that Prem Con may also be a "heuristic" connector;
where, according to Definition 7, Prem Con is said to be a heuristic connector if Prem =
T∪ Δ (where "Δ" means "the elements of Δ are possible") and T∪ Δ Con is a connector in
C + Δ.
44
Role understanding is a important kind of meaning understanding because the best way to
explain the meaning of theories is by showing the role they play in why-explanations of facts
(cf. Schurz 1985). However, the more orthodox kind of meaning understanding is given when
P is not a phenomenon, but a concept, and when the connector is not an argument ibs, but a –
partial or complete – definition. Our theory does not directly apply to this case; but if we replace
phenomena by concepts and arguments ibs by definitions ibs, the whole apparatus of fitting into,
unification, costs and gains seems to apply equally well.
45
Note that in contrast to Wright's analysis of functional explanations (1976, p. 81), our
analysis of functional understanding does not presuppose that to be adequate, "P(o) in order to
do E(o)" implies "P(o) because this has the effect E(o)", a case of understanding-why. However,
when functional understanding becomes scientifically important, this is indeed the case.
46
This was emphasized by Lambert in (1988, p. 306).
47
For instance, our notions of connectors could be seen as an objective explication of van
Fraassen's pragmatic notion of an "explanatory relevance relation" (1980, pp. 142f), which was
critized by Lambert and Brittan (1987, p. 41f) and Salmon and Kitcher (1987) for its subjective
character.
48
Note that in spite of (G1-2) u-sequences can't be replaced by sets since then identical
elements would be lost.
49
Even if the ordering relation among u-values could be assumed to be total and archimedean,
the additivity of u-values would still be doubtful, for the reason stated in fn. 20. The possibility
of extending our comparative model into a quantitative one is, of course, not being denied. But
there is a cost: total, archimedean ordering plus additivity must be assumed.
REFERENCES
Achinstein, P.: 1983, The Nature of Explanation, Oxford University Press, Oxford.
Alston, W. P.: 1971, 'The Place of the Explanation of Particular Facts in Science',
Philosophy of Science 38, 13-34.
Belnap, N. and Steel, T.: 1976, The Logic of Questions and Answers. Yale University Press,
New Haven.
Bromberger, S.: 1965, 'An Approach to Explanation', in R. Butler (ed.), Analytical
Philosophy, Second Series. Basil Blackwell, Oxford, pp. 72-105.
Eberle, R., Kaplan, D. and Montague, R.: 1961, 'Hempel and Oppenheim on Explanation',
Philosophy of Science 28, 418-28.
Fetzer, J.: 1981, 'Probability and Explanation', Synthese 48, 371-408.
Friedman, M.: 1974, 'Explanation and Scientific Understanding', Journal of Philosophy 71,
5-19.
Friedman, M.: 1983, Foundations of Space-Time Theories. Princeton University Press,
Princeton.
Gärdenfors, P.: 1988, Knowledge in Flux, MIT, Cambridge, MA.
Harman, G.: 1965, 'The Inference to the Best Explanation', Philosophical Review 74, 88-95.
Hempel, C. G.: 1977, 'Nachwort 1976: Neuere Ideen zu den Problemen der statistischen
Erklärung', in C. G. Hempel (ed.), Aspekte wissenschaftlicher Erklärung, W. de
Gruyter, Berlin, pp. 98-123.
Hempel, C. G.: 1965, Aspects of Scientific Explanation (and Other Essays), Free Press, New
York.
Hempel, C. G.: 1968, 'Maximal Specificity and Lawlikeness in Probabilistic Explanation',
Philosophy of Science 35,116-133.
Hooker, C. A.: 1981, 'Towards a General Theory of Reduction', Dialogue 20, Part I, 38-59,
Part 11, 201-36, Part 111, 497-529.
Jeffrey, R. C.: 1971, 'Statistical Explanation vs. Statistical Relevance', in W. Salmon (1971),
pp. 19-28.
Kim, J.: 1963, 'On the Logical Conditions of Deductive Explanation', Philosophy of Science
30, 286-91.
Kitcher, P.: 1981, 'Explanatory Unification', Philosophy of Science 48, 507-31.
Kitcher, P.: 1989, 'Explanatory Unification and the Causal Structure of the World', in P.
Kitcher and W. Salmon (eds.), pp. 410-505.
Kitcher, P. and Salmon, W. (eds.): 1989, Scientific Explanation (Minnesota Studies in the
Philosophy of Science Val. XIII), University of Minnesota Press, Minneapolis.
Kuhn, W. and Stöckler, M.: 1988, 'Deduktionen und Interpretationen. Erklärungen der
Planckschen Strahlungsformel in physikinterner, wissenschaftstheoretischer und
didaktischer Perspektive", in W. Kuhn (ed.), Didaktik der Physik (Tagungsband der
DGP 1987), FA Didaktik, Giessen.
Lambert, K.: 1988, 'Prolegomenon zu einer Theorie des Verstehens', in G. Schurz (ed.), pp.
299-319.
Lambert, K.: 1990, 'On Whether an Answer to a Why-Question Is an Explanation if and only
if it Yields Scientific Understanding', forthcoming.
Lambert, K. and Brittan, G. Jr.: 1987, An Introduction to the Philosophy of Science, 3rd ed,
Ridgeview, Atascadero.
Lehrer, K.: 1974, Knowledge. Clarendon Press, Oxford.
McDermott D. and Doyle, J.: 1980, 'Non-Monotonic Logic I', Artificial Intelligence 13, 41-72.
Moore, R. C.: 1985, 'Semantic Considerations on Nonmonotonic Logic', Artificial Intelligence 25, 75-94.
Morrison, M.: 1990, 'Unification, Realism and Inference', British Journal for the Philosophy
of Science 41, 305-32.
Niiniluoto, I.: 1982, 'Truthlikeness für Quantitative Statements', PSA 1, 208-16.
Salmon, W.: 1971, Statistical Explanation and Statistical Relevance (with contributions by
R. Jeffrey and J. Greeno), University of Pittsburgh Press, Pittsburgh.
Salmon, W.: 1978, 'Why ask 'Why?'?', Proc. Adr. Amer. Phil. Assoc. 51,683-705.
Salmon, W.: 1984, Scientific Explanation and the Causal Structure of the World, Princeton
University Press, Princeton.
Salmon, W.: 1989, Four Decades of Scientific Explanation, University of Minnesota Press,
Minneapolis.
Salmon, W. and Kitcher, P.: 1987, 'Van Fraassen on Explanation', Journal of Philosophy 84,
315-30.
Scheibe, E.: 1981, 'Eine Fallstudie zur Grenzfallbeziehung in der Quantenmechanik', in
J. Nitsch et al. (eds.), Grundlagenprobleme der modernen Physik. BI, Wien-Zürich, pp.
257-69.
Schurz, G.: 1982, 'Ein logisch-pragmatisches Modell von deduktiv-nomologischer Erklärung
(Systematisierung)', Erkenntnis 17, 321-47.
Schurz, G.: 1983, Wissenschaftliche Erklärung. Ansätze zu einer logisch-pragmatischen
Wissenschaftstheorie, dbv-Verlag der TU Graz, Graz.
Schurz, G.: 1985, 'Die wissenschaftstheoretische Diskussion um den Erklärungsbegriff und
ihre Bedeutung für die Physikdidaktik', in W. Kuhn (ed.), Didaktik der Physik
(Physikertagung 1984), Gahmig, Giessen, pp. 55-68.
Schurz, G.: 1988, 'Was ist wissenschaftliches Verstehen? Eine Theorie verstehensbewirkender Erklärungsepisoden', in G. Schurz (ed.), pp. 235-98.
Schurz, G. (ed.): 1988, Erklären und Verstehen in der Wissenschaft, R. Oldenbourg (Scientia
Nova), Munich. "
Schurz, G.: 1991, 'Relevant Deduction. From Solving Paradoxes Towards a General Theory',
Erkenntnis 35, 391-437.
Schurz, G. and Weingartner, P.: 1987, 'Verisimilitude Defined by Relevant ConsequenceElements. A New Reconstruction of Popper's Original Idea', in T. A. F. Kuipers (ed.),
What Is Closer-To-The-Truth, Rodopi, Amsterdam, pp. 47-77.
Scriven, M.: 1959, 'Truisms as the Grounds for Historical Explanation', in P. Gardiner (ed.),
Theories of History, New York, pp. 443-68.
Stegmüller, W.: 1986, Probleme und Resultate der Wissenschaftstheorie und Analytischen
Philosophie. Band 11. Dritter Teilband: Die Entwicklung des neuen Strukturalismus
seit 1973, Springer, Berlin.
Thagard, P.: 1978, 'The Best Explanation: Criteria for Theory Choice', Journal of Philosophy
75, 76-92.
Thagard, P.: 1989, 'Explanatory Coherence', Behavioral and Brain Sciences 12, 435-69.
Thompson, R. F.: 1967, Foundations of Physiological Psychology, Harper & Row, New York.
Toulmin, S.: 1963, Foresight and Understanding, Harper & Row, New York.
Tuomela, R.: 1980, 'Explaining Explaining', Erkenntnis 15, 211-43.
Van Fraassen, B.: 1980, The Scientific Image, Clarendon Press, Oxford.
Van Fraassen, B.: 1985, 'Salmon on Explanation', Journal of Philosophy 11, 639-51.
Wright, L.: 1976, Teleological Explanations, University of California Press, Berkeley.
Institut für Philosophie
Universität Salzburg
A-5020 Salzburg, Austria
Europe
University of California at Irvine
Department of Philosophy
Irvine, CA 92717
USA
© Copyright 2026 Paperzz