A Psychological Method to Investigate Verbal Concepts1

JOURNAL
OF MATHEMATICAL
PSYCHOLOGY:
A Psychological
Method
to Investigate
GEORGE
The Rockefeller
(1969)
$169-191
Concepts1
MILLER
A.
University,
Verbal
New
York,
New
York
10021
The method
of sorting is used to study how lexical information
might be organized
and stored in memory.
Judges sort a set of lexical items into clusters,
and data from
several judges
are pooled.
The number
of judges putting
a pair of items into the
same cluster is taken as a measure
of the proximity
of those two items. Constraints
on the resulting
data that are attributable
to the method
itself and consequences
following
from different
structural
hypotheses
are considered.
Data for 48 common
nouns are used to test the assumption
that when items are clustered
it reflects a decision
to ignore particular
conceptual
features
that would normally
distinguish
those items,
and an argument
is made that the conceptual
features used by most judges derive from
presuppositions
and assertions
contained
in the definitions
of the nouns.
Part
of
knowing
psychological
and
stored.
The
psychological
of
a language
question
How
discrete
should
and
cross-references
long
been
In the
lexical
how
this
a language
is certainly
apparently
unrelated
in
present
our
subjective
for
psychological
paper
the method
its
vocabulary,
and
information
is subjectively
lexical
user’s
lexicon
a subject
not
knowledge
of vocabulary
organized
items.
lexicon.
The
of sorting
are
nature
and
is used
is an
appropriate
organized
be characterized
alphabetically;
There
speculation
it
multiple
of these
?
it is no simple
list
interrelations
and
lexical
relations
has
research.
to explore
the
subjective
lexicon:
nouns, in this case-are
sorted into clusters on the basis of
of meaning.”
It is assumed that all items are conceptually
distinct to a
items--common
“similarity
native
is knowing
to ask
speaker;
he knows
In order
for
a native
he must
deliberately
enough
to recognize
speaker
of English
ignore
some
of their
to group
and
nouns
distinguishing
use
each
together
features.
of them
appropriately.
as semantically
By
an analysis
similar,
of the
1 This research
was supported
in part by the Advanced
Research
Projects
Agency,
Grant
No. DAHClS
68 G-5 to The Rockefeller
University.
The author is pleased to acknowledge
his
indebtedness
toHerbert
Rubenstein,
who participated
in the original formulation
of the problem;
to Virginia
Teller Sterba, who collected
the sorting
data reported
here; to Buena Chilstrom
for
tabulating
Roget features;
to D. Terence
Langendoen
for the formulation
of noun definitions
in
terms of presuppositions
and assertions;
and to numerous
colleagues
who criticized
earlier
versions
of this paper. The cluster analysis shown in Fig. 2 made use of a computer
program
written
by Stephen
C. Johnson.
169
0
1969 by Academic
480/6/z-r
Press,
Inc.
170
MILLER
sortings one hopes to discover which conceptual features have been ignored and thus,
by indirection,
what the features are.
A preliminary
account of this research, and a comparison of the sorting method with
other methods psychologists have used to study the subjective lexicon, was given by
Miller (1967).
RESULTS
OF
SORTING
ENGLISH
NOUNS
Each of 48 English nouns was typed on a 3 x 5 in. index card, along with a short
definition specifying the sense of the noun that was intended and a simple sentence
illustrating
that use of the word. (The 48 nouns, along with their definitions
and
examples, are given in the Appendix.)
The resulting pack of 48 cards was handed
to a judge with the request that he sort the cards into piles on the surface of a large
table “on the basis of similarity
of meaning.”
He was allowed as many piles as he
wanted, from 1 to 48, and he could put as many items as he wanted in any pile.
Most people spent from 5 to 30 min at the task. Those who protested that they did
not understand “similarity
of meaning” were not given a clear explanation.
They might
be told, “Words don’t have to be synonyms to be similar in meaning,”
or, if they
pressed further, “We want you to tell us what it means.”
Tests were conducted individually,
and judges were paid for their time. Cooperation
from the 50 Harvard and Radcliffe students who served as judges was excellent. All
had learned English as their first language.
The sorting task is mildly interesting.
It has the nondemanding
character of a
problem for which there are many correct solutions; it resembles a concept formation
task where a subject is free to choose the concepts he wants to use. Solving it is as
much a matter of esthetic judgment
as of conceptual knowledge.
Judges would try
tentative clusters, then frequently break them up and rearrange them; no constraints
were imposed on the order of presentation
of the items or on the judge’s freedom to
revise earlier decisions.
Mandler and Pearlstone (1966) report that when items must be examined and sorted
sequentially,
with no changes allowed, judges have little difficulty in remembering
and repeating their sorting; 42.5% of their college subjects were able to repeat an
initial sorting of 52 items without mistakes. In order to account for this recall Mandler
and Pearlstone argue that their subjects were “imposing
a conceptual rule on the
stimulus array.” In the present study, where subjects are instructed to sort on the
basis of similarity of meaning, we further assume that the conceptual rules are usually
semantic in character.
The number of categories used to sort the 48 words ranged from 6 to 26, with a
mean of 14.3 and a standard deviation of 5.
The clusters formed by each judge were recorded and later converted into an
A PSYCHOLOGICAL METHOD TO INVESTIGATE VERBAL CONCEPTS
171
incidence matrix. The matrix is 48 by 48, and cell i, j represents the particular pair
of nouns i and j. Cell i, j is one if that pair of nouns was put together and zero if they
were in separate clusters. The unweighted
incidence matrices, one for each judge,
were added together and the resulting matrix, which represents the pooled matrices
for 50 judges, was taken as the basis for further analysis. In the resulting data matrix,
Nii represents the number of judges who put nouns i and j together in the same
cluster.
The summed matrix for the 50 judges is shown in Table 1. The matrix is necessarily
symmetric (i cannot be sorted with j unless j is also sorted with i), so only the lower
half of the matrix is given in Table 1. The diagonal has also been omitted; since
each noun must be sorted with itself, the diagonal could be taken to be 50 for every
noun.
Table 1 shows that there was considerable agreement among judges as to the clusters
they formed. FEAR and REGRET, for example, were put together by 48 of the 50 judges,
as were PLANT and TREE. At the other extreme, many pairs of words-FEAR
and PLANT,
say, or REGRET and TREE-were
never judged similar in meaning by anyone.
We can regard Table 1 as a matrix of similarity measures, or proximities,
where Nii
is a measure of the semantic proximity
of noun i to noun j, or we can convert it into
a matrix of distances by using
Dij = N-
Nii.
(1)
There are several methods of data analysis that can be applied to such data. Which
method is most appropriate
depends on our theory of the psychological
processes
underlying
the subjects’ performance.
THEORY
OF
SORTING
It should be intuitively
obvious that the method of sorting imposes certain constraints on the frequencies of paired occurrences that can be obtained. In order to
explore those constraints we shall assume that each person partitions the set of lexical
items, i.e., creates a collection of subsets such that each element belongs to one and
only one subset. It is not necessary to assume that sorting must yield a partition, but
that is the simplest case. The data of Table 1 were obtained on that basis.
Given that sorting creates partitions,
the decision to interpret
the resulting data
matrix as if it were a similarity matrix is equivalent to assuming that it is appropriate
to represent the relations among the items as distances. If this assumption
is implausible, the method of sorting should not be used.
It is quite simple to prove that D is a metric. First note that the incidence matrix
expressed in D for an individual
judge represents a metric. This incidence matrix
contains 0 for all items put together by the judge, and 1 elsewhere. Every item is
:
KMC
:
i
1
I
:
.
:
.
1
1
3
1
UFHRP
n
:
:
:
i
i
1
4
1
i
i
1
2
1
1
3:::;
1 5
1
A:::
i
:
:
.
:
5
.
i
:
i
:
:
2
1
i
:
:
:
:
1
2
1 16
11840
1 21 40 42
1 22 41 43 48
10 14 12 9
.2211
121.
i 1 3
.12..
1 2
I 2
.2321
,443,
i
i
1
44
2
1
2 :
1
1 f
1
:
“t
:
:
:
~... -
i
i
5
2
1
i::
i
.
.
;
1
1
“7 “2
I 2
1 3
1 2
I 2
1 2
1
1 1
1 1
1 4
1 1
1
1 2
1
132
36
39 41
40 38 47
1:
i::::::::i
I..........
1
.
i...........
z...........
i:
TNELOBGVQA
1 1
1 1
4
1.4........7
1.4........3
1 1
1.
ii1043122212221
.izl..l....
,
9
11
9..........
. ii::
.
SORTING
:
1:
:
:
.
:
1:
:
1
:
DATA
:
.’
:iiii
:
1::
I
:
i
.
.,
:
FOR
TABLE
J
:
I’
:
.
:
48
WS
:
i
:
:
1:
:
:
:
.
:
:
:
.
:
:
42
29 30
33 31 44
36 32 42 44
NOUNS
:
YIMNOGSHEFRTUWBK.\C
.iiiiiiiz...l
.112111621..2i40
2 2 3 2
2122229..1215531
i 1 2 1 I
2 1 2 2
6 : 2 1 2 2
3 11
3 2 I
1 13
2 1
1 13
2 11
2678345.......5664
i
i:::::L7
: 38
37
25
27
29
ENGLISH
1
1
2
2
I
12
2
25
25
25
20
20
1215
1217161315
1 1 .17 18 19 11
3 1
16 17 13 6
2 2 3 5 61016
2 6 6 5 51018
2 810
4 5 915
I
48
4242
37 37 35
38 37 35 45
1 2
4 7
3 7
17
1
3 2 2
8 2 4 3
211
12
12
: 10
9
10
A PSYCHOLOGICAL
METHOD
TO
INVESTIGATE
VERBAL
CONCEPTS
173
necessarily put with itself, so Dii = 0; the matrix is symmetric, so that Dij = Dji ;
and the triangle inequality, Dij + Djlc >, Di, , is satisfied, since a judge cannot put
item i with item j, and put item j with item K without also putting item i with item k.
Since the sum of metrics is itself a metric, it necessarily follows that when incidence
matrices for individual judges are added together to give the data matrix, the result will
also represent a metric.
In order to demonstrate the implications of this fact in more intuitive terms, however, it is useful to derive it in a more explicit form. Consider any three items, Wi ,
Wj , and W, . Indicate the sorting of these three by slanted lines, disregarding other
items. For example, if a set of six items were sorted (W, , W, , Wi , W,)( WC , W,),
the sorting of Wi , Wi , and W, would be represented as ijlk. On the assumption
that every sorting is a partition, the only possible ways to partition three items are:
i/j/k, ijlk, iklj, jk/i, and +k.
Each judge must choose one of these five possibilities. If N is the total number of
judges serving in the experiment, and if n(x) denotes the number of judges who chose
partition x, then
N = n(ifi/k)
+ n(zj/k) + n(ik/j) + n(jk/i)
+ n(ijk).
(2)
Obviously, all of these numbers are nonnegative.
Let the number of judges who put items Wi and Wj together be denoted by Nij ,
which is the value tabulated in Table 1. For these three items we can write:
Nij = n(ij/k) + n(ijk),
(3)
Njle = n(jk/i)
+ n(qk),
(3’)
Nik = n(ik/j) + n(ijk).
(3”)
By rearranging (2) and substituting
(3) it can be shown that
(N - NJ
= (N - Ni,) + 2+/j)
+ (N - NJ
+ @/j/k),
from which we obtain
(N - Nij) + (N - Nj,) b (N - Nik).
(4)
Given the definition of D in (l), (4) can be recognized as the triangle inequality:
Qj + Djlc > Di,
Since D is symmetric
(4’)
and Dii = 0, it has the properties of a distance. In short, if
174
MILLER
matrices obtained by the method of sorting are interpreted
as similarity
measure will have properties necessary for metric representation.2
The triangle inequality can also be written:
matrices,
the
or, equivalently,
In words: the number of people who did not put items Wi and Wj together provides
an upper bound on the difference between the number who put W, with each of them.
Obviously, if everybody put Wi and Wj together, then anybody who put W, with one
of them would necessarily put it with the other.
The triangle inequality
holds for sorting data generally, but if we want to obtain
more specific predictions,
we must introduce
further assumptions
about what the
judges are doing. Presumably, in order to put two lexical items together in the same
cluster he must decide to ignore certain conceptual differences that would normally
distinguish
those items. If we assume that the items to be sorted satisfy some system
of features, we can consider what will happen to the values of Nij when different
features are ignored by different numbers of judges.
Before considering
the effects of different systems of conceptual features, however,
it should be pointed out that only distinctive features can be ignored. Suppose, for
example, that all of the items sorted share the same value for some particular feature.
It is of no concern whether the feature characterizes all of them, or characterizes none
of them, or is irrelevant for all of them. If they all have the same value, then that
particular feature cannot play a role in the judges’ sortings. In what follows, therefore, no claim is made that the analysis reveals all of the conceptual features that are
characteristic
of any of the items.
Paradigmatic
Organization.
By a paradigmatic
system is meant a set of lexical items
(e.g., kinship terms) which all have values for every feature. In order to illustrate the
results that we might expect from the method of sorting when the vocabulary being
sorted forms a paradigmatic
system, it is sufficient to consider a subset of four items,
” One class of counter-examples
to the triangle
inequality
as applied
to semantic
distances
follows
this model: the words ACROBAT and GOBLET seem to have little in common
conceptually
and should be quite distant from one another,
yet both are close to TUMBLER.
Such cases are
sometimes
used to illustrate
associative mediation,
since a learner who uses TUMBLER
as a mediator
will find it easier to remember
the pair ACROBAT and GOBLET. TUMBLER,
however,
represents
several different
semantic entities which merely happen to have the same phonological
realization
in English.
Although
the large distance seems to be shortened
by the mediator
(which converts
it
from a large conceptual
to a negligible
phonological
distance),
there is no real reduction
in conceptual distance. Conceptual
similarity
must be defined for particular
senses of words, and not
for their phonological
shapes.
A
PSYCHOLOGICAL
METHOD
TO
INVESTIGATE
VERBAL
CONCEPTS
175
W, , Wi , Wj , and W, that are optimally
distinguished
by two features, Fl and F2 ,
as illustrated in Table 2. From this table it should be obvious that when F2 is ignored
the result will be hi/$; when Fl is ignored the result will be hj/ik; and when both Fl
TABLE
FEATURE-BY-ITEM
PARADIGMATIC
and F, are ignored
can write:
MATRIX
SEMANTIC
+
+
FI
F2
2
+
-
FOR A
SYSTEM
+
-
the result will be hijk. In that case, corresponding
Nhi = Njk = n(hijk)
+ n(hibk),
(6)
Nhj = Ni,
+ n(hj/ik),
(6’)
= n(hijk)
Nhk = Nij = n(hijk).
Clearly, (6”) cannot
whether (6) will be
ordering. Whenever
this pattern, we can
to (3) above, we
(6”)
be larger than (6) or (6’) but
larger or smaller than (6’). Th
we can find in the data matrix
hypothesize that it arose from
no prediction
can be made as to
us, the proximities
form a partial
a subset of four items that satisfies
a paradigmatic
system of features.
Linear Organization.
If one knew in advance that a particular set of items formed
a linear sequence (e.g., the set BABY, CHILD,
ADOLESCENT,
ADULT),
sorting would
probably not be the method of choice for estimating distances between them. However,
since subsets of linearly related items may be included in a larger vocabulary, we should
consider what the effect would be on the data matrix.
If items Wi , Wj , and W, , in that order, are part of a rectilinear series of items, we
would not expect to find the sorting h/j. If n(ik/j) = 0 in equations (2) and (3), therefore, we have:
Dii =
0
+ n(jk/i)
+ n(ilj/K),
(7)
Djk = +jP)
+
+ n(iljP),
(7’)
Dik = n(i/k)
+ n(jk/i)
+ n(i/j/k).
(7”)
0
From (7) it is obvious that
Dij + Dj,
= Di, + n(i/‘j/k).
176
MILLER
One would prefer, of course, to have a distance measure d such that d,, + dj, :: di,,
when all three items lie along a line. With the method of sorting, however, n(ilj/k) is an
additive constant which may have a different value for every subset of three items. If
we consider only three items, we will not be able to decide from sorting data whether
a linear hypothesis is adequate or not. For longer series, however, the linear constraints
become progressively stronger.
As a practical matter, whenever there is reason to expect either a paradigmatic
or
a linear organization,
one of the multidimensional
scaling techniques should probablv
be used (e.g., Kruskal, 1964).
Hierarchical
Organization.
For reasons to be discussed later, hierarchical
(taxonomic) organization
based on relations of class inclusion is a pervasive feature of the
lexicon. In a hierarchical
system, unlike the paradigmatic,
not every item has a value
for every feature. For example, those living things that are classified as animals may
then be further classified as vertebrates, but those living things that are classified as
plants cannot be classified as vertebrates.
Plants have no value for the vertebrate
feature; the result of applying the vertebrate feature to plants is undefined. Thus, the
vertebrate feature is said to depend on the animal feature or, conversely, the conceptual
feature animal dominates the conceptual
feature vertebrate. The vertebrate classification can be applied only to those things that have already been classified as animals.
A feature Fl is said to dominate a feature F, just in case F, is defined only for items
having a particular
value of Fl . If for a given vocabulary we have a sequence of
+
'k
FIG. 1. A hierarchical
semantic
F2 is defined only for lexical items
system, where semantic
feature
having the value - for Fl .
features such that FE dominates F,+l
feature Fi , he must also ignore all the
know whether they are relevant or not
impose a hierarchical
ordering on the
F, dominates
feature
F, , i.e.,
, then whenever
a judge decides to ignore
features that Fi dominates, since he will not
without taking Fi into account. This fact will
data obtained by the sorting method. Let si
A PSYCHOLOGICAL
METHOD
TO
INVESTIGATE
represent the set of items that will be clustered
we will have a decreasing sequence of sets
together
VERBAL
CONCEPTS
if feature Fi is ignored.
177
Then
Obviously, items in si+i differ on fewer features than do items in set si . If N, is the
number of judges who put the items of s together, then, because of the dominance
relation among the features,
or, in terms of distances,
In short, the fewer features for which two items in a hierarchical
conceptual system
differ, the smaller is the distance between them.
The effect of ignoring features in a hierarchical
system of this kind is to produce
what has been called a hierarchical
clustering scheme (Johnson, 1967). A hierarchical
clustering scheme consists of a sequence of clusterings having the property that any
cluster is a merging of two or more clusters in the immediately
preceding clustering.
For example, the sequence of clusterings:
h/i/j/k, h/i/jk, h/qk, hijk,
forms a hierarchical
clustering scheme, and can easily be represented by a tree graph.
The important
fact about a hierarchical
clustering scheme is that whenever two items
Wi and Wi form a cluster, there cannot be any subsequent cluster ik that excludes Wj ,
or any cluster jk that excludes Wi ; once they have been placed together, they stay
together in all subsequent clusterings. A clustering scheme describes a hierarchical
structure, but it differs from a hierarchical
conceptual system in that no interpretation
is given, and no conceptual features are assigned to the various branch points in the
hierarchy.
From the fact that a conceptual hierarchy must have the structure of a hierarchical
clustering scheme, the triangle inequality can be considerably strengthened.
Consider
the case of three items Wi , Wj , and W, related as in Fig. 1. Figure 1 shows that the
effect of ignoring Fl will be to produce the clustering ijk; if F, is ignored, the clustering
will be iljk; and if neither Fl nor F, is ignored, the clustering will be ilj/k. These
three clusterings
form a hierarchical
clustering
scheme. Since the clusterings
ijjk
and ik/j cannot occur, n(;jjk) = n(ik/j) = 0. E ver y one must select from among three
possible clusterings, so (2) becomes
N = n(ijk) + n(iljk) + n(iii/k),
178
MILLER
and (3) becomes
Nfj = n(ijh),
(8)
Nj, = n(ijk) + n(iljk),
(8’)
NLk = n(fjh).
03”)
It follows that
N > Nik > Nij = Nik > 0,
or, in terms of distances,
0 < Dj, < Dij = Di, < N.
For any three items in such a system, therefore, the distances between them must all
be equal, or if one distance is less, the other two must be equal. In this case, as Johnson
has shown, D satisfies the ultrametric inequality:
Dj, < max[Dij , DJ,
(9’)
for any choice of i, ,i, and K. The weaker triangle inequality (4’) follows directly from
this ultrametric
inequality. Expressed in terms of N rather than D, this becomes
Njk 3 min[N,j
It should be noted that we have
the Nii , and even then only the order
The patterns of relations for which
transformation
of the numbers that
HIERARCHICAL
, N,,].
(9)
not used more than the ordinal relations among
within a particular series of dominance relations,
we are looking will remain invariant under any
leaves their order unchanged within any branch.
CLUSTERING
SCHEMES
From Johnson’s argument we know that when the ultrametric
inequality is satisfied,
there is a perfect match between a matrix of distances and a tree graph of the hierarchical clustering scheme; from a complete specification of either one, the other can
be directly obtained. We will not recapitulate that argument here, although the method
of constructing
a tree from a matrix of distances satisfying the ultrametric
inequality
can be briefly described. The key to the method is that items separated by the minimum
distance are merged and treated as a single element in a new matrix. The effect of the
merging is to produce another clustering in a hierarchical
sequence of clusterings.
The procedure
is the following.
Find the smallest distance (largest number of
subjects in Table 1, for example) and merge those elements. Suppose, for example,
that Wi and W, are two elements merged at this minimum distance; if the ultrametric
A PSYCHOLOGICAL
METHOD
TO INVESTIGATE
VERBAL
CONCEPTS
179
inequality holds, and if Dij is the smallestdistance,then Di, = Djk for any other item
W, . If the distancefrom Wi to any W, equalsthe distanceof Wj to any W, , then when
we merge Wi and Wj into a new element, the distanceof this cluster to all other W,
must be the distancefrom either of the mergedelementsto W, . So there is no difficulty
in forming a new, smaller matrix of distanceswith the clustered elementsreplaced
by their merger.
Now repeat this procedure on the new matrix: Find the smallestdistance, merge
those items, note the new clustering and the distance associatedwith it. Again, if
the ultrametric inequality holds, the distancesfrom all other items to the merged
set will equal their distancesto eachmember of the merger. We continue this iteration
until all items are merged together in a single cluster. The tree is simply a graphical
record of this sequenceof mergings. Only the ordinal properties of the data are used;
any matrix of values having the sameordinal relations would give the same tree
topologically, although the distancesassignedto the branch points would, of course,
be different.
The procedure runs smoothly when the ultrametric inequality is perfectly satisfied.
In practice, however, even when a hierarchical system is involved, sorting data will
be noisy, so that when Wi and Wj are merged, Di, and Djk will not be precisely equal
for all K. One assumesthat some of this noise results from the use of idiosyncratic
features by somejudges, or even from a failure to follow instructions. In any case,
the problem arisesof defining the distance Dtijjk between the cluster ij and item W,
when Di, f Dj, .
The problem can be illustrated by applying the merging procedure to the data
of Table 1. Consider first the pair of items, FEAR and REGRET, which, according to
Table 1, were judged to be similar in meaning by 48 of the 50 judges. We therefore
assign2 as a measureof the distance between FEAR and REGRET. PLANT and TREE are
alsoseparatedby the samedistance.At this level, therefore, the 48 items are grouped
into 46 clusters, 44 of which contain a singleitem and two of which contain a pair.
At D = 2, therefore, the similarity matrix is reduced to 46 x 46, and the question
is what distancesto assignto the clusters. For example, from Table I we can extract
the following frequenciesof paired occurrences:
w,
wi
=
FEAR
wj
=
REGRET
=
THRILL
42
42
WISH
URGE
38
37
37
37
EASE
25
25
HONOR
12
12
When FEAR and REGRET are merged into a single element, the distancesof the pair to
every other element are almostthe same,although even here there is somenoisein the
data. (From (5) of course, we know that the difference here cannot exceed2.) From
180
MILLER
FEAR-REGRET
to THRILL
the distance must be 8. From FEAR-REGRET
to WISH, however,
the distance can be either 12 or 13, depending
on which value we take. If the ultrametric inequality held without exception, no decision would be necessary. With noisy
data, however, some discrepancies are to be expected.
There are various alternatives open at this point (see Sokal and Sneath, 1963). We
Connectedness
Diometer
method
Number
method
of subjects
FIG. 2.
Tree graphs
of cluster analysis
applied
to data on 48 English
nouns
sorted
by
50 judges (see Table 1). According
to the connectedness
method,
the distance
of an item to a
cluster is its distance to the nearest member
of the cluster. According
to the diameter
method,
the
distance of an item to a cluster is its distance to the farthest
member
of the cluster. In a perfect
hierarchical
clustering
scheme to two methods would give the same results. Here, clusters which
are common
to the two methods
of analysis are indicated
by open circles at the appropriate
nodes.
A PSYCHOLOGICAL
METHOD
TO
INVESTIGATE
VERBAL
CONCEPTS
181
might take the mean or, since the numbers have only weak ordinal validity as measures
of similarity,
the median. Johnson’s proposal is to solve the problem twice, first using
the minimum
distance, and then again using the maximum distance. If, as the ultrametric inequality demands, the two distances are reaily equal, then the maximum and
the minimum should not be widely discrepant and the two solutions should give more
or less the same answer. But if the two hierarchies are quite different, we should be
warned either that we are not dealing with a hierarchical
conceptual system, or that the
data are too noisy for precise analysis.
The result of applying this analysis to the data in Table 1 is shown in Fig. 2. On the
left is the hierarchical
clustering scheme that results when the distance of an item to
a cluster is taken to be its distance to the nearest member of the cluster (Johnson’s
“connectedness
method”;
Sokal and Sneath’s “clustering
by single linkage”).
On the
right is the hierarchical
clustering scheme that results when the distance of an item to
a cluster is taken to be its distance to the most distant member of the cluster (Johnson’s
“diameter
method”;
Sokal and Sneath’s “clustering
by complete linkage”).
The distance between any pair of items can be read off the graph; it is the number associated
with the branch point representing
the smallest cluster to include them both, e.g.,
by the connectedness method, the distance from MOTHER
to COOK is 9.
The 48 nouns in Fig. 2 can be listed in an order such that both hierarchies can be
graphically
represented without any crossing lines in the tree graphs. The maximally
connected scheme contains 41 nonterminal
nodes; the minimum
diameter scheme
contains 43 nonterminal
nodes; 29 nodes (those indicated by open circles) represent
clusters that are common to both schemes. Thus, about 70% of the clusters indicated
by the two methods are common to both. (See note added in proof.)
Whether or not this degree of disagreement between the two methods is compatible
with the assumption that these 48 items represent a hierarchical
conceptual subsystem
is a matter for individual judgment.
It should be noted, however, that judgments
of
similarity shared by more than half the judges seem to arrange themselves the same
way according to both methods. It is the long distances (which are based on small
numbers of judges) that are most unreliable.
The connectedness
method tends to
emphasize the smaller and probably unreliable
values of Nii ; the diameter method
tends to suppress them. For that reason, the diameter method probably gives a more
reliable picture of the hierarchical
structure, although longer distances necessarily
remain indeterminant.
DISCUSSION
Bny psychological analysis of the structure underlying our system cf verbal concepts
should be judged against (at least) two criteria: plausibility
and linguistic relevance.
Plausibility
implies agreement with what most people would accept as the most basic
182
MILLER
verbal concepts in our language. Linguistic
relevance
theories of linguistic semantics that have been proposed
implies compatibility
in recent years.
with
Plausibility.
The argument for plausibility
rests on the results obtained by the
diameter method of cluster analysis, shown in the right half of Fig. 2. Those clusters
can readily be interpreted
in terms of abstract concepts that seem to comprise important components of the definitions of the items.
Of the 48 nouns, 24 are names of things and 24 are not. It was thought that if this
important
conceptual feature could not be recovered, the method of sorting should
probably be abandoned as a tool for the study of verbal concepts. In Fig. 2 the first 24
nouns are object names, and the second 24 are not. Was the object concept recovered ?
If we consider the connectedness solution, the most basic feature would seem to be
human vs nonhuman;
this is suggestive, but probably wrong, since it is based on the
most unreliable
part of the data in Table 1. If we consider the diameter method, the
object concept is not violated, although it is not fully confirmed, either; the diameter
analysis leaves us with five clusters that might be loosely interpreted as names of living
things, names of nonliving things, quantitative
terms, kinds of social interaction,
and
psychological
terms. Within these five clusters further plausible subdivisions
can be
identified.
In order to test further the general hypothesis that judges cluster items on the basis
of shared conceptual features, all 48 items were looked up in Roget’s Thesaurus and the
maximum number of shared categories in that classification scheme was tabulated for
every pair of items. In Fig. 3 the mean proximity
for pairs of items is plotted as a
function of the number of shared Roget features. It can be seen that there is a rough
S.D. - 7.1
x
.?I
I 3.0
9.6
15.0
I
2
3
Number
3. Mean proximity
they share in Roget’s
16.5
4
5
0
0
6
7
40-
0
FIG.
features
15.0
is plotted
Thesaurus.
of shored
for
all pairs
features
of items
(Roget)
as a function
of the number
of
A PSYCHOLOGICAL
METHOD
TO
INVESTIGATE
VERBAL
CONCEPTS
183
correlation,
although the variability
is so great that no precise prediction
of the data
in Table 1 could be derived from Roget’s classification. It is not obvious whether this
low correlation
argues against the plausibility
of the present results or of the Roget
criterion. It is conceivable that the method of sorting could be used to construct a
thesaurus that would conform even better to our implicit system of verbal concepts.
Linguistic Relevance.
The program outlined by Katz and Fodor (1963) can be taken
as a representative
statement of the goals of linguistic semantics. As they point out,
it is not sufficient for a semantic theory to provide a plausible characterization
of the
meanings of individual
lexical items. The meanings must be formulated in such a way
that interpretations
can be constructed, according to rule, for combinations
of words
that can occur in grammatical
constructions.
In order for an analysis of verbal concepts
to have linguistic relevance, it must be compatible with this goal of linguistic semantics.
The proposal advanced by Katz and Fodor is that a definition for some particular
sense of a word should be formulated
as a list of semantic markers, followed by a
distinguisher
that summarizes features specific to the meaning of that particular word.
A semantic marker is any universal basis for classifying meanings;
the markers
represent just those relations that are systematic in the language. To change a marker
would require changing the entries for many words in the lexicon. To change a
distinguisher,
however, would affect the definition of only that particular word. Katz
and Fodor then suggest how rules might be formulated to combine these markers and
distinguishers
for individual
words in order to construct interpretations
for grammatical combinations
of words.
The question arises, therefore, as to whether clusters obtained by the method of
sorting bear any relation to the semantic markers postulated
by Katz and Fodor.
To the extent that these systems are compatible,
the present results might be said to
have linguistic relevance. However, since Katz and Fodor do not offer any explicit set
of semantic markers, detailed comparison is impossible.
For this discussion, therefore, the question of linguistic relevance will be approached
differently, although the ultimate goals of semantic theory as stated by Katz and Fodor
will not be questioned. In particular,
the distinction
between semantic markers and
distinguishers
will be differently formulated,
with greater reliance on the form of the
definitional
statement itself.
Consider, for example, how a lexical entry might be phrased for the noun knight:
KNIGHT,
a man who has been raised to honorary
military
rank.
Here we have a class name, man, followed by a phrase (usually a relative clause) that
specifies how this member of the class is to be distinguished.
Let us assume that this is
a general formula for the definition of common nouns.
In order to provide such definitions,
of course, it is necessary to know how much of
the definition is to be included in each part of the formula. For example, we might
have defined knight as follows:
184
MILLER
KNIGHT,
a person who is male and who has been raised to honorary
military
rank.
In this version we have shifted the information
that knights are men out of the class
term and into the specifying clause. How could we decide which of these two definitions is to be preferred?
Although this division can vary somewhat as a function of use and context, a decision
can usually be based on our intuitive judgment
about the consequences of negation.
Negation has the effect of denying only the most specific feature of a definition.
For example, the sentence, Leslie is not a Knight, denies that Leslie has been raised to
honorary military rank, but it does not deny that Leslie is a man. In order to use the term
knight at all with respect to Leslie, we presuppose that Leslie is a man. Let us, therefore, call the first part of the formula the presupposition
of the noun and the second
part the assertion of the noun (Langendoen,
in press, Ch. 5). Then we can say that
negation denies the assertion, but not the presupposition.
On this basis, therefore, we
are led to prefer the first definition of KNIGHT to the second.
If we wish to deny that Leslie is a man, we would not normally say that Leslie is not
a knight, not a bachelor, not a father, etc. We would say Leslie is not a man. Presumably
the lexicon contains an entry of the form.
MAN,
a person who is male.
Leslie is not a man denies that Leslie is male, but it does not deny that Leslie is a
person. If we wish to deny that Leslie is a person, we exploit the definition.
PERSON,
a being that is human (a human being).
Ledie is not a person denies that Leslie is human, but leaves standing the presupposition that Leslie is a being (a pet turtle, perhaps).
The principle
involved is that a common noun will not normally be used in a
predicate phrase unless the subject satisfies the presuppositions
for its use, i.e., we do
not normally say such things as The shivt is a number. For this principle to hold for both
affirmative and negative sentences, negation of the predicate cannot be interpreted
as denying these presuppositions.
(One consequence is to allow a subtle kind of libel;
to say, for example, that Tom is not a thief denies that Tom is a criminal who steals,
but presupposes without asserting it that he is a criminal of some other sort.) In order
to use and understand
negative sentences of this type, an adult speaker of English
must have his lexical information
stored in such a manner that he can distinguish the
presupposition
from the assertion of any common noun. Since the presupposition
of
one noun may include the assertion of some more abstract noun, this requirement
imposes considerable
structure on the subjective lexicon.
It would be reckless to argue that this presuppositional
structure is the only principle
of organization
for our lexical memory, but a more conservative claim can be made that,
since the presuppositional
structure must be available to native speakers of English,
it might have been exploited by the judges in their performance on our sorting task.
A PSYCHOLOGICAL
METHOD
TO INVESTIGATE
VERBAL
CONCEPTS
There would seem to be two ways such information
could be used. A judge
combine items having the same assertion, as in HUSBAND, WIFE, SPOUSE:
185
might
a man who is married.
a womanwho is married.
SPOUSE, a person who is married.
HUSBAND,
WIFE,
Or he might combine items having the same presupposition,
as in HUSBAND, KNIGHT,
BROTHER.
In making this decision, of course, a judge will be guided by the particular
vocabulary he is sorting and probably by a general assumption that a solution with
fewer clusters is more satisfying than one with many clusters. Since presuppositions
are necessarily more general than assertions, it is probable that any haphazard selection
of items to be sorted will lend itself better to presuppositional
than to assertional
clustering.
For the 48 items used in this study, therefore, we would expect the sorting task to
be carried out largely on the basis of what the items presuppose, rather than what
they assert. In that case the sorting task will reveal degrees of compatibility
among
presuppositions,
and the structure that emerges should be a presuppositional
structure.
This presuppositional
structure is hierarchical,
as is implied in KNIGHT-7lla71-$W'SO?Z
being. That is to say, being dominates person, since person is undefined for nonliving
things; person dominates
men, since man is undefined
for nonpersons;
and man
dominates KNIGHT,
since KNIGHT is undefined for nonmen. As we indicated above,
feature dominance insures that the system will be hierarchical.
Even in paradigmatic
systems, where conceptual features are not related by dominance (every item has a value for every feature), an argument can be made for expecting
hierarchical
structure in the presuppositions
of the items. For example, the kin term
UNCLE (ignoring
uncle-in-law)
can be characterized
hierarchically
as UNCLE-brotherman-person-being
if the subjective lexicon includes such definitions as
UNCLE,
a brother who has a sibling
a man who has siblings.
a person who is male.
who is a parent.
BROTHER,
MAN,
This sequence implies that sex is the most general kinship feature in English, that
lineality is less general, and generation is least general; even though every kin term can
be classified with respect to all three. This argument
rests on the judgment
that
Leslie is not an uncZe presupposes that Leslie is somebody’s brother, but denies that
his sibling is a parent. However, this is an empirical question open to further research.
Similarly, a linear sequence can also receive a hierarchical
structure under this interpretation. Consider, for example, the following definitions:
BABY, a child who is very young.
CHILD,
a person who is young.
ADOLESCENT,
a person who is approaching
maturity.
ADULT, a person who is mature.
480/6/2-z
186
MILLER
These definitions rest on the judgments
that He is not a baby denies that he is very
young but presupposes that he is a child; that He is not a child denies that he is young
but presupposes that he is a person ; that He is not an adolescent denies that he is
approaching
maturity but presupposes that he is a person; and that He is not an adult
denies that he is mature but presupposes that he is a person. The linear ordering is
conveyed by the assertions: very young, young, approaching
maturity, and mature.
The presuppositions,
however, are hierarchical.
Insofar as judges reiy on shared presuppositions,
therefore, we would expect
their clusters to reveal a presuppositional
hierarchy. If judges have recourse to other
grounds for classification, however, their clusters will probably cut across this hierarchical structure.
The question of linguistic relevance, therefore, comes down to this: How well can
we account for the clusters shown in Fig. 2 in terms of our conjecture that judges were
sorting on the basis of the presuppositions
and assertions of the definitions?
The
answer demands a detailed analysis of the clusters obtained. Without
carrying it
through in detail, we can illustrate how such an analysis might proceed.
The first five items in Fig. 2 are the following:
a woman who has borne a child.
a person who prepares food by using heat.
DOCTOR, a person who is licensed to treat diseases.
UMPIRE,
a person who rules on the plays of a game.
KNIGHT,
a man who has been raised to honorary military
MOTHER,
COOK,
rank.
Since COOK, DOCTOR, and UMPIRE all presuppose person, they can be combined on
that basis simply by ignoring the assertions that distinguish them. MOTHER and KNIGHT,
however, cannot join the person cluster unless some of their presuppositions
are
ignored. Since a MOTHER is a woman and a WOMAN is a person, and since a KNIGHT is
a man and a MAN is a person, MOTHER and KNIGHT presuppose all that COOK, DOCTOR,
and UMPIRE presuppose, plus a little more. Some judges might not be willing to ignore
those additional presuppositions,
and so the number putting MOTHER and KNIGHT with
the person cluster would be correspondingly
smaller. This pattern is confirmed by
the general topology of the clustering scheme in Fig. 2.
Our hypothesis about what the judges were doing when they combined these nouns
having different presuppositions,
therefore, might be stated as follows: The greater
the number of presuppositions
of a noun that have to be ignored in order to include
it in a cluster with other nouns, the smaller will be the number of judges who include
it. Thus, no presuppositions
have to be ignored to cluster COOK, DOCTOR, and UMPIRE;
one pressupposition
has to be ignored in order to put MOTHER and KNIGHT in that
cluster. From Table 1 we can determine
that when no presuppositions
had to be
ignored, the numbers of judges putting the items together in pairs were 47,44, and 44,
but when one presupposition
had to be ignored the numbers were 41, 40, 39, 38, 38,
A PSYCHOLOGICAL
METHOD
TO INVESTIGATE
VERBAL
CONCEPTS
187
36, and 36. On the average, about seven judges were unwilling
to overlook the additional presuppositions
involved in MOTHER and KNIGHT.
It is instructive to carry this detailed analysis through the next five items in Fig. 2.
They might be defined as follows:
a plant that is large and has a woody trunk.
a living thing (being) that is not an animal.
ROOT, a part of a plant that grows in the soil.
HEDGE, a row of bushes that is planted as a fence.
FISH, an animal that lives in water and breathes with
TREE,
PLANT,
gills.
Since a TREE is a plant, no presuppositions
are ignored by putting TREE and PLANT
together;
indeed, even the assertion of PLANT is respected by this cluster. ROOT,
however, poses a new problem, since it is not a kind of a plant but a part of a plant.
The judges behaved as if part of a plant presupposed plant, i.e., as if one presupposition of ROOT had to be ignored in order to include it in the cluster with PLANT and
TREE. A similar argument
accounts for HEDGE, which is a collection of bushes, which
must presuppose bush, which in turn is a plant; thus, two presuppositions
of HEDGE
must be ignored in order to include it with PLANT, TREE, and ROOT.
In order to explain why FISH was clustered more with plants than with persons,
we must appeal to folk taxonomy. Judges sorted as if they did not consider animal
among the presuppositions
of person. If the present data are interpreted
literally, these
judges (about 40% of them) sorted as if PLANT and ANIMAL
both presupposed
some
class of nonhuman-living-thing,
which in turn presupposed
Ziving-thing (being), which
in turn presupposed
thing. In order to put FISH with plants, this presupposition
of
nonhuman-living-thing
did not have to be ignored, whereas it would have been ignored
if FISH had been put with persons. To the degree that this account is implausible,
of
course, it may indicate that something here has not been correctly understood. In any
case, our account for FISH is tentative and in need of further investigation
using a
variety of other animals, plants, and persons.
Anyone who pursues this analysis through the rest of the list will discover that the
10 items just analyzed constitute the most tractable subset to treat in this manner.. It
is obvious that other considerations
influenced many of the judges’ decisions. For
example, JACK and WHEEL probably go together, not on presuppositional
grounds, but
because a jack is used to raise a car when removing a wheel, i.e., on implicational
grounds. Similarly,
YACHT
and SKATE probably
go together because both imply
recreational
activities. Some of these nonlexical implications
were conveyed inadvertently by the sentences that were used to illustrate the intended sense of the word
(see Appendix).
In future studies it might be advisable to omit such sentences.
The complexity
that can result when both assertions and presuppositions
are
used is illustrated by the quantitative
items in the lower half of Fig. 2. Let us assume
that the information
about these items might be represented as follows:
188
MILLER
a distance that is $5 foot.
a magnitude that is linear.
MEASURE, a number that denotes a magnitude.
NUMBER,
a symbol that denotes how many times a thing is taken.
ORDER, an arrangement
that is methodical and successive.
GRADE, a class that is relative to an order.
SCALE, an order that is used for measurement.
INCH,
DISTANCE,
On this representation,
only MEASURE-NUMBER
and SCALE--ORDER
are related by presuppositions;
GRADE is related to ORDER by its assertion ; MEASURE is related indirectly
to the presupposition
of INCH by its assertion (or perhaps more directly if INCH is
defined as a measure of distance-that
is not an inch is somewhat ambiguous in this
respect); and INCH-MEASURE-NUMBER
is related to ORDER-GRADE-SCALE
only by the
assertion of SCALE. Nonetheless, in this context these six nouns form a highly integrated
cluster.
Analysis in terms of definitional presuppositions
and assertions should not be pushed
further than it wants to go, of course, and there is good reason to believe that judges
often had recourse to other grounds for forming their clusters. Introspectively,
judges
work by trying to put the words into sentence contexts. They explore such formulations
“They are all things that Y,”
as “They are all X’s,” where X is a presupposition;
where Y is an assertion; but also “They all involve 2” or “You use them all to talk
about 2” or “They all have something to do with 2,” etc., where 2 may be almost any
kind of nonlexical information.
Lacking anything better, some judges will even put
two items together if it is easy to form a sentence using them both; Anglin’s
data
(reported in Miller, 1967) suggest that this strategy is common in children, who may
not have learned to give special attention to definitional
sentences in judging similarity
of meaning or, perhaps, have not yet become adept at distinguishing
presupposition
from assertion.
Insofar as definitional
analysis is appropriate,
however, the results obtained from
the method of sorting can be said to have linguistic relevance. The method of sorting
should not be viewed as a discovery procedure-a
mechanical method for discovering
the presuppositions
and assertions of our subjective definitions.
But when it is used
cautiously with appropriate
consideration
for the choice of items and instructions
it
may provide a useful test for semantic hypotheses derived on other grounds.
APPENDIX
The following 48 nouns, with definitions and examples of use, were sorted by the
judges to obtain the results given in Table 1. The first 24 are names of objects, the
second 24 are names of nonobjects.
The definitions were taken with minor modifications from the Thorndike Barnhart Beginning Dictionary
(New York: Doubleday,
1964‘).
A PSYCHOLOGICAL
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20,
21.
TO INVESTIGATE
VERBAL
CONCEPTS
189
shaped piece of iron attached to a chain or rope to hold a ship in place.
The ship lost its anchor in the storm.
BLEACH, chemical used to whiten something.
My wife refuses to use any bleaches
in her laundry.
COOK, person who prepares food by using heat. Judging
by the taste of this the
restaurant must have hired a new cook.
DOCTOR,
person licensed to treat diseases. Many doctors are convinced that most
illnesses simply run their course.
EXHAUST,
the used gasoline that escapes. The smog above our cities is produced
by the exhaust of countless cars.
FISH, any animal that lives in water and has gills for breathing.
The pond used to
have many fish before boating became so popular.
GLUE, substance used to stick things together.
Epoxy cement seems to be much
better than fish glue.
HEDGE,
thick row of bushes planted as a fence. We have a hedge of multiflora
roses in front of our house.
IRON, tool for pressing clothes. Most women like these irons that dampen as they
press.
JACK, machine for lifting. Most cars come with a jack fitted in the trunk compartment.
KNIGHT,
in the Middle Ages a man raised to an honorable military rank and pledged
to do good deeds. I used to love to read stories about the knights of the Round
Table.
LABEL, a slip of paper or other material attached to anything and marked to show
what or whose it is or where it is to go. Can you read the label on the box ?
MOTHER,
a female parent. My own mother would say, “You only have one mother.”
NEST, a structure
used by birds for laying eggs and rearing young. There’s a
juneo nest in the tree outside my window.
ORNAMENT,
something to add beauty. With all her bejeweled ornaments she looks
like a chimp wearing a lei of hibiscus.
PLANT, a living thing that is not an animal. We used to have all kinds of plants in
the sunporch.
QUILT, a bedcover. A quilt seems much warmer than an ordinary blanket.
ROOT, part of a plant that grows down into the soil. Pines have a very simple
root system.
SKATE, a frame with a blade fixed to a shoe so a person can glide over the ice. It’s
good you brought your skates since the pond is frozen over most of the winter.
TREE, large plant with woody trunk. The trouble with oak trees is that they drop
a lot of leaves.
UMPIRE,
person who rules on the plays in a game. Some baseball fans think that
umpires come from Transylvania,
too.
1. ANCHOR,
2.
METHOD
190
MILLER
22. VARNISH,
a liquid that gives a smooth, glossy appearance to wood. I still have
trouble distinguishing
between shellac and light varnish.
23. WHEEL, a round frame turning on its center. I don’t think it’s worth buying extra
wheels for snow tires.
24. YACHT, boat for pleasure trips. Was it old J.P. who said, “If you have to ask how
much a yacht costs, you can’t afford one” ?
25. AID, help, support. The United States gave aid to Europe.
26. BATTLE, fighting,
war. One of the greatest battles of the war was at Gettysburg.
27. COUNSEL, advice. He was always ready with good counsel, if not with money.
28. DEAL, business arrangement.
That salesman always has several deals going at the
same time.
29. EASE, comfort, relief. She tried to find ease from her pain in every way possible.
30. FEAR, state of being afraid, dread. He lost his fear of the dark when he grew older.
31. GRADE, degree of rank, quality, value. She always bought eggs of the best grade.
32. HONOR, glory, fame. We were taught to strive for honor rather than money.
33. INCH, i+ foot. The box was only 3 inches deep.
34. JOKE, something said or done to make someone laugh. He often cracked jokes
to make his visitors feel at ease.
35. KILL, act of destroying. To be an ace you have to have at least five kills to your
credit.
36. LABOR, work. Labor enobles man but I’m opposed to nobility.
37. MEASURE,
size. We must know her waist measure to assure a good fit.
38. NUMBER, sum, total. The number of your fingers is 10.
39. ORDER, way one thing follows another. He wrote the words down in alphabetical
order.
40. PLAY, fun, sport. We often watch the children at play.
41. QUESTION, thing asked. Feel free to interrupt if you have any questions.
42. REGRET, feeling of being sorry. He was filled with regrets for what might have been.
43. SCALE, series of steps or degrees. His employees were underpaid
by any scale.
44. THRILL, a shivering, exciting feeling. She gets thrills from the movies.
45. URGE, a driving force or impulse. She felt a strong urge to cry out.
46. vow, a solemn promise. He took a vow not to shave until she returned.
47. WISH, desire or longing. Her wish is quite reasonable-for
a yacht, isn’t it ?
48. YIELD, product. There was an excellent yield of corn this year.
Note added in proof. The results of a Monte
Carlo simulation computed
by David Presberg
provide
a context
in which to evaluate
whether
the data conform
to a hierarchical
clustering
scheme. A simulated
random sorting by a single judge was generated
by, first, permuting
48 items
in a pseudorandom
order,
then partitioning
that ordering
of items into clusters
sequentially
according
to the rule that item i + 1 would be included
in the same cluster with item i with
probability
0.702 (which replicates
the average cluster size used by the judges).
After 50 judges
A
PSYCHOLOGICAL
METHOD
TO
INVESTIGATE
VERBAL
CONCEPTS
191
had been simulated,
the 50 incidence
matrices
were summed
to give a pseudorandom
proximity
matrix
comparable
to Table 1. Ten such matrices
were generated.
The largest proximity
in any
of these matrices
was 14; the likelihood
of obtaining
by chance proximities
as high as 48 is
clearly
negligible.
The ten pseudorandom
matrices
were then subjected
to cluster analysis by
both the connectedness
and the diameter
methods;
common
clusters comprised
from 6 to 21
per cent, with an average of 14.9 per cent of the clusters
being common
to both methods
of
analysis.
(For ten matrices
of random
numbers
uniformly
distributed
between
0 and 50, the
comparable
average was 10.4 per cent common
clusters.)
Thus, a percentage
agreement
as high
as 70 per cent could scarcely have occurred
by chance. Although
this is not the null hypothesis
one might prefer
to test, the results clearly indicate
that there is some significant
degree of
structure
in the data, and do not contradict
the claim that the structure
is hierarchical.
For each hierarchical
clustering
scheme,
of course,
there is a corresponding
matrix
that
conforms
to the ultrametric
inequality.
One can ask, therefore,
how closely the matrices
corresponding
to the connectedness
and diameter
solutions
in Fig. 2 match the original
data matrix
from which they were derived.
The matrix
corresponding
to the connectedness
solution
was
constructed
and correlated
entry by entry with the original
data matrix;
the product-moment
correlation
coefficient
was 0.947. The correlation
between
the diameter
solution
and the original
data matrix
was 0.954, and between
the diameter
and connectedness
solutions
was 0.925. When
these correlations
were computed
for pseudorandom
matrices,
the averages were 0.24, 0.36, and
0.23, respectively.
REFERENCES
JOHNSON,
S. C. Hierarchical
clustering
schemes. Psychometrika,
KATZ, J. J., AND FODOR,
J. A. The structure
of a semantic theory.
1967, 32, 241-254.
Language,
1963, 39,170-210.
of fit to a nonmetric
hypothesis.
J. B. Multidimensional
scaling by optimizing
goodness
Psychometrika,
1964, 29, l-27.
LANGENDOEN,
D. T. Essentials of English grammar.
New York: Holt, in press.
MANDLER,
G., AND PEARLSTONE,
2. Free and constrained
concept
learning
and subsequent
recall. Journal of Verbal Learning
and Verbal Behavior,
1966, 5, 126-l 31.
MILLER,
G. A. Psycholinguistic
approaches
to the study of communication.
In D. L. Arm (Ed.),
Journeys in science. Albuquerque:
The Univer.
of New Mexico
Press, 1967. Pp. 22-73.
SOKAL,
R. R., AND SNEATH,
P. H. A. Principles
of numerical
taxonomy.
San Francisco:
Freeman,
KRUSKAL,
1963.
RECEIVED:
May 15, 1968