JOURNAL OF MATHEMATICAL PSYCHOLOGY: A Psychological Method to Investigate GEORGE The Rockefeller (1969) $169-191 Concepts1 MILLER A. University, Verbal New York, New York 10021 The method of sorting is used to study how lexical information might be organized and stored in memory. Judges sort a set of lexical items into clusters, and data from several judges are pooled. The number of judges putting a pair of items into the same cluster is taken as a measure of the proximity of those two items. Constraints on the resulting data that are attributable to the method itself and consequences following from different structural hypotheses are considered. Data for 48 common nouns are used to test the assumption that when items are clustered it reflects a decision to ignore particular conceptual features that would normally distinguish those items, and an argument is made that the conceptual features used by most judges derive from presuppositions and assertions contained in the definitions of the nouns. Part of knowing psychological and stored. The psychological of a language question How discrete should and cross-references long been In the lexical how this a language is certainly apparently unrelated in present our subjective for psychological paper the method its vocabulary, and information is subjectively lexical user’s lexicon a subject not knowledge of vocabulary organized items. lexicon. The of sorting are nature and is used is an appropriate organized be characterized alphabetically; There speculation it multiple of these ? it is no simple list interrelations and lexical relations has research. to explore the subjective lexicon: nouns, in this case-are sorted into clusters on the basis of of meaning.” It is assumed that all items are conceptually distinct to a items--common “similarity native is knowing to ask speaker; he knows In order for a native he must deliberately enough to recognize speaker of English ignore some of their to group and nouns distinguishing use each together features. of them appropriately. as semantically By an analysis similar, of the 1 This research was supported in part by the Advanced Research Projects Agency, Grant No. DAHClS 68 G-5 to The Rockefeller University. The author is pleased to acknowledge his indebtedness toHerbert Rubenstein, who participated in the original formulation of the problem; to Virginia Teller Sterba, who collected the sorting data reported here; to Buena Chilstrom for tabulating Roget features; to D. Terence Langendoen for the formulation of noun definitions in terms of presuppositions and assertions; and to numerous colleagues who criticized earlier versions of this paper. The cluster analysis shown in Fig. 2 made use of a computer program written by Stephen C. Johnson. 169 0 1969 by Academic 480/6/z-r Press, Inc. 170 MILLER sortings one hopes to discover which conceptual features have been ignored and thus, by indirection, what the features are. A preliminary account of this research, and a comparison of the sorting method with other methods psychologists have used to study the subjective lexicon, was given by Miller (1967). RESULTS OF SORTING ENGLISH NOUNS Each of 48 English nouns was typed on a 3 x 5 in. index card, along with a short definition specifying the sense of the noun that was intended and a simple sentence illustrating that use of the word. (The 48 nouns, along with their definitions and examples, are given in the Appendix.) The resulting pack of 48 cards was handed to a judge with the request that he sort the cards into piles on the surface of a large table “on the basis of similarity of meaning.” He was allowed as many piles as he wanted, from 1 to 48, and he could put as many items as he wanted in any pile. Most people spent from 5 to 30 min at the task. Those who protested that they did not understand “similarity of meaning” were not given a clear explanation. They might be told, “Words don’t have to be synonyms to be similar in meaning,” or, if they pressed further, “We want you to tell us what it means.” Tests were conducted individually, and judges were paid for their time. Cooperation from the 50 Harvard and Radcliffe students who served as judges was excellent. All had learned English as their first language. The sorting task is mildly interesting. It has the nondemanding character of a problem for which there are many correct solutions; it resembles a concept formation task where a subject is free to choose the concepts he wants to use. Solving it is as much a matter of esthetic judgment as of conceptual knowledge. Judges would try tentative clusters, then frequently break them up and rearrange them; no constraints were imposed on the order of presentation of the items or on the judge’s freedom to revise earlier decisions. Mandler and Pearlstone (1966) report that when items must be examined and sorted sequentially, with no changes allowed, judges have little difficulty in remembering and repeating their sorting; 42.5% of their college subjects were able to repeat an initial sorting of 52 items without mistakes. In order to account for this recall Mandler and Pearlstone argue that their subjects were “imposing a conceptual rule on the stimulus array.” In the present study, where subjects are instructed to sort on the basis of similarity of meaning, we further assume that the conceptual rules are usually semantic in character. The number of categories used to sort the 48 words ranged from 6 to 26, with a mean of 14.3 and a standard deviation of 5. The clusters formed by each judge were recorded and later converted into an A PSYCHOLOGICAL METHOD TO INVESTIGATE VERBAL CONCEPTS 171 incidence matrix. The matrix is 48 by 48, and cell i, j represents the particular pair of nouns i and j. Cell i, j is one if that pair of nouns was put together and zero if they were in separate clusters. The unweighted incidence matrices, one for each judge, were added together and the resulting matrix, which represents the pooled matrices for 50 judges, was taken as the basis for further analysis. In the resulting data matrix, Nii represents the number of judges who put nouns i and j together in the same cluster. The summed matrix for the 50 judges is shown in Table 1. The matrix is necessarily symmetric (i cannot be sorted with j unless j is also sorted with i), so only the lower half of the matrix is given in Table 1. The diagonal has also been omitted; since each noun must be sorted with itself, the diagonal could be taken to be 50 for every noun. Table 1 shows that there was considerable agreement among judges as to the clusters they formed. FEAR and REGRET, for example, were put together by 48 of the 50 judges, as were PLANT and TREE. At the other extreme, many pairs of words-FEAR and PLANT, say, or REGRET and TREE-were never judged similar in meaning by anyone. We can regard Table 1 as a matrix of similarity measures, or proximities, where Nii is a measure of the semantic proximity of noun i to noun j, or we can convert it into a matrix of distances by using Dij = N- Nii. (1) There are several methods of data analysis that can be applied to such data. Which method is most appropriate depends on our theory of the psychological processes underlying the subjects’ performance. THEORY OF SORTING It should be intuitively obvious that the method of sorting imposes certain constraints on the frequencies of paired occurrences that can be obtained. In order to explore those constraints we shall assume that each person partitions the set of lexical items, i.e., creates a collection of subsets such that each element belongs to one and only one subset. It is not necessary to assume that sorting must yield a partition, but that is the simplest case. The data of Table 1 were obtained on that basis. Given that sorting creates partitions, the decision to interpret the resulting data matrix as if it were a similarity matrix is equivalent to assuming that it is appropriate to represent the relations among the items as distances. If this assumption is implausible, the method of sorting should not be used. It is quite simple to prove that D is a metric. First note that the incidence matrix expressed in D for an individual judge represents a metric. This incidence matrix contains 0 for all items put together by the judge, and 1 elsewhere. Every item is : KMC : i 1 I : . : . 1 1 3 1 UFHRP n : : : i i 1 4 1 i i 1 2 1 1 3:::; 1 5 1 A::: i : : . : 5 . i : i : : 2 1 i : : : : 1 2 1 16 11840 1 21 40 42 1 22 41 43 48 10 14 12 9 .2211 121. i 1 3 .12.. 1 2 I 2 .2321 ,443, i i 1 44 2 1 2 : 1 1 f 1 : “t : : : ~... - i i 5 2 1 i:: i . . ; 1 1 “7 “2 I 2 1 3 1 2 I 2 1 2 1 1 1 1 1 1 4 1 1 1 1 2 1 132 36 39 41 40 38 47 1: i::::::::i I.......... 1 . i........... z........... i: TNELOBGVQA 1 1 1 1 4 1.4........7 1.4........3 1 1 1. ii1043122212221 .izl..l.... , 9 11 9.......... . ii:: . SORTING : 1: : : . : 1: : 1 : DATA : .’ :iiii : 1:: I : i . ., : FOR TABLE J : I’ : . : 48 WS : i : : 1: : : : . : : : . : : 42 29 30 33 31 44 36 32 42 44 NOUNS : YIMNOGSHEFRTUWBK.\C .iiiiiiiz...l .112111621..2i40 2 2 3 2 2122229..1215531 i 1 2 1 I 2 1 2 2 6 : 2 1 2 2 3 11 3 2 I 1 13 2 1 1 13 2 11 2678345.......5664 i i:::::L7 : 38 37 25 27 29 ENGLISH 1 1 2 2 I 12 2 25 25 25 20 20 1215 1217161315 1 1 .17 18 19 11 3 1 16 17 13 6 2 2 3 5 61016 2 6 6 5 51018 2 810 4 5 915 I 48 4242 37 37 35 38 37 35 45 1 2 4 7 3 7 17 1 3 2 2 8 2 4 3 211 12 12 : 10 9 10 A PSYCHOLOGICAL METHOD TO INVESTIGATE VERBAL CONCEPTS 173 necessarily put with itself, so Dii = 0; the matrix is symmetric, so that Dij = Dji ; and the triangle inequality, Dij + Djlc >, Di, , is satisfied, since a judge cannot put item i with item j, and put item j with item K without also putting item i with item k. Since the sum of metrics is itself a metric, it necessarily follows that when incidence matrices for individual judges are added together to give the data matrix, the result will also represent a metric. In order to demonstrate the implications of this fact in more intuitive terms, however, it is useful to derive it in a more explicit form. Consider any three items, Wi , Wj , and W, . Indicate the sorting of these three by slanted lines, disregarding other items. For example, if a set of six items were sorted (W, , W, , Wi , W,)( WC , W,), the sorting of Wi , Wi , and W, would be represented as ijlk. On the assumption that every sorting is a partition, the only possible ways to partition three items are: i/j/k, ijlk, iklj, jk/i, and +k. Each judge must choose one of these five possibilities. If N is the total number of judges serving in the experiment, and if n(x) denotes the number of judges who chose partition x, then N = n(ifi/k) + n(zj/k) + n(ik/j) + n(jk/i) + n(ijk). (2) Obviously, all of these numbers are nonnegative. Let the number of judges who put items Wi and Wj together be denoted by Nij , which is the value tabulated in Table 1. For these three items we can write: Nij = n(ij/k) + n(ijk), (3) Njle = n(jk/i) + n(qk), (3’) Nik = n(ik/j) + n(ijk). (3”) By rearranging (2) and substituting (3) it can be shown that (N - NJ = (N - Ni,) + 2+/j) + (N - NJ + @/j/k), from which we obtain (N - Nij) + (N - Nj,) b (N - Nik). (4) Given the definition of D in (l), (4) can be recognized as the triangle inequality: Qj + Djlc > Di, Since D is symmetric (4’) and Dii = 0, it has the properties of a distance. In short, if 174 MILLER matrices obtained by the method of sorting are interpreted as similarity measure will have properties necessary for metric representation.2 The triangle inequality can also be written: matrices, the or, equivalently, In words: the number of people who did not put items Wi and Wj together provides an upper bound on the difference between the number who put W, with each of them. Obviously, if everybody put Wi and Wj together, then anybody who put W, with one of them would necessarily put it with the other. The triangle inequality holds for sorting data generally, but if we want to obtain more specific predictions, we must introduce further assumptions about what the judges are doing. Presumably, in order to put two lexical items together in the same cluster he must decide to ignore certain conceptual differences that would normally distinguish those items. If we assume that the items to be sorted satisfy some system of features, we can consider what will happen to the values of Nij when different features are ignored by different numbers of judges. Before considering the effects of different systems of conceptual features, however, it should be pointed out that only distinctive features can be ignored. Suppose, for example, that all of the items sorted share the same value for some particular feature. It is of no concern whether the feature characterizes all of them, or characterizes none of them, or is irrelevant for all of them. If they all have the same value, then that particular feature cannot play a role in the judges’ sortings. In what follows, therefore, no claim is made that the analysis reveals all of the conceptual features that are characteristic of any of the items. Paradigmatic Organization. By a paradigmatic system is meant a set of lexical items (e.g., kinship terms) which all have values for every feature. In order to illustrate the results that we might expect from the method of sorting when the vocabulary being sorted forms a paradigmatic system, it is sufficient to consider a subset of four items, ” One class of counter-examples to the triangle inequality as applied to semantic distances follows this model: the words ACROBAT and GOBLET seem to have little in common conceptually and should be quite distant from one another, yet both are close to TUMBLER. Such cases are sometimes used to illustrate associative mediation, since a learner who uses TUMBLER as a mediator will find it easier to remember the pair ACROBAT and GOBLET. TUMBLER, however, represents several different semantic entities which merely happen to have the same phonological realization in English. Although the large distance seems to be shortened by the mediator (which converts it from a large conceptual to a negligible phonological distance), there is no real reduction in conceptual distance. Conceptual similarity must be defined for particular senses of words, and not for their phonological shapes. A PSYCHOLOGICAL METHOD TO INVESTIGATE VERBAL CONCEPTS 175 W, , Wi , Wj , and W, that are optimally distinguished by two features, Fl and F2 , as illustrated in Table 2. From this table it should be obvious that when F2 is ignored the result will be hi/$; when Fl is ignored the result will be hj/ik; and when both Fl TABLE FEATURE-BY-ITEM PARADIGMATIC and F, are ignored can write: MATRIX SEMANTIC + + FI F2 2 + - FOR A SYSTEM + - the result will be hijk. In that case, corresponding Nhi = Njk = n(hijk) + n(hibk), (6) Nhj = Ni, + n(hj/ik), (6’) = n(hijk) Nhk = Nij = n(hijk). Clearly, (6”) cannot whether (6) will be ordering. Whenever this pattern, we can to (3) above, we (6”) be larger than (6) or (6’) but larger or smaller than (6’). Th we can find in the data matrix hypothesize that it arose from no prediction can be made as to us, the proximities form a partial a subset of four items that satisfies a paradigmatic system of features. Linear Organization. If one knew in advance that a particular set of items formed a linear sequence (e.g., the set BABY, CHILD, ADOLESCENT, ADULT), sorting would probably not be the method of choice for estimating distances between them. However, since subsets of linearly related items may be included in a larger vocabulary, we should consider what the effect would be on the data matrix. If items Wi , Wj , and W, , in that order, are part of a rectilinear series of items, we would not expect to find the sorting h/j. If n(ik/j) = 0 in equations (2) and (3), therefore, we have: Dii = 0 + n(jk/i) + n(ilj/K), (7) Djk = +jP) + + n(iljP), (7’) Dik = n(i/k) + n(jk/i) + n(i/j/k). (7”) 0 From (7) it is obvious that Dij + Dj, = Di, + n(i/‘j/k). 176 MILLER One would prefer, of course, to have a distance measure d such that d,, + dj, :: di,, when all three items lie along a line. With the method of sorting, however, n(ilj/k) is an additive constant which may have a different value for every subset of three items. If we consider only three items, we will not be able to decide from sorting data whether a linear hypothesis is adequate or not. For longer series, however, the linear constraints become progressively stronger. As a practical matter, whenever there is reason to expect either a paradigmatic or a linear organization, one of the multidimensional scaling techniques should probablv be used (e.g., Kruskal, 1964). Hierarchical Organization. For reasons to be discussed later, hierarchical (taxonomic) organization based on relations of class inclusion is a pervasive feature of the lexicon. In a hierarchical system, unlike the paradigmatic, not every item has a value for every feature. For example, those living things that are classified as animals may then be further classified as vertebrates, but those living things that are classified as plants cannot be classified as vertebrates. Plants have no value for the vertebrate feature; the result of applying the vertebrate feature to plants is undefined. Thus, the vertebrate feature is said to depend on the animal feature or, conversely, the conceptual feature animal dominates the conceptual feature vertebrate. The vertebrate classification can be applied only to those things that have already been classified as animals. A feature Fl is said to dominate a feature F, just in case F, is defined only for items having a particular value of Fl . If for a given vocabulary we have a sequence of + 'k FIG. 1. A hierarchical semantic F2 is defined only for lexical items system, where semantic feature having the value - for Fl . features such that FE dominates F,+l feature Fi , he must also ignore all the know whether they are relevant or not impose a hierarchical ordering on the F, dominates feature F, , i.e., , then whenever a judge decides to ignore features that Fi dominates, since he will not without taking Fi into account. This fact will data obtained by the sorting method. Let si A PSYCHOLOGICAL METHOD TO INVESTIGATE represent the set of items that will be clustered we will have a decreasing sequence of sets together VERBAL CONCEPTS if feature Fi is ignored. 177 Then Obviously, items in si+i differ on fewer features than do items in set si . If N, is the number of judges who put the items of s together, then, because of the dominance relation among the features, or, in terms of distances, In short, the fewer features for which two items in a hierarchical conceptual system differ, the smaller is the distance between them. The effect of ignoring features in a hierarchical system of this kind is to produce what has been called a hierarchical clustering scheme (Johnson, 1967). A hierarchical clustering scheme consists of a sequence of clusterings having the property that any cluster is a merging of two or more clusters in the immediately preceding clustering. For example, the sequence of clusterings: h/i/j/k, h/i/jk, h/qk, hijk, forms a hierarchical clustering scheme, and can easily be represented by a tree graph. The important fact about a hierarchical clustering scheme is that whenever two items Wi and Wi form a cluster, there cannot be any subsequent cluster ik that excludes Wj , or any cluster jk that excludes Wi ; once they have been placed together, they stay together in all subsequent clusterings. A clustering scheme describes a hierarchical structure, but it differs from a hierarchical conceptual system in that no interpretation is given, and no conceptual features are assigned to the various branch points in the hierarchy. From the fact that a conceptual hierarchy must have the structure of a hierarchical clustering scheme, the triangle inequality can be considerably strengthened. Consider the case of three items Wi , Wj , and W, related as in Fig. 1. Figure 1 shows that the effect of ignoring Fl will be to produce the clustering ijk; if F, is ignored, the clustering will be iljk; and if neither Fl nor F, is ignored, the clustering will be ilj/k. These three clusterings form a hierarchical clustering scheme. Since the clusterings ijjk and ik/j cannot occur, n(;jjk) = n(ik/j) = 0. E ver y one must select from among three possible clusterings, so (2) becomes N = n(ijk) + n(iljk) + n(iii/k), 178 MILLER and (3) becomes Nfj = n(ijh), (8) Nj, = n(ijk) + n(iljk), (8’) NLk = n(fjh). 03”) It follows that N > Nik > Nij = Nik > 0, or, in terms of distances, 0 < Dj, < Dij = Di, < N. For any three items in such a system, therefore, the distances between them must all be equal, or if one distance is less, the other two must be equal. In this case, as Johnson has shown, D satisfies the ultrametric inequality: Dj, < max[Dij , DJ, (9’) for any choice of i, ,i, and K. The weaker triangle inequality (4’) follows directly from this ultrametric inequality. Expressed in terms of N rather than D, this becomes Njk 3 min[N,j It should be noted that we have the Nii , and even then only the order The patterns of relations for which transformation of the numbers that HIERARCHICAL , N,,]. (9) not used more than the ordinal relations among within a particular series of dominance relations, we are looking will remain invariant under any leaves their order unchanged within any branch. CLUSTERING SCHEMES From Johnson’s argument we know that when the ultrametric inequality is satisfied, there is a perfect match between a matrix of distances and a tree graph of the hierarchical clustering scheme; from a complete specification of either one, the other can be directly obtained. We will not recapitulate that argument here, although the method of constructing a tree from a matrix of distances satisfying the ultrametric inequality can be briefly described. The key to the method is that items separated by the minimum distance are merged and treated as a single element in a new matrix. The effect of the merging is to produce another clustering in a hierarchical sequence of clusterings. The procedure is the following. Find the smallest distance (largest number of subjects in Table 1, for example) and merge those elements. Suppose, for example, that Wi and W, are two elements merged at this minimum distance; if the ultrametric A PSYCHOLOGICAL METHOD TO INVESTIGATE VERBAL CONCEPTS 179 inequality holds, and if Dij is the smallestdistance,then Di, = Djk for any other item W, . If the distancefrom Wi to any W, equalsthe distanceof Wj to any W, , then when we merge Wi and Wj into a new element, the distanceof this cluster to all other W, must be the distancefrom either of the mergedelementsto W, . So there is no difficulty in forming a new, smaller matrix of distanceswith the clustered elementsreplaced by their merger. Now repeat this procedure on the new matrix: Find the smallestdistance, merge those items, note the new clustering and the distance associatedwith it. Again, if the ultrametric inequality holds, the distancesfrom all other items to the merged set will equal their distancesto eachmember of the merger. We continue this iteration until all items are merged together in a single cluster. The tree is simply a graphical record of this sequenceof mergings. Only the ordinal properties of the data are used; any matrix of values having the sameordinal relations would give the same tree topologically, although the distancesassignedto the branch points would, of course, be different. The procedure runs smoothly when the ultrametric inequality is perfectly satisfied. In practice, however, even when a hierarchical system is involved, sorting data will be noisy, so that when Wi and Wj are merged, Di, and Djk will not be precisely equal for all K. One assumesthat some of this noise results from the use of idiosyncratic features by somejudges, or even from a failure to follow instructions. In any case, the problem arisesof defining the distance Dtijjk between the cluster ij and item W, when Di, f Dj, . The problem can be illustrated by applying the merging procedure to the data of Table 1. Consider first the pair of items, FEAR and REGRET, which, according to Table 1, were judged to be similar in meaning by 48 of the 50 judges. We therefore assign2 as a measureof the distance between FEAR and REGRET. PLANT and TREE are alsoseparatedby the samedistance.At this level, therefore, the 48 items are grouped into 46 clusters, 44 of which contain a singleitem and two of which contain a pair. At D = 2, therefore, the similarity matrix is reduced to 46 x 46, and the question is what distancesto assignto the clusters. For example, from Table I we can extract the following frequenciesof paired occurrences: w, wi = FEAR wj = REGRET = THRILL 42 42 WISH URGE 38 37 37 37 EASE 25 25 HONOR 12 12 When FEAR and REGRET are merged into a single element, the distancesof the pair to every other element are almostthe same,although even here there is somenoisein the data. (From (5) of course, we know that the difference here cannot exceed2.) From 180 MILLER FEAR-REGRET to THRILL the distance must be 8. From FEAR-REGRET to WISH, however, the distance can be either 12 or 13, depending on which value we take. If the ultrametric inequality held without exception, no decision would be necessary. With noisy data, however, some discrepancies are to be expected. There are various alternatives open at this point (see Sokal and Sneath, 1963). We Connectedness Diometer method Number method of subjects FIG. 2. Tree graphs of cluster analysis applied to data on 48 English nouns sorted by 50 judges (see Table 1). According to the connectedness method, the distance of an item to a cluster is its distance to the nearest member of the cluster. According to the diameter method, the distance of an item to a cluster is its distance to the farthest member of the cluster. In a perfect hierarchical clustering scheme to two methods would give the same results. Here, clusters which are common to the two methods of analysis are indicated by open circles at the appropriate nodes. A PSYCHOLOGICAL METHOD TO INVESTIGATE VERBAL CONCEPTS 181 might take the mean or, since the numbers have only weak ordinal validity as measures of similarity, the median. Johnson’s proposal is to solve the problem twice, first using the minimum distance, and then again using the maximum distance. If, as the ultrametric inequality demands, the two distances are reaily equal, then the maximum and the minimum should not be widely discrepant and the two solutions should give more or less the same answer. But if the two hierarchies are quite different, we should be warned either that we are not dealing with a hierarchical conceptual system, or that the data are too noisy for precise analysis. The result of applying this analysis to the data in Table 1 is shown in Fig. 2. On the left is the hierarchical clustering scheme that results when the distance of an item to a cluster is taken to be its distance to the nearest member of the cluster (Johnson’s “connectedness method”; Sokal and Sneath’s “clustering by single linkage”). On the right is the hierarchical clustering scheme that results when the distance of an item to a cluster is taken to be its distance to the most distant member of the cluster (Johnson’s “diameter method”; Sokal and Sneath’s “clustering by complete linkage”). The distance between any pair of items can be read off the graph; it is the number associated with the branch point representing the smallest cluster to include them both, e.g., by the connectedness method, the distance from MOTHER to COOK is 9. The 48 nouns in Fig. 2 can be listed in an order such that both hierarchies can be graphically represented without any crossing lines in the tree graphs. The maximally connected scheme contains 41 nonterminal nodes; the minimum diameter scheme contains 43 nonterminal nodes; 29 nodes (those indicated by open circles) represent clusters that are common to both schemes. Thus, about 70% of the clusters indicated by the two methods are common to both. (See note added in proof.) Whether or not this degree of disagreement between the two methods is compatible with the assumption that these 48 items represent a hierarchical conceptual subsystem is a matter for individual judgment. It should be noted, however, that judgments of similarity shared by more than half the judges seem to arrange themselves the same way according to both methods. It is the long distances (which are based on small numbers of judges) that are most unreliable. The connectedness method tends to emphasize the smaller and probably unreliable values of Nii ; the diameter method tends to suppress them. For that reason, the diameter method probably gives a more reliable picture of the hierarchical structure, although longer distances necessarily remain indeterminant. DISCUSSION Bny psychological analysis of the structure underlying our system cf verbal concepts should be judged against (at least) two criteria: plausibility and linguistic relevance. Plausibility implies agreement with what most people would accept as the most basic 182 MILLER verbal concepts in our language. Linguistic relevance theories of linguistic semantics that have been proposed implies compatibility in recent years. with Plausibility. The argument for plausibility rests on the results obtained by the diameter method of cluster analysis, shown in the right half of Fig. 2. Those clusters can readily be interpreted in terms of abstract concepts that seem to comprise important components of the definitions of the items. Of the 48 nouns, 24 are names of things and 24 are not. It was thought that if this important conceptual feature could not be recovered, the method of sorting should probably be abandoned as a tool for the study of verbal concepts. In Fig. 2 the first 24 nouns are object names, and the second 24 are not. Was the object concept recovered ? If we consider the connectedness solution, the most basic feature would seem to be human vs nonhuman; this is suggestive, but probably wrong, since it is based on the most unreliable part of the data in Table 1. If we consider the diameter method, the object concept is not violated, although it is not fully confirmed, either; the diameter analysis leaves us with five clusters that might be loosely interpreted as names of living things, names of nonliving things, quantitative terms, kinds of social interaction, and psychological terms. Within these five clusters further plausible subdivisions can be identified. In order to test further the general hypothesis that judges cluster items on the basis of shared conceptual features, all 48 items were looked up in Roget’s Thesaurus and the maximum number of shared categories in that classification scheme was tabulated for every pair of items. In Fig. 3 the mean proximity for pairs of items is plotted as a function of the number of shared Roget features. It can be seen that there is a rough S.D. - 7.1 x .?I I 3.0 9.6 15.0 I 2 3 Number 3. Mean proximity they share in Roget’s 16.5 4 5 0 0 6 7 40- 0 FIG. features 15.0 is plotted Thesaurus. of shored for all pairs features of items (Roget) as a function of the number of A PSYCHOLOGICAL METHOD TO INVESTIGATE VERBAL CONCEPTS 183 correlation, although the variability is so great that no precise prediction of the data in Table 1 could be derived from Roget’s classification. It is not obvious whether this low correlation argues against the plausibility of the present results or of the Roget criterion. It is conceivable that the method of sorting could be used to construct a thesaurus that would conform even better to our implicit system of verbal concepts. Linguistic Relevance. The program outlined by Katz and Fodor (1963) can be taken as a representative statement of the goals of linguistic semantics. As they point out, it is not sufficient for a semantic theory to provide a plausible characterization of the meanings of individual lexical items. The meanings must be formulated in such a way that interpretations can be constructed, according to rule, for combinations of words that can occur in grammatical constructions. In order for an analysis of verbal concepts to have linguistic relevance, it must be compatible with this goal of linguistic semantics. The proposal advanced by Katz and Fodor is that a definition for some particular sense of a word should be formulated as a list of semantic markers, followed by a distinguisher that summarizes features specific to the meaning of that particular word. A semantic marker is any universal basis for classifying meanings; the markers represent just those relations that are systematic in the language. To change a marker would require changing the entries for many words in the lexicon. To change a distinguisher, however, would affect the definition of only that particular word. Katz and Fodor then suggest how rules might be formulated to combine these markers and distinguishers for individual words in order to construct interpretations for grammatical combinations of words. The question arises, therefore, as to whether clusters obtained by the method of sorting bear any relation to the semantic markers postulated by Katz and Fodor. To the extent that these systems are compatible, the present results might be said to have linguistic relevance. However, since Katz and Fodor do not offer any explicit set of semantic markers, detailed comparison is impossible. For this discussion, therefore, the question of linguistic relevance will be approached differently, although the ultimate goals of semantic theory as stated by Katz and Fodor will not be questioned. In particular, the distinction between semantic markers and distinguishers will be differently formulated, with greater reliance on the form of the definitional statement itself. Consider, for example, how a lexical entry might be phrased for the noun knight: KNIGHT, a man who has been raised to honorary military rank. Here we have a class name, man, followed by a phrase (usually a relative clause) that specifies how this member of the class is to be distinguished. Let us assume that this is a general formula for the definition of common nouns. In order to provide such definitions, of course, it is necessary to know how much of the definition is to be included in each part of the formula. For example, we might have defined knight as follows: 184 MILLER KNIGHT, a person who is male and who has been raised to honorary military rank. In this version we have shifted the information that knights are men out of the class term and into the specifying clause. How could we decide which of these two definitions is to be preferred? Although this division can vary somewhat as a function of use and context, a decision can usually be based on our intuitive judgment about the consequences of negation. Negation has the effect of denying only the most specific feature of a definition. For example, the sentence, Leslie is not a Knight, denies that Leslie has been raised to honorary military rank, but it does not deny that Leslie is a man. In order to use the term knight at all with respect to Leslie, we presuppose that Leslie is a man. Let us, therefore, call the first part of the formula the presupposition of the noun and the second part the assertion of the noun (Langendoen, in press, Ch. 5). Then we can say that negation denies the assertion, but not the presupposition. On this basis, therefore, we are led to prefer the first definition of KNIGHT to the second. If we wish to deny that Leslie is a man, we would not normally say that Leslie is not a knight, not a bachelor, not a father, etc. We would say Leslie is not a man. Presumably the lexicon contains an entry of the form. MAN, a person who is male. Leslie is not a man denies that Leslie is male, but it does not deny that Leslie is a person. If we wish to deny that Leslie is a person, we exploit the definition. PERSON, a being that is human (a human being). Ledie is not a person denies that Leslie is human, but leaves standing the presupposition that Leslie is a being (a pet turtle, perhaps). The principle involved is that a common noun will not normally be used in a predicate phrase unless the subject satisfies the presuppositions for its use, i.e., we do not normally say such things as The shivt is a number. For this principle to hold for both affirmative and negative sentences, negation of the predicate cannot be interpreted as denying these presuppositions. (One consequence is to allow a subtle kind of libel; to say, for example, that Tom is not a thief denies that Tom is a criminal who steals, but presupposes without asserting it that he is a criminal of some other sort.) In order to use and understand negative sentences of this type, an adult speaker of English must have his lexical information stored in such a manner that he can distinguish the presupposition from the assertion of any common noun. Since the presupposition of one noun may include the assertion of some more abstract noun, this requirement imposes considerable structure on the subjective lexicon. It would be reckless to argue that this presuppositional structure is the only principle of organization for our lexical memory, but a more conservative claim can be made that, since the presuppositional structure must be available to native speakers of English, it might have been exploited by the judges in their performance on our sorting task. A PSYCHOLOGICAL METHOD TO INVESTIGATE VERBAL CONCEPTS There would seem to be two ways such information could be used. A judge combine items having the same assertion, as in HUSBAND, WIFE, SPOUSE: 185 might a man who is married. a womanwho is married. SPOUSE, a person who is married. HUSBAND, WIFE, Or he might combine items having the same presupposition, as in HUSBAND, KNIGHT, BROTHER. In making this decision, of course, a judge will be guided by the particular vocabulary he is sorting and probably by a general assumption that a solution with fewer clusters is more satisfying than one with many clusters. Since presuppositions are necessarily more general than assertions, it is probable that any haphazard selection of items to be sorted will lend itself better to presuppositional than to assertional clustering. For the 48 items used in this study, therefore, we would expect the sorting task to be carried out largely on the basis of what the items presuppose, rather than what they assert. In that case the sorting task will reveal degrees of compatibility among presuppositions, and the structure that emerges should be a presuppositional structure. This presuppositional structure is hierarchical, as is implied in KNIGHT-7lla71-$W'SO?Z being. That is to say, being dominates person, since person is undefined for nonliving things; person dominates men, since man is undefined for nonpersons; and man dominates KNIGHT, since KNIGHT is undefined for nonmen. As we indicated above, feature dominance insures that the system will be hierarchical. Even in paradigmatic systems, where conceptual features are not related by dominance (every item has a value for every feature), an argument can be made for expecting hierarchical structure in the presuppositions of the items. For example, the kin term UNCLE (ignoring uncle-in-law) can be characterized hierarchically as UNCLE-brotherman-person-being if the subjective lexicon includes such definitions as UNCLE, a brother who has a sibling a man who has siblings. a person who is male. who is a parent. BROTHER, MAN, This sequence implies that sex is the most general kinship feature in English, that lineality is less general, and generation is least general; even though every kin term can be classified with respect to all three. This argument rests on the judgment that Leslie is not an uncZe presupposes that Leslie is somebody’s brother, but denies that his sibling is a parent. However, this is an empirical question open to further research. Similarly, a linear sequence can also receive a hierarchical structure under this interpretation. Consider, for example, the following definitions: BABY, a child who is very young. CHILD, a person who is young. ADOLESCENT, a person who is approaching maturity. ADULT, a person who is mature. 480/6/2-z 186 MILLER These definitions rest on the judgments that He is not a baby denies that he is very young but presupposes that he is a child; that He is not a child denies that he is young but presupposes that he is a person ; that He is not an adolescent denies that he is approaching maturity but presupposes that he is a person; and that He is not an adult denies that he is mature but presupposes that he is a person. The linear ordering is conveyed by the assertions: very young, young, approaching maturity, and mature. The presuppositions, however, are hierarchical. Insofar as judges reiy on shared presuppositions, therefore, we would expect their clusters to reveal a presuppositional hierarchy. If judges have recourse to other grounds for classification, however, their clusters will probably cut across this hierarchical structure. The question of linguistic relevance, therefore, comes down to this: How well can we account for the clusters shown in Fig. 2 in terms of our conjecture that judges were sorting on the basis of the presuppositions and assertions of the definitions? The answer demands a detailed analysis of the clusters obtained. Without carrying it through in detail, we can illustrate how such an analysis might proceed. The first five items in Fig. 2 are the following: a woman who has borne a child. a person who prepares food by using heat. DOCTOR, a person who is licensed to treat diseases. UMPIRE, a person who rules on the plays of a game. KNIGHT, a man who has been raised to honorary military MOTHER, COOK, rank. Since COOK, DOCTOR, and UMPIRE all presuppose person, they can be combined on that basis simply by ignoring the assertions that distinguish them. MOTHER and KNIGHT, however, cannot join the person cluster unless some of their presuppositions are ignored. Since a MOTHER is a woman and a WOMAN is a person, and since a KNIGHT is a man and a MAN is a person, MOTHER and KNIGHT presuppose all that COOK, DOCTOR, and UMPIRE presuppose, plus a little more. Some judges might not be willing to ignore those additional presuppositions, and so the number putting MOTHER and KNIGHT with the person cluster would be correspondingly smaller. This pattern is confirmed by the general topology of the clustering scheme in Fig. 2. Our hypothesis about what the judges were doing when they combined these nouns having different presuppositions, therefore, might be stated as follows: The greater the number of presuppositions of a noun that have to be ignored in order to include it in a cluster with other nouns, the smaller will be the number of judges who include it. Thus, no presuppositions have to be ignored to cluster COOK, DOCTOR, and UMPIRE; one pressupposition has to be ignored in order to put MOTHER and KNIGHT in that cluster. From Table 1 we can determine that when no presuppositions had to be ignored, the numbers of judges putting the items together in pairs were 47,44, and 44, but when one presupposition had to be ignored the numbers were 41, 40, 39, 38, 38, A PSYCHOLOGICAL METHOD TO INVESTIGATE VERBAL CONCEPTS 187 36, and 36. On the average, about seven judges were unwilling to overlook the additional presuppositions involved in MOTHER and KNIGHT. It is instructive to carry this detailed analysis through the next five items in Fig. 2. They might be defined as follows: a plant that is large and has a woody trunk. a living thing (being) that is not an animal. ROOT, a part of a plant that grows in the soil. HEDGE, a row of bushes that is planted as a fence. FISH, an animal that lives in water and breathes with TREE, PLANT, gills. Since a TREE is a plant, no presuppositions are ignored by putting TREE and PLANT together; indeed, even the assertion of PLANT is respected by this cluster. ROOT, however, poses a new problem, since it is not a kind of a plant but a part of a plant. The judges behaved as if part of a plant presupposed plant, i.e., as if one presupposition of ROOT had to be ignored in order to include it in the cluster with PLANT and TREE. A similar argument accounts for HEDGE, which is a collection of bushes, which must presuppose bush, which in turn is a plant; thus, two presuppositions of HEDGE must be ignored in order to include it with PLANT, TREE, and ROOT. In order to explain why FISH was clustered more with plants than with persons, we must appeal to folk taxonomy. Judges sorted as if they did not consider animal among the presuppositions of person. If the present data are interpreted literally, these judges (about 40% of them) sorted as if PLANT and ANIMAL both presupposed some class of nonhuman-living-thing, which in turn presupposed Ziving-thing (being), which in turn presupposed thing. In order to put FISH with plants, this presupposition of nonhuman-living-thing did not have to be ignored, whereas it would have been ignored if FISH had been put with persons. To the degree that this account is implausible, of course, it may indicate that something here has not been correctly understood. In any case, our account for FISH is tentative and in need of further investigation using a variety of other animals, plants, and persons. Anyone who pursues this analysis through the rest of the list will discover that the 10 items just analyzed constitute the most tractable subset to treat in this manner.. It is obvious that other considerations influenced many of the judges’ decisions. For example, JACK and WHEEL probably go together, not on presuppositional grounds, but because a jack is used to raise a car when removing a wheel, i.e., on implicational grounds. Similarly, YACHT and SKATE probably go together because both imply recreational activities. Some of these nonlexical implications were conveyed inadvertently by the sentences that were used to illustrate the intended sense of the word (see Appendix). In future studies it might be advisable to omit such sentences. The complexity that can result when both assertions and presuppositions are used is illustrated by the quantitative items in the lower half of Fig. 2. Let us assume that the information about these items might be represented as follows: 188 MILLER a distance that is $5 foot. a magnitude that is linear. MEASURE, a number that denotes a magnitude. NUMBER, a symbol that denotes how many times a thing is taken. ORDER, an arrangement that is methodical and successive. GRADE, a class that is relative to an order. SCALE, an order that is used for measurement. INCH, DISTANCE, On this representation, only MEASURE-NUMBER and SCALE--ORDER are related by presuppositions; GRADE is related to ORDER by its assertion ; MEASURE is related indirectly to the presupposition of INCH by its assertion (or perhaps more directly if INCH is defined as a measure of distance-that is not an inch is somewhat ambiguous in this respect); and INCH-MEASURE-NUMBER is related to ORDER-GRADE-SCALE only by the assertion of SCALE. Nonetheless, in this context these six nouns form a highly integrated cluster. Analysis in terms of definitional presuppositions and assertions should not be pushed further than it wants to go, of course, and there is good reason to believe that judges often had recourse to other grounds for forming their clusters. Introspectively, judges work by trying to put the words into sentence contexts. They explore such formulations “They are all things that Y,” as “They are all X’s,” where X is a presupposition; where Y is an assertion; but also “They all involve 2” or “You use them all to talk about 2” or “They all have something to do with 2,” etc., where 2 may be almost any kind of nonlexical information. Lacking anything better, some judges will even put two items together if it is easy to form a sentence using them both; Anglin’s data (reported in Miller, 1967) suggest that this strategy is common in children, who may not have learned to give special attention to definitional sentences in judging similarity of meaning or, perhaps, have not yet become adept at distinguishing presupposition from assertion. Insofar as definitional analysis is appropriate, however, the results obtained from the method of sorting can be said to have linguistic relevance. The method of sorting should not be viewed as a discovery procedure-a mechanical method for discovering the presuppositions and assertions of our subjective definitions. But when it is used cautiously with appropriate consideration for the choice of items and instructions it may provide a useful test for semantic hypotheses derived on other grounds. APPENDIX The following 48 nouns, with definitions and examples of use, were sorted by the judges to obtain the results given in Table 1. The first 24 are names of objects, the second 24 are names of nonobjects. The definitions were taken with minor modifications from the Thorndike Barnhart Beginning Dictionary (New York: Doubleday, 1964‘). A PSYCHOLOGICAL 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20, 21. TO INVESTIGATE VERBAL CONCEPTS 189 shaped piece of iron attached to a chain or rope to hold a ship in place. The ship lost its anchor in the storm. BLEACH, chemical used to whiten something. My wife refuses to use any bleaches in her laundry. COOK, person who prepares food by using heat. Judging by the taste of this the restaurant must have hired a new cook. DOCTOR, person licensed to treat diseases. Many doctors are convinced that most illnesses simply run their course. EXHAUST, the used gasoline that escapes. The smog above our cities is produced by the exhaust of countless cars. FISH, any animal that lives in water and has gills for breathing. The pond used to have many fish before boating became so popular. GLUE, substance used to stick things together. Epoxy cement seems to be much better than fish glue. HEDGE, thick row of bushes planted as a fence. We have a hedge of multiflora roses in front of our house. IRON, tool for pressing clothes. Most women like these irons that dampen as they press. JACK, machine for lifting. Most cars come with a jack fitted in the trunk compartment. KNIGHT, in the Middle Ages a man raised to an honorable military rank and pledged to do good deeds. I used to love to read stories about the knights of the Round Table. LABEL, a slip of paper or other material attached to anything and marked to show what or whose it is or where it is to go. Can you read the label on the box ? MOTHER, a female parent. My own mother would say, “You only have one mother.” NEST, a structure used by birds for laying eggs and rearing young. There’s a juneo nest in the tree outside my window. ORNAMENT, something to add beauty. With all her bejeweled ornaments she looks like a chimp wearing a lei of hibiscus. PLANT, a living thing that is not an animal. We used to have all kinds of plants in the sunporch. QUILT, a bedcover. A quilt seems much warmer than an ordinary blanket. ROOT, part of a plant that grows down into the soil. Pines have a very simple root system. SKATE, a frame with a blade fixed to a shoe so a person can glide over the ice. It’s good you brought your skates since the pond is frozen over most of the winter. TREE, large plant with woody trunk. The trouble with oak trees is that they drop a lot of leaves. UMPIRE, person who rules on the plays in a game. Some baseball fans think that umpires come from Transylvania, too. 1. ANCHOR, 2. METHOD 190 MILLER 22. VARNISH, a liquid that gives a smooth, glossy appearance to wood. I still have trouble distinguishing between shellac and light varnish. 23. WHEEL, a round frame turning on its center. I don’t think it’s worth buying extra wheels for snow tires. 24. YACHT, boat for pleasure trips. Was it old J.P. who said, “If you have to ask how much a yacht costs, you can’t afford one” ? 25. AID, help, support. The United States gave aid to Europe. 26. BATTLE, fighting, war. One of the greatest battles of the war was at Gettysburg. 27. COUNSEL, advice. He was always ready with good counsel, if not with money. 28. DEAL, business arrangement. That salesman always has several deals going at the same time. 29. EASE, comfort, relief. She tried to find ease from her pain in every way possible. 30. FEAR, state of being afraid, dread. He lost his fear of the dark when he grew older. 31. GRADE, degree of rank, quality, value. She always bought eggs of the best grade. 32. HONOR, glory, fame. We were taught to strive for honor rather than money. 33. INCH, i+ foot. The box was only 3 inches deep. 34. JOKE, something said or done to make someone laugh. He often cracked jokes to make his visitors feel at ease. 35. KILL, act of destroying. To be an ace you have to have at least five kills to your credit. 36. LABOR, work. Labor enobles man but I’m opposed to nobility. 37. MEASURE, size. We must know her waist measure to assure a good fit. 38. NUMBER, sum, total. The number of your fingers is 10. 39. ORDER, way one thing follows another. He wrote the words down in alphabetical order. 40. PLAY, fun, sport. We often watch the children at play. 41. QUESTION, thing asked. Feel free to interrupt if you have any questions. 42. REGRET, feeling of being sorry. He was filled with regrets for what might have been. 43. SCALE, series of steps or degrees. His employees were underpaid by any scale. 44. THRILL, a shivering, exciting feeling. She gets thrills from the movies. 45. URGE, a driving force or impulse. She felt a strong urge to cry out. 46. vow, a solemn promise. He took a vow not to shave until she returned. 47. WISH, desire or longing. Her wish is quite reasonable-for a yacht, isn’t it ? 48. YIELD, product. There was an excellent yield of corn this year. Note added in proof. The results of a Monte Carlo simulation computed by David Presberg provide a context in which to evaluate whether the data conform to a hierarchical clustering scheme. A simulated random sorting by a single judge was generated by, first, permuting 48 items in a pseudorandom order, then partitioning that ordering of items into clusters sequentially according to the rule that item i + 1 would be included in the same cluster with item i with probability 0.702 (which replicates the average cluster size used by the judges). After 50 judges A PSYCHOLOGICAL METHOD TO INVESTIGATE VERBAL CONCEPTS 191 had been simulated, the 50 incidence matrices were summed to give a pseudorandom proximity matrix comparable to Table 1. Ten such matrices were generated. The largest proximity in any of these matrices was 14; the likelihood of obtaining by chance proximities as high as 48 is clearly negligible. The ten pseudorandom matrices were then subjected to cluster analysis by both the connectedness and the diameter methods; common clusters comprised from 6 to 21 per cent, with an average of 14.9 per cent of the clusters being common to both methods of analysis. (For ten matrices of random numbers uniformly distributed between 0 and 50, the comparable average was 10.4 per cent common clusters.) Thus, a percentage agreement as high as 70 per cent could scarcely have occurred by chance. Although this is not the null hypothesis one might prefer to test, the results clearly indicate that there is some significant degree of structure in the data, and do not contradict the claim that the structure is hierarchical. For each hierarchical clustering scheme, of course, there is a corresponding matrix that conforms to the ultrametric inequality. One can ask, therefore, how closely the matrices corresponding to the connectedness and diameter solutions in Fig. 2 match the original data matrix from which they were derived. The matrix corresponding to the connectedness solution was constructed and correlated entry by entry with the original data matrix; the product-moment correlation coefficient was 0.947. The correlation between the diameter solution and the original data matrix was 0.954, and between the diameter and connectedness solutions was 0.925. When these correlations were computed for pseudorandom matrices, the averages were 0.24, 0.36, and 0.23, respectively. REFERENCES JOHNSON, S. C. Hierarchical clustering schemes. Psychometrika, KATZ, J. J., AND FODOR, J. A. The structure of a semantic theory. 1967, 32, 241-254. Language, 1963, 39,170-210. of fit to a nonmetric hypothesis. J. B. Multidimensional scaling by optimizing goodness Psychometrika, 1964, 29, l-27. LANGENDOEN, D. T. Essentials of English grammar. New York: Holt, in press. MANDLER, G., AND PEARLSTONE, 2. Free and constrained concept learning and subsequent recall. Journal of Verbal Learning and Verbal Behavior, 1966, 5, 126-l 31. MILLER, G. A. Psycholinguistic approaches to the study of communication. In D. L. Arm (Ed.), Journeys in science. Albuquerque: The Univer. of New Mexico Press, 1967. Pp. 22-73. SOKAL, R. R., AND SNEATH, P. H. A. Principles of numerical taxonomy. San Francisco: Freeman, KRUSKAL, 1963. RECEIVED: May 15, 1968
© Copyright 2026 Paperzz