ORGANIZATIONAL BEHAVIOR AND HUMAN DECISION PROCESSES 48, 193-223 (1991) Selection of Verbal Probabilities: A Solution for Some Problems of Verbal Probability Expression ROBERT M. HAMM Institute of Cognitive Science, University of Colorado, Boulder A method for verbal expression of degree of uncertainty is described. It requires the subject to select a phrase from a list that spans the full range of probabilities. In a second, optional, step, the subject indicates the numerical meaning of each phrase. The method avoids two problems of verbal probabilities-the indefinitely large lexicon and the individual differences in the interpretation of words. To test whether context and ordinal position might bias subjects’ selection or interpretation of the verbal expressions in the list, the list order was varied. When the verbal expressions were arranged in random order, ordinal position had a signiticant effect on the selection of expressions. However, these effects did not occur when the phrases were listed in ascending or descending order. Considerations of accuracy and interpersonal agreement also support the use of ordered phrase lists. o 1991Academic PICSS,hc. INTRODUCTION The issue whether people think and communicate better using numerical or verbal expressions of probability has received recent attention (Beyth-Marom, 1982; Kong, Bamett, Mosteller, & Youtz, 1986; Wallsten, Budescu, Rapoport, Zwick, & Forsyth, 1986a; Zimmer, 1983). In a number of contexts, communication using verbal expressions of probability may be preferable, even though it is less precise than numerical communication (Zwick, 1987). This paper describes a method designed to avoid several problems that may limit the usefulness of verbal expressions of probability, and reports a study that evaluates possible confounding factors. Verbal expressions of probability are used in analyses of decisions in a variety of consequential contexts, such as medicine (Nakao & Axelrod, 1983) and political analysis (Beyth-Marom, 1982). In recent U.S. foreign policy deliberations, President Reagan’s National Security Planning Group is reported to have prepared for him a list of five military options This research was supported by Contract MDA-903-86-K-0265 from the OflIce of Basic Research, Army Research Institute for the Behavioral and Social Sciences, and revised while the author had a National Research Council Associateship at the ARI Fort Leavenworth Field Unit. Tom Wallsten and an anonymous reviewer provided useful comments on an earlier draft. Reprints may be requested from the author at Institute of Cognitive Science, Box 345, University of Colorado, Boulder, CO 80309-0345. 193 07495978f91 $3.00 Copyri&t Q 1991 by Academic Press. Inc. Au rights of reproduction in any form reserved. 194 ROBERT M. HAMM against Libya in December 1981. Their chances of success were expressed using the terms “low,” “moderate,” and “moderate to high” (Woodward, 1987, p. 198). A policy of using verbal rather than numerical probabilities can be criticized for several reasons (Beyth-Marom, 1982; Nakao & Axehod, 1983). Numerical probabilities can be more precise than verbal when pertinent relative frequency data are available. They permit communication to be less ambiguous. They can contribute to the evaluation of decision options, as a weight for the value of possible outcomes. Numerical measurement of uncertain evidence permits Bayesian inference. Yet people very frequently choose to use verbal rather than numerical expressions of probability. Verbal probabilities also have advantages. First, people prefer to use them. In an analysis of medical experts’ thinking about a decision, Kuipers, Moskowitz, and Kassirer (1988) observed that even when numerical terms were used they were qualitative descriptions that “function as easily communicable names for important landmarks rather than as real numbers” (p. 192; see also Teigen, 1988a, and Fox, 1986). When people are required to communicate a probability in which they are not confident, they may prefer to use words that allow them to express the vagueness of theirjudgment (Zwick, 1987). Linguistic terms may facilitate thinking about uncertainty in complex problems (Zimmer, 1983). Finally, making decisions based on verbal rather than numerical probabilities may entail only a slight cost (Hamm, 1988a; Wallsten, Budescu, & Erev, 1988). For example, Budescu, Weinberg, and Wallsten (1988) had subjects bid for gambles whose probabilities were described numerically or verbally. The quality of the bids was evaluated by comparing them to the expected values of the gambles, using the subjects’ own numerical interpretations of the verbal probabilities (elicited separately). Over all trials with gambles that had positive expected value, subjects gained only 1.23% less money in the verbal condition than in the numerical condition. With losing gambles, subjects lost only 4.65% more in the verbal condition than in the numerical condition. Here the ease and comfort of verbal expressions may be worth the small cost. Though verbal expression of probability may be justified in some situations, it presents problems that must be solved if it is to be used for communicating about high stake decisions. First, the meanings of phrases differ between people, although they seem stable for individuals over time (Budescu & Wallsten, 1985). Kong et al. (1986) found no systematic differences between occupational groups, although Nakao and Axelrod (1983) found some differences between physicians and non-physicians. Second, there is an indefinitely large number of words and phrases that could be used to express degree of belief. And indeed collectively people SELECTION OF VERBAL PROBABILITIES 195 use many of them. For example, Budescu et al. (1988, Study 1) required people to express the chance a dart would hit a particular wedge in a circle, using either their own words or the integers between 0 and 100. On 6 repetitions of 11 wedges, the average subject used about 13 phrases and 18integers. However, the 20 subjects used a total of 111different phrases, compared with 73 integers. This makes it difficult to develop a lexicon of the numerical meaningsof all verbal expressions of probability. Any new phrase would pose a problem of interpretation, in contrast with a new number that can be easily understood because it can be placed unambiguously in the [O,l] number line. Measurement theory techniques mediated by sophisticated computer programs have been used to investigate individuals’ interpretations of verbal expressions of probability (e.g., Wallsten ef al., 1986a;Rapoport, Wallsten, & Cox, 1987).These techniques require subjects to make many choices or ratings. But for verbal probabilities to be useful in contexts such as survey research or multiuser decision support systems, simpler methods are required. This paper presents and evaluates such a method. A Method for Verbal Expression of Degree of Uncertainty The method proposed here requires subjects to express their judgment of probability by selecting an expression from a list. This allows them to think about the problem using the familiar, verbal mode. The method addresses the problem of the indefinitely large lexicon by confining the subject’s responses to a limited set of verbal expressions, selected to cover the full range of degreesof belief. It solves the problem of individual differences by having the subject assign a numerical value to each phrase (after completing the probability judgment task), thus providing an interpretation for the chosen expression. A list was constructed (Table l), consisting of 19 verbal expressions covering the range from 0 to lOO%,with symmetry about an easily identifiable midpoint (“toss-up”). There was a term for each 5% mark, except that there was only one term in between 25 and 40%, and only one in between 60 and 75%. Other researchers may wish to use shorter (or longer) lists, lists without a sharply defined midpoint, lists that are not balanced around 50% (see Kong et al., 1986),or different phrases. These distinctions, while important, are not pertinent to the present investigation of factors affecting the selection of expressions from a list. The results of this study are applicable to lists composed of any set of verbal probability expressions that is distributed nearly evenly over a range of probabilities. The present list was produced by reviewing previous studies that elicited numerical values for verbal expressions of probability (Budescu & Wallsten, 1985;Lichtenstein & Newman, 1967;Simpson, 1944;Shanteau, 196 ROBERT M. HAMM TABLE 1 MEANS AND STANDARD DEVIATIONS OF NUMERICAL INTERPRETATIONS OF VERBAL EXPRESSIONSOF PROBABILITY MEASURED IN PREVIOUS STUDIES Standard deviation Verbal expression Mean (Median) Absolutely impossible Rarely .05 .08 39 (. 10) .16 .lO .16 (.15) .20 (.20) .25 (.25) .31 (.33) .27 .41 A0 (50) .07" .06 .07 49 .12” .08 .12 .ll .12 .45 (.45) .50 (.50) .47 .54 .55 (.55) .58 (HI) .66 .69 (.70) .74 (.75) .79 (30) .87 (X9) .89 (30) 234 - Very unlikely Seldom Not very probable Fairly unlikely Somewhat unlikely Uncertain Slightly less than half the time Toss-up Slightly more than half the time Better than even Rather likely Good chance Quite likely Very probable HighIy probable Almost certain Absolutely certain Source Value adopted for this study Author s 1944 B&W 1985 L&N 1%7 B&W 1985 s 1944 L&N 1967 L&N 1%7 L&N 1%7 L&N 1%7 Sh 1974 B&W 1985 L&N 1967 .oo .04 .oo .ll L&N 1%7 L&N 1%7 B&W 1985 Sh 1974 .45 .50 36 .06 L&N 1%7 L&N 1%7 Sh 1974 L&N 1%7 L&N 1%7 L&N 1967 L&N 1967 L&N 1967 Sh 1974 Author Author .55 ho .13 .14 .09 .12 .lO .07 .04 - .05 .lO .15 .20 .25 .33 .40 .70 .75 .@I .85 .90 .95 1.00 Note. Sources are B&W = Budescu & Wahsten, 1985;L&N = Lichtenstein & Newman, 1%7; Sh = Shanteau, 1974; S = Simpson, 1944; Author = author’s judgment. @Interquartile range. 1974; Wallsten et al., 1986), in order to identify a set of words and phrases that (a) have interpretations that cover the entire probability range, in about evenly spaced steps, and (b) have relatively narrow interpretations, as indicated by small standard deviations, compared to other candidates with nearby means (see Table 1). To cover the ends, “absolutely impossible” and “absolutely certain” were chosen. “Almost certain” was used to cover the 95% range; however, it was subsequently learned that Kong et al. (1986) had found that people assign this phrase a mean value of .78 (median 90). SELECTION OF VERBAL PROBABILITIES 197 The list of verbal expressions of probability may optionally be presented in sequential order [ascending (Table 1) or descending] or in random order. After subjects select verbal expressions from a list to express probabilities in a problem solving, choice, or communication task, they are asked to refer to the lists and assign numbers to each expression. To save time, they could be asked to interpret only those expressions that they had selected. Potential Problems with the Proposed Method The validity of using verbal expressions of probability selected from a list may be compromised if the meaning of a word or phrase depends on contextual factors. There are several possible types of context effect. First, a phrase’s immediate neighbors in a list may affect its meaning. Thus, “rarely” may mean something different if positioned between “very unlikely” and “absolutely impossible” than if its neighbors are “good chance” and “slightly less than half the time.” One method of controlling for “immediate neighbor” context effects is to randomize the list of available verbal expressionsfor each trial or each subject (Budescu et al., 1988). But while varying randomized lists is feasible in the computerized laboratory, it would be inconvenient for mailed survey questionnaires, for interviews in which the expression lists are presented to respondents on a card, or for probability elicitations in a decision support system. To determine the extent of immediate neighbor context effects on the selection of verbal probability expressions from lists, Study 1 compares the selection of expressions and the assignmentof numerical interpretations to these expressions when the lists are presented in sequential versus random order. Second, when subjects read a list of candidate phrases they must do so sequentially. Phrases early in the list may be more likely to be chosen if subjects stop reading after finding one that is good enough. Or phrases late in the list may be favored if subjects read through the whole list and choose from among phrases that are still in short term memory at the end. To measure such position effects, Study 1 presented the expressions in reversed orders. Third, while all verbal expressionsof probability have inherently vague meanings, some phrases are more vague than others (Wallsten et al., 1986a).There are several ways these differences in vaguenessmight affect the selection of a term from a list. Study 2 measuresvaguenessas the breadth of the meaning that subjects assign to each of the 19 expressions of probability. With this measure, the data from Study 1 can be analyzed to determine whether phrases’ differential vagueness and the order in which they are presented interact to influence people’s selection or interpretation of the phrases. 198 ROBERT M. HAMM The second step of the method, assigning numerical interpretations to the verbal expressions of probability, requires an additional effort from the subject. When this procedure is administered by an individual interviewer or a computer, the effort could be reduced by requiring numerical interpretations for only those phrases the subject used in the first phase; but this may be too complicated for use with group administrations or mailed questionnaires. An alternative is to use common interpretations of the phrases rather than the subjects’ own interpretations. Whether this is acceptable depends on how much the subjects’ interpretations differ from the common interpretations. The comparison of ordered and random lists raises the issue whether lists should be structured to allow users to find target phrases easily. An ordered list allows a subject to find a desired phrase more rapidly than a random list. An ordered list also might constrain subjects’ interpretations of the phrases. Whether such a constraint is an advantageor disadvantage depends on the strengths of the order effect and the context effect. STUDY 1 The purpose of Study 1 was to investigate whether the following factors infhtence subjects’ selection of verbal expressions of probability from a list: (a) the context, i.e., a phrase’s neighbors in the list; (b) the phrase’s ordinal position in the list; (c) whether the phrase appears in a sequentially ordered or random list; and (d) the vaguenessof the phrase. The influence of these factors on the following factors will be determined: (1) subjects’ tendency to select a phrase, (2) subjects’ assignmentof numbers to phrases, including the extent to which subjects’ interpretations of the phrases agree with common interpretations, and (3) the accuracy of subjects’ responses to the problems. The results concerning vagueness of word meaningwill be presented following Study 2, in which a different set of subjects judged the breadths of meaning for the verbal expressions. Method One hundred and forty-seven subjects from the Introductory Psychology subject pool participated in group sessions. Each did four probabi- listic inference word problems (example in Appendix 1; details in Hamm, 1988a).Half responded with verbal expressions of probability; the others used numbers. Subjects were instructed “to select verbal phrases that represent your estimates of the probability or likelihood that statements are true or that events have happened.” Subjects were told whether the list was random or ordered. Those given ordered lists were told the expressions were ordered according to their meanings as determined in surveys of a large number of people. Subjects were asked to consider all the possible phrases and select the best one for each answer. Following SELECTION OF VERBAL PROBABILITIES 199 the problems, subjects assigned numbers representing “the numerical probability that most closely represents what each of these verbal phrases means” to each of the phrases. Total response time on the questionnaire was recorded. One subject was dropped from the analysis for using the wrong response mode. A number of individual responses were dropped because subjects did not follow directions (e.g., used a verbal expression that was not on the list, assigned a range of values to a phrase, or assigned the same value to every phrase). The phrases were presented in one of four possible orders. Two of the lists were ordered sequentially, either ascending (Table 1) or descending. The other two lists were arranged in a random order, which was produced by folding the ordered list (splitting the list at the middle phrase, reversing one half, and interleaving the two halves) repeatedly.’ In answering the problems, 25 subjects used the ascending phrase list, 14 the descending, 16 the random list in Footnote 1, and 16 the reversed random list. The remaining subjects answered using numerical probabilities. The numbers of subjects assigning values to phrases in the four list orders were 48, 27, 31, and 32, respectively. When those subjects who used the verbal response mode subsequently assigned values to phrases, the list was presented in the same order. Results The results of Study 1 address three topics: the effect of phrase list order on the selection of phrases as answers to the problems, its effect on the numerical values subjects assign to the phrases, and finally its effect on the accuracy of the subjects’ answers to the problems. Each subject selected a phrase from a list to express his or her estimate of the probability of a hypothesis 16 times: 4 times (after zero, one, two, and three pieces of key information had been provided; see Appendix 1) in each of four problems (concerning Cabs, Doctors, Insurance, and Twins; see Hamm, 1988a). Eflect of phrase list order on phrase selection. The sequentially ordered and the random phrase lists were each presented in two orders (between subjects) that are reverses of one another. Comparison of these reversed lists can reveal the overall tendency to pick answers that are early or late, controlling for the identities of the phrases. When subjects selected from ’ The random order used was (1) uncertain, (2) rather likely, (3) somewhat unlikely, (4) rarely, (5) slightly less than half the time, (6) good chance, (7) fairly unlikely, (8) absolutely impossible, (9) toss-up, (10) quite likely, (11) not very probable, (12) absolutely certain, (13) slightly more than half the time, (14) very probable, (15) seldom, (16) almost certain, (17) better than even, (18) highly probable, (19) very unlikely. 200 ROBERT M. HAMM the ascending or descending ordered lists after all three key pieces of information had been presented, the average ordinal position of the selected phrases was 9.66. This is only l/3 of a position in front of the middle of the list. On the random lists, the mean ordinal position of the chosen phrase was 10.84, 5/6 of a position after the midpoint. Though the overall effect of ordinal position is small, its magnitude is larger in the random lists than the ordered lists. For the ordered lists, across all four problems, 8 expressions were selected more when they appeared in the first half of the list, 8 were selected more when they appeared in the second half, 2 were selected equally (or not at all), and 1 expression appeared in the middle position. With random lists, however, subjects chose more often from the second half of the list: 11 phrases were chosen more often when in the second half of the list, only 3 were chosen more often when in the first half, and 4 were chosen equally. Analysis of the probability of choosing a phrase when it is in the first compared to the second half of the list (see Hamm, 1988b) showed that a phrase from the first half of an ordered list would be used 5.33% of the time (averaged across problems), compared with 5.20% for a phrase from the second half of the list. In the random lists a phrase from the first half of the list would be used 4.84% of the time, while a phrase from the second half would be used 5.69% of the time. While these small effects are not significant in the individual problems, when averaged across problems, subjects faced with randomly ordered lists are significantly more likely to select phrases in the second half of the list (t = -2.86, p < .02). Dij%ulty fmding the desired phrase in ordered and random lists. A possible advantage of ordered over random lists is that subjects may be able to find the verbal probability expression they want more easily. First, subjects who know the phrase they want may be able to verify its presence more quickly in a list that is structured in an ascending or descending order. Second, subjects who know the probability they want to express (either as a number, a range of numbers, a phrase not in the list, or an idea that is not modally specific) may be able to find an expression for this probability in the list more quickly, by evaluating list expressions and comparing them to the target probability, when those expressions are ordered. In addition, with a random list the meanings of the phrases subjects select as answers using this process may be more variable. Third, people may not know the degree of probability they want to express until they have considered candidate phrases. It may be easier for them to check whether phrases apply to the situation when using an ordered list, in which phrases’ meanings can be quickly understood because they are implied by the meanings of their neighbors in the list. If so, answers should be both quicker and less variable. Subjects who used the random list took only 10 s longer on average (in SELECTION OF VERBAL 201 PROBABILITIES an 18-minprocedure) than subjects with the ordered list (NS). To measure variability in the meanings of the phrases subjects selected, the a priori values are used (Column 4 of Table 1.) (Using values the subjects assigned would confound list influence on variation in phrase selection with list influence on variation in phrase interpretation). Over all four answers on each of the four problems, the phrases subjects chose from the random lists had a priori numerical interpretations with higher average standard deviations (mean SD = .195)than those they chose from the ordered lists (mean SD = .181). However, while this relation was found for 11 of the 16 subproblems, the difference is not statistically significant (x2 = 1.25). In sum, answers with the random lists were slower and more variable than answers with the ordered lists, but not significantly so. E#ect of list order on values assigned to the verbal expressions of probability. The second procedure used in the proposed method is the assignment of numerical values to the verbal expressions of probability. Its purpose is to increase the accuracy of the proposed method by allowing adjustments for individual differences in the interpretation of phrases. Table 2 shows the mean value subjects assignedto the phrases when they TABLE 2 MEAN NUMERICAL VALUES ASSIGNED TO EACH PHRASE, EACH LIST ORDER Ordered lists N: Absolutely impossible Rarely Very unlikely Seldom Not very probable Fairly unlikely Somewhat unlikely Uncertain Slightly less than half the time Toss-up Slightly more than half the time Better than even Rather likely Good chance Quite likely Very probable Highly probable Almost certain Absolutely certain Random lists 76 Mean SD 64 Mean .011 .094 .132 .176 .230 .282 .333 A06 366 .069 .047 .053 .071 .056 Sk54 .064 .144 .186 .216 .198 .261 .302 .402 A47 .495 .039 .554 .603 .676 .727 .780 .836 .882 .931 l.ooo .053 .066 .075 .076 .073 .073 .070 366 304 .041 Total 140 Mean SD .180 .214 .271 .318 A04 .074 .078 .132 .089 .090 .083 .092 .122 .432 .501 .051 al2 .439 .498 &I6 .021 .566 .618 .679 .719 .758 .834 .878 .8% 997 .068 .082 .166 .135 .560 .611 .677 .I23 .769 .835 .880 ,913 .998 .061 .019 SD .083 .088 .218 .124 .llO .109 .120 .lOl .095 .073 .108 .012 .074 .121 .106 .087 .084 .071 .087 .008 202 ROBERT M. HAMM were presented in ordered and random lists. (These values may be compared with those in Table 1.) For the values elicited in this procedure to be unbiased, they should not be affected by the order in which phrases are presented. Reversal of these lists had little effect on the mean values: none of the verbal probabilities were assigned numerical values that were significantly different in the ordered and random lists. Subjects were less variable in assigning numerical meanings to phrases in the ordered lists than in the random lists. For 17 of the 19 phrases (all save “toss-up” and “high probability”), there was higher variability in the numerical values assigned to the phrases when presented in the randomly ordered lists (x2 = 7.5, df = 1, p < .005, one-tailed). The mean standard deviations of the values assigned to the phrases in the four lists were: Ascending order (.071); Descending order (.047); Random order A (.084); and Random order B (. 118). The random list standard deviation (M = .lOl) was significantly higher than the ordered list standard deviation (A4 = .054; r = 3.92, df = 18, p < .Ol). Effect of list structure on amount of duplication in assigned values. If in the value assignment procedure of the proposed method subjects assign the same numerical value to more than one phrase, this would degrade the precision of the method. People can be expected to do this more often with random than with ordered lists. The extent of such duplication can be measured by counting the number of pairs of phrases to which a subject assigns the same value. For example, if “almost certain” and “highly probable” are both assigned the value 90, that is one duplicate pair. If in addition “quite likely” were to be called 90, this would produce three pairs. If someone assigned the same value to all 19 phrases, there would be (19*18)/2 = 171 duplicate pairs. The mean numbers of pairs of phrases that were assigned duplicate values were Ascending list (1.73); Descending list (1.03); Random list A (4.36); Random list B (4.84). The number of duplications is very small in comparison with the maximum possible count of 171. Significantly more duplicate values were assigned to phrases in the random lists than in the ordered lists, as predicted (F(3,140) = 6.13, p = .0006). Effect of list structure on similarity of subjects’ assigned values to values found in previous studies. Different list orders can be evaluated for use with this method by comparing the interpretations subjects give to the phrases in these lists with the interpretations given them in earlier studies (summarized in the right hand column of Table 1). The closer the interpretations with this method are to the common or conventional meanings, the less the total possible misunderstanding. Subtracting the prior values from the subjects’ values provides a measure of the difference. Table 3 shows the mean difference scores for the ordered and random phrase lists, as well as their variability (standard deviations). In both lists the SELECTION OF VERBAL TABLE 203 PROBABILITIES 3 MEANS AND STANDARDDEVIATIONSOF DIFFERENCESBETWEENSUBJECTS INIWW~ETATIONSAND A PIUORI INTERPRETATIONS OF VERBAL EXPRESSIONS IN ORDERED AND RANDOM LISTS Absolutely impossible Rarely Very unlikely Seldom Not very probable Fairly unlikely Somewhat unlikely Uncertain Slightly less than half the time Toss-up Slightly more than half the time Better than even Rather likely Good chance Quite likely Very probable Highly probable Almost certain Absolutely certain SD List with greater deviation score .114 .089 .221 .124 r r** r** r** Ordered lists Random lists Mean Mean SD .018 .094 .013 .103 a47 .031 ,026 ,031 ,032 .cuM a05 .081 a49 .054 .080 .063 .067 .067 .085 .065 - .002 .llO .OlO .108 - .028 .124 .002 .179 0** - .003 - .005 .042 .039 - .018 .mo .054 .003 r* .002 .002 .026 .025 .022 .017 .021 .020 .053 a66 .076 .077 .076 .077 .074 .068 306 .016 .018 .020 .032 .043 .016 .022 .053 .003 .072 .082 r r .180 .148 .108 .lOl 0 .073 .117 r r+* r - - .ool - .Oll Distance of phrase from nearest anchor 0 r* 0 0 r r 0 Note. N = 76 for Ordered lists and N = 64 for Random lists. *p < .lO; **p < .05. differences tended to be positive in the fast half of the list and negative in the second half, that is, subjects’ interpretations of these verbal expressions are closer to the middle of the probability scale (5) in the present conditions than in previous studies (primarily Lichtenstein & Newman, 1967).For 13 of the 19 phrases, the absolute value of the mean deviation was larger when the lists were presented randomly. Four of these comparisons (for “rarely,” “very unlikely,” “seldom,” and “almost certain”) were significant (in one-way ANOVAs) at p < .05, and two more (“somewhat unlikely” and “slightly less than half the time”) at p < .lO. For only one of the phrases (“not very probable”) were the values assigned using the ordered list significantly more different from the a priori values than the values assigned using the random list. In sum, subjects assign to randomly arranged verbal expressions of probability numerical values that are more variable and farther from the 204 ROBERT M. HAMM conventional meanings of these terms than they assign to expressions in ordered lists. If this reflects their understanding of the meanings of the expressions when they are using them to answer the questions, then it is preferable to use the ordered lists. Effect of distance of a phrase from a reference point on variability of value assignment. Three of the verbal expressions of probability used here have quite specific meanings: “absolutely certain” (1 .O), “toss-up” (JO), and “absolutely impossible” (0). It is possible that subjects use these phrases as reference points when assigning values to other phrases. If so, less variability would be expected in the values assigned to phrases that are near to these landmarks, than in the values assigned to more distant phrases. The distance of a phrase from a landmark will be more salient in an ordered list than in a random list. Therefore, the effect of distance from a reference phrase on the variability of the values assigned to other phrases would be smaller in random lists. However, people already know the meanings of these phrases, and so even in the random lists a phrase whose meaning is near a reference point may have a narrow range of interpretations. In the list of 19 phrases, the distance of a phrase from the nearest reference point is simply measured by counting the number of steps in the list to the nearest reference point (see Column 6 of Table 3). The hypothesis predicts that this measure will be positively correlated (over the 16 nonlandmark phrases) with the standard deviation of the values the subjects assigned to the phrase, and that this correlation will be larger for the ordered lists than for the random lists. The correlations were Ascending list (r = .41, p < .lO); Descending list (r = .30, NS); Ordered lists overall (r = .44, p < .05), Random list A (r = .32, NS); Random list B (r = .14, NS); and Random lists overall (r = .25, NS). As predicted, all correlations were positive and the correlations were higher for ordered than random lists. Effect of phrase list order on accuracy of problem answers. A third criterion for evaluating the proposed method of expressing degree of belief by selecting verbal expressions of probability is the accuracy of its use. In the present study, this accuracy is a joint product of (a) the phrase the subject selects, (b) the meaning assigned to the phrase, and (c) the right answer to the problem. Is the accuracy of subjects’ responses affected by the order in which the phrases are presented, or by the use of subjects’ own values rather than conventional values for the phrases? Effects of list structure on accuracy of problem answers. Accuracy is measured by translating all pertinent phrases (those the experimenter included in the word problem, and those the subject selected as responses) into numbers, and comparing the response number with the correct answer (produced by applying Bayes’ Theorem to the numbers in SELECTION OF VERBAL 205 PROBABILITIES the word problem; see Hamm, 1988a).Translation from phrases to numbers can be done in two ways: using the conventional values available a priori (Table 1) or the values each individual subject assigned to the phrases. Accuracy using both translations will be studied here, to separate those effects of list order on accuracy which are due to selection from those due to value assignment. If phrase list order affects accuracy using the a priori translations, this can only be due to its effects on selection of a phrase as a response. If list order affects accuracy using the subjects’ individual translations, but not using the a priori translations, this must be due to its effects on subjects’ assignment of values to phrases. Results are presented separately for subjects for whom the word problems were presented with verbal and numerical expressions of probability (Table 4). If the probabilities were presented verbally, the numerical values of the right answers depend on an assignmentof numerical values to TABLE 4 COMPARISON OF WORD PROBLEM ACCURACY (ABSOLUTE DEVIATIONS) BETWEEN SUBJECTSWITH ORDEREDAND RANDOM PHRASELISTS, FORSUBJECTSWITH NUMEFWAL AND VERNAL PRESENTATIONOF PROBABILITIES Presentationmode: Responsemode: Problem Cabs 2 3 Doctors 1 3 Insurance 1 3 Twins 1 2 3 Verbal Verbal Mean deviation Amount of info 1 Numerical Verbal Ordered Random N N N N N N N N N N No. of problems(out of 10) where orderedlist is more accurate * p < .os; **p < .Ol. .13 (9) 34 (9) .35 (18) .24 Q-9 .l3 (8) .33 (16) .06 (18) .63 (9) .23 (18) .45 (9) .08 (9) .09 (9) .25 (18) 5 F 2.4 19.0 .4 5.2 1.3 Mean deviation Significance Ordered Random F .14 .oo** .51 .03* .27 .74 .25 .I A4 .O 39 .l3 (7) .09 1.2 .23 (16) (8) (8) .22 (16) .O .2 .91 .64 .13 (10) .09 (11) .42 (21) .17 (21) .62 (10) .I5 (21) .23 (11) .22 (11) .12 (10) .30 (21) 3 Significance .21 2.1 .17 36 1.2 .30 .9 .35 1.3 .26 .47 2.6 .I3 .16 .O .87 .47 7.7 .01* .12 1.0 34 .O 97 .5 .49 (8) (8) .38 (16) .I1 06) (8) (16) (8) (8) .11 (8) .26 (15) 206 ROBERT M. HAMM one or more phrases. Because the subjects responded verbally, the numerical values of their amwer.s also depend on the assignment of numerical values to phrases. Answers to the “no information” subproblems are not analyzed here, because there was little variation in response. There was no single correct answer for the Doctor and Insurance problems when only two pieces of information had been presented (see Hamm, 1987; 1988a), and so these subproblems too are excluded from the analysis. Table 4 shows the accuracy scores (absolute errors) for subjects using the verbal response mode, computed using the a priori translations from phrases to numbers. Only 3 of 20 comparisons between the ordered and random lists were statistically significant. In all three, ordered lists produced more accurate responses. When the deviation score, rather than the absolute deviation score, was used, the results were similar, which shows that the advantage of ordered lists is not simply their smaller variability. In conclusion, there is evidence that the order in which phrases are presented can influence the accuracy of the subjects’ performance on probabilistic inference word problems. There is a possible limitation on the generality of this finding: four of the 10 problems (those with 3 pieces of information in Table 4) are Bayes’ Theorem problems, on which people’s answers are notoriously inaccurate. A parallel analysis, using subjects’ individually assigned values to translate the meaning of the phrases and calculate accuracy, also found the ordered lists produced more accurate responses. Using the argument presented above, this implies that the advantage of the ordered lists depends on mechanisms influencing both the phrase selection and phrase interpretation processes. Effects of use of subject’s own assigned values versus a priori values on accuracy of response. A motivation for the proposed method is to enable subjects to express their degrees of belief in a way that is more natural for them than using numerical probabilities. It might seem that asking the subjects afterwards for their numerical interpretations of the phrases defeats this purpose. However, the virtue of the method is its isolation of the numerical thinking, for it allows subjects to use only the linguistic mode when thinking about the problems. Translating the phrases into numbers is done separately and does not interfere with the problem solving, which is of primary interest. Nonetheless, the assignment of numbers to phrases places a burden on the subjects, and so it is worth considering whether it is possible to do without this procedure by using a priori numerical interpretations of the phrases. What effect does the use of the subjects’ own translations of the phrases have on word problem accuracy? Table 5 shows the mean accuracy score (absolute error) on each problem using both the conventional numerical values (available a priori) and 207 SELECTION OF VERBAL PROBABILITIES TABLE 5 COMPARISON OF MEAN ACCIJRAC~E~ (ABSOLUTE DEVIATIONS) FOR EACH SUBPROBLEM, USING CONVENTIONAL VALUES AND SUEXJECTS’INDIVIDUAL VALUES of info Conventional values Own values t Significance N 1 2 3 1 3 1 3 1 2 3 .19 .lO .38 .17 .61 .20 .39 .17 .ll .28 .19 .13 .36 .17 .55 .19 40 .17 .12 .24 .09 -2.67 1.65 .18 4.06 .69 - .92 .79 .98 2.93 .925 .011* .lOl .857 .ooo** .492 .361 .434 .333 .004** 53 53 105 105 53 107 53 52 52 104 Amount Problem Cabs Doctors Insurance Twins * p < .05; **p < .Ol. the subjects’ own numerical values for the phrases. The comparison includes subjects whose presentation mode and response mode were numerical/verbal, verbal/numerical, or verbal/verbal. The answers using the subjects’ own values were more accurate on 8 of the 10 subproblems [using both absolute deviation scores (Table 5) and simple deviation scores], and significantly so for two subproblems. However, for one subproblem there is significantly higher accuracy using the a priori values. Therefore when accuracy is very important, it is probably preferable to use subjects’ individual interpretations of the phrases, rather than relying on a universal a priori interpretation. However, the evidence is mixed, and the difference in even the significant comparisons is small. In conditions where it is difficult to get subjects to assign values to the phrases, a priori interpretations could probably be used with only a small loss of accuracy. Discussion of Study 1 Study 1 showed that several of the mechanisms anticipated to cause problems for the proposed method have only minor effects. The first possibility was that the subject’s use of a given verbal expression could differ when it appears in different neighborhoods of the list. The second phase of the method is designed to correct for any such effect by having the subject assign values to phrases in the same list that they selected from in the first phase. However, possible variants of the method involve subjects assigning values to a short list (only those phrases they actually used) or skipping the second phase altogether. Because these shortcuts would sacrifice the ability to correct for the effects of immediate neigh- 208 ROBERT M. HAMM bors, they should be used only if these effects are minor. The use of random versus ordered lists tested these effects. When subjects selected phrases from random lists (in the first phase), they took a little longer and selected phrases whose numerical meanings had slightly higher variance; but these findings were not statistically significant. Similarly, there was no significant difference between the ordered and random lists in the values assigned to any individual phrase (in the second phase). The largest effect occurred for “very unlikely,” to which subjects assigned an average value of .132 when it appeared between “rarely” and “seldom” in the ordered list, but .186 when it was at one end of the random list next to “highly probable.” These negative results are offset by several differences that suggest a sequentially ordered list is preferable to a random list. With random lists, there was significantly more variability between subjects in the values assigned to phrases, and subjects more frequently assigned the same number to more than one phrase. With ordered lists, the subjects selected phrases that made their answers to the probabilistic inference word problems more accurate. (This is true when analyzed using both their own and the conventional assignments of numbers to the phrases.) Finally, subjects gave less conventional values to phrases which were displayed in random order. Similar effects probably occur when people interpret the verbal expressions prior to selecting a phrase to answer a word problem. One possible explanation for the more consistent performance with the ordered list is that subjects produce values for the terms by referring to two types of context, and these contexts are consistent in the ordered lists. The first, local context is the phrase’s immediate neighbors. The second, global context is established by reference points on the probability scale-“absolutely certain” for 1.O, “toss-up” for .S, and “absolutely impossible” for 0 (see Reagan, Mosteller, & Youtz, 1989). The evidence for the use of the global context is that the values given to phrases near these reference points were less variable than the values given to phrases farther away. The correlation between phrase value variance and distance from a reference point was statistically significant in the ordered lists, but was not significant (though positive) in the random lists. Thus the sequential arrangement of the list seems to facilitate the use of the reference point strategy in assigning values to phrases. Subjects use both the local and global contexts in interpreting the phrases. In the random lists the global context is difficult to discern, and often conflicts with the local context. In the ordered lists global context is visually obvious and the local and global contexts are coherent. This may explain why values closer to those found in previous studies were assigned to phrases in the ordered lists, and is another reason to prefer ordered lists. A second potential problem investigated in Study 1 was the effect of a SELECTION OF VERBAL PROBABILITIES 209 phrase appearing early or late in the list. If such position effects were strong, extra controls would be required, such as reversal of list orders in a counterbalanced design. Although ordinal position of the verbal expressions had little effect on the assignment of values to the expressions, reversal of lists did intluence choice of phrases from the random phrase list. The effect was small, but statistically significant when data were aggregated across problems. The subjects chose a phrase more often when it appeared in the second half than the first half of the random list. There was no effect in the ordered list. However, the effect may be larger when subjects use a list just once. In this study they used the list 16 times, and their familiarity with the list may have lessened the position effect. Systematic list reversal does not seem necessary for ordered lists. If random lists are used, counterbalanced reversal would be needed to prevent bias due to people’s tendency to select phrases from late in the list. The third issue addressed by Study 1 was whether it is necessary to have subjects individually assign numerical interpretations to the verbal probability expressions. Although this calibration compensates for individual differences in the interpretation of the phrases, it is a burden for subject and researcher. Would it be sufficient to use the conventional interpretations, those measured in previous studies (or indeed in this one) and available a priori, and thus skip the individual calibrations in the second phase? Use of the subjects’ own values to interpret the phrases they selected produced answers to the problems which were usually (but not always) more accurate than use of the conventional values. When the numbers subjects assigned to the verbal expressions were compared with the values found in previous studies, greater deviations were observed in the random lists than the ordered lists. Therefore, a researcher who assumes subjects’ interpretations of the phrases are close enough to the interpretations from other studies that he or she can dispense with the individual calibrations would be risking less error by presenting the phrases in ordered rather than random lists. STUDY 2: MEASURING THE BREADTH OF MEANING OF VERBAL PROBABILITY EXPRESSIONS Verbal expressions of probability differ in vagueness, that is, in the range of numerical probabilities to which they refer. Some phrases, such as “absolutely certain” and “toss-up,” refer to narrow ranges of probabilities (see Kong ef al., 1986), while other phrases, particularly those with meanings around 25 or 75%, refer to broader ranges. The tendency to select a phrase with a broad “membership function” (Wallsten et al., 1986a) may be strongly affected by its ordinal position in a list. Specifically, if people scan a list looking for the first phrase whose meaning 210 ROBERT M. HAMM covers the probability they have in mind, they will be more likely to select a phrase with a broad range of meaning, particularly if the list is random so that broad phrases covering all parts of the probability scale appear early in the list. Additionally, the values people assign to vague phrases may be more influenced by the phrases’ immediate neighbors in the list than the values assigned to phrases with precise meanings. A second study was carried out to measure the breadth of the membership functions of the 19 verbal expressions of probability used in Study 1. This makes it possible to determine whether differences in breadth of meaning have any effect on the two behaviors involved in the proposed method: selection of verbal expressions from a list and assignment of numerical values to verbal expressions. Method Sixty-five subjects, primarily from the Introductory Psychology subject pool, filled out a questionnaire (Appendix 2) which asked them to state the lower and upper bounds of the numerical probabilities that each phrase refers to (similar procedures were used, for different phrases, by Reagan et al., 1989). Half of the subjects named the lower bound for each phrase before the upper bound, and half did the reverse. Crossed with this factor, half of the subjects named the phrases in Random order A (see Footnote 1, above), and half used its reverse, Random order B. Results The mean lower and upper limits, across all conditions, are presented in Columns 1 and 2 of Table 6. The midpoint between these bounds is an estimate of the meaning the individual assigns to the verbal expression of probability. The mean and median midpoints of these ranges and their standard deviations (Columns 3-5) are similar to the values in Table 1. The difference between a phrase’s upper and lower bounds is a measure of the range of meaning the individual assigns to the phrase, and can be used as an estimate of the breadth of the phrase’s membership function. The mean and median range for each phrase are presented in Columns 6 and 7 of Table 6. “Absolutely impossible” (.034), “toss-up” (.042), and “absolutely certain” (.052) have the narrowest ranges, and “uncertain” (.240) and “better than even” (.163) the widest (Column 6). The median range measure does not discriminate well among the phrases, for its value for many of the phrases was .lO. The 8th column shows the standard deviations of the differences between the upper and lower bounds, which reveal that there is an exceptionally high variation across subjects (SD = .336) in the range of meaning attributed to “uncertain.” For comparison with previous work, the differences between the median upper and lower bounds for the three phrases studied by Wallsten et SELECTION OF VERBAL 211 PROBABILITIES TABLE 6 THE MEAN LOWER LIMIT, UPPER LIMIT, AND MIDPOINT BETWEEN THE LIMITS, FOR EACH PHRASE Phrase Absolutely impossible R=lY Very unlikely Seldom Not very probable Fairly unlikely Somewhat unlikely Uncertain Slightly less than half the time Toss-up Sliitly more than halfthetime Better than even Rather likely Good chance Quite likely Very probable Highly probable Almost certain Absolutely certain Lower limit Upper limit .w7 .064 .046 .117 ,113 .176 .217 294 .041 .183 ,145 .243 ,235 .287 349 .534 .390 .484 .524 ,540 .585 ,652 .686 ,733 ,757 340 928 Midpoint Mean Median Ranee SD of SD Mean Median .071 .126 .116 .lll .129 .153 .034 .106 .lOO .126 ,122 .ll2 .132 .240 .oo .10 .10 .lO .I0 .lO .lO .lO .I33 .079 .061 :E .174 .231 ,283 .414 .lOO .075 .150 .150 .225 .250 so0 .470 .526 .430 SOS A40 .500 .050 .045 .080 .042 .07 a0 .048 .087 A04 ,703 .735 .799 .824 .872 .899 .950 980 .564 .621 .660 ,726 .755 .803 .828 .895 .954 .555 .600 .700 .750 .065 .088 .222 .137 .146 .122 .127 .088 .lOS .080 .163 .I50 .147 .137 .139 .142 .109 .052 a9 .lO .I5 .I5 .lO .I0 .lO .lO .oo .053 .123 .087 .077 .087 .093 a34 .090 .142 .024 .117 .ooo :E .850 .925 l.CK!fl :E .061 .M5 .336 Note. N = 65. al. (1986a) that were used in the present study were “toss-up”: .13; “good chance”: .46; and “almost certain”: .l 1. These ranges are generally larger than the individual ranges measured in our study (Columns 6 and 7 of Table 6), which may reflect different elicitation procedures. The standard deviation of the values assignedto a verbal expression of probability is an alternative measureof the breadth of the phrase’s membership function, although it is confounded with individual differences in the meaning of the phrase. Column 6 of Table 2 shows the mean of the standard deviations of the values given to phrases when presented in the four list orders in Study 1. Column 8 of Table 6 shows the standard deviation of the midpoints of the ranges, from Study 2. The intercorrelations among these five measures of breadth of membership function are all fairly high, ranging from .55 to .86 (Table 7). Therefore when a direct measure of the breadth of membership function is lacking, the between subject standard deviation might serve as a useful proxy. E#ect of list reversal on the selection of phrases with broad and narrow membership functions. The question whether the subjects in Study 1 preferred to use phrases with broader meanings is addressed in Table 8, which shows the correlations between the indices of breadth of membership function (derived from data from Studies 1 and 2) and measures of 212 ROBERT M. HAMM TABLE 7 INTERCORRELATIONS AMONG FIVE MEASURESOF BREADTH OF MEMBERSHIPFUNCTION Median range U - L difference SD of value SD of midpointd Mean range” Median range” U - L Differenceb SD of value’ .73** .74 .67** .65** .72 .7Ci** .59** .63 .86 .55** Note. N = 19 for every correlation except those involving the U - L Difference index, for which N = 3. o Difference between upper and lower bounds, Study 2, N = 65. b Difference between median upper bound and median lower bound, from Fig. 4 of Wallsten, Budescu, Rapoport, Zwick, and Forsyth (1986). ’ Mean of standard deviations of values assigned to the phrases, from four lists with different phrase orders, Study 1, N = 138. d Standard deviation of midpoint (average of upper and lower bounds), Study 2. ** p < .Ol. the number of subjects who used each phrase for each problem in Study 1, separately for the ordered and random lists. The relations are generally positive, which suggestspeople prefer to use phrases with broad, even vague, meanings. There was no general tendency for broad phrases to be selected more in random lists than ordered lists, as had been predicted. Effect of breadth of phrase meaning and list structure on variability of value assignment. It was expected that subjects would assign more vari- able numerical values to verbal probability expressions that have broader meanings,and that this would occur particularly when the phrase lists are randomly ordered and the context supplies fewer constraints on the meaning of each term. The correlations between the four measures of breadth of membership function of the 19 phrases (from Study 2; see Table 7) and the standard deviations of the values assigned to those phrases when presented in ordered and random lists (from Study 1; see TABLE 8 CORRELATIONS BETWEEN INDICES OF BREADTH OF PHRASES’ MEMBERSHIP FUNCTIONS AND THE NUMBER OF SUBJECTS WHO SELIXTED EACH PHRASE (SUMMED OVER FOUR PROBLEMS), SEPARATELY FOR ORDEREDAND RANDOM PHIUSE LISTS Measures of breadth of membership function” List structure Ordered Random Total Mean raw Median range SD of values SD of midpoints .35* .29 .35* .43** .38* .43** .22 .22 .24 .41** .29 .38* a Variables are defined in notes to Table 7. *p < .lO; **p < .05. SELECTION OF VERBAL PROBABILITIES 213 Table 2) were all significantly positive (I, C .05). However, this relationship was not substantially stronger for the random lists than the ordered lists. Discussion of Study 2 Subjects in Study 1 preferred phrases which Study 2 showed to have relatively broad meanings, such as “somewhat unlikely” or “good chance.” This preference, however, may be due to the particular word problems used in the study. The answers people chose for these problems tended to be in the .60 to 90 range (see Hamm, 1988a). The verbal expressions covering this range (as well as the .I0 to the .40 range) have broader ranges than the phrases covering other ranges. Therefore subjects’ tendency to select phrases with relatively broad meanings on these problems may be an artifact due to the particular problems used. Further study is needed to clarify this issue. The expected interaction between breadth of phrase meaning and list structure was not found: Subjects were not more likely to select phrases with vague meanings when they appeared in random lists as opposed to ordered lists. Phrasesthat had broad meaningsas indicated by the difference between the lower and upper bounds that individuals place on their meanings also tended to have meanings that varied more between individuals. Wallsten et al. (1986b) have shown that these individual differences in preferred point-meanings for verbal expressions tend to be stable. The fact that people put broader bounds on those expressionsthat others may interpret differently, even though their own interpretation is stable, suggeststhat people know these phrases have greater interpersonal variability of meaning. GENERAL DISCUSSION The proposed method of expressing probability verbally by selecting words or phrases from a list has the advantagesdetailed by Zwick (1987; e.g., that people prefer to use verbal probabilities) without the disadvantages of unconstrained verbal expression. It limits the number of possible verbal probabilities by offering a specific set of expressions. It calibrates the interpretations of the verbal expressions by getting the subjects to state the numerical meanings of the phrases. The studies reported here show that the validity of the method is little affected by several anticipated factors. A phrase’s immediate neighbors have a slight influence on its meaning. The calibration provided by the second phase of the method controls for this. Further, presenting the expression in sequential order exploits the immediate neighbor effect: it makes the meanings subjects assign less 214 ROBERT M. HAMM variable and closer to the conventional meanings of the phrases. Ordinal position, as manipulated by reversing the list, affects subjects’ selection of terms when the lists are randomly ordered but not when they are sequentially ordered. Subjects have a slight preference for phrases that are more vague, but this might be due to the particular problems studied. Use of the ordered list enables subjects to select phrases that are more accurate answers for the word problems. In addition, use of their own interpretations rather than the common meanings usually makes their answers more accurate. Thus the proposed method seems feasible as a way of expressing probability verbally. As long as the phrases are presented in sequential order, it should not be necessary to systematically reverse the order of presentation. The individual calibration in the second phase of the method makes it more accurate, but in certain applications this may not be worth the additional effort. Further Context Effects Several context effects not addressed in the present study should be kept in mind by researchers considering using lists of verbal probabilities. An expression’s meaning may depend on the object whose probability is being discussed (Brun & Teigen, 1988, Study 3; Mapes, 1979; WaIlsten et al., 1986b). Thus, “highly likely” may have a different numerical interpretation if applied to the possible failure of a Broadway play than if applied to the possible meltdown of a nuclear reactor. To control for this effect, the subject could be asked to assign numbers to phrases in a specific context. The meaning may also depend on whether the verbal expression of probability has an object or not. Thus, Brun and Teigen (1988, Studies 1 and 3) found that people tend to assign higher numbers to probability expressions when they appear in a sentence (e.g., “unemployment figures will probably rise sharply during the next three years”) than when they stand alone. However, with medical contexts (Study 2), they found the opposite: context made the interpretations lower. Probability expressions may have different kinds of meaning when used with different response modes. For example, Teigen (1988b) required subjects to use “probable” or “not probable” to describe the chances of each of several contestants in a music competition. When contestants have about equal probabilities of winning, subjects use “probable” for those contestants with greater than average chances. When they thus call 7 of 20 contestants “probable winners,” they implicitly define “probable” as .I5 (at most), much lower than in other studies. But it is not clear whether this would have happened had a broader range of verbal expressions been available to choose from. Another possible contextual effect is that the meaning of a phrase may SELECTION OF VERBAL PROBABILITIES 215 depend on the other phrases available in the choice set (see Pepper, 1981). For example, the meaning of “probable” may depend on whether “not probable” is present in the list. A term and its negation may mutually influence their meanings to be equidistant from the midpoint of 50% (Reyna, 1981; but see Reagan et al., 1989). Two similar terms such as “fairly unlikely” and “somewhat unlikely” may be assigned the same broad meaning if only one of them is in a list, but may be assigned adjacent but nonoverlapping meanings if both are present. Perhaps the most pertinent context issue is the question whether a subject interprets a verbal expression the samewhen reading it in a word problem, when considering whether to select it from a list as the answer to the word problem, and when assigning a numerical value to it. The proposed method depends crucially on the assumption that these interpretations are invariant, as does the method of Kong et al. (1986); hence it needs to be formally tested. Does a Structured List Distort Meaning? Study 1 indicated that the phrase selection method is at least as valid with the sequentially ordered phrase list as with the random list. The coherent local and global contexts seemsto make the conventional meanings of the verbal expressions more easily accessible. But the constraints that an ordered list places on the subject’s interpretation of the phrases may distort, rather than clarify, the meanings of the phrases, and thus may prevent people from using the phrases as they normally would. Both the center of meaning and the breadth of meaning of a phrase may be distorted. Consider, for example, the meaning of “almost certain.” Kong et al. (1986)found subjects assign it a mean value of .78 (median .90), but the author used it to represent .95 in the present study. When subjects assigned values to “almost certain” in the random phrase lists (where there was nothing to indicate that the phrase meant .95), the mean value was 90 (see Table 6). However, in ordered lists (where it appeared in the 18th or 2nd of 19 positions, between “highly probable” and “absolutely certain”), its mean value was .93. Placing the phrase in the ordered list changed its meaning in the direction the author intended. Another example is “uncertain,” the verbal expression that the author selectedto represent 40. The range of meaning people assignto this word is both very wide (an average range of .24 between the lower and upper bounds; see Table 6) and very variable (the standard deviation of the range is .34; some subjects gave it a range of 0 and others a range of 100). Although the mean value assigned to “uncertain” was .40 or .41 in both the ordered and random lists (Table 2), which agrees with the a priori value, these values were much more variable in the random list (SD = 216 ROBERT M. HAMM .18) than the ordered list (SD = .06). Placing a phrase in an ordered list can drastically change the breadth of its meaning. Because of the exceptional variability of the meaning of “uncertain,” an alternative phrase for the region near .40 should be substituted. A candidate is “worse than even,” which Shanteau (1974) found to have a mean value of .38 using two different procedures, and which is symmetric with “better than even” whose assigned meaning is 60. Although it is possible to find replacement phrases for particular inappropriate verbal expressions, still the ordered list will change some phrases’ meanings and breadths of meaning, for many individuals. This can be viewed as a necessary cost of adopting a common set of interpretations of verbal expressions of uncertainty. Kong et al. (1986) advocate improving the use of verbal probabilities through codifying the meaning of probabilistic expressions. They suggest measuring what people usually mean by phrases, publicizing this, and training people to use the terms with these agreed-upon meanings. Such publicity and training would (a) reduce the differences between people, (b) narrow the individual membership functions for each phrase, and (c) get people to use the phrases to mean the same probability in different contexts. Establishing such a convention would require changing people’s interpretations of many phrases. The proposed method of selecting verbal probability expressions from a list could be a tool in such a program. The changes that the use of an ordered phrase list in this method would induce in the meaning of its phrases might be viewed as costs worth incurring in order to improve communication about uncertainty. A list that displayed both the verbal expressions of probability and their numerical interpretations (as in Fig. 1) could be useful in this context. This scale would not require a separate calibration procedure. People would be free to use the mode they found more fitting to the problem and to their cognitive style. The two modes of expression would mutually define each other, constraining people’s interpretations of each to be closer to the common meanings. Finally, use of the graphic scale would train people to associate the verbal and numerical expressions, promoting the availability, acceptance, and use of the conventional interpretation. Alternative lists of verbal expressions of very low or very high probabilities would be useful for making distinctions among degrees of near impossibility or near certainty. These lists should be based on research discovering the phrases people already use in contexts where these ranges of probability are pertinent, such as medicine (cf. Meyer & Pauker, 1987) or technological systems. A recent example highlights this need. To assess the overall risk of space shuttle failure, NASA engineers were asked to make verbal assessmentsof the reliability of space shuttle components. SELECTION Verbal OF VERBAL Expressions Absolutely impossible Numerical - .15 Not very probable .io Worse Slightly Slightly less than more .lO Seldom Somewhat than unlikely .25 unlikely .x3 than even .a3 half the time .A5 Toss-up .m half the time 55 Better than Rather even .%I likely .m .75 Good chance Quite Very Highly Almost Absolutely FIG. 1. likely .m probable .85 probable So certain certain Expressions 05 unlikely Fairly 217 -m Rarely Very PROBABILITIES .s5 - -1m Proposed graphic display of verbal and numerical expressions of probability. These were then translated into numbers, using an arbitrary code (“frequent” = .Ol; “reasonably probable” = .OOl; “occasional” = .OOOl;and “remote” = .OOOOl)that had not been used by the engineers in making their original assessments (Marshall, 1986). This contributed to overconfidence in shuttle safety prior to the Challenger explosion. This poor risk assessment practice has given subjective judgment a bad name in the aerospace community: “the government is relying too much on subjective judgment and too little on statistical analysis in deciding which of thousands of safety problems on the space shuttle should get attention” (Marshall, 1988, p. 1233). Codification of verbal expressions of probability would impose consistent interpretations on the phrases and allow experts’ subjective judgment to make its potentially crucial contribution. The flexibility inherent in the ability to select vague or precise probability expressions is a special attraction of verbal probabilities. Those who value this feature of verbal probabilities may be more disturbed by the potential distortion of verbal expressions’ breadths of meaning when 218 ROBERT M. HAMM placed in a list than by the distortion of their central meanings. For example, presenting a set of phrases with evenly spaced meanings may force people to interpret them as having equal breadths of meaning. It is possible, however, to preserve considerations of breadth of meaning while establishing conventional meanings for verbal expressions of probability. For example, Beyth-Marom (1982) proposed a framework for codifying the breadth of meaning of verbal probability expressions. It divides the probability scale into ranges .lO or .20 wide, and associates each range with from 2 to 6 verbal expressions. For example, the terms “small chance” and “doubtful” would refer to the .lO to .30 range. Although this reflects the fact that verbal expressions apply to ranges of probability, it has disadvantages. It does not distinguish between probabilities within a range. It requires people to learn a number of arbitrary sharp boundaries (e.g., at .lO and .30). If establishing a convention requires people to relearn the meanings of phrases, it seems more useful to associate phrases with points and allow for fuzzy boundaries, than to associate phrases with precise ranges. An alternative way to address breadth of meaning in the context of the method of selecting verbal probabilities from a list is to be explicit about the breadth of each expression’s meaning, in addition to its center of meaning. Initially, researchers taking this approach could require subjects to specify not only a central numerical meaning (as in Study 1) but also a range of meanings (as in Study 2) for each expression used for presenting or answering the question. The researcher would then know the range of possible meanings the subject might intend for the answer to the substantive question. Thus this approach recognizes and roughly measures one of the primary functions of verbal probabilities, the control of vagueness of expression. After sufftcient study to establish conventional breadths of meaning, a more convenient graphic list such as Fig. 2 could be used. This presents the phrases’ conventional ranges as well as the conventional means of Fig. 1. It is distinct from Beyth-Marom’s proposal in that each expression has a center of meaning and its own boundaries. By eliciting or providing interpretations of both the center and the breadth of each phrase’s meanings, it would be possible to assure that neither dimension of meaning is distorted or miscommunicated in the verbal probability selection method. Very Unl,kely I , TOSS1 8 I I 0 .5 Rather likely I I I FIG. 2. Proposed graphic display of verbal and numerical probabilities, incorporating breadths of meaning as well as centers of meaning. SELECTION OF VERBAL PROBABILITIES 219 Two additional issues need exploration. First, not only verbal but also numerical expressions can be said to have breadths of meaning. Paradoxically, the number SO refers to a wider range of probabilities than .60, while the corresponding verbal expression “toss-up” has a narrower meaning than “better than even.” We need to understand this anomaly and its implications for the verbal probability selection method. Second, in many contexts verbal expressions have floating meanings that are negotiated in conversations. Can this feature of communication about probability be integrated with the use of the specific methods proposed here, or with any program for establishing conventional meanings for verbal probabilities? APPENDIX 1 The “Doctor” Problem, One of Four Probabilistic Problems Used in the Study Inference Word In alternative versions of the problem, the probabilistic information was presented in either verbal or numerical form. [0 piecesof key information.] The next word problem is about a doctor trying to figure out what diseasea patient has. The patient is, undeniably, ill, but it is difficult to know what disease he has. You will be asked to estimate how likely it is that the patient has one of two diseases. The patient comes in to the emergency room at night with a very unusual symptom-his eyes are bright yellow. The doctor knows that there are only two diseasesthat can produce this particular symptom-hepatitis and toxic uremia. People never contract both illnesses at the same time. With what you know now, what is the probability that the patient has toxic uremia? [l piece of key information.] A discussion with a colleague reminds the doctor that toxic uremia is a less common disease than hepatitis. He checks a textbook and finds that [it is highly probablethat people] [90% of people] who present to their doctors with the symptom of yellow eyes have hepatitis, therefore, [it is very unlikely that they] [only 20% ofpeople with this symptom] have toxic uremia. With what you now know, what is the probability that the patient has toxic uremia? [2 piecesof key information.] The doctor orders the lab to do a Speck test on the patient’s blood. In two hours the results are back-the Speck test indicates that the patient has toxic uremia. With what you know now, what is the probability that the patient has toxic uremia? [3 pieces of key information]. The doctor consults his diagnostic manual 220 ROBERT M. HAMM and discovers that the Speck test is the best way to find out whether a patient with yellow eyes has hepatitis or toxic uremia. However, the Speck test is not foolproof. When the patient has toxic uremia, [it is rather likely that the Speck test will indicate that the patient has this illness. It is somewhat unlikely that the Speck test will indicate that the patient has hepatitis] [the Speck test correctly indicates this 70% of the time, but 30% of the time it falsely indicates that the patient has hepatitis]. Similarly, when the patient actually has hepatitis, [it is somewhat unlikely that the Speck test will indicate that the patient has toxic uremia] [the Speck test will indicate that the disease is toxic uremia approximately 30% of the time]. With what you know now, what is the probability that the patient has toxic uremia? APPENDIX 2 Instructions for Questionnaire Eliciting Lower and Upper Bounds on the Numerical Meanings of Each Phrase Two versions were prepared. One asked for upper bounds first, and the other asked for lower bounds first. People often use words or phrases such as “impossible” or “very likely” to express a degree of uncertainty or certainty. We are interested in the range of uncertainties for which you think it appropriate to use each of a number of words or phrases. Think of a cafeteria tray that has 100 ping pong balls on it. Some of them are white and rest are yellow. You can see every one of them clearly. You must convey to a friend how many of the balls are white. You want to tell him how likely it is that a white ball would be picked if they were thoroughly mixed up and someone were to draw one without looking. However, you are not allowed to tell the person the actual proportion of white ping pong balls. Rather, you are forced to use a nonnumerical descriptive phrase. We want to know the range of proportions of white ping pong balls, in the tray described above, for which you would consider each term to be appropriate. We will ask you to tell us this for each of 20 terms. The first term is “about even.” What is the highest (lowest) proportion of white balls (out of 100) for which you think~ld be appropriate to use the term “about even,” in trying to tell your friend the proportion of white and yellow ping pong balls? Write that number here: Now what is the lowest (highest) proportion of white balls for which you think it would be appropriate to use the term “about even”? SELECTION OF VERBAL PROBABILITIES 221 Look at your answers. You should have named two numbers somewhere between 0 and 100 (inclusive). The second number should have been lower (higher) than (or equal to) the first. Any number in between the two numbers would be a reasonable interpretation for your friend to make when you tell him that the chance of drawing a white ping pong ball is “about even.” Any number higher (lower) than your fust answer would not be a reasonable interpretation of “about even”; nor would any number lower (higher) than your second answer be reasonable. If these statements are not all true, you may wish to go back and change one or both of your answers. Below is a list of words or phrases expressing degree of uncertainty. Assume that you are using each phrase to describe the chance of drawing a white ping pong ball from the tray of 100balls. For each phrase, please express the upper and lower (lower and upper) numerical limits that you would expect your friend to use in interpreting it. Please focus on each word or phrase by itself, rather than trying to compare it with your answers for other words or phrases. Upper (Lower) limit Uncertain Rather likely Somewhat unlikely Rarely Slightly less than half the time Good chance Fairly unlikely Absolutely impossible Toss-up Quite likely Not very probable Absolutely certain Slightly more than half the time Very probable Seldom Almost certain Better than even Highly probable Very unlikely Lower (Upper) limit 222 ROBERT M. HAMM REFERENCES Beyth-Marom, R. (1982). How probable is probable? A numerical translation of verbal probability expressions. Journal of Forecasting, 1, 257-269. Brun, W., & Teigen, K. H. (1988). Verbal probabilities: Ambiguous, context-dependent, or both? Organizational Behavior and Human Decision Processes, 41, 39OXl4. Budescu, D. V., & Wallsten, T. S. (1985). Consistency in interpretation of probabilistic phrases. Organizational Behavior and Human Decision Processes, 36, 391-405. Budescu, D. V., Weinberg, S., & Wallsten, T. S. (1988). Decisions based on numerically and verbally expressed uncertainties. Journal of Experimental Psychology: Human Perception and Performance, 14, 281-294. Fox, J. (1986). Knowledge, decision making, and uncertainty. In W. A. Gale (Ed.), Artificial intelligence and statistics (pp. 57-76). New York: Addison-Wesley. Hamm, R. M. (1987). Diagnostic inference: People’s use of information in incomplete Bayesian word problems (Publication 87-11). Institute of Cognitive Science, University of Colorado, Boulder. Hamm, R. M. (1988a, November). Accuracy of probabilistic inference using verbal vs. numerical probabilities. Psychonomics Society Meetings, Chicago. Hamm, R. M. (1988b). Evaluation of a method of verbally expressing degree of belief by selecting phrases from a list (Publication 88-6). Institute of Cognitive Science, University of Colorado, Boulder. Kong, A., Bamett, G. O., Mosteller, F., & Youtz, C. (1986). How medical professionals evaluate expressions of probability. New Engand Journal of Medicine, 315, 74&745. Kuipers, B., Moskowitz, A. J., & Kassirer, J. P. (1988). Critical decisions under uncertainty: Representation and structure. Cognitive Science, 12, 177-210. Lichtenstein, S., & Newman, J. R. (1%7). Empirical scaling of common verbal phrases associated with numerical probabilities. Psychonomic Science, 9, 563-564. Mapes, R. E. A. (1979). Verbal and numerical estimates of probability in therapeutic contexts. Social Science and Medicine A, 13, 277-282. Marshall, E. (1986). Feynman issues his own shuttle report, attacking NASA’s risk estimates. Science, 232, 15%. Marshall, E. (1988). Academy panel faults NASA’s safety analysis. Science, 239, 1233. Meyer, K. B., L Pauker, S. G. (1987). Screening for HIV: Can we afford the false positive rate? New England Journal of Medicine, 317, 238-241. Nakao, M. A., & Axelrod, S. (1983). Numbers are better than words: Verbal specifications of frequency have no place in medicine. American Journal of Medicine, 74, 1061-1065. Pepper, S. (1981). Problems in the quantification of frequency expressions. In D. Fiske (Ed.), New directions for methodology of social and behavioral science: Problems with language imprecision (No. 9, pp. 25-41). San Francisco: Jossey-Bass. Rapoport, A., Wallsten, T. S., & Cox, J. A. (1987). Direct and indirect scaling of membership functions of probability phrases. Mathematical Modelling, 9, 397-417. Reagan, R. T., Mosteller, F., & Youtz, C. (1989). Quantitative meanings of verbal probability expressions. Journal of Applied Psychology, 74, 433-442. Reyna, V. F. (1981). The language of possibility and probability: Effects of negation on meaning. Memory and Cognition, 9, 642-650. Shanteau, J. (1974). Component processes in risky decision making. Journal of Experimental Psychology, 103, 680-691. Simpson, R. H. (1944). The specific meanings of certain terms indicating differing degrees of frequency. Quarterly Journal of Speech, 30, 328-330. Teigen, K. H. (1988a). The language of uncertainty. Acta Psychologica, 68, 27-38. Teigen, K. H. (1988b). When are low-probability events judged to be “probable”? Effects SELECTION OF VERBAL PROBABILITIES 223 of outcome set characteristics on verbal probability estimates. Actu Psychologica, 68, 157-174. Wallsten, T. S., Budescu, D. V., & Erev, I. (1988). Understanding and using linguistic uncertainties. Acta Psychologica, 68, 39-52. Wallsten, T. S., Budescu, D. V., Rapoport, A., Zwick, R., & Forsyth, B. (1986a). Measuring the vague meanings of probability terms. Journal of Experimental Psychology: General, 115, 348-365. Wallsten, T. S., Fillenbaum, S., Bt Cox, J. A. (1986b). Base rate effects on the interpretations of probability and frequency expressions. Journal of Memory and Language, 25, 571487. Woodward, B. (1987). Veil: The secret wars of the CIA, 1981-1987. New York: Pocket Books. Zimmer, A. C. (1983). Verbal vs. numerical processing of subjective probabilities. In R. W. Scholz (Ed.), Decision making under uncertainty (pp. 159-182). North-Holland: Elsevier Science Publishers. Zwick, R. (1987). Combining stochastic uncertainty and linguistic inexactness: Theory and experimental evaluation. Ph.D. dissertation, University of North Carolina, Chapel Hill. RECEIVED: May 12, 1988
© Copyright 2026 Paperzz