Selection of Verbal Probabilities - Homepages | The University of

ORGANIZATIONAL
BEHAVIOR
AND
HUMAN
DECISION
PROCESSES
48,
193-223 (1991)
Selection of Verbal Probabilities: A Solution for Some
Problems of Verbal Probability Expression
ROBERT M. HAMM
Institute of Cognitive Science, University of Colorado, Boulder
A method for verbal expression of degree of uncertainty is described. It
requires the subject to select a phrase from a list that spans the full range of
probabilities. In a second, optional, step, the subject indicates the numerical
meaning of each phrase. The method avoids two problems of verbal probabilities-the indefinitely large lexicon and the individual differences in the interpretation of words. To test whether context and ordinal position might bias
subjects’ selection or interpretation of the verbal expressions in the list, the list
order was varied. When the verbal expressions were arranged in random order, ordinal position had a signiticant effect on the selection of expressions.
However, these effects did not occur when the phrases were listed in ascending or descending order. Considerations of accuracy and interpersonal agreement also support the use of ordered phrase lists. o 1991Academic PICSS,hc.
INTRODUCTION
The issue whether people think and communicate better using numerical or verbal expressions of probability has received recent attention
(Beyth-Marom, 1982; Kong, Bamett, Mosteller, & Youtz, 1986; Wallsten, Budescu, Rapoport, Zwick, & Forsyth, 1986a; Zimmer, 1983). In a
number of contexts, communication using verbal expressions of probability may be preferable, even though it is less precise than numerical
communication (Zwick, 1987). This paper describes a method designed to
avoid several problems that may limit the usefulness of verbal expressions of probability, and reports a study that evaluates possible confounding factors.
Verbal expressions of probability are used in analyses of decisions in a
variety of consequential contexts, such as medicine (Nakao & Axelrod,
1983) and political analysis (Beyth-Marom, 1982). In recent U.S. foreign
policy deliberations, President Reagan’s National Security Planning
Group is reported to have prepared for him a list of five military options
This research was supported by Contract MDA-903-86-K-0265 from the OflIce of Basic
Research, Army Research Institute for the Behavioral and Social Sciences, and revised
while the author had a National Research Council Associateship at the ARI Fort Leavenworth Field Unit. Tom Wallsten and an anonymous reviewer provided useful comments on
an earlier draft. Reprints may be requested from the author at Institute of Cognitive Science,
Box 345, University of Colorado, Boulder, CO 80309-0345.
193
07495978f91 $3.00
Copyri&t
Q 1991 by Academic Press. Inc.
Au rights of reproduction
in any form reserved.
194
ROBERT
M.
HAMM
against Libya in December 1981. Their chances of success were expressed using the terms “low,” “moderate,” and “moderate to high”
(Woodward, 1987, p. 198).
A policy of using verbal rather than numerical probabilities can be
criticized for several reasons (Beyth-Marom, 1982; Nakao & Axehod,
1983). Numerical probabilities can be more precise than verbal when
pertinent relative frequency data are available. They permit communication to be less ambiguous. They can contribute to the evaluation of decision options, as a weight for the value of possible outcomes. Numerical
measurement of uncertain evidence permits Bayesian inference. Yet people very frequently choose to use verbal rather than numerical expressions of probability.
Verbal probabilities also have advantages. First, people prefer to use
them. In an analysis of medical experts’ thinking about a decision,
Kuipers, Moskowitz, and Kassirer (1988) observed that even when numerical terms were used they were qualitative descriptions that “function
as easily communicable names for important landmarks rather than as
real numbers” (p. 192; see also Teigen, 1988a, and Fox, 1986). When
people are required to communicate a probability in which they are not
confident, they may prefer to use words that allow them to express the
vagueness of theirjudgment (Zwick, 1987). Linguistic terms may facilitate
thinking about uncertainty in complex problems (Zimmer, 1983).
Finally, making decisions based on verbal rather than numerical probabilities may entail only a slight cost (Hamm, 1988a; Wallsten, Budescu,
& Erev, 1988). For example, Budescu, Weinberg, and Wallsten (1988)
had subjects bid for gambles whose probabilities were described numerically or verbally. The quality of the bids was evaluated by comparing
them to the expected values of the gambles, using the subjects’ own
numerical interpretations of the verbal probabilities (elicited separately).
Over all trials with gambles that had positive expected value, subjects
gained only 1.23% less money in the verbal condition than in the numerical condition. With losing gambles, subjects lost only 4.65% more in the
verbal condition than in the numerical condition. Here the ease and comfort of verbal expressions may be worth the small cost.
Though verbal expression of probability may be justified in some situations, it presents problems that must be solved if it is to be used for
communicating about high stake decisions. First, the meanings of phrases
differ between people, although they seem stable for individuals over time
(Budescu & Wallsten, 1985). Kong et al. (1986) found no systematic differences between occupational groups, although Nakao and Axelrod
(1983) found some differences between physicians and non-physicians.
Second, there is an indefinitely large number of words and phrases that
could be used to express degree of belief. And indeed collectively people
SELECTION
OF VERBAL
PROBABILITIES
195
use many of them. For example, Budescu et al. (1988, Study 1) required
people to express the chance a dart would hit a particular wedge in a
circle, using either their own words or the integers between 0 and 100. On
6 repetitions of 11 wedges, the average subject used about 13 phrases and
18integers. However, the 20 subjects used a total of 111different phrases,
compared with 73 integers. This makes it difficult to develop a lexicon of
the numerical meaningsof all verbal expressions of probability. Any new
phrase would pose a problem of interpretation, in contrast with a new
number that can be easily understood because it can be placed unambiguously in the [O,l] number line.
Measurement theory techniques mediated by sophisticated computer
programs have been used to investigate individuals’ interpretations of
verbal expressions of probability (e.g., Wallsten ef al., 1986a;Rapoport,
Wallsten, & Cox, 1987).These techniques require subjects to make many
choices or ratings. But for verbal probabilities to be useful in contexts
such as survey research or multiuser decision support systems, simpler
methods are required. This paper presents and evaluates such a method.
A Method for Verbal Expression of Degree of Uncertainty
The method proposed here requires subjects to express their judgment
of probability by selecting an expression from a list. This allows them to
think about the problem using the familiar, verbal mode. The method
addresses the problem of the indefinitely large lexicon by confining the
subject’s responses to a limited set of verbal expressions, selected to
cover the full range of degreesof belief. It solves the problem of individual differences by having the subject assign a numerical value to each
phrase (after completing the probability judgment task), thus providing an
interpretation for the chosen expression.
A list was constructed (Table l), consisting of 19 verbal expressions
covering the range from 0 to lOO%,with symmetry about an easily identifiable midpoint (“toss-up”). There was a term for each 5% mark, except
that there was only one term in between 25 and 40%, and only one in
between 60 and 75%. Other researchers may wish to use shorter (or
longer) lists, lists without a sharply defined midpoint, lists that are not
balanced around 50% (see Kong et al., 1986),or different phrases. These
distinctions, while important, are not pertinent to the present investigation of factors affecting the selection of expressions from a list. The
results of this study are applicable to lists composed of any set of verbal
probability expressions that is distributed nearly evenly over a range of
probabilities.
The present list was produced by reviewing previous studies that elicited numerical values for verbal expressions of probability (Budescu &
Wallsten, 1985;Lichtenstein & Newman, 1967;Simpson, 1944;Shanteau,
196
ROBERT M. HAMM
TABLE 1
MEANS AND STANDARD DEVIATIONS OF NUMERICAL INTERPRETATIONS OF VERBAL
EXPRESSIONSOF PROBABILITY MEASURED IN PREVIOUS STUDIES
Standard
deviation
Verbal
expression
Mean
(Median)
Absolutely impossible
Rarely
.05
.08
39 (. 10)
.16
.lO
.16 (.15)
.20 (.20)
.25 (.25)
.31 (.33)
.27
.41
A0 (50)
.07"
.06
.07
49
.12”
.08
.12
.ll
.12
.45 (.45)
.50 (.50)
.47
.54
.55 (.55)
.58 (HI)
.66
.69 (.70)
.74 (.75)
.79 (30)
.87 (X9)
.89 (30)
234
-
Very unlikely
Seldom
Not very probable
Fairly unlikely
Somewhat unlikely
Uncertain
Slightly less than half
the time
Toss-up
Slightly more than half
the time
Better than even
Rather likely
Good chance
Quite likely
Very probable
HighIy probable
Almost certain
Absolutely certain
Source
Value adopted
for this study
Author
s 1944
B&W 1985
L&N 1%7
B&W 1985
s 1944
L&N 1967
L&N 1%7
L&N 1%7
L&N 1%7
Sh 1974
B&W 1985
L&N 1967
.oo
.04
.oo
.ll
L&N 1%7
L&N 1%7
B&W 1985
Sh 1974
.45
.50
36
.06
L&N 1%7
L&N 1%7
Sh 1974
L&N 1%7
L&N 1%7
L&N 1967
L&N 1967
L&N 1967
Sh 1974
Author
Author
.55
ho
.13
.14
.09
.12
.lO
.07
.04
-
.05
.lO
.15
.20
.25
.33
.40
.70
.75
.@I
.85
.90
.95
1.00
Note. Sources are B&W = Budescu & Wahsten, 1985;L&N = Lichtenstein & Newman,
1%7; Sh = Shanteau, 1974; S = Simpson, 1944; Author = author’s judgment.
@Interquartile range.
1974; Wallsten et al., 1986), in order to identify a set of words and phrases
that (a) have interpretations that cover the entire probability range, in
about evenly spaced steps, and (b) have relatively narrow interpretations,
as indicated by small standard deviations, compared to other candidates
with nearby means (see Table 1). To cover the ends, “absolutely
impossible” and “absolutely certain” were chosen. “Almost certain”
was used to cover the 95% range; however, it was subsequently learned
that Kong et al. (1986) had found that people assign this phrase a mean
value of .78 (median 90).
SELECTION
OF VERBAL
PROBABILITIES
197
The list of verbal expressions of probability may optionally be presented in sequential order [ascending (Table 1) or descending] or in random order. After subjects select verbal expressions from a list to express
probabilities in a problem solving, choice, or communication task, they
are asked to refer to the lists and assign numbers to each expression. To
save time, they could be asked to interpret only those expressions that
they had selected.
Potential Problems with the Proposed Method
The validity of using verbal expressions of probability selected from a
list may be compromised if the meaning of a word or phrase depends on
contextual factors. There are several possible types of context effect.
First, a phrase’s immediate neighbors in a list may affect its meaning.
Thus, “rarely” may mean something different if positioned between
“very unlikely” and “absolutely impossible” than if its neighbors are
“good chance” and “slightly less than half the time.” One method of
controlling for “immediate neighbor” context effects is to randomize the
list of available verbal expressionsfor each trial or each subject (Budescu
et al., 1988). But while varying randomized lists is feasible in the computerized laboratory, it would be inconvenient for mailed survey questionnaires, for interviews in which the expression lists are presented to
respondents on a card, or for probability elicitations in a decision support
system. To determine the extent of immediate neighbor context effects on
the selection of verbal probability expressions from lists, Study 1 compares the selection of expressions and the assignmentof numerical interpretations to these expressions when the lists are presented in sequential
versus random order.
Second, when subjects read a list of candidate phrases they must do so
sequentially. Phrases early in the list may be more likely to be chosen if
subjects stop reading after finding one that is good enough. Or phrases
late in the list may be favored if subjects read through the whole list and
choose from among phrases that are still in short term memory at the end.
To measure such position effects, Study 1 presented the expressions in
reversed orders.
Third, while all verbal expressionsof probability have inherently vague
meanings, some phrases are more vague than others (Wallsten et al.,
1986a).There are several ways these differences in vaguenessmight affect the selection of a term from a list. Study 2 measuresvaguenessas the
breadth of the meaning that subjects assign to each of the 19 expressions
of probability. With this measure, the data from Study 1 can be analyzed
to determine whether phrases’ differential vagueness and the order in
which they are presented interact to influence people’s selection or interpretation of the phrases.
198
ROBERT
M.
HAMM
The second step of the method, assigning numerical interpretations to
the verbal expressions of probability, requires an additional effort from
the subject. When this procedure is administered by an individual interviewer or a computer, the effort could be reduced by requiring numerical
interpretations for only those phrases the subject used in the first phase;
but this may be too complicated for use with group administrations or
mailed questionnaires. An alternative is to use common interpretations of
the phrases rather than the subjects’ own interpretations. Whether this is
acceptable depends on how much the subjects’ interpretations differ from
the common interpretations.
The comparison of ordered and random lists raises the issue whether
lists should be structured to allow users to find target phrases easily. An
ordered list allows a subject to find a desired phrase more rapidly than a
random list. An ordered list also might constrain subjects’ interpretations
of the phrases. Whether such a constraint is an advantageor disadvantage
depends on the strengths of the order effect and the context effect.
STUDY 1
The purpose of Study 1 was to investigate whether the following factors
infhtence subjects’ selection of verbal expressions of probability from a
list: (a) the context, i.e., a phrase’s neighbors in the list; (b) the phrase’s
ordinal position in the list; (c) whether the phrase appears in a sequentially ordered or random list; and (d) the vaguenessof the phrase. The
influence of these factors on the following factors will be determined: (1)
subjects’ tendency to select a phrase, (2) subjects’ assignmentof numbers
to phrases, including the extent to which subjects’ interpretations of the
phrases agree with common interpretations, and (3) the accuracy of subjects’ responses to the problems. The results concerning vagueness of
word meaningwill be presented following Study 2, in which a different set
of subjects judged the breadths of meaning for the verbal expressions.
Method
One hundred and forty-seven subjects from the Introductory Psychology subject pool participated in group sessions. Each did four probabi-
listic inference word problems (example in Appendix 1; details in Hamm,
1988a).Half responded with verbal expressions of probability; the others
used numbers. Subjects were instructed “to select verbal phrases that
represent your estimates of the probability or likelihood that statements
are true or that events have happened.” Subjects were told whether the
list was random or ordered. Those given ordered lists were told the expressions were ordered according to their meanings as determined in
surveys of a large number of people. Subjects were asked to consider all
the possible phrases and select the best one for each answer. Following
SELECTION
OF VERBAL
PROBABILITIES
199
the problems, subjects assigned numbers representing “the numerical
probability that most closely represents what each of these verbal phrases
means” to each of the phrases. Total response time on the questionnaire
was recorded.
One subject was dropped from the analysis for using the wrong response mode. A number of individual responses were dropped because
subjects did not follow directions (e.g., used a verbal expression that was
not on the list, assigned a range of values to a phrase, or assigned the
same value to every phrase).
The phrases were presented in one of four possible orders. Two of the
lists were ordered sequentially, either ascending (Table 1) or descending.
The other two lists were arranged in a random order, which was produced
by folding the ordered list (splitting the list at the middle phrase, reversing
one half, and interleaving the two halves) repeatedly.’ In answering the
problems, 25 subjects used the ascending phrase list, 14 the descending,
16 the random list in Footnote 1, and 16 the reversed random list. The
remaining subjects answered using numerical probabilities. The numbers
of subjects assigning values to phrases in the four list orders were 48, 27,
31, and 32, respectively. When those subjects who used the verbal response mode subsequently assigned values to phrases, the list was presented in the same order.
Results
The results of Study 1 address three topics: the effect of phrase list
order on the selection of phrases as answers to the problems, its effect on
the numerical values subjects assign to the phrases, and finally its effect
on the accuracy of the subjects’ answers to the problems.
Each subject selected a phrase from a list to express his or her estimate
of the probability of a hypothesis 16 times: 4 times (after zero, one, two,
and three pieces of key information had been provided; see Appendix 1)
in each of four problems (concerning Cabs, Doctors, Insurance, and
Twins; see Hamm, 1988a).
Eflect of phrase list order on phrase selection. The sequentially ordered
and the random phrase lists were each presented in two orders (between
subjects) that are reverses of one another. Comparison of these reversed
lists can reveal the overall tendency to pick answers that are early or late,
controlling for the identities of the phrases. When subjects selected from
’ The random order used was (1) uncertain, (2) rather likely, (3) somewhat unlikely, (4)
rarely, (5) slightly less than half the time, (6) good chance, (7) fairly unlikely, (8) absolutely
impossible, (9) toss-up, (10) quite likely, (11) not very probable, (12) absolutely certain, (13)
slightly more than half the time, (14) very probable, (15) seldom, (16) almost certain, (17)
better than even, (18) highly probable, (19) very unlikely.
200
ROBERT
M.
HAMM
the ascending or descending ordered lists after all three key pieces of
information had been presented, the average ordinal position of the selected phrases was 9.66. This is only l/3 of a position in front of the middle
of the list. On the random lists, the mean ordinal position of the chosen
phrase was 10.84, 5/6 of a position after the midpoint.
Though the overall effect of ordinal position is small, its magnitude is
larger in the random lists than the ordered lists. For the ordered lists,
across all four problems, 8 expressions were selected more when they
appeared in the first half of the list, 8 were selected more when they
appeared in the second half, 2 were selected equally (or not at all), and 1
expression appeared in the middle position. With random lists, however,
subjects chose more often from the second half of the list: 11 phrases were
chosen more often when in the second half of the list, only 3 were chosen
more often when in the first half, and 4 were chosen equally. Analysis of
the probability of choosing a phrase when it is in the first compared to the
second half of the list (see Hamm, 1988b) showed that a phrase from the
first half of an ordered list would be used 5.33% of the time (averaged
across problems), compared with 5.20% for a phrase from the second half
of the list. In the random lists a phrase from the first half of the list would
be used 4.84% of the time, while a phrase from the second half would be
used 5.69% of the time. While these small effects are not significant in the
individual problems, when averaged across problems, subjects faced with
randomly ordered lists are significantly more likely to select phrases in
the second half of the list (t = -2.86, p < .02).
Dij%ulty fmding the desired phrase in ordered and random lists. A
possible advantage of ordered over random lists is that subjects may be
able to find the verbal probability expression they want more easily. First,
subjects who know the phrase they want may be able to verify its presence more quickly in a list that is structured in an ascending or descending
order. Second, subjects who know the probability they want to express
(either as a number, a range of numbers, a phrase not in the list, or an idea
that is not modally specific) may be able to find an expression for this
probability in the list more quickly, by evaluating list expressions and
comparing them to the target probability, when those expressions are
ordered. In addition, with a random list the meanings of the phrases
subjects select as answers using this process may be more variable. Third,
people may not know the degree of probability they want to express until
they have considered candidate phrases. It may be easier for them to
check whether phrases apply to the situation when using an ordered list,
in which phrases’ meanings can be quickly understood because they are
implied by the meanings of their neighbors in the list. If so, answers
should be both quicker and less variable.
Subjects who used the random list took only 10 s longer on average (in
SELECTION
OF VERBAL
201
PROBABILITIES
an 18-minprocedure) than subjects with the ordered list (NS). To measure
variability in the meanings of the phrases subjects selected, the a priori
values are used (Column 4 of Table 1.) (Using values the subjects assigned
would confound list influence on variation in phrase selection with list
influence on variation in phrase interpretation). Over all four answers on
each of the four problems, the phrases subjects chose from the random
lists had a priori numerical interpretations with higher average standard
deviations (mean SD = .195)than those they chose from the ordered lists
(mean SD = .181). However, while this relation was found for 11 of the
16 subproblems, the difference is not statistically significant (x2 = 1.25).
In sum, answers with the random lists were slower and more variable than
answers with the ordered lists, but not significantly so.
E#ect of list order on values assigned to the verbal expressions of
probability. The second procedure used in the proposed method is the
assignment of numerical values to the verbal expressions of probability.
Its purpose is to increase the accuracy of the proposed method by allowing adjustments for individual differences in the interpretation of phrases.
Table 2 shows the mean value subjects assignedto the phrases when they
TABLE 2
MEAN NUMERICAL VALUES ASSIGNED TO EACH PHRASE, EACH LIST ORDER
Ordered lists
N:
Absolutely impossible
Rarely
Very unlikely
Seldom
Not very probable
Fairly unlikely
Somewhat unlikely
Uncertain
Slightly less than half
the time
Toss-up
Slightly more than half
the time
Better than even
Rather likely
Good chance
Quite likely
Very probable
Highly probable
Almost certain
Absolutely certain
Random lists
76
Mean
SD
64
Mean
.011
.094
.132
.176
.230
.282
.333
A06
366
.069
.047
.053
.071
.056
Sk54
.064
.144
.186
.216
.198
.261
.302
.402
A47
.495
.039
.554
.603
.676
.727
.780
.836
.882
.931
l.ooo
.053
.066
.075
.076
.073
.073
.070
366
304
.041
Total
140
Mean
SD
.180
.214
.271
.318
A04
.074
.078
.132
.089
.090
.083
.092
.122
.432
.501
.051
al2
.439
.498
&I6
.021
.566
.618
.679
.719
.758
.834
.878
.8%
997
.068
.082
.166
.135
.560
.611
.677
.I23
.769
.835
.880
,913
.998
.061
.019
SD
.083
.088
.218
.124
.llO
.109
.120
.lOl
.095
.073
.108
.012
.074
.121
.106
.087
.084
.071
.087
.008
202
ROBERT
M.
HAMM
were presented in ordered and random lists. (These values may be compared with those in Table 1.) For the values elicited in this procedure to
be unbiased, they should not be affected by the order in which phrases are
presented. Reversal of these lists had little effect on the mean values:
none of the verbal probabilities were assigned numerical values that were
significantly different in the ordered and random lists.
Subjects were less variable in assigning numerical meanings to phrases
in the ordered lists than in the random lists. For 17 of the 19 phrases (all
save “toss-up” and “high probability”), there was higher variability in
the numerical values assigned to the phrases when presented in the randomly ordered lists (x2 = 7.5, df = 1, p < .005, one-tailed). The mean
standard deviations of the values assigned to the phrases in the four lists
were: Ascending order (.071); Descending order (.047); Random order A
(.084); and Random order B (. 118). The random list standard deviation (M
= .lOl) was significantly higher than the ordered list standard deviation
(A4 = .054; r = 3.92, df = 18, p < .Ol).
Effect of list structure on amount of duplication in assigned values. If
in the value assignment procedure of the proposed method subjects assign
the same numerical value to more than one phrase, this would degrade the
precision of the method. People can be expected to do this more often
with random than with ordered lists. The extent of such duplication can
be measured by counting the number of pairs of phrases to which a
subject assigns the same value. For example, if “almost certain” and
“highly probable” are both assigned the value 90, that is one duplicate
pair. If in addition “quite likely” were to be called 90, this would produce three pairs. If someone assigned the same value to all 19 phrases,
there would be (19*18)/2 = 171 duplicate pairs. The mean numbers of
pairs of phrases that were assigned duplicate values were Ascending list
(1.73); Descending list (1.03); Random list A (4.36); Random list B (4.84).
The number of duplications is very small in comparison with the maximum possible count of 171. Significantly more duplicate values were
assigned to phrases in the random lists than in the ordered lists, as predicted (F(3,140) = 6.13, p = .0006).
Effect of list structure on similarity of subjects’ assigned values to
values found in previous studies. Different list orders can be evaluated for
use with this method by comparing the interpretations subjects give to the
phrases in these lists with the interpretations given them in earlier studies
(summarized in the right hand column of Table 1). The closer the interpretations with this method are to the common or conventional meanings,
the less the total possible misunderstanding. Subtracting the prior values
from the subjects’ values provides a measure of the difference. Table 3
shows the mean difference scores for the ordered and random phrase
lists, as well as their variability (standard deviations). In both lists the
SELECTION
OF VERBAL
TABLE
203
PROBABILITIES
3
MEANS AND STANDARDDEVIATIONSOF DIFFERENCESBETWEENSUBJECTS
INIWW~ETATIONSAND A PIUORI INTERPRETATIONS OF VERBAL EXPRESSIONS IN
ORDERED AND RANDOM LISTS
Absolutely impossible
Rarely
Very unlikely
Seldom
Not very probable
Fairly unlikely
Somewhat unlikely
Uncertain
Slightly less than
half the time
Toss-up
Slightly more than
half the time
Better than even
Rather likely
Good chance
Quite likely
Very probable
Highly probable
Almost certain
Absolutely certain
SD
List with
greater
deviation score
.114
.089
.221
.124
r
r**
r**
r**
Ordered lists
Random lists
Mean
Mean
SD
.018
.094
.013
.103
a47
.031
,026
,031
,032
.cuM
a05
.081
a49
.054
.080
.063
.067
.067
.085
.065
- .002
.llO
.OlO .108
- .028 .124
.002 .179
0**
- .003
- .005
.042
.039
- .018
.mo
.054
.003
r*
.002
.002
.026
.025
.022
.017
.021
.020
.053
a66
.076
.077
.076
.077
.074
.068
306
.016
.018
.020
.032
.043
.016
.022
.053
.003
.072
.082
r
r
.180
.148
.108
.lOl
0
.073
.117
r
r+*
r
-
- .ool
-
.Oll
Distance of
phrase from
nearest anchor
0
r*
0
0
r
r
0
Note. N = 76 for Ordered lists and N = 64 for Random lists.
*p < .lO; **p < .05.
differences tended to be positive in the fast half of the list and negative in
the second half, that is, subjects’ interpretations of these verbal expressions are closer to the middle of the probability scale (5) in the present
conditions than in previous studies (primarily Lichtenstein & Newman,
1967).For 13 of the 19 phrases, the absolute value of the mean deviation
was larger when the lists were presented randomly. Four of these comparisons (for “rarely,” “very unlikely,” “seldom,” and “almost
certain”) were significant (in one-way ANOVAs) at p < .05, and two
more (“somewhat unlikely” and “slightly less than half the time”) at p <
.lO. For only one of the phrases (“not very probable”) were the values
assigned using the ordered list significantly more different from the a
priori values than the values assigned using the random list.
In sum, subjects assign to randomly arranged verbal expressions of
probability numerical values that are more variable and farther from the
204
ROBERT
M.
HAMM
conventional meanings of these terms than they assign to expressions in
ordered lists. If this reflects their understanding of the meanings of the
expressions when they are using them to answer the questions, then it is
preferable to use the ordered lists.
Effect of distance of a phrase from a reference point on variability of
value assignment. Three of the verbal expressions of probability used
here have quite specific meanings: “absolutely certain” (1 .O), “toss-up”
(JO), and “absolutely impossible” (0). It is possible that subjects use
these phrases as reference points when assigning values to other phrases.
If so, less variability would be expected in the values assigned to phrases
that are near to these landmarks, than in the values assigned to more
distant phrases. The distance of a phrase from a landmark will be more
salient in an ordered list than in a random list. Therefore, the effect of
distance from a reference phrase on the variability of the values assigned
to other phrases would be smaller in random lists. However, people already know the meanings of these phrases, and so even in the random
lists a phrase whose meaning is near a reference point may have a narrow
range of interpretations.
In the list of 19 phrases, the distance of a phrase from the nearest
reference point is simply measured by counting the number of steps in the
list to the nearest reference point (see Column 6 of Table 3). The hypothesis predicts that this measure will be positively correlated (over the 16
nonlandmark phrases) with the standard deviation of the values the subjects assigned to the phrase, and that this correlation will be larger for the
ordered lists than for the random lists. The correlations were Ascending
list (r = .41, p < .lO); Descending list (r = .30, NS); Ordered lists overall
(r = .44, p < .05), Random list A (r = .32, NS); Random list B (r = .14,
NS); and Random lists overall (r = .25, NS). As predicted, all correlations were positive and the correlations were higher for ordered than
random lists.
Effect of phrase list order on accuracy of problem answers. A third
criterion for evaluating the proposed method of expressing degree of belief by selecting verbal expressions of probability is the accuracy of its
use. In the present study, this accuracy is a joint product of (a) the phrase
the subject selects, (b) the meaning assigned to the phrase, and (c) the
right answer to the problem. Is the accuracy of subjects’ responses affected by the order in which the phrases are presented, or by the use of
subjects’ own values rather than conventional values for the phrases?
Effects of list structure on accuracy of problem answers. Accuracy is
measured by translating all pertinent phrases (those the experimenter
included in the word problem, and those the subject selected as responses) into numbers, and comparing the response number with the
correct answer (produced by applying Bayes’ Theorem to the numbers in
SELECTION
OF VERBAL
205
PROBABILITIES
the word problem; see Hamm, 1988a).Translation from phrases to numbers can be done in two ways: using the conventional values available a
priori (Table 1) or the values each individual subject assigned to the
phrases. Accuracy using both translations will be studied here, to separate those effects of list order on accuracy which are due to selection from
those due to value assignment. If phrase list order affects accuracy using
the a priori translations, this can only be due to its effects on selection of
a phrase as a response. If list order affects accuracy using the subjects’
individual translations, but not using the a priori translations, this must be
due to its effects on subjects’ assignment of values to phrases.
Results are presented separately for subjects for whom the word problems were presented with verbal and numerical expressions of probability
(Table 4). If the probabilities were presented verbally, the numerical values of the right answers depend on an assignmentof numerical values to
TABLE 4
COMPARISON OF WORD PROBLEM ACCURACY (ABSOLUTE DEVIATIONS) BETWEEN
SUBJECTSWITH ORDEREDAND RANDOM PHRASELISTS, FORSUBJECTSWITH
NUMEFWAL AND VERNAL PRESENTATIONOF PROBABILITIES
Presentationmode:
Responsemode:
Problem
Cabs
2
3
Doctors
1
3
Insurance
1
3
Twins
1
2
3
Verbal
Verbal
Mean deviation
Amount
of info
1
Numerical
Verbal
Ordered Random
N
N
N
N
N
N
N
N
N
N
No. of problems(out of 10)
where orderedlist is more
accurate
* p < .os; **p < .Ol.
.13
(9)
34
(9)
.35
(18)
.24
Q-9
.l3
(8)
.33
(16)
.06
(18)
.63
(9)
.23
(18)
.45
(9)
.08
(9)
.09
(9)
.25
(18)
5
F
2.4
19.0
.4
5.2
1.3
Mean deviation
Significance Ordered Random F
.14
.oo**
.51
.03*
.27
.74
.25
.I
A4
.O
39
.l3
(7)
.09
1.2
.23
(16)
(8)
(8)
.22
(16)
.O
.2
.91
.64
.13
(10)
.09
(11)
.42
(21)
.17
(21)
.62
(10)
.I5
(21)
.23
(11)
.22
(11)
.12
(10)
.30
(21)
3
Significance
.21
2.1
.17
36
1.2
.30
.9
.35
1.3
.26
.47
2.6
.I3
.16
.O
.87
.47
7.7
.01*
.12
1.0
34
.O
97
.5
.49
(8)
(8)
.38
(16)
.I1
06)
(8)
(16)
(8)
(8)
.11
(8)
.26
(15)
206
ROBERT
M.
HAMM
one or more phrases. Because the subjects responded verbally, the numerical values of their amwer.s also depend on the assignment of numerical values to phrases. Answers to the “no information” subproblems are
not analyzed here, because there was little variation in response. There
was no single correct answer for the Doctor and Insurance problems
when only two pieces of information had been presented (see Hamm,
1987; 1988a), and so these subproblems too are excluded from the analysis.
Table 4 shows the accuracy scores (absolute errors) for subjects using
the verbal response mode, computed using the a priori translations from
phrases to numbers. Only 3 of 20 comparisons between the ordered and
random lists were statistically significant. In all three, ordered lists produced more accurate responses. When the deviation score, rather than
the absolute deviation score, was used, the results were similar, which
shows that the advantage of ordered lists is not simply their smaller variability. In conclusion, there is evidence that the order in which phrases
are presented can influence the accuracy of the subjects’ performance on
probabilistic inference word problems. There is a possible limitation on
the generality of this finding: four of the 10 problems (those with 3 pieces
of information in Table 4) are Bayes’ Theorem problems, on which people’s answers are notoriously inaccurate.
A parallel analysis, using subjects’ individually assigned values to
translate the meaning of the phrases and calculate accuracy, also found
the ordered lists produced more accurate responses. Using the argument
presented above, this implies that the advantage of the ordered lists depends on mechanisms influencing both the phrase selection and phrase
interpretation processes.
Effects of use of subject’s own assigned values versus a priori values on
accuracy of response. A motivation for the proposed method is to enable
subjects to express their degrees of belief in a way that is more natural for
them than using numerical probabilities. It might seem that asking the
subjects afterwards for their numerical interpretations of the phrases defeats this purpose. However, the virtue of the method is its isolation of the
numerical thinking, for it allows subjects to use only the linguistic mode
when thinking about the problems. Translating the phrases into numbers
is done separately and does not interfere with the problem solving, which
is of primary interest. Nonetheless, the assignment of numbers to phrases
places a burden on the subjects, and so it is worth considering whether it
is possible to do without this procedure by using a priori numerical interpretations of the phrases. What effect does the use of the subjects’ own
translations of the phrases have on word problem accuracy?
Table 5 shows the mean accuracy score (absolute error) on each problem using both the conventional numerical values (available a priori) and
207
SELECTION OF VERBAL PROBABILITIES
TABLE 5
COMPARISON OF MEAN ACCIJRAC~E~
(ABSOLUTE DEVIATIONS) FOR EACH SUBPROBLEM,
USING CONVENTIONAL VALUES AND SUEXJECTS’INDIVIDUAL VALUES
of info
Conventional
values
Own
values
t
Significance
N
1
2
3
1
3
1
3
1
2
3
.19
.lO
.38
.17
.61
.20
.39
.17
.ll
.28
.19
.13
.36
.17
.55
.19
40
.17
.12
.24
.09
-2.67
1.65
.18
4.06
.69
- .92
.79
.98
2.93
.925
.011*
.lOl
.857
.ooo**
.492
.361
.434
.333
.004**
53
53
105
105
53
107
53
52
52
104
Amount
Problem
Cabs
Doctors
Insurance
Twins
* p < .05; **p < .Ol.
the subjects’ own numerical values for the phrases. The comparison includes subjects whose presentation mode and response mode were numerical/verbal, verbal/numerical, or verbal/verbal. The answers using the
subjects’ own values were more accurate on 8 of the 10 subproblems
[using both absolute deviation scores (Table 5) and simple deviation
scores], and significantly so for two subproblems. However, for one subproblem there is significantly higher accuracy using the a priori values.
Therefore when accuracy is very important, it is probably preferable to
use subjects’ individual interpretations of the phrases, rather than relying
on a universal a priori interpretation. However, the evidence is mixed,
and the difference in even the significant comparisons is small. In conditions where it is difficult to get subjects to assign values to the phrases, a
priori interpretations could probably be used with only a small loss of
accuracy.
Discussion of Study 1
Study 1 showed that several of the mechanisms anticipated to cause
problems for the proposed method have only minor effects. The first
possibility was that the subject’s use of a given verbal expression could
differ when it appears in different neighborhoods of the list. The second
phase of the method is designed to correct for any such effect by having
the subject assign values to phrases in the same list that they selected
from in the first phase. However, possible variants of the method involve
subjects assigning values to a short list (only those phrases they actually
used) or skipping the second phase altogether. Because these shortcuts
would sacrifice the ability to correct for the effects of immediate neigh-
208
ROBERT
M.
HAMM
bors, they should be used only if these effects are minor. The use of
random versus ordered lists tested these effects.
When subjects selected phrases from random lists (in the first phase),
they took a little longer and selected phrases whose numerical meanings
had slightly higher variance; but these findings were not statistically significant. Similarly, there was no significant difference between the ordered and random lists in the values assigned to any individual phrase (in
the second phase). The largest effect occurred for “very unlikely,” to
which subjects assigned an average value of .132 when it appeared between “rarely” and “seldom” in the ordered list, but .186 when it was at
one end of the random list next to “highly probable.”
These negative results are offset by several differences that suggest a
sequentially ordered list is preferable to a random list. With random lists,
there was significantly more variability between subjects in the values
assigned to phrases, and subjects more frequently assigned the same number to more than one phrase. With ordered lists, the subjects selected
phrases that made their answers to the probabilistic inference word problems more accurate. (This is true when analyzed using both their own and
the conventional assignments of numbers to the phrases.) Finally, subjects gave less conventional values to phrases which were displayed in
random order. Similar effects probably occur when people interpret the
verbal expressions prior to selecting a phrase to answer a word problem.
One possible explanation for the more consistent performance with the
ordered list is that subjects produce values for the terms by referring to
two types of context, and these contexts are consistent in the ordered
lists. The first, local context is the phrase’s immediate neighbors. The
second, global context is established by reference points on the probability scale-“absolutely certain” for 1.O, “toss-up” for .S, and “absolutely
impossible” for 0 (see Reagan, Mosteller, & Youtz, 1989). The evidence
for the use of the global context is that the values given to phrases near
these reference points were less variable than the values given to phrases
farther away. The correlation between phrase value variance and distance
from a reference point was statistically significant in the ordered lists, but
was not significant (though positive) in the random lists. Thus the sequential arrangement of the list seems to facilitate the use of the reference
point strategy in assigning values to phrases. Subjects use both the local
and global contexts in interpreting the phrases. In the random lists the
global context is difficult to discern, and often conflicts with the local
context. In the ordered lists global context is visually obvious and the
local and global contexts are coherent. This may explain why values
closer to those found in previous studies were assigned to phrases in the
ordered lists, and is another reason to prefer ordered lists.
A second potential problem investigated in Study 1 was the effect of a
SELECTION
OF VERBAL
PROBABILITIES
209
phrase appearing early or late in the list. If such position effects were
strong, extra controls would be required, such as reversal of list orders in
a counterbalanced design. Although ordinal position of the verbal expressions had little effect on the assignment of values to the expressions,
reversal of lists did intluence choice of phrases from the random phrase
list. The effect was small, but statistically significant when data were
aggregated across problems. The subjects chose a phrase more often
when it appeared in the second half than the first half of the random list.
There was no effect in the ordered list. However, the effect may be larger
when subjects use a list just once. In this study they used the list 16 times,
and their familiarity with the list may have lessened the position effect.
Systematic list reversal does not seem necessary for ordered lists. If
random lists are used, counterbalanced reversal would be needed to prevent bias due to people’s tendency to select phrases from late in the list.
The third issue addressed by Study 1 was whether it is necessary to
have subjects individually assign numerical interpretations to the verbal
probability expressions. Although this calibration compensates for individual differences in the interpretation of the phrases, it is a burden for
subject and researcher. Would it be sufficient to use the conventional
interpretations, those measured in previous studies (or indeed in this one)
and available a priori, and thus skip the individual calibrations in the
second phase?
Use of the subjects’ own values to interpret the phrases they selected
produced answers to the problems which were usually (but not always)
more accurate than use of the conventional values. When the numbers
subjects assigned to the verbal expressions were compared with the values found in previous studies, greater deviations were observed in the
random lists than the ordered lists. Therefore, a researcher who assumes
subjects’ interpretations of the phrases are close enough to the interpretations from other studies that he or she can dispense with the individual
calibrations would be risking less error by presenting the phrases in ordered rather than random lists.
STUDY 2: MEASURING THE BREADTH OF MEANING OF VERBAL
PROBABILITY EXPRESSIONS
Verbal expressions of probability differ in vagueness, that is, in the
range of numerical probabilities to which they refer. Some phrases, such
as “absolutely certain” and “toss-up,” refer to narrow ranges of probabilities (see Kong ef al., 1986), while other phrases, particularly those
with meanings around 25 or 75%, refer to broader ranges. The tendency
to select a phrase with a broad “membership function” (Wallsten et al.,
1986a) may be strongly affected by its ordinal position in a list. Specifically, if people scan a list looking for the first phrase whose meaning
210
ROBERT
M.
HAMM
covers the probability they have in mind, they will be more likely to select
a phrase with a broad range of meaning, particularly if the list is random
so that broad phrases covering all parts of the probability scale appear
early in the list. Additionally, the values people assign to vague phrases
may be more influenced by the phrases’ immediate neighbors in the list
than the values assigned to phrases with precise meanings.
A second study was carried out to measure the breadth of the membership functions of the 19 verbal expressions of probability used in Study
1. This makes it possible to determine whether differences in breadth of
meaning have any effect on the two behaviors involved in the proposed
method: selection of verbal expressions from a list and assignment of
numerical values to verbal expressions.
Method
Sixty-five subjects, primarily from the Introductory Psychology subject
pool, filled out a questionnaire (Appendix 2) which asked them to state
the lower and upper bounds of the numerical probabilities that each
phrase refers to (similar procedures were used, for different phrases, by
Reagan et al., 1989). Half of the subjects named the lower bound for each
phrase before the upper bound, and half did the reverse. Crossed with this
factor, half of the subjects named the phrases in Random order A (see
Footnote 1, above), and half used its reverse, Random order B.
Results
The mean lower and upper limits, across all conditions, are presented
in Columns 1 and 2 of Table 6. The midpoint between these bounds is an
estimate of the meaning the individual assigns to the verbal expression of
probability. The mean and median midpoints of these ranges and their
standard deviations (Columns 3-5) are similar to the values in Table 1.
The difference between a phrase’s upper and lower bounds is a measure
of the range of meaning the individual assigns to the phrase, and can be
used as an estimate of the breadth of the phrase’s membership function.
The mean and median range for each phrase are presented in Columns 6
and 7 of Table 6. “Absolutely impossible” (.034), “toss-up” (.042), and
“absolutely certain” (.052) have the narrowest ranges, and “uncertain”
(.240) and “better than even” (.163) the widest (Column 6). The median
range measure does not discriminate well among the phrases, for its value
for many of the phrases was .lO. The 8th column shows the standard
deviations of the differences between the upper and lower bounds, which
reveal that there is an exceptionally high variation across subjects (SD =
.336) in the range of meaning attributed to “uncertain.”
For comparison with previous work, the differences between the median upper and lower bounds for the three phrases studied by Wallsten et
SELECTION
OF VERBAL
211
PROBABILITIES
TABLE 6
THE MEAN LOWER LIMIT, UPPER LIMIT, AND MIDPOINT BETWEEN THE LIMITS, FOR
EACH PHRASE
Phrase
Absolutely impossible
R=lY
Very unlikely
Seldom
Not very probable
Fairly unlikely
Somewhat unlikely
Uncertain
Slightly less than
half the time
Toss-up
Sliitly more than
halfthetime
Better than even
Rather likely
Good chance
Quite likely
Very probable
Highly probable
Almost certain
Absolutely certain
Lower
limit
Upper
limit
.w7
.064
.046
.117
,113
.176
.217
294
.041
.183
,145
.243
,235
.287
349
.534
.390
.484
.524
,540
.585
,652
.686
,733
,757
340
928
Midpoint
Mean
Median
Ranee
SD of
SD
Mean
Median
.071
.126
.116
.lll
.129
.153
.034
.106
.lOO
.126
,122
.ll2
.132
.240
.oo
.10
.10
.lO
.I0
.lO
.lO
.lO
.I33
.079
.061
:E
.174
.231
,283
.414
.lOO
.075
.150
.150
.225
.250
so0
.470
.526
.430
SOS
A40
.500
.050
.045
.080
.042
.07
a0
.048
.087
A04
,703
.735
.799
.824
.872
.899
.950
980
.564
.621
.660
,726
.755
.803
.828
.895
.954
.555
.600
.700
.750
.065
.088
.222
.137
.146
.122
.127
.088
.lOS
.080
.163
.I50
.147
.137
.139
.142
.109
.052
a9
.lO
.I5
.I5
.lO
.I0
.lO
.lO
.oo
.053
.123
.087
.077
.087
.093
a34
.090
.142
.024
.117
.ooo
:E
.850
.925
l.CK!fl
:E
.061
.M5
.336
Note. N = 65.
al. (1986a) that were used in the present study were “toss-up”: .13;
“good chance”: .46; and “almost certain”: .l 1. These ranges are generally larger than the individual ranges measured in our study (Columns 6
and 7 of Table 6), which may reflect different elicitation procedures.
The standard deviation of the values assignedto a verbal expression of
probability is an alternative measureof the breadth of the phrase’s membership function, although it is confounded with individual differences in
the meaning of the phrase. Column 6 of Table 2 shows the mean of the
standard deviations of the values given to phrases when presented in the
four list orders in Study 1. Column 8 of Table 6 shows the standard
deviation of the midpoints of the ranges, from Study 2. The intercorrelations among these five measures of breadth of membership function are
all fairly high, ranging from .55 to .86 (Table 7). Therefore when a direct
measure of the breadth of membership function is lacking, the between
subject standard deviation might serve as a useful proxy.
E#ect of list reversal on the selection of phrases with broad and narrow
membership functions. The question whether the subjects in Study 1
preferred to use phrases with broader meanings is addressed in Table 8,
which shows the correlations between the indices of breadth of membership function (derived from data from Studies 1 and 2) and measures of
212
ROBERT M. HAMM
TABLE 7
INTERCORRELATIONS AMONG FIVE MEASURESOF BREADTH OF MEMBERSHIPFUNCTION
Median range
U - L difference
SD of value
SD of midpointd
Mean range”
Median range”
U - L Differenceb
SD of value’
.73**
.74
.67**
.65**
.72
.7Ci**
.59**
.63
.86
.55**
Note. N = 19 for every correlation except those involving the U - L Difference index,
for which N = 3.
o Difference between upper and lower bounds, Study 2, N = 65.
b Difference between median upper bound and median lower bound, from Fig. 4 of Wallsten, Budescu, Rapoport, Zwick, and Forsyth (1986).
’ Mean of standard deviations of values assigned to the phrases, from four lists with
different phrase orders, Study 1, N = 138.
d Standard deviation of midpoint (average of upper and lower bounds), Study 2.
** p < .Ol.
the number of subjects who used each phrase for each problem in Study
1, separately for the ordered and random lists. The relations are generally
positive, which suggestspeople prefer to use phrases with broad, even
vague, meanings. There was no general tendency for broad phrases to be
selected more in random lists than ordered lists, as had been predicted.
Effect of breadth of phrase meaning and list structure on variability of
value assignment. It was expected that subjects would assign more vari-
able numerical values to verbal probability expressions that have broader
meanings,and that this would occur particularly when the phrase lists are
randomly ordered and the context supplies fewer constraints on the
meaning of each term. The correlations between the four measures of
breadth of membership function of the 19 phrases (from Study 2; see
Table 7) and the standard deviations of the values assigned to those
phrases when presented in ordered and random lists (from Study 1; see
TABLE 8
CORRELATIONS BETWEEN INDICES OF BREADTH OF PHRASES’ MEMBERSHIP FUNCTIONS
AND THE NUMBER OF SUBJECTS WHO SELIXTED EACH PHRASE (SUMMED OVER FOUR
PROBLEMS), SEPARATELY FOR ORDEREDAND RANDOM PHIUSE LISTS
Measures of breadth of membership function”
List structure
Ordered
Random
Total
Mean
raw
Median
range
SD of
values
SD of
midpoints
.35*
.29
.35*
.43**
.38*
.43**
.22
.22
.24
.41**
.29
.38*
a Variables are defined in notes to Table 7.
*p < .lO; **p < .05.
SELECTION
OF VERBAL
PROBABILITIES
213
Table 2) were all significantly positive (I, C .05). However, this relationship was not substantially stronger for the random lists than the ordered
lists.
Discussion of Study 2
Subjects in Study 1 preferred phrases which Study 2 showed to have
relatively broad meanings, such as “somewhat unlikely” or “good
chance.” This preference, however, may be due to the particular word
problems used in the study. The answers people chose for these problems
tended to be in the .60 to 90 range (see Hamm, 1988a). The verbal
expressions covering this range (as well as the .I0 to the .40 range) have
broader ranges than the phrases covering other ranges. Therefore subjects’ tendency to select phrases with relatively broad meanings on these
problems may be an artifact due to the particular problems used. Further
study is needed to clarify this issue.
The expected interaction between breadth of phrase meaning and list
structure was not found: Subjects were not more likely to select phrases
with vague meanings when they appeared in random lists as opposed to
ordered lists.
Phrasesthat had broad meaningsas indicated by the difference between
the lower and upper bounds that individuals place on their meanings also
tended to have meanings that varied more between individuals. Wallsten
et al. (1986b) have shown that these individual differences in preferred
point-meanings for verbal expressions tend to be stable. The fact that
people put broader bounds on those expressionsthat others may interpret
differently, even though their own interpretation is stable, suggeststhat
people know these phrases have greater interpersonal variability of meaning.
GENERAL DISCUSSION
The proposed method of expressing probability verbally by selecting
words or phrases from a list has the advantagesdetailed by Zwick (1987;
e.g., that people prefer to use verbal probabilities) without the disadvantages of unconstrained verbal expression. It limits the number of possible
verbal probabilities by offering a specific set of expressions. It calibrates
the interpretations of the verbal expressions by getting the subjects to
state the numerical meanings of the phrases. The studies reported here
show that the validity of the method is little affected by several anticipated factors.
A phrase’s immediate neighbors have a slight influence on its meaning.
The calibration provided by the second phase of the method controls for
this. Further, presenting the expression in sequential order exploits the
immediate neighbor effect: it makes the meanings subjects assign less
214
ROBERT
M.
HAMM
variable and closer to the conventional meanings of the phrases. Ordinal
position, as manipulated by reversing the list, affects subjects’ selection
of terms when the lists are randomly ordered but not when they are
sequentially ordered. Subjects have a slight preference for phrases that
are more vague, but this might be due to the particular problems studied.
Use of the ordered list enables subjects to select phrases that are more
accurate answers for the word problems. In addition, use of their own
interpretations rather than the common meanings usually makes their
answers more accurate.
Thus the proposed method seems feasible as a way of expressing probability verbally. As long as the phrases are presented in sequential order,
it should not be necessary to systematically reverse the order of presentation. The individual calibration in the second phase of the method
makes it more accurate, but in certain applications this may not be worth
the additional effort.
Further Context Effects
Several context effects not addressed in the present study should be
kept in mind by researchers considering using lists of verbal probabilities.
An expression’s meaning may depend on the object whose probability is
being discussed (Brun & Teigen, 1988, Study 3; Mapes, 1979; WaIlsten et
al., 1986b). Thus, “highly likely” may have a different numerical interpretation if applied to the possible failure of a Broadway play than if
applied to the possible meltdown of a nuclear reactor. To control for this
effect, the subject could be asked to assign numbers to phrases in a
specific context. The meaning may also depend on whether the verbal
expression of probability has an object or not. Thus, Brun and Teigen
(1988, Studies 1 and 3) found that people tend to assign higher numbers to
probability expressions when they appear in a sentence (e.g., “unemployment figures will probably rise sharply during the next three years”) than
when they stand alone. However, with medical contexts (Study 2), they
found the opposite: context made the interpretations lower.
Probability expressions may have different kinds of meaning when used
with different response modes. For example, Teigen (1988b) required
subjects to use “probable” or “not probable” to describe the chances of
each of several contestants in a music competition. When contestants
have about equal probabilities of winning, subjects use “probable” for
those contestants with greater than average chances. When they thus call
7 of 20 contestants “probable winners,”
they implicitly
define
“probable” as .I5 (at most), much lower than in other studies. But it is
not clear whether this would have happened had a broader range of verbal
expressions been available to choose from.
Another possible contextual effect is that the meaning of a phrase may
SELECTION
OF VERBAL
PROBABILITIES
215
depend on the other phrases available in the choice set (see Pepper, 1981).
For example, the meaning of “probable” may depend on whether “not
probable” is present in the list. A term and its negation may mutually
influence their meanings to be equidistant from the midpoint of 50%
(Reyna, 1981; but see Reagan et al., 1989). Two similar terms such as
“fairly unlikely” and “somewhat unlikely” may be assigned the same
broad meaning if only one of them is in a list, but may be assigned
adjacent but nonoverlapping meanings if both are present.
Perhaps the most pertinent context issue is the question whether a
subject interprets a verbal expression the samewhen reading it in a word
problem, when considering whether to select it from a list as the answer
to the word problem, and when assigning a numerical value to it. The
proposed method depends crucially on the assumption that these interpretations are invariant, as does the method of Kong et al. (1986); hence
it needs to be formally tested.
Does a Structured List Distort Meaning?
Study 1 indicated that the phrase selection method is at least as valid
with the sequentially ordered phrase list as with the random list. The
coherent local and global contexts seemsto make the conventional meanings of the verbal expressions more easily accessible. But the constraints
that an ordered list places on the subject’s interpretation of the phrases
may distort, rather than clarify, the meanings of the phrases, and thus
may prevent people from using the phrases as they normally would. Both
the center of meaning and the breadth of meaning of a phrase may be
distorted. Consider, for example, the meaning of “almost certain.” Kong
et al. (1986)found subjects assign it a mean value of .78 (median .90), but
the author used it to represent .95 in the present study. When subjects
assigned values to “almost certain” in the random phrase lists (where
there was nothing to indicate that the phrase meant .95), the mean value
was 90 (see Table 6). However, in ordered lists (where it appeared in the
18th or 2nd of 19 positions, between “highly probable” and “absolutely
certain”), its mean value was .93. Placing the phrase in the ordered list
changed its meaning in the direction the author intended.
Another example is “uncertain,” the verbal expression that the author
selectedto represent 40. The range of meaning people assignto this word
is both very wide (an average range of .24 between the lower and upper
bounds; see Table 6) and very variable (the standard deviation of the
range is .34; some subjects gave it a range of 0 and others a range of 100).
Although the mean value assigned to “uncertain” was .40 or .41 in both
the ordered and random lists (Table 2), which agrees with the a priori
value, these values were much more variable in the random list (SD =
216
ROBERT
M.
HAMM
.18) than the ordered list (SD = .06). Placing a phrase in an ordered list
can drastically change the breadth of its meaning.
Because of the exceptional variability of the meaning of “uncertain,”
an alternative phrase for the region near .40 should be substituted. A
candidate is “worse than even,” which Shanteau (1974) found to have a
mean value of .38 using two different procedures, and which is symmetric
with “better than even” whose assigned meaning is 60.
Although it is possible to find replacement phrases for particular inappropriate verbal expressions, still the ordered list will change some phrases’ meanings and breadths of meaning, for many individuals. This can be
viewed as a necessary cost of adopting a common set of interpretations of
verbal expressions of uncertainty. Kong et al. (1986) advocate improving
the use of verbal probabilities through codifying the meaning of probabilistic expressions. They suggest measuring what people usually mean by
phrases, publicizing this, and training people to use the terms with these
agreed-upon meanings. Such publicity and training would (a) reduce the
differences between people, (b) narrow the individual membership functions for each phrase, and (c) get people to use the phrases to mean the
same probability in different contexts. Establishing such a convention
would require changing people’s interpretations of many phrases. The
proposed method of selecting verbal probability expressions from a list
could be a tool in such a program. The changes that the use of an ordered
phrase list in this method would induce in the meaning of its phrases might
be viewed as costs worth incurring in order to improve communication
about uncertainty.
A list that displayed both the verbal expressions of probability and their
numerical interpretations (as in Fig. 1) could be useful in this context.
This scale would not require a separate calibration procedure. People
would be free to use the mode they found more fitting to the problem and
to their cognitive style. The two modes of expression would mutually
define each other, constraining people’s interpretations of each to be
closer to the common meanings. Finally, use of the graphic scale would
train people to associate the verbal and numerical expressions, promoting
the availability, acceptance, and use of the conventional interpretation.
Alternative lists of verbal expressions of very low or very high probabilities would be useful for making distinctions among degrees of near
impossibility or near certainty. These lists should be based on research
discovering the phrases people already use in contexts where these ranges
of probability are pertinent, such as medicine (cf. Meyer & Pauker, 1987)
or technological systems. A recent example highlights this need. To assess the overall risk of space shuttle failure, NASA engineers were asked
to make verbal assessmentsof the reliability of space shuttle components.
SELECTION
Verbal
OF VERBAL
Expressions
Absolutely
impossible
Numerical
-
.15
Not very probable
.io
Worse
Slightly
Slightly
less than
more
.lO
Seldom
Somewhat
than
unlikely
.25
unlikely
.x3
than
even
.a3
half the time
.A5
Toss-up
.m
half the time
55
Better
than
Rather
even
.%I
likely
.m
.75
Good chance
Quite
Very
Highly
Almost
Absolutely
FIG.
1.
likely
.m
probable
.85
probable
So
certain
certain
Expressions
05
unlikely
Fairly
217
-m
Rarely
Very
PROBABILITIES
.s5
-
-1m
Proposed graphic display of verbal and numerical expressions of probability.
These were then translated into numbers, using an arbitrary code
(“frequent” = .Ol; “reasonably probable” = .OOl; “occasional” =
.OOOl;and “remote” = .OOOOl)that had not been used by the engineers
in making their original assessments (Marshall, 1986). This contributed to
overconfidence in shuttle safety prior to the Challenger explosion. This
poor risk assessment practice has given subjective judgment a bad name
in the aerospace community: “the government is relying too much on
subjective judgment and too little on statistical analysis in deciding which
of thousands of safety problems on the space shuttle should get attention”
(Marshall, 1988, p. 1233). Codification of verbal expressions of probability would impose consistent interpretations on the phrases and allow
experts’ subjective judgment to make its potentially crucial contribution.
The flexibility inherent in the ability to select vague or precise probability expressions is a special attraction of verbal probabilities. Those who
value this feature of verbal probabilities may be more disturbed by the
potential distortion of verbal expressions’ breadths of meaning when
218
ROBERT M. HAMM
placed in a list than by the distortion of their central meanings. For example, presenting a set of phrases with evenly spaced meanings may force
people to interpret them as having equal breadths of meaning.
It is possible, however, to preserve considerations of breadth of meaning while establishing conventional meanings for verbal expressions of
probability. For example, Beyth-Marom (1982) proposed a framework for
codifying the breadth of meaning of verbal probability expressions. It
divides the probability scale into ranges .lO or .20 wide, and associates
each range with from 2 to 6 verbal expressions. For example, the terms
“small chance” and “doubtful” would refer to the .lO to .30 range. Although this reflects the fact that verbal expressions apply to ranges of
probability, it has disadvantages. It does not distinguish between probabilities within a range. It requires people to learn a number of arbitrary
sharp boundaries (e.g., at .lO and .30). If establishing a convention requires people to relearn the meanings of phrases, it seems more useful to
associate phrases with points and allow for fuzzy boundaries, than to
associate phrases with precise ranges.
An alternative way to address breadth of meaning in the context of the
method of selecting verbal probabilities from a list is to be explicit about
the breadth of each expression’s meaning, in addition to its center of
meaning. Initially, researchers taking this approach could require subjects
to specify not only a central numerical meaning (as in Study 1) but also a
range of meanings (as in Study 2) for each expression used for presenting
or answering the question. The researcher would then know the range of
possible meanings the subject might intend for the answer to the substantive question. Thus this approach recognizes and roughly measures one of
the primary functions of verbal probabilities, the control of vagueness of
expression.
After sufftcient study to establish conventional breadths of meaning, a
more convenient graphic list such as Fig. 2 could be used. This presents
the phrases’ conventional ranges as well as the conventional means of
Fig. 1. It is distinct from Beyth-Marom’s proposal in that each expression
has a center of meaning and its own boundaries. By eliciting or providing
interpretations of both the center and the breadth of each phrase’s meanings, it would be possible to assure that neither dimension of meaning is
distorted or miscommunicated in the verbal probability selection method.
Very
Unl,kely
I
,
TOSS1
8
I
I
0
.5
Rather likely
I
I
I
FIG. 2. Proposed graphic display of verbal and numerical probabilities, incorporating
breadths of meaning as well as centers of meaning.
SELECTION
OF VERBAL
PROBABILITIES
219
Two additional issues need exploration. First, not only verbal but also
numerical expressions can be said to have breadths of meaning. Paradoxically, the number SO refers to a wider range of probabilities than .60,
while the corresponding verbal expression “toss-up” has a narrower
meaning than “better than even.” We need to understand this anomaly
and its implications for the verbal probability selection method. Second,
in many contexts verbal expressions have floating meanings that are negotiated in conversations. Can this feature of communication about probability be integrated with the use of the specific methods proposed here,
or with any program for establishing conventional meanings for verbal
probabilities?
APPENDIX 1
The “Doctor”
Problem, One of Four Probabilistic
Problems Used in the Study
Inference Word
In alternative versions of the problem, the probabilistic information
was presented in either verbal or numerical form.
[0 piecesof key information.] The next word problem is about a doctor
trying to figure out what diseasea patient has. The patient is, undeniably,
ill, but it is difficult to know what disease he has. You will be asked to
estimate how likely it is that the patient has one of two diseases.
The patient comes in to the emergency room at night with a very unusual symptom-his eyes are bright yellow. The doctor knows that there
are only two diseasesthat can produce this particular symptom-hepatitis
and toxic uremia. People never contract both illnesses at the same time.
With what you know now, what is the probability that the patient has
toxic uremia?
[l piece of key information.] A discussion with a colleague reminds the
doctor that toxic uremia is a less common disease than hepatitis. He
checks a textbook and finds that [it is highly probablethat people] [90% of
people] who present to their doctors with the symptom of yellow eyes
have hepatitis, therefore, [it is very unlikely that they] [only 20% ofpeople
with this symptom] have toxic uremia.
With what you now know, what is the probability that the patient has
toxic uremia?
[2 piecesof key information.] The doctor orders the lab to do a Speck
test on the patient’s blood. In two hours the results are back-the Speck
test indicates that the patient has toxic uremia.
With what you know now, what is the probability that the patient has
toxic uremia?
[3 pieces of key information]. The doctor consults his diagnostic manual
220
ROBERT
M.
HAMM
and discovers that the Speck test is the best way to find out whether a
patient with yellow eyes has hepatitis or toxic uremia. However, the
Speck test is not foolproof. When the patient has toxic uremia, [it is rather
likely that the Speck test will indicate that the patient has this illness. It is
somewhat unlikely that the Speck test will indicate that the patient has
hepatitis] [the Speck test correctly indicates this 70% of the time, but 30%
of the time it falsely indicates that the patient has hepatitis]. Similarly,
when the patient actually has hepatitis, [it is somewhat unlikely that the
Speck test will indicate that the patient has toxic uremia] [the Speck test
will indicate that the disease is toxic uremia approximately 30% of the
time].
With what you know now, what is the probability that the patient has
toxic uremia?
APPENDIX 2
Instructions for Questionnaire Eliciting Lower and Upper Bounds
on the Numerical Meanings of Each Phrase
Two versions were prepared. One asked for upper bounds first, and the
other asked for lower bounds first.
People often use words or phrases such as “impossible” or “very
likely” to express a degree of uncertainty or certainty. We are interested
in the range of uncertainties for which you think it appropriate to use each
of a number of words or phrases.
Think of a cafeteria tray that has 100 ping pong balls on it. Some of
them are white and rest are yellow. You can see every one of them
clearly. You must convey to a friend how many of the balls are white.
You want to tell him how likely it is that a white ball would be picked if
they were thoroughly mixed up and someone were to draw one without
looking. However, you are not allowed to tell the person the actual proportion of white ping pong balls. Rather, you are forced to use a nonnumerical descriptive phrase.
We want to know the range of proportions of white ping pong balls, in
the tray described above, for which you would consider each term to be
appropriate. We will ask you to tell us this for each of 20 terms.
The first term is “about even.” What is the highest (lowest) proportion
of white balls (out of 100) for which you think~ld
be appropriate to
use the term “about even,” in trying to tell your friend the proportion of
white and yellow ping pong balls? Write that number here:
Now what is the lowest (highest) proportion of white balls for which
you think it would be appropriate to use the term “about even”?
SELECTION
OF VERBAL
PROBABILITIES
221
Look at your answers. You should have named two numbers somewhere between 0 and 100 (inclusive). The second number should have
been lower (higher) than (or equal to) the first. Any number in between
the two numbers would be a reasonable interpretation for your friend to
make when you tell him that the chance of drawing a white ping pong ball
is “about even.” Any number higher (lower) than your fust answer would
not be a reasonable interpretation of “about even”; nor would any number lower (higher) than your second answer be reasonable. If these statements are not all true, you may wish to go back and change one or both
of your answers.
Below is a list of words or phrases expressing degree of uncertainty.
Assume that you are using each phrase to describe the chance of drawing
a white ping pong ball from the tray of 100balls. For each phrase, please
express the upper and lower (lower and upper) numerical limits that you
would expect your friend to use in interpreting it.
Please focus on each word or phrase by itself, rather than trying to
compare it with your answers for other words or phrases.
Upper (Lower) limit
Uncertain
Rather likely
Somewhat unlikely
Rarely
Slightly less than
half the time
Good chance
Fairly unlikely
Absolutely impossible
Toss-up
Quite likely
Not very probable
Absolutely certain
Slightly more than
half the time
Very probable
Seldom
Almost certain
Better than even
Highly probable
Very unlikely
Lower (Upper) limit
222
ROBERT M. HAMM
REFERENCES
Beyth-Marom, R. (1982). How probable is probable? A numerical translation of verbal
probability expressions. Journal of Forecasting, 1, 257-269.
Brun, W., & Teigen, K. H. (1988). Verbal probabilities: Ambiguous, context-dependent, or
both? Organizational Behavior and Human Decision Processes, 41, 39OXl4.
Budescu, D. V., & Wallsten, T. S. (1985). Consistency in interpretation of probabilistic
phrases. Organizational Behavior and Human Decision Processes, 36, 391-405.
Budescu, D. V., Weinberg, S., & Wallsten, T. S. (1988). Decisions based on numerically
and verbally expressed uncertainties. Journal of Experimental Psychology: Human
Perception and Performance, 14, 281-294.
Fox, J. (1986). Knowledge, decision making, and uncertainty. In W. A. Gale (Ed.), Artificial
intelligence and statistics (pp. 57-76). New York: Addison-Wesley.
Hamm, R. M. (1987). Diagnostic inference: People’s use of information in incomplete Bayesian word problems (Publication 87-11). Institute of Cognitive Science, University of
Colorado, Boulder.
Hamm, R. M. (1988a, November). Accuracy of probabilistic inference using verbal vs.
numerical probabilities. Psychonomics Society Meetings, Chicago.
Hamm, R. M. (1988b). Evaluation of a method of verbally expressing degree of belief by
selecting phrases from a list (Publication 88-6). Institute of Cognitive Science, University of Colorado, Boulder.
Kong, A., Bamett, G. O., Mosteller, F., & Youtz, C. (1986). How medical professionals
evaluate expressions of probability. New Engand Journal of Medicine, 315, 74&745.
Kuipers, B., Moskowitz, A. J., & Kassirer, J. P. (1988). Critical decisions under uncertainty: Representation and structure. Cognitive Science, 12, 177-210.
Lichtenstein, S., & Newman, J. R. (1%7). Empirical scaling of common verbal phrases
associated with numerical probabilities. Psychonomic Science, 9, 563-564.
Mapes, R. E. A. (1979). Verbal and numerical estimates of probability in therapeutic contexts. Social Science and Medicine A, 13, 277-282.
Marshall, E. (1986). Feynman issues his own shuttle report, attacking NASA’s risk estimates. Science, 232, 15%.
Marshall, E. (1988). Academy panel faults NASA’s safety analysis. Science, 239, 1233.
Meyer, K. B., L Pauker, S. G. (1987). Screening for HIV: Can we afford the false positive
rate? New England Journal of Medicine, 317, 238-241.
Nakao, M. A., & Axelrod, S. (1983). Numbers are better than words: Verbal specifications
of frequency have no place in medicine. American Journal of Medicine, 74, 1061-1065.
Pepper, S. (1981). Problems in the quantification of frequency expressions. In D. Fiske
(Ed.), New directions for methodology of social and behavioral science: Problems with
language imprecision (No. 9, pp. 25-41). San Francisco: Jossey-Bass.
Rapoport, A., Wallsten, T. S., & Cox, J. A. (1987). Direct and indirect scaling of membership functions of probability phrases. Mathematical Modelling, 9, 397-417.
Reagan, R. T., Mosteller, F., & Youtz, C. (1989). Quantitative meanings of verbal probability expressions. Journal of Applied Psychology, 74, 433-442.
Reyna, V. F. (1981). The language of possibility and probability: Effects of negation on
meaning. Memory and Cognition, 9, 642-650.
Shanteau, J. (1974). Component processes in risky decision making. Journal of Experimental Psychology, 103, 680-691.
Simpson, R. H. (1944). The specific meanings of certain terms indicating differing degrees
of frequency. Quarterly Journal of Speech, 30, 328-330.
Teigen, K. H. (1988a). The language of uncertainty. Acta Psychologica, 68, 27-38.
Teigen, K. H. (1988b). When are low-probability events judged to be “probable”? Effects
SELECTION OF VERBAL PROBABILITIES
223
of outcome set characteristics on verbal probability estimates. Actu Psychologica, 68,
157-174.
Wallsten, T. S., Budescu, D. V., & Erev, I. (1988). Understanding and using linguistic
uncertainties. Acta Psychologica, 68, 39-52.
Wallsten, T. S., Budescu, D. V., Rapoport, A., Zwick, R., & Forsyth, B. (1986a). Measuring the vague meanings of probability terms. Journal of Experimental Psychology:
General, 115, 348-365.
Wallsten, T. S., Fillenbaum, S., Bt Cox, J. A. (1986b). Base rate effects on the interpretations of probability and frequency expressions. Journal of Memory and Language, 25,
571487.
Woodward, B. (1987). Veil: The secret wars of the CIA, 1981-1987. New York: Pocket
Books.
Zimmer, A. C. (1983). Verbal vs. numerical processing of subjective probabilities. In R. W.
Scholz (Ed.), Decision making under uncertainty (pp. 159-182). North-Holland:
Elsevier Science Publishers.
Zwick, R. (1987). Combining stochastic uncertainty and linguistic inexactness: Theory and
experimental evaluation. Ph.D. dissertation, University of North Carolina, Chapel Hill.
RECEIVED: May 12, 1988