Sadder but Wiser Induction?

Sadder but Wiser Induction?
Situation-Personality Interaction Revealed by an Inductive Reasoning Model
Kayo Sakamoto ([email protected])
Masanori Nakagawa
([email protected])
Japan Society for the Promotion of Science;
Tokyo Institute of Technology, 2-21-1 O-okayama,
Meguro-ku, Tokyo, 152-8552 JAPAN
Tokyo Institute of Technology, 2-21-1 O-okayama,
Meguro-ku, Tokyo, 152-8552 JAPAN
Abstract
We have developed a computational model of inductive
reasoning that includes both positive and negative premises
(Sakamoto & Nakagawa, 2007; 2008). The model explains
argument strength ratings in terms of two kinds of
similarities; the similarity between positive premises and the
conclusion, and the similarity between negative premises and
the conclusion. In the model, the similarity functions for
positive premises and negatives are represented respectively
by two parameters that model the emphasis balance between
the two kinds of similarity on argument strength ratings.
Emphasis balance has been shown to reflect differences in
situational ratings for identical argument strengths (Sakamoto
& Nakagawa, 2007; 2008). The present study stresses the
further potential for the representation of emphasis balance in
our model to also account for the interactions between
situational differences and individual personality differences
in argument strength ratings. Parameters estimated from
individual argument strength data provide two insights into
the situation-personality relationship. Specifically, while
neurotic individuals (sadder) are not affected by the situation
(wiser induction), extrovert individuals tend to place greater
emphasize on negative premise similarities regardless of the
situation.
Keywords: inductive reasoning; computational model;
personality; situational reasoning; statistical analysis of
language corpus.
Introduction
This study is concerned with one kind of inductive
reasoning argument (e.g., Rips, 1975; Osherson, Smith,
Wilkie, Lopez, & Shafir,1990), such as:
Person A likes wine.
Person A likes champagne.
The strength (the likelihood of the conclusion below the line
given the premise above the line) of this type of argument
depends mainly on the entities in each sentence (e.g.,
“wine”, “champagne”), because these sentences share the
same basic predicate (e.g., “Person A likes ~.”) whose
members we do not know apart from the premise entity.
The arguments in this study also include the following type:
Person A likes wine.
Person A doesn’t like beer.
Person A likes champagne.
In the second premise, the predicate involves a negative
verbal form, and is called a negative premise in contrast to
the positive premise in the first premise. The strength of
this kind of argument is higher when the conclusion entity is
similar to the positive premise entity, but it is lower when
the conclusion entity is similar to the negative premise
entity. On the other hand, when the conclusion entity is
similar to both the positive and negative premise entities,
how high will the argument strength be?
In real-world situations, reasoning-based behavior that
involves such argument evaluations can entail some element
of situational context. For example, the situation of giving
somebody a present involves a kind of risk. Even if you
knew that the person in question likes wine but not beer,
could you ‘reasonably’ infer their reactions toward receiving
a bottle of champagne from you? In such a situation, your
argument ratings would probably differ depending on
whether the person is your close friend or your easily-upset
boss. Given this, the model that we have been developing
(Sakamoto & Nakagawa, 2007; 2008) reflects how human
ratings of argument strength are, by their very nature,
context-dependent.
At the same time, risky situations are recognized
differently by different individuals. For example, it is
natural to think that argument ratings above would also
differ depending on whether the rater has a conservative
personality or a progressive personality regardless of the
situation. This study attempts to examine the relationship
between rating situations and rater personality within
inductive reasoning by conducting model simulations.
An outline of the paper is as follows: Firstly, the model
we have been developing is described (in the “Model”
section).
Second, an experiment is introduced that
demonstrates situational differences for identical inductive
reasoning arguments (in the “Experiment” section). Then,
model parameters are estimated from the experimental data,
and simulation results are presented that indicate that
situational differences can be attributed to differences in the
balance of emphasis for positive premise and negative
premise similarities. Furthermore, based on individual
emphasis balances estimated from individual experimental
data, situation-personality variations are also indicated,
including an interaction (in the “Model Simulation” section).
Finally, an interpretation of the situation-personality
interaction is discussed.
Model
Here, we describe a kernel function model for inductive
reasoning that we have been developing (Sakamoto &
Nakagawa, 2007; 2008). Structurally, the model is a kind of
1807
linear regression model, 1 in which the dependent variable
(the model’s output) is the argument strength rating, and the
explanatory variables are two kinds of similarity function
values. One similarity function relates to the positive
premise-conclusion similarity and the other relates to the
negative premise-conclusion similarity. These similarities
are computed based on their distances in a semantic space
constructed from a statistical analysis of a corpus. The
model’s parameters (regression coefficients) can account for
situational differences in inductive reasoning.
Semantic Space Construction for the Model In order to
construct a semantic space, soft-clustering results for a
Japanese corpus are utilized. In this method, nouns are
clustered based on their feature strengths, and the clusterattribution probabilities of nouns are estimated from
predicate-argument frequency data assumed to reflect the
feature strengths of nouns. The structure of this method is
similar to popular methods within natural language
processing, such as Pereira’s method and PLSI (Pereira,
Tishby, & Lee, 1993; Hofmann, 1999). Details of the
method are available in Sakamoto and Nakagawa (2007).
From the analysis results, 600 cluster-attribution
probabilities P(Cluster|Noun) are estimated for 18,142
nouns. In this study, the latent cluster C is assumed to be a
semantic category that can be described in terms of a
typicality gradient (Rosch, 1973). The cluster-attribution
probability of a noun P(Cluster| Noun) is assumed to
represent an entity’s typicality with respect to a category.
When a certain category has a high conditional probability
given a particular noun, it is natural that the entity denoted
by the noun has the features indicated by the category.
Thus, by considering each C as a dimension, entities can be
represented in the semantic space constructed from the
corpus-analysis results.
Model Construction The model outputs an argument
c
strength, denoted as v( N ), which is the likelihood of a
c
conclusion including entity N , given positive premises
including entities N1+ ,…, N n++ and negative premises
−
−
including entity N1 ,…, N n − .
the following function:
( )
c
v( N ) is represented by
( )
v( N c ) = aSIM + N c + bSIM − N c , (3)
where
( )
+
SIM + N c = ∑i e − βdci
( )
n
n−
SIM − N c = ∑ j e
m
((
− β d cj−
) (
+
, (4)
, (5)
d ci+ = ∑ P C k | N c − P C k | N i+
))
2
, (6)
k
1
This structure is the same as Support Vector Machines (SVMs:
Vapnic, 1992) based on the kernel method.
m
((
) (
d cj− = ∑ P C k | N c − P C k | N −j
))
2
. (7)
k
d ci+ and d cj− are functions for squared word distances based
on the categorical feature (denoted as
Ck ). d ci+ represents
c
the distance between the conclusion entity N and the
positive premise entity N i+ , while
d cj− represents the
c
distance between the conclusion entity N and the negative
−
premise entity N j . Here, the number of categories, m, is
fixed to 20 (out of 600), on the assumption that only
characteristic categorical dimensions for the concerned
entities should be utilized. Each word distance function
c
constructs Gaussian kernel functions2, such as SIM+( N )
c
and SIM-( N ), when combined with nonlinear exponential
functions and the parameter β, to which 1 has been applied.
As a cognitive interpretation, the Gaussian kernel functions
can be regarded as nonlinear similarity functions.
SIM+( N
c
) represents the similarities between the
conclusion entity N
c
and the positive premise entities,
c
while SIM-( N ) denotes the similarities between N
c
and
the negative premise entities. Furthermore, a and b are
parameters for the similarity functions. In terms of their
cognitive interpretation, parameter a related to the positive
premise similarity function should have a positive value
(a>0) while parameter b related to the negative premise
similarity function should have a negative value (b<0).
Here, these parameters define the hyperplane on which the
c
c
argument strength of conclusion N , v( N ) = 0. Such a
hyperplane can be viewed as the border between the region
of positive premises and the region of negative premises.
Thus, the absolute rate |b/a| represents the emphasis balance
between the positive premise similarity and the negative
premise similarity within argument strength ratings. Figure
1 presents different balances in emphasis for the same
argument rating by the different hyperplanes:
Person A likes wine.
Person A doesn’t like beer.
Person A likes champagne.
Accordingly, when another conclusion comes close to the
curved line (hyperplane) around “Wine”, the argument
strength exceeds 0, while the argument strength becomes
less than 0 when the conclusion comes close to “Beer”. In
Panel (1) of Figure 1, when the region of positive premise is
too small to include the conclusion entity of “Champagne”,
2
When kernel functions are utilized in SVMs, nonlinear
classification problems can be solved in a simple linear model as
the kernel function maps input data onto a space that is capable of
linear classifications or regressions (Vapnik, 1992).
1808
Figure 2. Example of experiment in Over condition (translated into English).
it would be rated low. On the other hand, in Panel (2) of
Figure 1, “champagne” would be rated high because the
region positive premise is sufficiently large to include it.
Therefore, the balance of emphasis represented in our model
can explain situation differences in inductive reasoning.
Experiment
This section introduced the experiment conducted in
Sakamoto and Nakagawa (2008), which sought to
investigate whether argument strength is rated differently in
different situations.
Task The task was to rate inductive reasoning arguments on
a 7-point scale (from ‘strongly likely’ ~ ‘strongly unlikely’)
(see Figure 2). Unlike the usual inductive reasoning task,
each rating in the study was scored according to the
variation from a ‘concocted’ right answer. Participants were
told that their ability to guess the right rating answer would
be a reflection of their ability to learn word meanings in a
new language (e.g., a new word like bamisoya). Thus, for
the participants, there was a kind risk to receiving low
evaluations about their language ability. When a rating
corresponded to the right answer, it received a perfect score.
The concocted right answer for each argument was assigned
by referring to situation-free rating data without such
scoring. In the over-estimation risk (Over) condition, as the
Positive premise
Positive premise
Wine
Wine
Champagne
Champagne
Negative premise
Beer
(1)The case of large |b/a|.
Beer
Negative premise
(2) The case of small |b/a|.
Figure 1. Different balances in emphasis
argument rating increased relative to the right answer, the
score reduction also increased. Conversely, in the underestimation risk (Under) condition, as the argument rating
decreased relative to the right answer, the more the
reduction to the score increased. Score allocations for each
condition are presented in Table 1, which shows that highratings tend to lead to low evaluations in the Over condition,
while low-ratings tend to lead to low evaluations in the
Under condition.
Argument materials Four sets of inductive reasoning
arguments were rated that included entities from four
different semantic domains 3 (see Table 2). Each set
contained eight arguments, and each argument consisted of
three positive premises, three negative premises, and a
conclusion. The premise and conclusion statements all
consisted of a combination of a nonsense predicate (‘~’ is
bamisoya) and an entity (a jet plane), such as “A jet plane is
bamisoya”. In the case of negative premises, the predicate
involved a negative verbal form, such as “A trailer is not
bamisoya” .
Participants The participants were 118 Japanese
undergraduate students, of which 58 were assigned to the
Over condition, with the remaining 60 being assigned to the
Under condition.
Procedure The entire experimental procedure was
controlled by a web application executed with Internet
Explorer 6.0. The participants all followed the experimental
procedure together in a computer class. The experimental
procedure was divided into 6 stages; the first stage was for
the experimental instructions. The second stage was a
3
In each domain, the positive premise entities are selected from
a specific region of the model’s semantic space, the negative
premises entities are selected from another region, and the
conclusion entities are selected from the both regions.
1809
Table 1. Allocation of scores in each risk condition.
corresponds to
over 3 points
2 points
1 point
1 point
2 points
over 3 points
the concocted
underunderunderoverestimated
over-estimated over-estimated right answer
estimated
estimated
estimated
UNDER
add 0
add 35
add 65
add 100
minus 35
minus 65
minus 100
OVER
minus 100
minus 65
minus 35
add 100
add 65
add 35
add 0
These results suggest that participants’ ratings were
practice rating session for one of the four argument sets in
affected by the risk-involved situational contexts: in the
which feedback about the right answer and the current score
Over condition, ratings tended to be lower due to
was shown after each argument rating. The third to the fifth
application of a strategy of avoiding over-estimations that
stages were rating sessions for the remaining argument sets
might incur score reductions, while the participant ratings in
without feedback. However, during the last of these
the Under condition tended to be higher because of a
sessions, the current total score was displayed to each
strategy to avoid under-estimations that might incur score
participant. The last stage was an announcement of the total
reductions.
score and the ranking in the computer class. After this
procedure was completed, the true purpose of the
experiment was explained to all participants.
Model Simulations
Validity of Model’s Assumptions
Result of Experiment
Argument ratings on 7-point scales during the nofeedback sessions were translated into numerical scales (1 ~
7) and were analyzed in terms of the differences between
the two conditions. The average ratings over the three sets
of arguments (24 arguments) were 3.783 (SD = 1.248) for
the Under condition and 3.578 (SD = 1.210) for the Over
condition, respectively, representing a significant difference
between the two conditions according to a paired t test (p <
0.01).
Table 2. Examples of task sets.
<noun> is bamisoya (nonsense word).
Nouns used as entities in
positive premises
jumbo jet
ferry
shallop
Nouns used as entities in
negative premises
bus
train
prison
Nouns used as entities in
conclusions
passenger car
buoy
airplane
aquarium
ward office
trailer
taxi
fishing boat
First of all, the validity of the model’s assumptions was
evaluated. The model assumes that argument strength
ratings can be explained in terms of two kinds of
similarities; the similarity between positive premises and the
conclusion, and the similarity between negative premises
and the conclusion. Furthermore, these similarities are
computed from categories, and the categories are estimated
from an analysis of a corpus. The validity of these
assumptions is evaluated in a multiple regression analysis.
If the assumptions are not valid, the model’s fit for the
analysis might not be significant, or the estimated
parameters might be inexplicable (for example, parameter a
for the positive premise similarity is minus, or parameter b
for the negative premise similarity is plus). Here, we
estimate parameters for each participant. The parameters a
and b are estimated based on ratings (24 ratings for each
participant) obtained from the experiment using the leastsquare method, and model performance is then evaluated.
Argument ratings on the 7-point scales were translated into
numerical scales (-3 ~ 3). The results indicate that 107 of
the 118 individual estimations (for both the Over and Under
conditions) have a significant F ratio at p < 0.05, and that all
of these parameters are explicable (a for all 107 > 0, and b
for 107 < 0). Therefore, the model’s assumptions are
validated from these model fittings and the parameters are
explicable.
Situational Differences in the Model
In the experiment, the participants’ ratings were affected
by the risk-involving situations. Here, we examine whether
this result is due to different balances of emphasis in the two
different experimental conditions: in the Over condition,
greater emphasis is put on the negative premise similarity
(larger |b/a|), while greater emphasis is given to the positive
premise similarity in the Under condition (smaller |b/a|).
This time, the participants’ parameters estimated in the
previous subsection are divided into two groups, based on
the experimental conditions, and compared. The averaged
1810
( )
( )
v( N c ) = aSIM + N c + bSIM − N c + c , (7)
This control model differs from the original model
constructed with Equation (4) in parameter c that reflects an
across-the-board boost or reduction in ratings (rating values
for a conclusion on border line). We estimated each
participant’s parameters for this control model from their
rating data (24 ratings for each participant), screened them
with the F ratio (p < 0.05), then compared the parameter c
between the Over and Under conditions. The average of
parameter c was 1.064 (SD=1.681) in the Over condition,
and 0.835 (SD=1.520) in the Under condition, with no
significant difference. This suggests that the situational
difference in the inductive reasoning ratings is due to the
balance of emphasis between the positive premise similarity
and the negative premise similarity, and not due to an
across-the-board boost or reduction in ratings. Note that it is
not possible to distinguish between the two interpretations
of the experimental findings without conducting parameter
estimations for the proposed model.
reasoning experiment. Until the end of the session, they
were not told about the connection between the assessment
and the previous inductive reasoning experiment. The
participants are classified according to their scores on the
two personality factors: classified into high or low
neuroticism groups (High-N/Low-N), and classified into
high or low extroversion groups (High-E/Low-E). These
classifications are based on average scores for the
participants.
For the investigation, two sets of two-way analyses of
variance (ANOVA) were conducted on an individual’s
absolute rate |b/a|. The factors were Personality group and
Situational condition (High-N/Low-N times Over/Under,
and High-E/Low-E times Over/Under). The result of the
ANOVA for High-N/Low-N times Over/Under indicated a
significant interaction (p < 0.05). In contrast, the result of
the ANOVA for High-E/Low-E times Over/Under indicated
two significant main effects (p < 0.05). As shown in Figure
3, the interaction between the neuroticism groups and the
situational conditions reflects the fact that participants in the
High-N group are unaffected by the situation. On the other
hand, there are situational effects on both the High-E group
and the Low-E group, with the High-E group tending to
emphasize negative premise similarity regardless to the
situation.
1.22
Emphasis balance |b/ a|
balance of emphasis for the Over condition is 1.164
(SD=0.087) while for the Under condition, it is 1.104
(SD=0.131), with the difference being significant (p < 0.01),
which is consistent with the hypothesis.
However, the interpretation of this experimental result
remains ambiguous. A possible alternative interpretation is
that the result reflects an across-the-board boost or reduction
in all 24 ratings for each situation. That would be the case if
a rating about a conclusion on the border in Figure 1 differs
in the Over and Under conditions. In order to distinguish
between these alternatives, we constructed another control
model, as follows;
Situation-Personality Relationships
In the previous sub-section, it was suggested that the
absolute ratio |b/a| reflecting the balance of emphasis can
explain situational differences in inductive reasoning.
However, a given situation will be perceived quite
differently by different individuals with different
personalities. Accordingly, this sub-section investigates the
relationship between individual personality and the situation,
that is, the balance of emphasis represented by |b/a|.
Personality Assessment From the Japanese NEO-PI-R
(The Japanese Revised NEO Personality Inventory:
Shimonaka, Nakazato, Gondo, & Takayama, 1998), ten
items that assess the first factor (Extroversion) and the
second factor (Neuroticism) were used for this assessment.
These items were combined with another 43 filler items
(for the third to fifth factors of NEO-PI-R and the
Achievement Motive Scale by Horino, 1987) under the
control of another web application executed with Internet
Explorer 6.0. Of the 118 participants who joined the
inductive reasoning experiment introduced above, 78
participated in this assessment session. Again, they all
followed the assessment procedure together in the same
computer class about two months after the inductive
Under
Over
1.2
1.18
1.16
1.14
1.12
1.1
1.08
Low- N
High- N
Figure 3. Interaction between Neuroticism and
situation.
Discussion
The present study demonstrates that our model can reveal
the interaction between the situation and personality within
inductive reasoning involving both positive premises and
negative premises. Our kernel function model of inductive
reasoning explains argument strength ratings in terms of two
kinds of similarities; the similarity between positive
premises and the conclusion, and the similarity between
negative premises and the conclusion.
Two model
parameters together represent the balance of emphasis
between positive premise similarity and negative premise
similarity in inductive reasoning ratings. The results of
parameter estimations indicate that this emphasis balance
can explain not only situational effects in inductive
1811
reasoning experiment but also the interaction between
situational effects and participant personality.
In a two-way ANOVA for the balance of emphasis
represented by model parameters, the interaction between
experimental situations and neurotic personality was
significant. This indicates that neurotic individuals do not
adjust their balance of emphasis according to the riskrelated situation for inductive reasoning, although nonneurotic individuals do. This seems a little strange because
an individual with high neuroticism would generally be
regarded as being easily affected by the surrounding
atmosphere, become easily worried, and being quick to
anger, as well as being easily discouraged. It is probably
safe to say that this interaction is analogous to Alloy’s
depressive realism effect (the sadder but wiser effect: Alloy
and Abramson, 1979). Here, we attempt to interpret this
discrepancy in terms of task strategies employed in the
inductive reasoning experiment. We speculate that neurotic
participants will adopt a different strategy from other
participants who are affected by the situation. Within the
situational strategy, participants may refer to the score
allocation presented at every rating, as shown in Figure 1.
While score allocation has absolutely no connection with
the actual right answers, participants are likely to utilize the
available information in front of them. We may, therefore,
regard this strategy as a kind of heuristics. In contrast, with
a neurotic strategy, participants might seek some clues from
the right answer in the feedback session, and somehow
apply this uncertain clue. While the clue from the right
answers would actually be rather vague, it is reasonable to
believe that the answer could be induced from the right
answers in the feedback session. Because the distribution of
right answers has no relation with score allocations, neurotic
participants are likely to be free from situational effects. It
is quite likely that neurotic people routinely employ
effortful but logic-governed forms of thinking, and, thus,
become exhausted from their efforts, and, in turn, easily
become mentally unstable.
In contrast, the results of the second two-way ANOVA
indicated a significant main effect of extroversion
personality. This suggests that extroversive participants
place heavier emphasis on negative premise similarity than
non-extroversive participants. Briefly considered, this result
might be related to the tendency for the thinking of
extroversive people to focus on broader information.
Although these considerations are issues for speculation,
clearly, the balance of emphasis represented by the model
parameters differs based on the situation and on personality,
and difference in the emphasis balance reflect differences in
task strategies utilized in inductive reasoning. Whatever the
case may be, our model undeniably has great potential to
provide further insights into the nature of inductive
reasoning.
Program, “Framework for Systematization and Application
of Large-scale Knowledge Resources”. Furthermore, the
authors would like to thank Dr. T. Joyce of Tama
University, for his critical reading of our manuscripts and
valuable comments on an earlier draft.
References
Alloy, L. B., & Abramson, L. Y. (1979). Judgment of
contingency in depressed and nondepressed students:
Sadder but wiser? Journal of Experimental Psychology:
General, 108, 441-485.
Horino, M. (1987) Analysis and reconsideration of the
concept of achievement motive. The Japanese journal of
educational psychology 35(2),148-154. (In Japanese)
Hofmann, T. (1999). Probabilistic latent semantic indexing.
Proceedings of the 22nd International Conference on
Research and Development in Information
Retrieval :SIGIR ’99. 50-57.
Osherson, D. N., Smith, E. E., Wilkie, O. Lopez, A., and
Shafir, E. (1990). Category-Based Induction.
Psychological Review, 97, 2, 185-200.
Pereira, F., Tishby, N., and Lee, L. (1993). Distributional
clustering of English words. Proceedings of the 31st
Meeting of the Association for Computational Linguistics.
183-190.
Rips, L. J. (1975). Inductive judgment about netural
categories. Journal of Verbal Learning and Verbal
Behavior, 14, 665-681.
Rosch, E. (1973). On the internal structure of perceptual and
semantic categories. In T. E. Moore (Ed.), Cognitive
Development and the Acquisition of Language (pp. 111144). New York: Academic Press.
Sakamoto, K., & Nakagawa, M. (2007). Risk Context
Effects in Inductive Reasoning: An Experimental and
Computational Modeling Study. Proceedings of the
Sixth International and Interdisciplinary Conference on
Modeling and Using Context. Kokinov, B. et al. (Eds.):
CONTEXT2007, Springer LNAI 4635, pp. 425-438.
Sakamoto, K., & Nakagawa, M. (2008). A Computational
Model of Risk-Context-Dependent Inductive Reasoning
Based on a Support Vector Machine. Proceedings of the
third international conference on Large-scale
Knowlegde Resources. Tokunaga, T., and Ortega, A.
(Eds.):LKR2008, Springer LNAI 4938, pp.295-309.
Simonaka, J., Nakazato, K., Gondo, Y., and Takayama, M.
(1998). Construction and factorial validity of the
Japanese NEO-PI-R. The Japanese Journal of
Personality,6, 2, 138-147. (In Japanese)
Vapnik, V. (1992). The Nature of Statistical Learning
Theory. Springer.
Acknowledgments
Supported by the Japanese Society for the Promotion of
Science, and the Tokyo Institute of Technology 21COE
1812