Evidence of Outcome Bias and Reverse Outcome Bias Mark E

Judging Audit Quality in Light of Adverse Outcomes:
Evidence of Outcome Bias and Reverse Outcome Bias
Mark E. Peecher, Ph.D., CPA
Deloitte & Touche Teaching Fellow
Associate Professor of Accountancy
M. David Piercey
Deloitte & Touche Doctoral Fellow
Doctoral Candidate
University of Illinois at Urbana-Champaign
February 2006
Under Revision: Comments Appreciated
We thank Brooke Elliott, Anne Farrell, Mike Gibbins, Jonathan Grenier, Gary Hecht, Josh Herbold, Karim
Jamal, Kathryn Kadous, Susan Krische, Thomas Matthews, Molly Mercer, Joel Pike, Doug Prawitt, Ira
Solomon, Kristy Towry, and George Wu for their helpful comments. We also thank participants at the 9th
BDRM Conference at Duke University and at the 1st Accounting Research Symposium at Brigham Young
University, as well as workshop participants at the University of Alberta, the University of Connecticut,
Emory University, and the University of Illinois at Urbana-Champaign. Address email to either
[email protected] or [email protected] or mail to either author at UIUC, Department of Accountancy,
College of Business, 284 Wohlers Hall, 1206 S. Sixth Street, Champaign, IL, 61820.
Judging Audit Quality in Light of Adverse Outcomes:
Evidence of Outcome Bias and Reverse Outcome Bias
Abstract
Considerable auditing research demonstrates that individuals exhibit outcome effects when
judging audit quality: That is, they judge auditor negligence higher when given knowledge about
adverse audit outcomes. Many studies conclude that knowledge of adverse audit outcomes biases
individuals against auditors, and attempt to improve their judgments by reducing outcome effects.
Yet, whether outcome effects imply outcome bias is a vexing question: From a Bayesian
perspective, individuals should judge auditors more harshly when given adverse outcomes.
Logically, individuals could exhibit either outcome bias (over-rely on adverse audit outcomes),
reverse outcome bias (under-rely on adverse outcomes), or neither form of bias. Applying
Prospect Theory’s probability weighting function, we hypothesize and find that individuals’
negligence judgments exhibit outcome bias when the Bayesian probability of auditor negligence
is relatively low, but also exhibit reverse outcome bias when the Bayesian probability is relatively
high. This finding is robust to both relatively rich and relatively abstract experimental settings,
and to judgments made in hindsight and in foresight. While many factors likely contribute to
judgments of auditors potentially being too harsh (e.g., large plaintiff losses and auditors’ “deep
pockets”), our conclusions suggest that the effect of adverse outcome information on judgments
of auditor negligence is not as obvious, and depends on whether the Bayesian probability of
auditor negligence is high or low. We suggest that the model for judgments of auditor negligence
should expand to include both outcome bias and reverse outcome bias, where predicted by our
combination of outcome effects and Prospect Theory’s probability weighting function.
1
I.
INTRODUCTION
In a variety of contexts, individuals assess the quality of auditors’ decision making in
light of adverse outcomes, e.g., material misstatements of earnings. Such outcomes often are
salient in contexts such as litigation, the popular business press, and alternative dispute resolution.
A common concern is that adverse outcomes exert too much influence on individuals’ judgments
of auditor negligence.
The effects of outcome knowledge on individuals’ judgments of auditor negligence are
called outcome effects. A number of accounting studies have shown that individuals judge
auditors more harshly when given information about adverse outcomes than when not given such
information (e.g., Clarkson et al. 2002; Kadous 2001).1 Some studies conclude that larger
outcome effects indicate a larger bias against auditors and, accordingly, attempt to de-bias
individuals’ judgments of auditors by reducing outcome effects (e.g., Cornell et al. 2005;
Clarkson et al. 2002; Kadous 2001).
Yet, the extent to which outcome effects imply outcome bias is more vexing: From a
Bayesian perspective, audit outcomes are informative of original audit quality (Hershey and
Baron 1995; Hawkins and Hastie 1990; Brown and Solomon 1987). A Bayesian evaluator would
judge auditors more harshly given knowledge of adverse audit outcomes, and therefore exhibit
such outcome effects (see Section II; Hershey and Baron 1995; Holmstrom 1978).2 Since
individuals are regularly non-Bayesian, they could either over- or under-rely on outcome
information (e.g., Edwards 1968). Logically, outcome effects, by themselves, could be the same
as, greater than, or less than those exhibited by a Bayesian evaluator of audit quality.
1
Outcome effects have also been shown in other accounting contexts, including bankruptcy prediction
(Buchman 1985), capital budgeting (Brown and Solomon 1987), variance investigation (Lipe 1993),
taxation (Kadous and Magro 2001), and performance evaluation (Frederickson, Peffer and Pratt 1999).
2
It is generally appropriate for evaluators to use outcome information in most real-world contexts (Hershey
and Baron 1992, 1995). Normatively, adverse outcomes are to some extent diagnostic of auditor negligence
unless either: (1) individuals are certain that they possess all of the information that auditors should have
possessed when making their decisions, or (2) individuals are uncertain whether they possess all such
information but are somehow certain that all missing information is uncorrelated with adverse outcomes
(Brown and Solomon 1987, 565-66).
2
Several accounting studies include caveats to emphasize that outcome effects can be
normatively appropriate (e.g., Brown and Solomon 1987; Lipe 1993; Anderson et al. 1993, 1997;
Tan and Lipe 1997; Frederickson et al. 1999), while others explicitly assume that outcome effects
likely are non-normative (e.g., Kadous 2001, 441). For example, some studies characterize
evaluators as being “vulnerable” or “susceptible” to outcome effects, or stress that evaluators
unlikely can ignore outcomes when evaluating auditors or managers, who did not know the
outcomes at the time of their decisions (e.g., Kinney and Nelson 1996; Kadous 2001; Clarkson et
al. 2002; Cornell et al. 2005). Other studies take measures to increase the likelihood that outcome
effects indicate outcome bias by trying to suppress the diagnosticity of outcomes. For example,
some participants are asked to assume they have exactly the same information that auditors
(should have) had when deciding (e.g., Anderson et al. 1993, 1997).3 Some participants are told
that they should ignore outcomes (e.g., Anderson et al. 1993, 1997; Clarkson et al. 2002).4 Yet,
whether and the extent to which outcome effects documented in these studies indicate outcome
bias depends in large part on the diagnosticity of the adverse outcomes for audit quality. Outcome
effects should arise when participants reasonably judge adverse outcomes to have non-zero
diagnosticity for audit quality.
We build upon this literature by using a conceptual approach that does not require caveats
about whether or the extent to which adverse outcomes are diagnostic of audit quality. We allow
adverse outcomes to be diagnostic of audit quality and, unlike prior studies, measure their
3
Of course, it is impossible to quantify with certainty whether prior research rendered outcome information
non-diagnostic of audit quality. For example, this may depend on the extent to which participants in an
experiment believe it when they are told that they have all of the information that an auditor had. If
individuals believe they have incomplete information, they may use outcome information to infer missing
information (Hershey and Baron 1992).
4
Whether such instructions render outcomes non-diagnostic is also uncertain. An important distinction is
whether such experimental tasks ask participants to recall their prior beliefs about auditor negligence or to
judge auditor negligence. If the task judgment task were to recall one’s prior beliefs (i.e., what one thought
audit quality was before knowledge of outcome), then the optimal judgment would ignore outcomes.
Failure to do so is called hindsight bias (e.g., Kennedy 1995). However, if the judgment task were to judge
auditor negligence (i.e., to diagnose original decision-making quality), then the normative judgment (to
maximize accurate judgments of auditor negligence) would use outcome information, even if one were
given instructions to ignore it (Hershey and Baron 1992, 1995). The over-use, not the mere use, of outcome
information that leads to unduly large outcome effects is outcome bias (Baron and Hershey 1998).
3
diagnosticity, as suggested by Hershey and Baron (1992, 1995). Because we empirically measure
the diagnosticity of adverse outcomes via a Bayesian Benchmark, we can separate outcome bias
from outcome effects.
In addition, we present theory that has not been used to illuminate outcome effects —
Prospect Theory’s probability weighting function (cf., Kahneman and Tverksy 1979; Tversky and
Kahneman 1992). By combining extant auditing theory with Prospect Theory, we hypothesize
conditions under which judgments of auditor negligence exhibit outcome bias (i.e., are too high)
and exhibit reverse outcome bias (i.e., are too low). We predict and find that individuals’
judgments of auditor negligence exhibit outcome bias when the Bayesian probability of auditor
negligence is relatively low (i.e., below the vicinity of 40%), but also exhibit reverse outcome
bias low when the Bayesian probability of negligence is relatively high (i.e., above the vicinity of
40%). Our results support this prediction in two experiments that collectively use both relatively
abstract and relatively rich experimental settings (adapted from Kadous 2001, 2000), and for
judgments made in hindsight and in foresight.
Prior auditing research has identified many factors that increase the harshness with which
individuals judge auditors. For example, individuals may be motivated by plaintiffs’ losses
(Kadous 2000), or simply by auditors’ “deep pockets” (i.e., regardless of whether they were in
fact negligence), and these factors likely do increase the probability that individuals’ judgments
of auditors’ are too harsh. However, with respect to adverse outcome information, our theory and
findings suggest that its incremental effects are not obvious, but rather depend on the Bayesian
probability of auditor negligence.
Our theory and experimental findings contribute to the accounting literature in at least
four ways. One, we use a novel measurement method that conceptually distinguishes among
outcome effects, outcome bias, and reverse outcome bias, and that directly measures these biases.
Two, based on Prospect Theory’s probability weighting function, we predict that probability
weighting constitutes a previously overlooked source of both outcome bias and reverse outcome
4
bias, with the sign of bias depending on the Bayesian probability of auditor negligence. This
extends auditing theory with the first prediction of both outcome bias and reverse outcome bias
following adverse outcomes, and extends Prospect Theory with its first prediction of either
outcome bias or reverse outcome bias. Three, we report two experiments to empirically test and
replicate our theory-based predictions. Four, our theory and findings suggest that the theoretical
model for judgments of auditor negligence — including de-biasing frameworks — should expand
to include both outcome bias and reverse outcome bias (i.e., judgments following adverse
outcomes that are reliably too lenient), where predicted by our application of Prospect Theory’s
probability weighting function.
The remainder of this paper is organized as follows. Section II develops our theory and
hypotheses. Sections III and IV describe two experiments designed to test our hypotheses.
Section V discusses our conclusions and limitations.
II. THEORY
In this section, we develop three theory-based hypotheses. We begin by reviewing the
accounting outcome-effect literature from a Bayesian point of view. We then discuss Prospect
Theory’s probability weighting function and its applicability to outcome effects observed in audit
negligence contexts to develop hypothesis one (H1). Next, we develop H2 by discussing how
evaluations conditioned on past, instead of future, outcomes can increase outcome effects.
Finally, in developing H3, we discuss the joint effects of probability weighting in H1 and
outcome temporality in H2. For expositional purposes, we emphasize the audit negligence context
when developing our hypotheses, but our underlying theory generalizes to other accounting
contexts.
2.1 A Bayesian Perspective on Outcome Effects
Many accounting studies find that evaluators with information about adverse outcomes
more harshly assess auditors’ decision processes than do evaluators without such information
5
(see, e.g., Reimers and Butler 1992; Anderson et al. 1993; Lowe and Reckers 1994; Nelson and
Kinney 1996; Anderson et al. 1997; Kadous 2000, 2001; and Clarkson et al. 2002).5 Normatively,
outcome effects usually are desirable, since adverse audit outcomes can be and often are
informative of the quality of auditors’ decision processes. Hawkins and Hastie (1990, 312) note
that outcome effects are consistent with learning from outcome feedback:
“The rational or adaptive response to outcome feedback should be to
change beliefs or reasoning procedures to incorporate the implications of
new information.”
In the rare case that an evaluator has certain, accurate, and complete information about
auditors’ ex ante decision processes, outcomes would add no incremental information and thus
would be non-diagnostic (Hershey and Baron 1992). Under such pristine conditions, evaluators’
outcome-informed and outcome-uninformed assessments of auditor negligence should be
equivalent.
Pristine conditions rarely occur in the natural audit ecology, however. Even auditors
themselves inaccurately recall the information sets used to make their judgments and decisions
(Moeckel and Plumlee 1989; Moeckel 1990).6 Third-party evaluators, who were not at the audit,
likely obtain uncertain and incomplete information about auditors’ ex ante decision processes.
When evaluators have such information, outcomes are diagnostic of auditors’ ex ante decision
processes (Brown and Solomon 1987; Hoch and Loewenstein 1989; Hershey and Baron 1992;
Kelman, Fallas and Folger 1998). That is, evaluators should use outcome information, along with
other diagnostic signals, to assess the quality of auditors’ ex ante decision processes (Hershey and
Baron 1995; cf., Holmstrom 1979).
5
Elsewhere, outcome effects are measured as the difference between evaluators’ assessments of decisionmaking conditional on good, as opposed to bad, outcomes (Frederickson et al. 1999; Tan and Lipe 1997).
Brown and Solomon (1987) also include a good outcome condition, along with no outcome and bad
outcome conditions. All three of these studies include cautionary statements that larger outcome effects
might or might not represent greater outcome bias, but none measure the extent to which this is true.
6
Rather than recalling actual evidence, experienced auditors sometimes rely on default values to
reconstruct past mental representations and decision processes (Moeckel 1990, 371-72). Whether outcomes
affect default values auditors use for reconstruction purposes is a topic for future research.
6
To illustrate, suppose that an evaluator assesses the probability of auditor negligence
(AN) conditional on an outcome that indicates materially misstated (MM) financial statements
(p(AN|MM)). Before considering the outcome, the evaluator had prior beliefs about base rate
probabilities of auditor negligence p(AN) and material misstatements p(MM), as well as about the
conditional probability of materially misstated financial statements when auditors are negligent
(p(MM|AN)). So long as material misstatements are more likely to go unprevented or undetected
when auditors are negligent than when they are not, the likelihood ratio p(MM|AN)/p(MM) must
exceed 1. The greater this ratio, the greater the outcome diagnosticity, and the greater the factor
by which an evaluator should increase the base rate p(AN) in assessing of auditor negligence
p(AN|MM):
( p!MM | AN " % .
##
p! AN | MM " * p! AN ") &&
' p!MM " $
(1)
Therefore, the extent to which outcome effects imply that evaluators are overly harsh or
overly lenient is hard to say without measurement. Bayesian evaluators would exhibit outcome
effects (p(AN|MM) > p(AN) in equation 1, above), to maximize the expectation of an accurate
evaluation.
An additional complication is that evaluators often need to assess the quality of auditors’
decisions based on probabilistic, instead of deterministic, outcomes. The probability with which a
material misstatement exists often is based on rumor and can be a matter of contentious debate.
Were earnings really misstated and, if so, at what point did the misstatement really become
material? Deloitte LLP, as just one example, publicly and vigorously disagreed with the SEC’s
allegation that an adverse outcome – a material misstatement – existed in Pre-paid Legal Services
Inc.’s financial statements: “Deloitte & Touche . . . took the unusual step yesterday of stating
publicly that it believes the Securities and Exchange Commission was wrong when it forced a
company to restate its financial results. The firm said it would not certify the revised books
because it does not believe they are correct” (Glater and Norris 2001). When considering
7
probabilistic adverse outcomes, evaluators generally should treat them as more diagnostic of poor
audit decisions as they become more probable.
2.2 Prospect Theory, Probability Weighting & Reverse Outcome Bias
Even aside from Prospect Theory, a Bayesian perspective recognizes the possibility that:
(1) outcome effects could reflect evaluators’ warranted belief revision instead outcome bias, (2)
evaluators could exhibit outcome bias (i.e., over-harshness) or reverse outcome bias (i.e., overleniency), and (3) reductions in outcome effects unwittingly could cause or amplify reverse
outcome bias (cf., Hershey and Baron 1992, 1995). While prior accounting studies on outcome
effects are silent about or assume away reverse outcome bias, we next apply Prospect Theory’s
probability weighting function to specify conditions under which reverse outcome bias (and
outcome bias) is likely to obtain.
When Kahneman and Tversky’s (1979) Prospect Theory emerged, it included a nascent
probability weighting function to account for mounting empirical evidence of individuals’
systematic mistreatment of probabilities (Phillips, Hays and Edwards 1966; Phillips and Edwards
1968; Edwards 1968). This probability weighting function is distinct from Prospect Theory’s
more famous value function, which features a reference point and is about twice as steep for
losses than for gains.7
Research on the probability weighting function during the past 25 years has revealed
several theory-consistent empirical regularities. Most strikingly, individuals overweight relatively
low probabilities but underweight relatively high probabilities, with these trends abating for
probabilities in the vicinity of 0 and 1. Multiple empirical studies show that plotting individuals’
weighted probabilities (w(p)) on actual probabilities (p) results in an inverse-S (see Figure 1).
Multiple empirical studies also show that, in the vicinity of 40%, the probability weighting
function switches from overweighting to underweighting probabilities and also go from
7
Without the probability weighting function, Prospect Theory cannot explain well-replicated decision
behaviors with respect to risk preferences or violations of first-order stochastic dominance in choosing
among risky prospects (see, e.g., Fox and Tversky 1998).
8
exhibiting concavity to convexity (e.g., Tversky and Kahneman 1992; Camerer and Ho 1994).8
While initial tests of the probability weighting function examine how humans weight stated
probabilities when choosing between alternative abstract gambles, more recent work does so in
applied contexts featuring intuitive probability assessment (e.g., Fox and Tversky 1998; Wu and
Gonzalez 1999).
[Insert Figure 1 here]
Prospect Theorists characterize the probability weighting effect as a primitive,
unconscious bias in humans’ perceptions of uncertainty (Wu and Gonzalez 1996; Gonzalez and
Wu 1999; Wu, Zhang and Gonzalez 2004). We contend that, as such, Prospect Theory’s
probability weighting function can be used to predict conditions under which evaluators underand over-react to the diagnostic value of probabilistic (i.e., uncertain and certain) adverse audit
outcomes. If evaluators mis-weight the probabilities of auditor negligence in accord with Prospect
Theory, evaluators will overweight the Bayesian probability of auditor negligence when it is
relatively low (below the vicinity 40%), but underweight the Bayesian probability of auditor
negligence when it is relatively high (above the vicinity of 40%). This suggests the following
hypothesis (see Figure 1):
H1: (Probability-Weighting Effect) Outcome-informed evaluators will exhibit
outcome bias (over-harshness) when the Bayesian probability of auditor
negligence is relatively low (below the vicinity of 40%), but reverse outcome
bias (over-leniency) when the Bayesian probability of auditor negligence is
relatively high (above the vicinity of 40%).
If H1 were empirically supported, it would contribute to theory of how and how well
evaluators assess auditors’ negligence. It would constitute an important boundary condition for
the presumption that outcome effects imply overly harsh judgments of auditors and for the idea
that outcome effects generally should be reduced. Notably, H1 predicts over leniency for
probabilities that have ecological significance in audit litigation contexts – those probabilities that
8
Wu and Gonzalez (1996) demonstrate the cross-over point characteristic of an inverse-S curvature (also
see, e.g., Tversky and Fox 1995). In addition Prelec’s (1998) elegant model predicts the cross-over point to
be at 1/e + 38%.
9
are above legal standards of proof such as “preponderance of the evidence,” “clear and
convincing,” and “beyond reasonable doubt.”
2.3 Outcome Temporality: Revising Beliefs in Foresight vs. Hindsight
H1 predicts the way evaluators revise their prior beliefs of auditor negligence, conditional
on outcome information, relative to a Bayesian. As prior accounting studies on outcome effects
demonstrate, many factors likely influence the relative harshness or leniency of evaluators’
judgments of auditor negligence. For example, some individuals may be motivated by plaintiffs’
losses (Kadous 2000), by auditors’ “deep pockets” (i.e., regardless of whether a material
misstatement really exists), or by a personal pre-disposition against auditors.
Of particular interest to our belief-revision orientation is whether, holding the stated
probability of outcomes constant, evaluators’ pre-posterior analyses, or how they prospectively
revise beliefs given future adverse outcomes, differ from their posterior analyses, or how they
retrospectively revise beliefs given past adverse outcomes. As Hawkins and Hastie (1990, 311)
note, “It is a common observation that events in the past appear simple, comprehensible, and
predictable compared to events in the future. Everyone has had the experience of believing that
they ‘knew it all along’ the outcome of a horse race, football game, marriage, business
investment…” (italics added). Evaluators likely treat past material misstatements as more
predictable or controllable (i.e., by auditors) than future material misstatements.9 Since
controllability increases outcome effects (cf., Tan and Lipe 1997), evaluators likely will assess
auditors less favorably given past, as opposed to future, adverse outcomes. Thus, we propose the
following hypothesis:
9
Hawkins and Hastie (1990, 315) discuss several other ways individuals may respond differently to past, as
opposed to future, otherwise identical outcomes. They argue individuals’ processing of past outcomes
“involves narrow-minded thinking backwards from the given outcome to precipitating events… whereas
foresight involves consideration of many possible outcomes.” Thus, individuals naturally may respond to
future outcomes by processing them in frequentist terms but to otherwise identical past outcomes by
processing them in non-frequentist terms. We leave examination of this interesting possibility to future
research, as it is beyond the scope of our study.
10
H2: (Temporality Effect) Outcome-informed evaluators will judge auditors to
be more negligent conditional on past, as opposed to future, adverse audit
outcomes.
Two points are noteworthy: One, this outcome-informed comparison differs from typical
comparisons in accounting literature. Typically, assessments of evaluators given information of
past outcomes are compared to assessments of evaluators who are uninformed about outcomes
(e.g., Lipe 1993; Nelson and Kinney 1996; Kadous 2001; Kadous and Magro 2001; Clarkson et
al. 2002). Thus, prior accounting studies do not isolate on the extent to which outcome effects are
caused by information about outcomes or by information about past, as opposed to future,
outcomes.
Two, while H2 predicts that evaluators will make harsher judgments of auditor
negligence given past, as opposed to future, adverse outcomes, it is silent on whether their
evaluations will be too harsh or lenient. Theory for H1 warrants predicting two kinds of
directional bias, whereas theory for H2 warrants predicting one directional effect (without
specification of bias).10
2.4 Joint Effects of Probability Mis-Weighting and Outcome Temporality
For relatively low Bayesian probabilities or auditor negligence, H1’s predicted outcome
bias and H2’s predicted temporality effect for hindsight evaluations go the same direction. As
such, H1 and H2 jointly warrant predicting outcome bias, for both foresight and hindsight
evaluations, given relatively low Bayesian probabilities of auditor negligence.
For relatively high Bayesian probabilities, though, H1’s predicted reverse outcome bias
and H2’s predicted temporality effect for hindsight evaluations go in opposite directions. Absent
10
Hindsight bias differs from the temporality effect discussed in H2. Hindsight bias refers to evaluators’
propensity to mis-remember what their prior beliefs about the probability an outcome was (or would have
been), after learning something about a realized outcome (e.g., Kennedy 1995, 253). That would be
analogous to giving evaluators outcome information that probabilistically points towards a material
misstatement, and then asking them to provide their prior beliefs, as if they did not have the outcome
information. In contrast, outcome effects refer to evaluators’ propensity to revise their prior beliefs about
decision-making quality, in light of outcome information. For the former type of task, ignoring outcome
information will maximize the expectation of accurately recalled priors and reduce hindsight bias. For the
latter task, using outcome information will maximize the expectation of accurately updated evaluations of
auditor negligence (e.g., Baron and Hershey 1988). The temporality effect of H2 predicts that evaluators’
revised beliefs will be harsher given past versus future outcomes.
11
additional theory, we would conjecture that the net bias is simply an empirical question. A
theoretical case for predicting reverse outcome bias, even in hindsight, exists, however.
Specifically, Fox and Tversky (1998) and Wu and Gonzalez (1999) model probability judgments
as if they follow a two-stage process. In Stage 1, evaluators amass “support” for probabilities
based on event salience and, in Stage 2, they weight the probabilities in accord with the
probability weighting function. Many factors influence event salience and thus how much support
is amassed; “unpacking” is one (Tversky and Koehler 1994; Fox and Tversky 1998; Fox and
Birke 2002). Unpacking breaks outcomes into components – one could unpack the outcome
“material misstatement” into “incorrect revenue recognition,” “incorrect inventory valuation,”
and so on.
We argue that hindsight temporality, like unpacking, increases event salience and thus
how much support evaluators amass in stage one, resulting in higher objective probabilities (e.g.,
Hawkins and Hastie 1990). Then in stage two, evaluators mis-weight these objective probabilities
in accord with the probability weighting function. Thus, for relatively high Bayesian probabilities
of auditor negligence, we predict reverse outcome bias to obtain in hindsight as well as foresight.
We can not, however, predict whether the magnitude of reverse outcome bias will be greater in
foresight versus in hindsight: That would depend on the relative effect sizes associated with in
Stage 1’s support gathering versus in Stage 2’s probability mis-weighting. As the shape of the
probability weighting function clearly indicates (see Figure 2), the extent of underweighting of
probabilities nonlinearly varies for relatively high probabilities. So questions about the relative
size of Stage 1’s hindsight-related incremental “support” acquisition and of Stage 2’s probability
under-weighting ultimately are empirical.
This discussion warrants the following hypotheses, broken down into cases that feature
outcome bias for relatively low (H3a) and reverse outcome bias for relatively high (H3b)
Bayesian probabilities of auditor negligence:
[Insert Figure 2 here]
12
H3a: (Relatively Low Bayesian Probabilities of Negligence) Outcome bias will
obtain for relatively low Bayesian probabilities of auditor negligence,
regardless of whether evaluations are made in hindsight or foresight.
H3b: (Relatively High Bayesian Probabilities of Negligence) Reverse outcome
bias will obtain for relatively high Bayesian probabilities of auditor
negligence, regardless of whether evaluations are made in hindsight or
foresight.
If these hypotheses were empirically supported, it would provide a new lens through
which to view prior accounting studies on how evaluators use past adverse outcomes to evaluate
decisions of auditors or managers. Prior accounting studies assume or emphasize the condition of
evaluator over-harshness and ignore or downplay the possibility of evaluator over-leniency (e.g.,
Kadous 2001; Kadous and Magro 2001; Clarkson et al. 2002).
III. EXPERIMENT 1
3.1 Participants.
Undergraduates enrolled in an introductory accountancy course at the University of
________ served as participants.11 Nine hundred thirty-three volunteered for extra-credit, worth
up to 1% of their final grade. Participants averaged 1.37 years of post-high-school education (s =
0.97), 0.09 accounting courses (s = 0.38), and 3.29 business, accounting, and economics courses
(s = 1.98). 59.0% were male. On an 11-point Lickert scale centered at 0, participants were
slightly unsympathetic to auditors, on average (mean = –0.11, s = 1.65, t = –2.03, p = 0.043).
3.2 Task.
The experimental materials consisted of a single paper packet and featured three sections:
an introduction, an experimental case, and a post-case questionnaire. The introduction explained
basic concepts such as material misstatements, unqualified audit opinions, reasonable assurance,
11
Like most outcome effect studies in accounting we use non-professional evaluators. Many nonprofessional and professionals evaluators assess the quality of auditors’ decisions in many different
contexts – including business school students and other readers of the popular business press, jurors,
journalists, mediators, state or federal judges, lawyers, regulators, and so on. Absent a theory that our
students revise beliefs differently than other evaluators, our use of student participants is both theoretically
and practically justified (Libby, Bloomfield, and Nelson 2002; Peecher and Solomon 2001).
13
auditor negligence, and due professional care. It asked participants simple review questions to
emphasize these concepts.
Because this is the first outcome-effect accounting study to provide theory predicting a
condition under which reverse outcome bias (i.e., over-leniency) obtains, we wanted to provide
ample opportunity for evaluators’ assessments to be overly harsh. Thus, the introduction included
language adapted from the “severe consequences” condition in Kadous (2000), emphasizing
financial and emotional losses associated with undetected material misstatements and audit
failures. We also added language to highlight how the Enron and WorldCom debacles resulted in
lost savings and jobs (Appendix A).
In the experimental case section, participants used natural frequencies to convey their
prior beliefs and revised beliefs, conditional on probabilistic outcome information. We chose to
solicit their beliefs with natural frequencies because people come closer to being Bayesians when
using natural frequencies instead of probabilities (Gigerenzer and Hoffrage 1995, 1999;
Gigerenzer 2000). Our use of natural frequencies instead of probabilities, therefore, prejudices us
against observing our predicted form of non-Bayesian belief revision that is consistent with the
probability weighting function.
In the experimental case, the instrument referred to all the audits of U.S. companies by
Big-4 public accounting firms as the reference population, and elicited participants’ prior beliefs
(Appendix B). Specifically, it asked participants about the base rate frequency of material
misstatements (p(MM)), auditor negligence (p(AN)), and material misstatements given auditor
negligence (p(MM|AN)). Based on these measures, we computed the likelihood ratio
(p(MM|AN)/p(MM)) and the Bayesian posterior.
3.3 Manipulations and Experimental Design.
We manipulated outcome temporality, outcome probability (P*), and Order in a 2 × 5 ×
2 between-participants, fixed-factorial design. The first two factors manipulate attributes of
14
outcomes, and last factor caused participants to provide different priors (Appendix C). We next
explain each factor in turn.
The two levels of outcome temporality are hindsight and foresight. In the hindsight
condition, participants evaluated auditors based on audits and misstatement outcomes that already
had occurred, and all verbs appeared in the past tense. In the foresight conditions, participants
evaluated auditors based audits and misstatement outcomes that had not yet occurred, and all
verbs appeared in the future tense (Appendix C). This manipulation differs from manipulations in
extant outcome-effect accounting research, in which the baseline conditions typically withhold
outcome information. With our manipulation, any differences in evaluations of auditor negligence
are due participants’ use of different encoding or processing of past versus future outcomes.
The five levels of outcome probability, P*, are 0%, 25%, 75%, 90% or 100%. These
were five levels of explicitly stated probabilities of material misstatement (i.e., the adverse
outcome of interest). To accomplish this manipulation, the instrument described a hypothetical
watch list of U.S. financial statement audits. It included wording to the effect that a stated
percentage of companies on the list had (will have) materially misstated financial statements, with
the percentage stated depending on the level of P* to which given participants were assigned.
Later, participants evaluated audit quality for an audit randomly drawn from the list (Appendix
C).12
The manipulation of P* at five levels provides three benefits. One, it allows for a
robustness check for the outcome temporality effect for misstatements at multiple, explicitly
stated levels of probability. Two, it enables us to compare participants’ revised beliefs against a
Bayesian benchmark for the entire [0,1] probability interval. Three, while levels of P* denoting
certainty (i.e., 0% and 100%) make our research comparable to prior outcome effect studies in
12
For parsimony, we used a single mechanism (i.e., an earnings management watch list) for all five levels
of P*. In the real world, different mechanisms (e.g., an SEC investigation) may signal different levels of
P*. Identification of these mechanisms and the levels of P* that people typically associate with them are
beyond the scope of Experiment 1. We let participants infer levels of P* based on two realistic signals in
Experiment 2.
15
accounting, levels of P* that convey uncertainty (i.e., 25%, 75%, and 90%) are advantageous
because, in the real world, evaluators regularly confront uncertain outcomes (e.g., sometimes
whether a material misstatement exists is a judgment call and frequently the probability of
material misstatement < 100%).
The third factor is Order, by which we caused participants’ priors to differ, for control
purposes. We counterbalanced whether the instrument first elicited participants’ outcomeuninformed, prior beliefs p(AN), p(MM), or p(MM|AN) (Appendix B), or their outcome-informed
revised beliefs, p(AN|MMP*) (Appendix C). That is, some participants processed outcome
information before providing their priors whereas other participants first provided their priors.
Although Order significantly affected participants’ priors, it is insignificant in all tests of our
hypotheses and does not change any of our inferences. For simplicity, we collapse across Order
in our analyses.
3.4 Manipulation Checks.
To encourage participants’ attention, review questions appeared throughout the
instrument and in its final section. We told participants before the experiment that their extracredit depended on the accuracy of their answers on the review questions and on the
reasonableness of their responses for the case questions. Participants spent an average of 28.9
minutes on the experiment (s = 4.9).
To check on the outcome temporality manipulation, participants responded to a multiplechoice question immediately after receiving outcome information but before responding with
their posterior judgment. The question asked whether the outcome information related to audits
that “have already finished” or “have not yet started” (Appendix C). Of the 933 participants, 856
(91.8%) passed and 77 (8.2%) failed the manipulation check. When deciding whether to present
our findings with or without participants who failed our manipulation check, we wanted to bias
against supporting our theory. In H3b, we predict that reverse outcome bias (i.e., over-leniency)
will obtain, even in hindsight outcome temporality. So, to reduce the chance of classification
16
errors biasing our findings, we dropped participants who failed the manipulation check from the
sample a priori, resulting in a beginning sample size of 856.13
3.5 Experimental Findings.
Because of the specific inverse-S empirical shape of the data predicted by our theory
(Figure 2), we use a cubic polynomial regression to estimate our participants’ revised beliefs
about auditor negligence (p(AN|MMP*). H1 and H3 predicted that participants’ revised beliefs
would obtain as a cubic polynomial on the computed Bayesian revised belief, Bayes.14 Thus, we
regress participants’ revised beliefs on our two manipulated factors and on the Bayesian revised
belief up to a power of 3 (i.e., Bayes, Bayes2, and Bayes3). No higher-order interactions obtain.
We also include time spent on the task, Minutes, as an explanatory covariate.15
Note that, if participants were perfect Bayesians, the plotting of their revised beliefs and
Bayesian revised beliefs would produce a 45° line coinciding with the main diagonal (Figure 2).
13
The experimental findings do not change statistically or qualitatively with inclusion of participants who
failed the manipulation check. We also exclude observations due to missing or unintelligible data (30) and
outliers (15) identified by Cook’s distance (Neter et al. 1996). The experimental findings do not statistically
or qualitatively change if we include outlier responses.
14
If a misstatement outcome, MM, has a probability P*, the Bayesian posterior of auditor negligence is:
!
(
( p MM | AN
(
( p!MM | AN " % %
## ## - !1 , P *" ) & p! AN " ) &
Bayes * P * )&& p! AN " ) &&
& p MM
&
' p!MM " $ $
'
'
'
!
"
"%# %# ,
##
$$
(2)
where MM is a no-misstatement outcome. Note that, when P* = 1, equation (2) reduces to Bayes =
p(AN|MM), and when P* = 0, equation (2) reduces to Bayes = p(AN| MM ). For example, if 10 out of every
100 audits allow a material misstatement (p(MM) = 0.1), if auditors are negligent on 20 out of every 100
financial statement audits (p(AN) = 0.2), if 8 out of those 20 negligent audits allow a material misstatement
(p(MM|AN) = 8/20 = 0.4), and if you are 90% sure that a particular audit under evaluation did end in a
material misstatement (P* = 0.9), then the Bayesian probability that the auditors are negligent for that
particular audit under evaluation is:
(
( !1 , 0.4 " % %
(
( 0.4 % %
## ## * 0.733.
Bayes * 0.9 ) && 0.2 ) &
# ## - !1 , 0.9 " ) && 0.2 ) &&
' 0.1 $ $
'
' !1 , 0.1" $ $
'
(2a).
Since, the Bayesian probability of auditor negligence is above 40%, H1 would predict that individuals with
these prior beliefs about auditor negligence and about the diagnosticity of outcomes would under-react to
this outcome information (a material misstatement with P* = 90% probability), and, in this case,
underestimate the probability of auditor negligence given this outcome (73.3%).
15
For exploratory purposes, the post-test also included thirteen follow-up questions. While some questions
asked participants about their gender, education, and degree of sympathy towards auditors, others asked
them to think about covariation, randomness, and diagnostic inference. None of these thirteen response
variables are statistically significant in our analyses (all p’s > 0.10), and none qualitatively change our
findings. We thus omit these exploratory variables for simplicity.
17
The coefficients for polynomial terms Bayes2, and Bayes3 would be 0, and the coefficient for
Bayes would be 1.
Table 1 presents the cubic polynomial regression model, and Figure 3 displays the
marginal profile plot.16 As shown, coefficients for Bayes, Bayes2, and Bayes3 are all significant
(all p’s < 0.001). Furthermore, inspection of findings in Table 1 and Figure 3 suggest that H1, H2,
and H3 are supported. One, statistical significance obtains up to a cubic power for Bayes, and, in
the vicinity of 40%, the profile plot plainly switches from being concave and exhibiting outcome
bias to being convex and exhibiting reverse outcome bias (when the plot is above the diagonal,
over-harshness is indicated). This pattern of findings comprises the exact empirical footprint
predicted by H1. Inspection of Figure 3 also reveals hindsight evaluations are harsher evaluations
than foresight evaluations, consistent with H2. Finally, consistent with H3a and H3b, inspection
of Figure 3 reveals that hindsight evaluations exhibit outcome bias for relatively low Bayesian
probabilities of auditor negligence, but reverse outcome bias for relatively high Bayesian
probabilities of auditor negligence. We next present additional tests of H1-H3 below.
[Insert Table 1 and Figure 3 here]
As further tests of H1 and H3, we chose a relatively low and a relatively high value of
Bayes, and contrasted the prediction of participant’s revised beliefs p(AN|MMP*) for outcomes
from the polynomial regression model at Table 2 against the null-hypothesized value of Bayes.
We chose 25% and 75% as representative high and low values of Bayes well above and below
Prospect Theory’s 40%, respectively. H1 would be supported by observing significant outcome
bias at Bayes = 25% and significant reverse outcome bias at Bayes = 75%. H3 would be
supported by observing these two effects in the foresight and in the hindsight conditions.
Table 2 reports results for contrasts of p(AN|MMP*) against Bayes at 25% and 75% for all
five levels of P* and on an overall basis (sensitivity analyses showed that results were
qualitatively and statistically similar for other nearby, arbitrarily chosen values). The right
16
Centering the polynomial terms Bayes2 and Bayes3 to reduce multicollinearity yields statistically and
qualitatively similar results (Neter et al. 1996).
18
column of Table 2 provides results on an overall basis. When the Bayesian probability of auditor
negligence is 25%, the predicted value for p(AN|MMP*) is significantly greater than 25% both in
the foresight condition at 31.78% (t = 4.384, p < 0.001) and in the hindsight condition at 35.53%
(t = 6.782, p < 0.001). These two findings indicate significant outcome bias. When the Bayesian
probability of auditor negligence is 75%, however, the predicted value for p(AN|MMP*) is
significantly lower than 75% in the foresight condition at 50.08% (t = –10.795, p < 0.001) and in
the hindsight condition at 53.84% (t = –9.218, p < 0.001). These two findings indicate significant
reverse outcome bias. Overall, this pattern of findings supports H1 and H3.17
[Insert Table 2 here]
To test H2, we examine the main effect of hindsight in the cubic polynomial regression
model at Table 1. As the Temporality coefficient shows, participants’ judgments of auditor
negligence were 3.75 percentage points higher (i.e., more harsh) in the hindsight conditions than
in the foresight conditions (t = 2.343, p = 0.019). This main effect supports H2.
3.6 Supplemental Findings.
To supplement our tests of H1, H2, and H3, we measure participants’ outcome effects,
outcome bias, and reverse outcome bias over low and high ranges of Bayesian probabilities of
auditor negligence, across the hindsight and foresight conditions. Specifically, since our
hypotheses predict outcome bias (reverse outcome bias) for Bayesian posteriors below (above)
the vicinity of 40%, we tabulate participants’ outcome effects and outcome bias (reverse outcome
17
To obtain convergent validity (cf., Trochim 2001), we compare our findings to those of other studies of
the probability weighting function. Tversky and Kahneman (1992) use a single parameter model to specify
the probability weighting function:
p.
,
(3)
w! p " *
!p
.
"
. 1/ .
- !1 , p "
When . < 1, the predicted inverse-S curve obtains (as in Figure 1), and the curve becomes more linear as .
approaches 1. For . = 1, w(p) = p, so the curve coincides with the main diagonal. When . > 1, a regular-S
curve obtains. We estimated . using non-linear regression and an iterative algorithmic process that
continued until reductions in the sum of squared residuals were locally minimized. We estimate . at 0.61
for hindsight and at 0.57 for foresight. These estimates yield inverse-S curves like those in Figure 3 and
typify prior estimates of . (e.g., Camerer and Ho’s 1994 review reports an average . of 0.56, Wu and
Gonzalez 1996 estimate . at 0.71, and Tversky and Kahneman 1992 estimate . at 0.61 and 0.69).
19
bias) within the ranges 10% ! Bayes ! 30% and 50% ! Bayes ! 90%. As Table 3 shows, we
replicate the finding of outcome effects documented in prior auditor negligence studies, in both
the hindsight and the foresight conditions. However, consistent with the tests of H1 and H3,
participants exhibit, on average, outcome bias within the 10% ! Bayes ! 30% range, but reverse
outcome bias within the range 50% ! Bayes ! 90% range. Within the hindsight conditions,
participants in the 10% ! Bayes ! 30% range exhibit outcome effects of 14.8 percentage points (t
= 7.17, p < 0.001), which is 12.3 percentage points too high (i.e., outcome bias; t = 6.18, p <
0.001). Participants in the 50% ! Bayes ! 90% range, in contrast, exhibit outcome effects of 16.5
percentage points (t = 5.53, p < 0.001), which is 10.2 percentage points too low (i.e., reverse
outcome bias; t = –3.39, p < 0.001). Results for the foresight conditions are similar, but they
exhibit less outcome bias and more reverse outcome bias (Table 3). These findings are consistent
with H1, H2, and H3.
[Insert Table 3 here]
IV. EXPERIMENT 2
Since Experiment 1 is the first to report evidence of reverse outcome bias and uses a
relatively abstract audit negligence setting, we desired to test whether our theory-based
hypotheses would replicate and extend to a richer, more realistic audit negligence setting. To do
so, we used a setting quite similar to the Big Time Gravel scenario from Kadous (2000, 2001).
Participants evaluated auditor negligence with respect to this particular, vivid audit rather than
with respect to an audit randomly drawn from a stylized earnings management watch list.
Undergraduates (n = 168) from a different semester of the same introductory accountancy
course at the University of _______ volunteered, again for 1% extra course credit. Participants
had completed an average of 1.47 years of post-high-school education (s = 1.20), 0.12 accounting
courses (s = 0.38), and 3.15 business, accounting, and economics courses (s = 1.77). On an 11-
20
point scale ranging from unsympathetic to auditors (–5) to sympathetic with auditors (+5),
participants’ average rating was –0.01 (s = 1.39, p = 0.99).
We employed a 2 ) 2 between-participant factorial design. We manipulated outcome
temporality at two levels, hindsight and foresight, by describing the Big Time Gravel audit and
misstatement information in either past or future tense (e.g., Appendix D). We manipulated
outcome probability, P*, at two levels, with more realism than in Experiment 1. At one level of
P*, participants received outcome information stating that the Big Time Gravel’s inventory was
(or will be, in foresight conditions) overstated according to an SEC investigation (Appendix D).
At the other level of P*, the outcome information was rumors among analysts of an SEC
investigation of possible overstatement. We measured participants’ perceptions of P* by asking
them for the probability that the outcome actually was (or will be) a material misstatement and
then used these measured probabilities for the outcome information in computing the Bayesian
posterior, Bayes (see equation 2, footnote 14).
Results for the polynomial model appear at Table 4 Panel A.18 Consistent with H1, the
coefficients for Bayes, Bayes2, and Bayes3 are once again statistically significant (see Table 4
Panel A). Inconsistent with H2, however, the effects caused by the hindsight-foresight distinction
are statistically insignificant in this richer, more realistic audit negligence context (t = –0.754, p =
0.452). One plausible explanation is that the influence of the foresight-hindsight distinction on
outcome effects and outcome bias becomes negligible as one moves to relatively rich, realistic
decision contexts (see, e.g., Christensen-Szalanski and Willham 1991).
Since we observe no outcome-temporality effect in Experiment 2, we do not test H3 and
collapse across the hindsight and foresight conditions in the posterior probability plot at Figure
18
We used the same manipulation check of the hindsight manipulation as in Experiment 1. Of the 166
subjects, only 14 failed the manipulation check. Results are statistically robust to their inclusion or
exclusion at / = 0.05. Centering the polynomial terms Bayes2 and Bayes3 to reduce multicollinearity
produced statistically similar results at /0= 0.05 (Neter et al. 1996). Additionally, the tabulated results
exclude 11 observations with missing or nonsense responses.
21
4.19 Observe that the modeled posteriors switch from concavity to convexity, crossing the
diagonal at approximately 40%, leaving the empirical footprint of the probability weighting
function (H1). Moreover, as Table 4 Panel B shows, collapsed across outcome temporality,
participants’ evaluations exhibit statistically significant outcome bias when the Bayesian
probability of auditor negligence is relatively low, or 25% (t = 1.604, p = 0.055), but also
statistically significant reverse outcome bias when the Bayesian probability of auditor negligence
is relatively high, or 75% (t = –3.022, p = 0.001), consistent with H1.20, 21
V. CONCLUSION: LIMITATIONS AND IMPLICATIONS
We report theory and experimental-empirical findings that contribute to the accounting
literature on outcome effects and outcome bias. Outcome effects occur when evaluators use
outcomes to revise their beliefs about others’ decision quality, and they robustly obtain in many
accounting contexts (e.g., Buchman 1985; Brown and Solomon 1987; Lipe 1993; Kinney and
Nelson 1996; Frederickson et al. 1999; Kadous 2000, 2001; Kadous and Magro 2001). Although
outcome effects generally should obtain after negative outcomes, several accounting studies try to
improve evaluators’ judgments by reducing outcome effects or their consequences. Such studies
use two basic approaches: (1) identify interventions that reduce outcome effects (e.g., Kadous
2001) and (2) identify offsetting effects likely to exist in accounting contexts (e.g., Kinney and
Nelson 1996).
19
We do not test H3 because it is about the joint influence of probability weighting (H1) and outcome
temporality (H2). In Experiment 2, only H1’s probability weighting effects obtain, there is no joint
influence to consider, and so H3 becomes moot.
20
We repeated the supplemental analyses to H1 performed for Experiment 1, estimating the amount of
outcome bias (reverse outcome bias) in the 10%-to-30% (50%-to-90%) range of Bayesian probabilities. As
expected, the average outcome bias within the low range was +8.1 percentage points (p < 0.001, n = 58).
Similarly, the average reverse outcome bias within the high range was –11.7 percentage points (p = 0.023,
n = 11).
21
The estimate of . (see footnote 17) for all assessments of auditor negligence (collapsed across hindsight
and foresight) was 0.71. This estimate yields a curve that appears very similar to that of the cubic
polynomial regression model at Figure 4, and, as with Experiment 1, aligns very comparably with historical
estimates from Prospect Theory.
22
Although these two approaches have led to identification of factors that reduce or offset
evaluators’ outcome effects, they tend to downplay the possibility of reverse outcome bias and
vexingly do not separate outcome effects into appropriate belief revision and inappropriate bias.
Guided by Prospect Theory’s probability weighting function, we predict conditions under which
outcome bias and reverse outcome bias will obtain. We also empirically measure the degree to
which outcome effects reflect warranted belief revision, outcome bias, or reverse outcome bias.
Consistent with our theory, findings from two experiments show outcome bias for relatively low
Bayesian probabilities of auditor negligence, but reverse outcome bias for relatively high
Bayesian probabilities. Notably, even though reverse outcome bias has not been emphasized in
extant outcome-effect accounting studies, we predict and observe that it obtains – even when
evaluators’ assess audit quality in hindsight.
As with all experimental investigations of theory, this study naturally has some
limitations. While this is the first study to measure how evaluators’ assessments of audit quality
compare to a Bayesian benchmark, we rely on participants’ judgmental inputs for this benchmark.
We neither identify, nor contend that we could identify, accurate base rates of auditor negligence
or materially misstated financial statements. Of course, such accuracy information would be quite
costly if not impracticable to acquire and, of course, this limitation applies to all studies of
outcome effects in the accounting literature. One potentially profitable avenue for future research
is to obtain “best-practice” estimates of such base rates from very experienced academics,
auditors, standard setters, regulators and/or managers. A second limitation is that our theory and
experimental findings pertain to “on average” results. It would be profitable if future research
could develop theory as to how evaluator characteristics influence, or interact with environmental
factors to influence, the shape of the probability weighting function (Gonzalez and Wu 1999)
report preliminary, exploratory work in this area).
Despite these limitations, we provide the first theory-based empirical experimental
evidence regarding conditions under which outcome effects in accounting contexts likely reflect
23
warranted belief revision, outcome bias, and reverse outcome bias. Empirical findings from two
experiments suggest that both foresight and hindsight evaluators’ assessments exhibit outcome
bias and reverse outcome bias, depending on whether Bayesian probabilities of auditor
negligence are relatively low or relatively high, respectively. Interestingly, we predict and
observe reverse outcome bias for ecologically meaningful probabilities within audit litigation
contexts – those that fall at or above legal standards of proof such as “preponderance of the
evidence,” “clear and convincing,” and “beyond reasonable doubt.”
Our theory and findings collectively suggest it would be helpful to add reverse outcome
bias to the lexicon of biases germane to accounting contexts and to embellish de-biasing
frameworks in the accounting so they better account for: (1) the influence of multiple effects on
evaluator bias in applicable accounting contexts (e.g., audit litigation, capital budgeting,
performance evaluation), and (2) conditions under which interventions likely will influence such
effects or evaluator bias.
24
REFERENCES
ANDERSON, J.; D. J. LOWE; AND P. RECKERS. “Evaluation of Auditor Decisions: Hindsight Bias
Effects and the Expectation Gap.” Journal of Economic Psychology (1993): 711-737.
ANDERSON, J.; M. M. JENNINGS; D. J. LOWE; AND P. RECKERS. “The Mitigation of Hindsight
Bias in Judges’ Evaluations of Auditor Decisions.” Auditing: A Journal of Practice &
Theory (1997): 20-39.
BARON, J. AND J. C. HERSHEY. “Outcome bias in Decision Evaluation.” Journal of Personality
and Social Psychology 54 (1988): 569-579.
BROWN, C. E. AND I. SOLOMON. “Effects of Outcome Information on Evaluations of Managerial
Decisions.” The Accounting Review 62 (1987): 564-577.
BUCHMAN, T. “An effect of Hindsight on Predicting Bankruptcy with Accounting Information.”
Accounting, Organizations and Society 10 (1985): 267-285.
CAMERER, C. F. AND T. HO. “Violations of the Betweenness Axiom and Nonlinearity in
Probability.” Journal of Risk and Uncertainty 8 (1994): 167-196.
CHRISTENSEN-SZALASNKI, J. J. J. AND WILLHAM, C. F. “The Hindsight Bias: A Meta-Analysis.”
Organizational Behavior and Human Decision Processes 48 (1991): 147-168.
CLARKSON, P. M.; C. EMBY; AND V. W-S WATT. “Debiasing the Outcome Effect.” Auditing: A
Journal of Practice & Theory 21 (2002): 1-20.
CORNELL, R. M.; R. C. WARNE; AND M. M. EINING. “Remedial Tactics in Auditor Negligence
Litigation.” Working paper, University of Utah, 2005.
EDWARDS, W. “Conservatism in Human Information Processing.” Formal Representations of
Human Judgment. B. Kleinmuntz, ed., New York, NY: Wiley, 1968: 17-52.
FOX, C. R. AND R. BIRKE. “Forecasting Trial Outcomes: Lawyers Assign Higher Probability to
Possibilities that are Described in Greater Detail.” Law and Human Behavior 26 (2002):
159-173.
FOX, C. R. AND A. TVERSKY. “A Belief-Based Account of Decision under Uncertainty,
Management Science 44 (1998): 879-895.
FREDERICKSON, J. R.; S. A. PEFFER; AND J. PRATT. “Performance Evaluation Judgments: Effects
of Prior Experience Under Different Performance Evaluation Schemes and Feedback
Frequencies.” Journal of Accounting Research 37 (1999): 151-165.
GIGERENZER, G. Adaptive Thinking: Rationality in the Real World. Oxford, UK: Oxford
University Press, 2000.
GIGERENZER, G. AND U. HOFFRAGE. “How to Improve Bayesian Reasoning Without Instruction:
Frequency Formats.” Psychological Review 102 (1995): 684-704.
GIGERENZER, G. AND U. HOFFRAGE. “Overcoming Difficulties in Bayesian Reasoning: A Reply
to Lewis & Keren and Mellers & McGraw.” Psychological Review 106 (1999): 425-430.
25
GLATER, J. D. AND F. NORRIS. “Deloitte Parts with S.E.C. Over Audit of Company.” The New
York Times Online (August 2, 2001).
GONZALEZ, R. AND G. WU. “On the Shape of the Probability Weighting Function.” Cognitive
Psychology 38 (1999): 129-166.
HAWKINS, S. AND R. HASTIE. “Hindsight: Biased Judgments of Past Events After the Outcomes
Are Known.” Psychological Bulletin 107, 3 (1990): 311-327.
HERSHEY, J. AND J. BARON. “Judgments by Outcomes: When is it Justified?” Organizational
Behavior and Human Decision Processes 53 (1992): 89-93.
HERSHEY, J. AND J. BARON. “Judgments by Outcomes: When is it Warranted?” Organizational
Behavior and Human Decision Processes 62, 1 (1995): 127.
HOCH, S. AND G. LOWENSTEIN. “Outcome Feedback: Hindsight and Information.” Journal of
Experimental Psychology: Learning, Memory, and Cognition 15, 4 (1989): 605-619.
HOLMSTROM, B. “Moral Hazard and Observability.” The Bell Journal of Economics (Spring
1979): 74-91.
HOGARTH, R. M. “Beyond Discrete Biases: Functional and Dysfunctional Aspects of Judgmental
Heuristics.” Psychological Bulletin 90, 2 (1981): 197-217.
KADOUS, K. “The Effects of Audit Quality and Consequence Severity on Juror Evaluations of
Auditor Responsibility for Plaintiff Losses.” The Accounting Review 75, 3 (2000): 327341.
KADOUS, K. “Improving Jurors’ Evaluations of Auditors in Negligence Cases.” Contemporary
Accounting Research (2001): 425-449.
KADOUS, K. AND A. MAGRO. “The Effects of Exposure to Practice Risk on Tax Professionals’
Judgments and Recommendations.” Contemporary Accounting Research (Fall 2001):
451-475.
KAHNEMAN, D. AND TVERSKY A. “Prospect Theory: An Analysis of Decision under Risk.”
Econometrica 47, 2 (1979): 263-291.
KELMAN, M.; D. FALLAS; AND H. FOLGER. “Decomposing Hindsight Bias.” Journal of Risk and
Uncertainty 16 (1998): 251-269.
KENNEDY, J. “Debiasing Audit Judgment with Accountability: A Framework and Experimental
Results.” Journal of Accounting Research Autumn (1993): 231-245.
KENNEDY, J. “Debiasing the Curse of Knowledge in Audit Judgment.” The Accounting Review
70, 2 (1995): 249-273.
KINNEY, W. AND M. NELSON. “Outcome Information and the ‘Expectation Gap’: The Case of
Loss Contingencies.” Journal of Accounting Research 34, 2 (1996): 281-299.
LIBBY, R.; R. BLOOMFIELD; AND M. W. NELSON. “Experimental Research in Financial
Accounting.” Accounting, Organizations, and Society 27 (2002): 775-810.
26
LIPE, M. “Analyzing the Variance Investigation Decision: The Effects of Outcomes, Mental
Accounting, and Framing.” The Accounting Review 68, 4 (1993): 748-764.
LOWE, D. J. AND P. M. J. RECKERS. “The Effects of Hindsight Bias on Jurors’ Evaluations of
Auditor Decisions.” Decision Sciences 25, 2 (1992): 401-426.
MOECKEL, C. “The Effect of Experience on Auditors’ Memory Errors. Journal of Accounting
Research (Autumn 1990): 368-387.
MOECKEL, C., AND R. PLUMLEE. “Auditors’ Confidence in Recognition of Audit Evidence.” The
Accounting Review (October 1989): 635-668.
NETER, J.; M. H. KUTNER; C. J. NACHTSCHEIM; AND W. WASSERMAN Applied Linear Statistical
Models McGraw-Hill, Inc., 4th ed, 1996.
PEECHER, M. E. AND I. SOLOMON. “Theory and Experimentation in Studies of Audit Judgments
and Decisions: Avoiding Common Research Traps.” International Journal of Auditing 5
(2001): 193-203.
PHILLIPS, L. D., AND W. EDWARDS. “Conservatism in a Simple Probability Inference Task.”
Journal of Experimental Psychology 72, 3 (September 1966): 346-354.
PHILLIPS, L. D.; W. L. HAYS; AND W. EDWARDS. “Conservatism in Complex Probabilistic
Inference.” IEEE Transactions on Human Factors in Electronics HFE-7, 1 (March
1966): 7-18.
PRELEC, D. “The Probability Weighting Function.” Econometrica 60 (1998): 497-528.
RAIFFA, H. Decision Analysis: Introductory Lectures on Choices under Uncertainty. Reading,
MA: Addison-Wesley Publishing Company, 1997.
REIMERS, J. AND S. BUTLER. “The Effect of Outcome Knowledge on Auditors’ Judgmental
Evaluations.” Accounting, Organizations and Society 17, 2 (1992): 185-194.
TAN, H. AND M. LIPE. “Outcome effects: The Impact of Decision Process and Outcome
Controllability.” Journal of Behavioral Decision Making 10 (1997): 315-325.
TROCHIM, W. M. The Research Methods Knowledge Base, 2nd ed., Cornell University, Cincinnati,
Ohio: Atomic Dog, 2001.
TVERSKY, A. AND C. R. FOX. “Weighing Risk and Uncertainty.” Psychological Review 102(2)
(1995): 269-83.
TVERSKY, A. AND D. KAHNEMAN. “Advances in Prospect Theory: Cumulative Representation of
Uncertainty.” Journal of Risk and Uncertainty 5 (1992): 297-323.
TVERSKY, A. AND D. KOEHLER. “Support Theory: A Non-Extensional Representation of
Subjective Probability.” Psychological Review 101 (1994): 547-567.
WU, G. AND GONZALEZ, R. “Curvature of the Probability Weighting Function.” Management
Science 42, 12 (1996): 1676-1690.
27
WU, G. AND GONZALEZ, R. “Nonlinear Decision Weights in Choice under Uncertainty.”
Management Science 45, 1 (1999): 74-85.
WU, G.; J. ZHANG; AND R. GONZALEZ. “Decision Under Risk.” Blackwell Handbook of Judgment
and Decision Making, ed. N. Harvey and D. Koehler, Malden, MA: Blackwell
Publishers, 2004.
28
Table 1
Cubic Polynomial Regression of p(AN|MMP*) on Bayes – Experiment 1
Dependent variable: p(AN|MMP*) is participants’ posterior beliefs of auditor negligence.
Independent variables: Hindsight = 1 for subjects in the hindsight conditions (verbs
describing outcome information in past tense), 0 otherwise (verbs describing outcome
information in future tense). Bayes, Bayes2 and Bayes3 refers to the simple, squared, and
cubed Bayesian posteriors for p(AN|MMP*), respectively. P* refers to stated outcome
probability that we manipulated at five levels between-subjects (P* = 0%, 25%, 75%, 90% or
100%). Minutes refers to how long it took each participant to complete the instrument.
Intercept
Hindsight
Bayes
Bayes2
Bayes3
P*25%
P*75%
P*90%
P*100%
MINUTES
F
R2
R2adj
n
Estimated
Coefficient
–10.123
3.753
1.266
–2.416
1.867
1.237
11.595
18.322
5.021
0.525
10.123
36.7%
36.0%
811
Std.
Error
5.495
1.602
0.243
0.645
0.470
2.729
2.948
2.986
3.071
0.169
t
–1.842
2.343
5.203
–3.746
3.975
0.453
3.934
6.136
1.635
3.116
p
0.066
0.019
< 0.001
< 0.001
< 0.001
0.651
< 0.001
< 0.001
0.102
0.002
< 0.001
Table 2
Cubic Polynomial Contrast Tests of p(AN|MMP*) against Bayes – Experiment 1
Contrasts of participants’ posterior beliefs of auditor negligence, p(AN|MMP*), to Bayesian posteriors at low (25%) and high (75%) values of Bayesian posteriors
using estimated marginal means from the cubic polynomial model (Table 1).
Panel A: Foresight Conditions
At P*= 0%
Bayes
25.00% 75.00%
p(AN|MMP*)
24.54% 42.85%
Contrast
-0.46% –32.15%
Hypothesized Sign
+
–
Standard Error
2.78% 3.23%
t
–0.165 –9.947
p
0.869 < 0.001
At P* = 25%
25.00% 75.00%
25.78% 44.08%
0.78% –30.92%
+
–
2.09%
3.14%
0.372
–9.842
0.355 < 0.001
At P* = 75%
25.00%
75.00%
36.14%
54.44%
11.14% –20.56%
+
–
2.17%
2.62%
5.125
–7.845
< 0.001
< 0.001
At P* = 90%
25.00%
75.00%
42.86%
61.17%
17.86% –13.83%
+
–
2.28%
2.58%
7.844
–5.355
< 0.001
< 0.001
At P* = 100%
25.00%
75.00%
55.82%
47.87%
30.82% –27.13%
+
–
7.42%
2.80%
4.151
–9.706
< 0.001
< 0.001
Overall
25.00%
75.00%
31.78%
50.08%
6.78% –24.92%
+
–
1.55%
2.31%
4.384 –10.795
< 0.001
< 0.001
At P* = 100%
25.00% 75.00%
59.57% 51.62%
34.57% –23.38%
+
–
3.47%
2.77%
9.963
–8.450
< 0.001 < 0.001
Overall
25.00% 75.00%
35.53% 53.84%
10.53% –21.16%
+
–
1.55%
2.30%
6.782
–9.218
< 0.001 < 0.001
Panel B: Hindsight Conditions
Bayes
p(AN|MMP*)
Contrast
Hypothesized Sign
Standard Error
t
p
At P*= 0%
25.00% 75.00%
28.29% 46.60%
3.29% –28.40%
+
–
2.80%
3.24%
1.177
–8.764
0.120 < 0.001
At P* = 25%
25.00% 75.00%
29.53% 47.84%
4.53% –27.16%
+
–
2.09%
3.12%
2.172
–8.693
0.015 < 0.001
At P* = 75%
25.00% 75.00%
39.89% 58.20%
14.89% –16.80%
+
–
2.18%
2.61%
6.839
–6.441
< 0.001 < 0.001
At P* = 90%
25.00% 75.00%
46.62% 64.92%
21.62% –10.08%
+
–
2.29%
2.58%
9.434
–3.906
< 0.001 < 0.001
Table 3
Supplemental Analyses – Experiment 1
Mean outcome effects and outcome bias (reverse outcome bias) within low (10% ! Bayes ! 30%)
and high (50% ! Bayes ! 90%) ranges of Bayesian posterior probabilities of auditor negligence.
Hindsight
Observed Outcome Effect
Bayesian Outcome Effect
Outcome Bias (Reverse Outcome Bias)
n
Foresight
Observed Outcome Effect
Bayesian Outcome Effect
Outcome Bias (Reverse Outcome Bias)
n
10%!Bayes!30%
p
Estimate
14.8 < 0.001
2.5
n/a
12.3 < 0.001
144
10.4
2.7
7.7
133
< 0.001
n/a
< 0.001
50%!Bayes!90%
Estimate
p
16.5
< 0.001
26.7
n/a
-10.2
< 0.001
81
8.9
24.1
-15.2
81
0.001
n/a
< 0.001
Table 4
Panel A: Cubic Polynomial Regression of p(AN|MMP*) on Bayes – Experiment 2
Dependent variable: p(AN|MMP*) is participants’ posterior beliefs of auditor negligence.
Independent variables: Hindsight = 1 for subjects in the hindsight conditions (verbs
describing outcome information in past tense), 0 otherwise (verbs describing outcome
information in future tense). Bayes, Bayes2 and Bayes3 refers to the simple, squared, and
cubed Bayesian posteriors for p(AN|MMP*), respectively. P*SEC refers to outcome probability,
and equals 1 if the material misstatement outcome was based upon the results of the results of
an SEC investigation, and 0 if the outcome was based upon rumors about an SEC
investigation. Participants then evaluated P* numerically, and those evaluations of P* were
used in the computation of Bayes.
Intercept
Hindsight
Bayes
Bayes2
Bayes3
P*SEC
F
R2
R2adj
n
Estimated
Coefficient
2.128
–1.802
1.909
–3.143
2.237
–0.169
43.609
59.1%
57.7%
157
Std.
Error
3.589
2.390
0.377
1.117
0.836
2.417
t
0.593
–0.754
5.068
–2.814
2.677
–0.070
p
0.554
0.452
< 0.001
0.006
0.008
0.944
< 0.001
Panel B: Cubic Polynomial Contrast Tests of p(AN|MMP*) against Bayes – Experiment 2
Contrasts of participants’ posterior beliefs of auditor negligence, p(AN|MMP*), to Bayesian
posteriors at low (25%) and high (75%) values of Bayesian posteriors using estimated marginal
means from the cubic polynomial model (Panel A).
Bayes
p(AN|MMP*)
Contrast
Hypothesized Sign
Standard Error
t
p
25.00%
31.61%
6.61%
+
4.12%
1.604
0.055
75.00%
60.74%
–14.26%
–
4.72%
–3.022
0.001
Figure 1
A Hypothetical Probability Weighting Function
(e.g., Tversky and Kahneman 1992)
100%
80%
Behavioral
Observation
w(p)
60%
40%
20%
0%
0%
20%
40%
60%
Probability p
80%
100%
Figure 2
Probability Weighting Effect (H1) and Temporality Effect (H2)
100%
Observed Posterior
80%
Reverse
Outcome
Bias (H1)
60%
Foresight
Hindsight
40%
Temporality
Effect (H2)
20%
Outcome
Bias (H1)
0%
0%
20%
40%
60%
Bayesian Posterior
80%
100%
Figure 3
Polynomial Profile Plots of p(AN|MMP*) on Bayes – Experiment 1
from the polynomial regression model at Table 1
100%
Observed Posterior
80%
60%
Foresight
Hindsight
40%
20%
0%
0%
20%
40%
60%
Bayesian Posterior
80%
100%
Figure 4
Polynomial Profile Plots of p(AN|MMP*) on Bayes – Experiment 2
from the polynomial regression model at Table 4
100%
Observed Posterior
80%
60%
40%
20%
0%
0%
20%
40%
60%
Bayesian Posterior
80%
100%
Appendix A
Excerpt from Introduction of Experiment 1
Why is this Important? Unfortunately, undetected material misstatements can cause enormous
losses of money, as in the appalling Enron and WorldCom cases. Investors and lenders who rely
on the financial statements with undetected material misstatements can lose their money.
Employees of the company can lose their jobs and their life-savings, which is often invested in
the stock of the company they work for. Many innocent people, both inside and outside of the
company, can be harmed by an accounting or auditing failure. As we have seen, the negative
effects of misstated financial statements can touch hundreds of thousands of people. Thus, it is
absolutely critical to society that auditors exercise due professional care.
Appendix B
Excerpt from Experiment 1: Elicitation of Prior Beliefs
1. How often do you think that audited financial statements contain a
material misstatement?
A: _________ out of every 1,000 audited financial statements contain
a material misstatement.
2. How often do you think the auditors are negligent?
A: The auditors are negligent on _________ out of every 1,000
financial statement audits.
THE NEXT QUESTION WILL REFER TO YOUR ANSWER TO #2.
BEFORE ANSWERING #3, PLEASE WRITE YOUR ANSWER TO
#2 EVERYWHERE YOU SEE A DOTTED LINE IN #3.
3. Now, imagine the
negligent audits in your answer to #2.
Given that these auditors are negligent, how many out of these
do you believe contain a material misstatement in the audited
financial report?
A: The financial statements contain a misstatement on _________ of
negligent audits from #2.
the
Appendix C
Excerpt from Experiment 1: Outcome Information and Elicitation of Posterior Beliefs
from the hindsight [foresight] condition at P* = 25%
Instructions: For questions #4-5, we will be referring to an Earnings
Management Watch List compiled by a major securities rating agency. This is a
list of U.S. companies who have been identified by the rating agency as having
potentially manipulated [as potentially manipulating] their reported earnings
numbers in the financial statements [in the future]. This Watch List includes
information about a number of companies and the Big Four auditors who audited
[who will soon begin the audits of] their financial statements. After changing the
name of both the auditors and their clients, the list looks like the following:
Auditor
BIG-FOUR-A
BIG-FOUR-B
BIG-FOUR-C
BIG-FOUR-D
BIG-FOUR-A
BIG-FOUR-B
…
Client
U.S. Company, Inc. 1
U.S. Company, Inc. 2
U.S. Company, Inc. 3
U.S. Company, Inc. 4
U.S. Company, Inc. 5
U.S. Company, Inc. 6
…
IMPORTANT: Among the financial statement audits from this Earnings
Management Watch List, SOME (25%) of the audited financial reports actually
were [will be] materially misstated, even though the auditor gave [will give]
the financial statements a clean audit opinion after the audit was completed [once
the audit is complete].
Review Question
4. The audits that you just read about of the financial statements of companies
from the Earnings Management Watch List:
! have already finished.
! have not yet started.
Case Question
5. We have randomly chosen one audit from the above list. How frequently do
you believe that the auditors would [will] have been negligent on audits like
the one randomly chosen from this list?
A: I believe that the auditors would [will] have been negligent on ______
out of every 100 audits like the one randomly chosen from this Watch
List.
Appendix D
Excerpt from Experiment 2: Outcome Information and Elicitation of Posterior Beliefs
from the hindsight [foresight] condition at P* = SEC Investigation
Outcome Information
IMPORTANT! For questions #11-13, also assume the following: After the audit was completed
[is complete], Big Time Gravel encountered [will encounter] some financial difficulty. The SEC
(the U.S. Securities and Exchange Commission) began [will begin] an investigation, alleging that
the financial statements of Big Time Gravel were [will have been] materially misstated, even
though Jones & Company gave [will have given] the financial statements a clean audit opinion.
At the end of their investigation, the SEC concluded [will conclude] that Big Time Gravel [will
have] understated the amount of gravel and concrete inventory in its audited financial statements
by $15 million. As a result of Big Time Gravel’s financial difficulty, Bierhoff Inc., a lender who
loaned [will loan] Big Time Gravel $25 million, had [will have] to settle for collecting only $15
million of its $25 million loan, losing $10 million. Bierhoff Inc. alleged [will allege] that it [will
have] made the loan relying on the financial statements, audited by Jones & Company, which
were [will have been] found by the SEC to be materially misstated.
Case Questions
11. The SEC conducted [will conduct] an investigation and concluded [will conclude] that the
financial statements of Big Time Gravel were [will have been] materially misstated. Think
of 100 SEC investigations like the one described above. How frequently do you believe that
audited financial statements with an SEC conclusion such as this actually would [will] have
been materially misstated?
A: I believe that the audited financial statements actually would [will] have been materially
misstated for ______ out of every 100 audits with an SEC judgment like this one.
12. Think of 100 cases like Jones & Company’s audit of Big Time Gravel, and with the same
outcome information (described above). Given this outcome, how frequently do you believe
that the auditors would [will] have been negligent on audits like the one described above?
A: I believe that the auditors would [will] have been negligent on ______ out of every 100
audits like this one.