Application of Likelihood Ratios for Firearm and

This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/authorsrights
Author's personal copy
Science and Justice 53 (2013) 223–229
Contents lists available at SciVerse ScienceDirect
Science and Justice
journal homepage: www.elsevier.com/locate/scijus
Technical note
Application of likelihood ratios for firearm and toolmark analysis
Stephen Bunch, Gerhard Wevers ⁎
ESR, Mt Albert Science Centre, Hampstead Rd, Mt Albert, Private Bag 92021, Auckland, New Zealand
a r t i c l e
i n f o
Article history:
Received 26 February 2012
Received in revised form 17 December 2012
Accepted 19 December 2012
Keywords:
Bayes' rule
Firearms
Likelihood ratio
Posterior odds
Prior odds
Toolmarks
a b s t r a c t
Historically firearm and toolmark examiners have rendered categorical or inconclusive opinions and eschewed
probabilistic ones, especially in the United States. We suggest this practice may no longer be necessary or desirable, and outline an alternative approach that is within a comprehensive logical/Bayesian paradigm. Hypothetical forensic and non-forensic examples are provided for readers who are practicing firearm and toolmark
examiners, and the strengths and weaknesses of both approaches are considered.
© 2013 Forensic Science Society. Published by Elsevier Ireland Ltd. All rights reserved.
1. Introduction
In the United States especially, firearm–toolmark examiners often
provide in their scientific reports conclusions that are categorical—
identifications (individualizations) and exclusions. These conclusions
attempt to answer, for example, this question: Was this bullet fired
from this gun? The answers given are often Yes or No, with a “reasonable degree of scientific certainty,” or practical certainty. These results,
though usually highly probative we believe, do not reach, and cannot
reach, perfection (not even an exclusion of two bullets with fundamentally different rifling characteristics reaches absolute certainty, though it
comes vanishingly close). Thus what is important to note here is that
(1) these conclusions are in the form of posterior, or final odds, and
(2) mathematically, posterior odds can never equal infinity (probability
of 1).
Moreover, posterior odds cannot rationally exist without the assumption of prior odds, whether explicitly or implicitly. The unwitting
assumption of prior odds is not so apparent with firearm and toolmark
analysis, but it happens nonetheless with each categorical conclusion
offered.
It has been a forceful argument of many experts in evidence evaluation for many years that forensic experts ideally should not assume
prior odds and then produce posterior odds [1–3]. If a scientist ignores
all non-scientific, contextual information in assigning a prior, then
final probabilities could be very low and extremely misleading to
⁎ Corresponding author. Tel.: +64 9 8153912; fax: +64 9 8496046.
E-mail address: [email protected] (G. Wevers).
attorneys and the court. But if she does account for outside,
non-scientific information in the prior (that is, she is thinking of the
probability as a juror would), then not only is she risking scientific
bias, but she would be effectively usurping the rightful role of the
court in assessing investigative and other information, and the court
also would likely double count evidence contained in the scientist's
prior. The forensic scientist should properly consider only outside information that could affect her laboratory observations (what one actually
sees under the microscope) or non-case information that is scientific
and background in nature (such as the technical literature, training
and experience) and that bears on the interpretation of her results.
There is no bright line crisply separating the proper from the improper
use of contextual information by the forensic scientist. Judgment is involved. But ideally the scientist should leave the assignment of prior
odds to the judge and/or jury.
There are at least three solutions to this less-than-ideal state of affairs. First, the discipline could continue with the current paradigm
where practiced, but in the interest of intellectual honesty examiners
could provide transparency in reports and testimony by (1) avoiding
any suggestion of absolute certainty and indeed by providing words
such as “practical certainty” to categorical conclusions, and by (2) disclosing the assumption of contextual information that is embedded in
prior odds, which odds are implicit and logically necessary to effect
these conclusions. It is unclear how the second of these would be accomplished without adding a detailed verbal explanation, and we view this
solution as less than desirable.
As a second alternative, examiners could provide in the examination
report a conclusion by using the complete Bayes' rule. That is, provide a
final probability in quantitative terms. Both an LR and the prior odds
must be calculated and assigned, respectively. Clearly, however, this
1355-0306/$ – see front matter © 2013 Forensic Science Society. Published by Elsevier Ireland Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.scijus.2012.12.005
Author's personal copy
224
S. Bunch, G. Wevers / Science and Justice 53 (2013) 223–229
runs counter to the principle that scientists should not be assigning
prior odds. Another option would be for the scientist to leave the assignment of priors to the attorneys and the court, but provide guidance. That
is, the scientist could provide a sensitivity table in her report, such as the
one below, to demonstrate how changing the prior odds would change
the final odds.
Prior odds
Likelihood ratio (LR)
Posterior odds
1
1
1
1
5000
5000
5000
5000
5 to 1
250 to 1
1000 to 1
5000 to 1
to
to
to
to
1000
20
5
1
This solution is actually similar to the third alternative below, with
the exceptions that LRs must be quantitative and that they instruct
the contributor and the court on how to reason to final odds given
the examiner's LR conclusion.
The authors' preference is to abandon conclusions in the form of
posterior odds and shift to using a full likelihood approach (this also
could be termed a probabilistic or logical approach). Importantly, an
LR could be provided in either qualitative or quantitative terms. Consider how this solution could work in the following example.
2. A concrete example
In addition to Fig. 1, here are the relevant facts for this hypothetical
case, from either the submittal letter or the bullet examination:
1. A submitted .22 caliber projectile is on the left of the photograph; a
second .22 caliber projectile was recovered from another crime
scene 300 miles distant and is on the right. Both bullets are typical
in appearance and measurable features with bullets fired from .22
Long Rifle cartridges.
2. No firearm was submitted or found.
3. Fig. 1 displays the best microscopic agreement observed. Overall, the
microscopic agreement between the two specimens was very poor,
although a small amount of correspondence was observed around
the bullets.
4. All class characteristics were in agreement. The rifling was 7 lands/
grooves with a right twist. A search of the FBI's “General Rifling
Characteristics” database returned no firearms in .22 caliber with
these general rifling characteristics (GRCs).
Fig. 1. Photograph of a comparison between two .22 caliber projectiles found 300 miles
apart.
5. Also submitted to the laboratory was a videotape showing the defendant threatening to shoot a convenience store clerk with what
appears to be a large-caliber revolver. The tape was submitted for
the purpose of having the revolver's make and model identified as
part of a second examination.
The contributor wishes to know if these two projectiles were fired
from the same firearm. Assume that you are an examiner working this
case. By using the traditional categorical model, an inconclusive result
would be the norm. How would one (you) calculate a likelihood ratio
in this case? First, let us note that when determining or calculating a
likelihood ratio, the only matters typically under consideration are
your observations and your scientific knowledge base. Thus for a microscopic comparison anything associated with the activity (the shooting),
such as, the videotape and the differing locations where the bullets
were recovered, should be ignored.
Let us also note again that a likelihood ratio needs not be quantitative [4]. For this case, the prosecution hypothesis is that the two
bullets were fired from the same firearm. The defense hypothesis
is that they were fired from different firearms. We will designate
these propositions with symbols, SF and DF, respectively. The LR is
thus the probability of your examination observations given the scientific background information and assuming that they were fired
from the same firearm, divided by the probability of your examination observations given the background information and that they
were fired from different firearms. This can be written symbolically
as follows:
LR ¼
Pðobs j I; SFÞ
Pðobs j I; DFÞ
where “I” represents the scientific background information (and is usually omitted from the expression for the sake of brevity). The vertical line
(|) denotes a conditional, and simply means “assuming that,” or “given
that.” Thus P(obs|I, SF) can be read as “the probability of the observations
given that the bullets were fired from the same firearm.” Happily, one of
the features of the odds form of the Bayes' rule is that multiple LRs
containing separate pieces of evidence (observations) that are independent (i.e., changing one does not change another) can simply be multiplied together to form a more updated, single LR for a given hypothesis
and conditioning information. In our example, class-characteristic evidence is very likely independent of microscopic-agreement evidence.
Changing one should not change the other, or only slightly, such that
the assumption of independence is reasonable. In this case there are
two pieces of evidence—the caliber and the GRC data/match on the one
hand, and the degree of microscopic correspondence on the other. Thus
we can assign two LRs, one for each, then multiply them together to obtain the overall strength of the bullet evidence.
So what are your probabilities of observing .22 caliber and GRCs of 7/
right on these bullets? Go to your background information. Let us assume your laboratory does not yet possess an internal caliber/GRC database from physical evidence submissions. But you do know that the
FBI-GRC database returns zero guns that are .22 rimfire and .22 centerfire
caliber and are also 7/right. Your search of sales literature and the internet also returns nothing. And you have never seen 7/right in your experience. Let us pretend more research finally uncovers one make of rifle
in .22 Long Rifle caliber made in Latin America in the 1950s with rifling
of 7/right, but that is all (again, this is a hypothetical example). Therefore,
as 7/right is a rare rifling characteristic, you would assess the probability
of observing 7/right rifling, assuming that the bullets were fired from
different guns, as quite low. And conversely, as this rifling characteristic
is rare, the probability of observing 7/right rifling, assuming the bullets
were fired from the same gun, is extraordinarily high, virtually 1. Thus
the ratio of 1 to “quite low” is quite high. So for now our classcharacteristics LR is “quite high.”
Author's personal copy
S. Bunch, G. Wevers / Science and Justice 53 (2013) 223–229
But we should refine this further. In New Zealand the Institute of
Environment Science and Research Limited (ESR) has adopted the
following verbal likelihood scale that we will use for illustration purposes:
Report language
Likelihood ratio
Extremely strong support against
Very strong support against
Strong support against
Moderate support against
Slight support against
Neutral
Slight support for
Moderate support for
Strong support for
Very strong support for
Extremely strong support for
b0.000001
0.000001–0.001
0.001–0.01
0.01–0.1
0.1–1
1
1–10
10–100
100–1000
1000–1,000,000
>1,000,000
From the caliber match, the GRC data and the GRC match, and
from just perusing the verbal side of the scale, you might decide
that, given the rarity of these GRCs, that the LR is approximately
1000 and that strong support for is the best single fit with your
observations.
You next observe the overall microscopic agreement between the
specimens. Again, it is very poor, but some correspondence does exist,
although much less than would be expected for a known match comparison. As part of your overall knowledge base, you especially consider
the possibility of a subclass marking from the technical literature and
from your laboratory experience and training. You decide that due to
the poor correspondence, the microscopic agreement alone—separate
from the GRCs—has an LR greater than 1, but less than 10 and therefore
provides only slight support for the proposition that the bullets were
fired from the same gun. Combining these two pieces of evidence (the
two LRs), you believe that the best fit is very strong support for. Note
that this result is quite different from a traditional conclusion of
inconclusive.
Most examiners would probably prefer the above qualitative over
the quantitative approach for two reasons: (1) they are familiar and
comfortable with present-day, traditional conclusions that also are
reached via a qualitative process, and (2) there is no firm and accepted
database for population frequencies of class characteristics for bullets,
nor a probability model for microscopic marks that is nearly as well developed as those for, say, DNA evidence. Nevertheless, a quantitative
approach is logical and justifiable as an option. Further, it is often
easy to dismiss how much information already exists from extant
research. Then too, it would be a relatively small matter for a laboratory to keep track of class-characteristic data on all submitted firearms, via their own internal database. The frequencies yielded
would be directly relevant to calculating class-characteristic likelihood ratios. (Note that caliber and GRC frequencies from firearms submitted to a laboratory are arguably more suitable for class-characteristic
LRs than are background population frequencies—which would be
much more difficult or impossible to obtain in any case. The information
used would be from a pool of potential crime guns, as opposed to a pool
of all guns.) In actual practice, of course, an examiner could use both the
qualitative and quantitative approaches and compare them to settle on
a final conclusion.
In any event, let us proceed with the quantitative method for this
example. Of all the firearms in the relevant geographic area, or better,
from your laboratory's caliber/gauge/GRC database (and this would
include shotguns), and also from your knowledge base, you estimate
that about 1 in 800 fire .22 caliber projectiles with GRCs of 7/right.
This yields a class-characteristics LR of 800 given the alternative
hypothesis in this case. As for the micro-correspondence, you again
take into account your knowledge base, including if you wish, the
number and groupings of consecutive matching striations (CMS)
that you observe. You decide that this evidence alone is weak; that
perhaps 1 in 10 “innocent” guns might display this low level of
225
agreement (again, including shotguns). And the micro LR's numerator—
the probability of observing this level of correspondence given they
were fired in the same gun? You might judge this as anywhere
from between zero and 1, depending on several factors that could
affect what you observe, but for the sake of demonstration, let us
use 0.5. So the LR for this micro evidence is (0.5/0.1)=5, and thus
the total, overall LR follows:
LR ¼
800 0:5
¼ 4000:
1
0:1
This means that you believe that your observations were 4000 times
more likely if the bullets were fired from the same gun as opposed to
different guns. You consult the verbal scale; your verbal conclusion is
that your observations provide very strong support for the proposition
that the bullets were fired from the same gun as opposed to having
been fired in different guns.
Note here that if the scientist wishes, a “worst-case/best-case”
bracket could inform the conclusion, over and above the quantitative
LR range in the verbal scale. Doing so is attractive for those cases for
which unreliable or no data exists, or for which the examiner has little
confidence in the data. An examiner may believe that a single LR
could be significantly off the mark, but at the same time strongly believe
that the LR lies within certain limits. The procedure is analogous to providing quantitative distance brackets in gunshot residue examinations
for muzzle-to-garment distance. In this case, one might strongly believe
that the true population frequency of .22 caliber and 7/right in the relevant geographic area lies in the bracket of from 1 in 300 to 1 in 10,000. It
is the same for the micro-correspondence in that it might lie between 1
in 8 and 1 in 35. These figures would correspond to total LRs of 1200 to
175,000—ignoring for simplicity sake any bracket for the micro-LR
numerator—which in turn would correspond on the verbal scale to
the language of very strong support for. Thus your report language
could state that your observations provide very strong support for the
proposition that the bullets were fired in the same gun as opposed to
having been fired in different guns.
3. A second example
In addition to the above photomicrographs, here are many of the
relevant facts for this hypothetical case, taken from the shotshell
examinations:
1. Two 12-gauge shotguns were submitted: a break-open, hammerless single-barrelled shotgun, and a pump-action shotgun. Also
submitted was a fired 12-gauge evidence shotshell (Fig. 2).
Fig 2. Single-barrelled test-fire on left; evidence shotshell on right.
Author's personal copy
226
S. Bunch, G. Wevers / Science and Justice 53 (2013) 223–229
2. The single-barrelled shotgun's firing pin was not free-floating, and
protruded slightly into the breech area when the action was in an
open position.
3. Whether visible in these reproductions or not, the single-barrelled
test-fired shotshells featured fine, parallel breechface marks, along
with the vertical mark on the primer at the 12:00 position. These
marks were highly similar among all test-fires. The pump-action
test-fired shotshells featured non-linear, “pebbly” marks on the
primer, as did the submitted evidence shotshell. The vertical mark
was absent from both pump-action test-fired shotshells and from
the evidence shotshell (Fig. 3).
4. The firing pin impressions for the evidence and all test-fired specimens were highly similar in size and shape, with virtually no
marks in their interiors. Thus, by using the Association of Firearm
& Tool Mark Examiners (AFTE) Glossary definition of class characteristics, they (the firing pin impressions) were virtually the same
for all specimens [6]. The two marks on the battery cup of the
single-barrelled test-fired shotshells in the 7:00 position were inconsistent across test-fired shotshells. They were absent on the
test shotshells prior to firing (Fig. 4).
The contributor wishes to know if the submitted, fired evidence
shotshell was fired from the single-barrelled shotgun. Again, assume
that you are an examiner working in this case. By using the traditional
model, many examiners might individualize the submitted specimen
to the pump-action shotgun, given the small marks on the left and
right sides of the primers. Some might exclude the evidence shotshell
as having been fired in the single-barrelled shotgun. Others, on the
ground of similar class characteristics, might render an inconclusive
verdict. But how might you approach this case using likelihood ratios?
For the sake of brevity, let us consider only the quantitative approach, though realizing that the purely qualitative approach is perfectly legitimate. The prosecution hypothesis is that the evidence shotshell
was fired in the single-barrelled shotgun. We have two plausible defense hypotheses: that the submitted evidence shotshell was fired in
some unknown firearm, or that it was fired in the pump-action shotgun.
First let us do the math given that we are comparing the prosecution
hypothesis to defense hypothesis #1, in that the shotshell was fired in
some unknown firearm. From your knowledge base, and these examination observations, you might assign the following probabilities that
comprise two LRs, one for the class characteristics and the other for
the micro-correspondence:
P(class observations| shotshell fired from the single barrelled shotgun) = 0.95
P(class observations | shotshell fired from an unknown firearm) =
0.2
Fig 3. Pump-action test-fire on left; evidence shotshell on right.
Fig 4. Pump-action test-fire on left; evidence shotshell on right.
P(micro observations| shotshell fired from the single barrelled shotgun) = 1/2000, or 0.0005.
P(micro observations | shotshell fired from an unknown firearm)≅ 1.
Thus the overall LR for comparing the single-barrelled shotgun to
the unknown = (0.95/0.2) × (0.0005/1) ≅ 0.00238.
Since this is less than one, it weighs against the prosecution hypothesis in the LR's numerator and in favor of the defense hypothesis
that is in the LR's denominator. (The reciprocal, 420.1, favors the defense hypothesis by this amount. That is, the evidence is 420 times
more likely given that the shotshell was fired in an unknown shotgun
as opposed to being fired in the single-barrelled shotgun.)
Consulting the verbal scale, our conclusion is that our observations
provide strong support against the proposition that the shotshell was
fired in the single-barrelled shotgun, as opposed to having been fired
in some unknown shotgun. (This is equivalent to saying that the observations provide strong support for the proposition that the shotshell was
fired in some unknown firearm as opposed to having been fired in the
single-barrelled shotgun.)
Now let us do the computations given that we are comparing the
prosecution hypothesis to defense hypothesis #2, that the shotshell
was fired in the pump-action shotgun. You might assign the following
probabilities:
P(class observations| shotshell fired from the single barrelled shotgun) = 0.95
P(class observations | shotshell fired from the pump action shotgun) = 0.95
P(micro observations| shotshell fired from the single barrelled shotgun) = 0.0005.
P(micro observations | shotshell fired from the pump action shotgun) = 0.9.
Thus the overall LR for comparing the single-barrelled shotgun to
the pump-action shotgun = (0.95/0.95) × (0.0005/0.9) ≅ 0.00056.
Consulting the verbal scale, our conclusion is that our observations
provide very strong support against the proposition that the shotshell
was fired in the single-barrel shotgun as opposed to the pump-action
shotgun. (This is equivalent to saying that the observations provide
very strong support for the proposition that the shotshell was fired in
the pump-action shotgun as opposed to the single-barrelled shotgun.)
Several observations should be made here. First, as mentioned, probabilities are personal. Examiners will vary in their assignment of them.
For example, consider the 0.9 figure for the P(micro-obs|pump-action).
Here one would normally tend toward higher probabilities if the micro-
Author's personal copy
S. Bunch, G. Wevers / Science and Justice 53 (2013) 223–229
correspondence were high, and tend toward lower probabilities as the
correspondence decreases. However, if the guns were available for
test-firing, information about how well they reproduce marks from
shot-to-shot would be highly relevant, as would information about
the time span between crime-shot and test-firing, information about
tampering, etc. (It is our view that the latter type of information
would be a legitimate scientific information to be included in the assignment of probabilities, for the reason that the examination observations would be affected.) Clearly, an expert's judgment is involved. Thus
examiner “A” might assign a numerator value of 0.9 for this LR; examiner
“B” may assign a value of 0.7 or 0.95, or what-have-you.
But this variance is perfectly legitimate and simply reflects reality.
Fully informed and competent experts can, and will, reasonably disagree in their judgments, whether radiologists or civil engineers; nor
is this different from traditional practice. It is well understood and accepted that for any specific case, some examiners may conclude Yes
(or No), while others offer inconclusive conclusions. Moreover, the
New Zealand experience suggests that there is a good degree of consistency in verbal conclusions across examiners (inter-examiner reliability/precision), and this performance parameter is capable of a more
meaningful measurement for scaled conclusions, via validity and proficiency tests, than it is for the Yes/No/Maybe responses.
Second, notice how the LRs–and thus the strength of the evidence–
change depending on the hypotheses compared. Comparing one possibility to many possibilities is not equivalent to comparing one possibility to another single possibility.
Third, all known scientifically relevant information can be invoked
when assigning probabilities in the LRs. This includes empirical and theoretical studies on the probability of random striation matches, research
on subclass marks, and laboratory error rates from validity and proficiency tests. Error rates are a source of much debate, and clearly past
error rates should not be imported directly into a specific examination.
Nevertheless, the possibility of a false-positive laboratory error from
specimen mix-ups, for example, is real. Though admittedly not a purely
scientific consideration per se, the probability of an error in a report or
in testimony obviously bears on the administration of justice. And the
probability of laboratory error for a specific case easily can be many orders of magnitude greater than, say, a DNA random match probability.
One of the advantages of the likelihood approach is that, if the scientist
wishes, and if it is in accord with laboratory policy, the potentiality for
laboratory error can be incorporated directly into the LRs.
4. Laboratory error
For example, let us assume that in a specific cartridge-case examination for which your initial micro-correspondence LR very strongly supports the prosecution hypothesis of same-gun (SG), your probability for
the microscopic observations assuming a different gun (DG) is 1 in 2000
(essentially your probability of a random- or a subclass-match, where
“match” represents the quality and quantity of micro-correspondence
observed in this comparison). This figure of 0.0005 would be the denominator of a micro-correspondence LR. But now you also wish to consider the possibility of a false-positive (FP) laboratory/practitioner
error, specifically a false-positive error from a specimen mix-up between evidence and test-fired specimens. Given your lab's history,
given past false-positive errors in your lab stemming from specimen
mix-ups, your lab's review policies, and given the nature of the corrective actions taken subsequently, and of course your own mix-up experience, you assign a false-positive probability due to mix-up of 1 in
800 (0.00125) for this specific case. The two events, a coincidental or
subclass match on the one hand, and a false positive match due to a
mix-up on the other, are mutually exclusive, i.e., they cannot happen
at the same time. Thus the two probabilities are simply added: the
LR's denominator would now be 0.0005 + 0.00125= 0.00175, and
the micro LR, assuming for simplicity a numerator of 1, would be
227
approximately 571, not 2000. In symbols this would be expressed as
follows:
Pðmicro observationsjSGÞ
1
¼
≈ 571:
Pðmicro observations or FPjDGÞ 0:0005 þ 0:00125
Thus the inclusion of the false-positive potentiality from an evidence/
control mix-up has changed the microscopic LR from 2000 to 571, both
favoring the prosecution hypothesis.
To avoid double counting the potential lab error, the classcharacteristics LR would remain the same: let us assign its value as
50 in this example. Therefore the overall post-error LR would be
571 × 50= 28,550. From the verbal scale this result would still be
classified as very strong evidence in favor of the prosecution hypothesis.
Of course laboratory/practitioner errors of the false-negative type
also can occur. These can be included for cases in which initial LRs
support a negative, e.g., support a defense hypothesis of DG. Though
we include no sample calculation here, for these cases the probability
of an error would be incorporated into an LR's numerator instead of a
denominator.
5. Conclusion: the pros and cons of the likelihood approach
compared with the current categorical approach
The main flaw and weakness of the dominant paradigm have been
largely set forth in this paper already: categorical conclusions take the
structure of posterior odds, which ideally should not be provided in
reports and testimony. By their very nature, posterior odds in the firearm–toolmark discipline incorporate non-scientific contextual information that is contained in the prior odds, which odds thus are
inappropriate for the forensic scientist to assume. Moreover, presently most firearm and toolmark examiners are unaware that they are
assuming prior odds, and thus are unaware that they are incorporating contextual information in their conclusions.
Consequently, it is also our belief that the traditional model is
far more prone to a contextual bias. Consider the following four
statements:
1. Given the potentiality for contextual bias, no responsible forensic
scientist should include information that is completely external
to an examination when forming an evaluative opinion; only the
observations before him and the relevant background scientific information should be brought to bear. The outside information is for
the court to consider, not the scientist.
2. Firearm and toolmark examiners, indeed examiners in all of the individualizing sciences, often offer categorical conclusions. And
these take the form of posterior odds;
3. Posterior odds require prior odds. In actual practice, examiners
who individualize unwittingly assume a primitive or naive prior
of 1:1 or thereabouts, which contains non-scientific contextual information. Thus their categorical conclusion also necessarily draws
on non-scientific contextual information.
4. When seeking an answer of posterior odds, Bayes' rule provides that
new, relevant information of all types be accounted for in either the
prior odds or via an additional likelihood ratio having to do with this
new information. In this way posterior odds are updated, as we have
seen.
But this set of statements is clearly self-contradictory. Interestingly, the experiments conducted by Itiel Dror and his colleagues with
fingerprint examiners illustrate this contradiction [6,7]. The examiners often changed–updated–their conclusions given new, outside
information. This was a quite logical step in and of itself, in accordance with the Bayes' rule. But it violated the principle that forensic
Author's personal copy
228
S. Bunch, G. Wevers / Science and Justice 53 (2013) 223–229
scientists should never incorporate non-scientific information external to the examination process.
Another weakness of the traditional paradigm is the all-ornothing nature of the conclusions. Imagine a bullet comparison in
which very strong micro correspondence slowly disappears. At the
beginning of this thought experiment the correspondence would result in a traditional individualization. But as the correspondence gradually disappears, the examiner reaches a decision point where she
“falls off the cliff” and renders an inconclusive decision. The result is
that probative information is lost, information that in general can go
either toward exoneration or incrimination. Another example of lost
information is in the present treatment of caliber and GRC data for
bullets. It is used as a filter, a screening device to be applied before
proceeding onward. If further microscopic examination yields little
or nothing, the caliber and GRC evidence are then lost in an inconclusive outcome. The rationale for this is sometimes given that there
exist many firearms bearing those GRCs, along with that caliber, so
that this information should not be considered. This reasoning is
faulty. To place this reasoning in sharp relief, we know that habitual
cigarette smoking increases the probability of lung cancer by many
times. And yet perhaps we have all heard the “Uncle Henry” fallacy:
“My Uncle Henry lived to 95 and smoked liked a chimney all his life
without a speck of lung cancer. Obviously people smoke and do not
get lung cancer. I do not put any stock in all that risk nonsense.”
True, there are no formal databases containing caliber and GRC
frequencies (though each laboratory could relatively easily construct
them). But examiners know much about relative caliber/gauge and
GRC frequencies from their laboratory experience, from the FBI's
GRC database, from doing research and from general familiarity with
firearms. Their knowledge is not perfect, but none is. Even DNA databases are imperfect representations of reality. Forensic scientists should
be paid to use their expert judgment, and all judgment is imperfect [4].
Note here that part of using expert judgment may in some cases lie in
the use of best-case/worst-case analyses, as presented earlier. If the
boundaries of the LR-bracket from such an analysis are above and
below a value of one, then no conclusion or a weak conclusion is rational and expected. If the bracket boundaries are reasonable and either
less than one or greater than one, then it is our view that forensic scientists should seriously consider reporting these findings in the interests
of justice and sound public policy and be willing to discuss their reasoning and assumptions in testimony and perhaps in reports. Once again,
however, expert judgment is involved in deciding this question.
Related to this, much of the data that firearm examiners observe is
continuous in nature. There are no discrete levels attached to the
quantity and quality of micro correspondence. And yet the conclusions offered in the traditional paradigm are bluntly discrete. Clearly
many individualization and exclusion conclusions are stronger than
others. The traditional approach has no means to accommodate this
fact.
Of course, lost information also occurs at the opposite end of the
spectrum from individualizations. The generally accepted policy and
protocol is to offer exclusions only with clearly incompatible class characteristics, with some exceptions [8]. But this exclusion criterion appears to be more stringent than the criteria for individualizations. The
latter have to do with the degree of microscopic correspondence observed. The criterion for exclusions largely ignores incompatible microscopic correspondence (in many or most cases) for the reason that it is
conceivable that the firearm or tool could have changed or been tampered with from the time of the crime to the submission of the specimen to the laboratory, thus changing the microscopic marks. But just
as with cigarette smoking or GRCs, a possibility of tampering, for example, does not logically justify ignoring probabilities and expert judgment
about them. A likelihood approach not only creates a greater space
for examiner judgment and a range of conclusions at the negative end
of the spectrum, it also allows the examiner to apply the same or
near-same standard across the spectrum: class characteristics can
tend toward the positive or negative; likewise, microscopic correspondence (or lack thereof) also can tend toward the positive or negative.
Then too, the examiner using the traditional paradigm must decide when class characteristics are truly “incompatible.” Again, this
kind of incompatibility is not a discrete phenomenon but a continuous one. Judgment is called for in many cases (7/right vs. 5/left on
bullets is obviously easy. Differing firing pin impressions on cartridge
cases often are not). But a discrete cutoff, as with individualizations,
means that once again, “falling off the cliff” can and does occur.
We argue that there are no scientific or logical advantages to the
traditional approach. Only deficits. Nevertheless, tradition does offer
some practical advantages. First, there is the ease of use by examiners,
and the ease of understanding (though this ease of understanding is
often illusory when the Bayes' rule and prior odds are introduced as
a necessity for complete understanding). Moreover, judges and lawyers currently understand the notion of final probability better than
they understand the meaning of likelihood ratios.
These practical advantages are hardly lethal to the likelihood approach, however. The now-disbanded British Forensic Science Service
from 2000 to 2006 provided lectures in Continuing Education Seminars
for judges in order for them to better understand the entire likelihood
approach. This educational program was received positively [4].
The pros and cons for the likelihood approach are more or less the
mirror images of those for the traditional approach. It is scientifically
and logically sound. There is no usurping the role of the court/jury;
there is no falling-off-the-cliff problem; evidence is not lost. All scientific information is capable of being smoothly and logically accounted
for, including CMS data and laboratory/examiner error possibilities.
There is less risk of a contextual bias. The likelihood approach should
improve the overall administration of justice.
But initially at least, not all judges and juries may feel comfortable
with likelihoods; perhaps much more telling, nor would examiners
steeped in traditional thinking. Training for examiners would require
updating. And persistent training workshops for jurists also would
prove helpful, as in the British experience. Still, for some time there
could be resistance in the courts, at least to some degree, and certainly resistance would come from many established examiners. But at
the end of the day, as in Britain and other countries, we believe that
the logical force of the likelihood approach would win converts and
prevail on a practical basis as well. The experience in New Zealand
has been quite positive overall.
Moreover, adoption of this approach would improve the scientific
and professional status of both the science and of the firearm–toolmark
practitioner, respectively. Conclusions would flow more logically and
coincide far more closely with the strength of the observations/
evidence. There would be no need to explain in trials or scientific validity hearings how the term individualization fails to mean certainty. The
likelihood approach logically would do away with the misleading
all-or-nothing debates over, for example, subclass marks, for this possibility could be incorporated into LRs.
For the firearm–toolmark examiner, assertions that they are mere
technicians would be harder to make stick. The hallmark of a technician
is the lack of theoretical knowledge of a subject and the unquestioning,
rote compliance with written procedures. Training and understanding
of the likelihood approach would move the needle farther toward a scientist and away from a technician. To properly execute this method, the
examiner going to court must have a deeper theoretical understanding
of the science and the significance of findings. In sum, it is our recommendation that examiners worldwide–especially in the United States–
begin moving toward the likelihood approach and toward the standardization of verbal scales and training.
References
[1] B. Robertson, G.A. Vignaux, Interpreting Evidence: Evaluating Forensic Science in
the Courtroom, John Wiley & Sons, Chichester, 1995, pp. 20–21, (219–220).
Author's personal copy
S. Bunch, G. Wevers / Science and Justice 53 (2013) 223–229
[2] C.G.G. Aitken, F. Taroni, Statistics and the Evaluation of Evidence for Forensic
Scientists, John Wiley & Sons, Chichester, 2004, pp. 105–112.
[3] Guest Editorial, Expressing evaluative opinions: a position statement, Science &
Justice 51 (2011) 1–2.
[4] C.E.H. Berger, J. Buckleton, C. Champod, I.W. Evett, G. Jackson, Evidence evaluation: a
response to the Court of Appeal judgment in R v T, Science & Justice 51 (2011) 43–49.
[5] Association of Firearm & Tool Mark Examiners (AFTE) Glossary, 5th ed., version
5.070207, 2007, pp. 172.
229
[6] I.E. Dror, D. Charlton, A.E. Peron, Contextual information renders experts vulnerable to
making erroneous identifications, Forensic Science International 156 (2006) 74–78.
[7] I.E. Dror, D. Charlton, Why experts make errors, Journal of Forensic Identification
56 (2006) 600–616.
[8] Guideline on exclusions, www.swggun.org.