Law, Probability and Risk (2012) 11, 1−24 Advance Access publication on October 24, 2011 doi:10.1093/lpr/mgr020 Scale of conclusions for the value of evidence A NDERS N ORDGAARD† The Swedish National Laboratory of Forensic Science (SKL), SE-58194 Linköping, Sweden and Department of Computer and Information Science, Linköping University, SE-58183 Linköping, Sweden R ICKY A NSELL The Swedish National Laboratory of Forensic Science (SKL), SE-58194 Linköping, Sweden and Department of Physics, Chemistry and Biology, Linköping University, SE-58183 Linköping, Sweden AND W EINE D ROTZ AND L ARS JAEGER The Swedish National Laboratory of Forensic Science (SKL), SE-58194 Linköping, Sweden [Received on 8 April 2011; revised on 12 September 2011; accepted on 13 September 2011] Scales of conclusion in forensic interpretation play an important role in the interface between scientific work at a forensic laboratory and different bodies of the jurisdictional system of a country. Of particular importance is the use of a unified scale that allows interpretation of different kinds of evidence in one common framework. The logical approach to forensic interpretation comprises the use of the likelihood ratio as a measure of evidentiary strength. While fully understood by forensic scientists, the likelihood ratio may be hard to interpret for a person not trained in natural sciences or mathematics. Translation of likelihood ratios to an ordinal scale including verbal counterparts of the levels is therefore a necessary procedure for communicating evidence values to the police and in the courtroom. In this paper, we present a method to develop an ordinal scale for the value of evidence that can be applied to any type of forensic findings. The method is built on probabilistic reasoning about the interpretation of findings and the number of scale levels chosen is a compromise between a pragmatic limit and mathematically well-defined distances between levels. The application of the unified scale is illustrated by a number of case studies. Keywords: evidence value; ordinal scales; likelihood ratio; logical approach. 1. Introduction Forensic science is science used for the purpose of law (Caddy and Cobb, 2004). The demand for forensic science expertise has grown steadily throughout the last decades. To some extent this is due to the increased demand for the prosecution to present physical evidence, such as forensic evidence in court, as well as developments of novel techniques and introduction of forensic databases. Some disciplines such as DNA and IT forensics has experienced an exponential growth, whereas other disciplines such as handwriting at the same time experienced a diminishing demand. † Email: [email protected] c The Author 2011. Published by Oxford University Press. All rights reserved. 2 A. NORDGAARD ET AL. In order to be used accurately, the outcoming results of the forensic scientific analyses of traces and evidence material has first to be interpreted for evidential value which then has to be converted into phrasings standardized and understandable in the court process. One facilitator in bridging different scientific findings and make their weight transparent and comparable to the legal process is to report their weight by using a verbal scale. 2. Forensic casework Forensic casework in Sweden today is to some extent served by county police laboratories, though the bulk is served by one centralized national forensic science laboratory, SKL (Swedish National Laboratory of Forensic Science) positioned as an independent laboratory within the police. In Sweden, the disciplines of forensic medicine, forensic psychology, forensic toxicology and paternity testing are positioned outside the police forming a separate authority serving those specific demands. The police laboratories perform a limited array of forensic investigations, whereas the duties of the national laboratory cover most forensic disciplines and also include research and development of old and novel forensic skills and techniques. Thus, the vast majority of the forensic investigations requested for at SKL originate from the police and they cover investigation of any kind of evidence seized at a crime scene investigation or from individuals involved. The investigations requested for vary considerably in complexity, depending on the specific case in question. A large part of the investigations performed at the laboratory are limited to one specific or a few forensic disciplines, whereas other cases such as murder and armed robbery cases cover a broad span of expertise needed to fulfil the investigations requested. The laboratory digital case management system (LCMS) in use is modern and there is a high degree of electronic information passing to and from the police and within the different laboratory functions. More or less digitally connected to the LCMS are a magnitude of different expert systems and databases that support different expertise in use. The laboratory information management system (LIMS) created for the analyses and information flow for biological samples and DNA, in particular the high throughput chain handling reference DNA samples from sampling to DNA databasing and hit reporting, is highly developed (Hedman et al., 2008). As the laboratory delivers expert witness statements from multiple forensic disciplines to serve as part of the crime investigation report, sometimes reported in one multidisciplinary statement, a unified way to report the findings are needed. In addition, the combination of evidence from results originating from different disciplines must be achievable in a balanced way. The endpoint of any forensic crime investigation is the presentation of the forensic evidence in court. One or several forensic reports covering different areas of expertise will be part of the written evidence presented by the prosecution in the court proceedings. The forensic expert, or reporting officer, may also be required to attend court as an expert witness giving oral evidence, although this is generally not the case in Sweden. A vital part of the testimony is to present the robustness and evidential weight of the findings. 3. A unified scale of conclusions at the Swedish National Laboratory of Forensic Sciences The forensic interpretation at SKL has almost always been made according to a scale of conclusions. The wording in the scale has somewhat shown a span between the inconclusive and some type of support for an assumption—there are words, intensifiers, to pinpoint a rising strength. During the 1990s, SCALES OF CONCLUSIONS IN FORENSIC INTERPRETATION 3 a number of predominant intensifiers used became the skeleton of some type of a common scale of conclusions, although scales with other wordings or other levels still existed in some of the disciplines. Moreover, there was no congruence in what way the questions from the commissioner were answered or when and how an interpretation was done. The differences between identifying something, interpreting what could be source of something and interpreting some type of activity/event were also not distinct. The propositions forwarded often concerned a mix of those three issues, which led to diffuse statements, and it was frequently unclear how the outcomes of different methods interfered in the interpretation according to propositions at different levels. Should the intrinsic features of a trace and the location, number and distribution of findings have an input on the interpretation of a source or of an activity? Then, which evidence could be combined and in what way? A project was initiated during the late 1990s with the aim to decide what principle of evidence interpretation to be used, standardize a common scale of conclusions and increase the overall knowledge about the interpretation of evidence. The existing scale of conclusions today at SKL (first edition in 2005) is designed in such a way that the interpretation of the findings is made with respect to a pair of propositions, and at the same time the probabilities of the propositions themselves are not interpreted. It has also been designed to fit a logical approach using likelihood ratios (Buckleton, 2005). In our training (in interpretation) of forensic experts, it has been pinpointed as important to have a clear and transparent addressed proposition, and also to state what is included in the alternative proposition. It has further been thoroughly discussed what shall and can be used when interpreting at a source level of propositions compared to an activity level. Today we often reformulate the issue of the commissioner to distinct propositions, and try to pre-assess the value of the investigation, even before the investigation is commenced. In Fig. 1 is shown the current edition of the scale of conclusions at SKL translated into English. The scale has nine levels expressed as consecutive integers ranging from −4 to +4. The positive numbers are used when the findings are such that they are more consistent with the proposition forwarded by the commissioner (or more specifically the proposition that is the reformulation of the commissioner’s question) than with the alternative proposition. The negative numbers are used for the opposite situation. Level 0 is used when the findings are (in principal) equally consistent with both propositions. To exemplify: If the commissioner asks whether a trace G originates from a potential source S, this question is typically reformulated as the proposition ‘S is the source of G’ and an alternative proposition is typically formulated as ‘Some other source is the source of G’. If the findings are more consistent with the former proposition, a positive level is used, and if they are more consistent with the latter proposition then a negative level is used. The question from the commissioner does not however lead to a formulated proposition that is incriminating. As an example, the commissioner may ask: Is this passport authentic? The reformulated proposition would be ‘The passport is authentic’, but the suspicion behind the question is of course that the passport is a forgery which also becomes the alternative proposition. Thus, if the findings are consistent with a forgery, they will be reported with a negative level in the scale, which in turn shall be interpreted as support for a criminal activity. Each level of the scale comes with its number, a verbal equivalent to this number and an explanatory text (in italics in Fig. 1). The explanatory text is for most levels formulated in such a way that it should be clear that the logical approach (Buckleton, 2005) is used for evaluation. Exceptions are the levels −1, 0 and 1 where a more simple explanation is used, still not jeopardizing the meaning of the level. Note that in the scale the word ‘hypothesis’ is used instead of ‘proposition’. Generally we consider these two to be synonymous expressions in the framework of forensic interpretation, 4 A. NORDGAARD ET AL. FIG. 1. Scale of conclusions used at The Swedish National Laboratory of Forensic Science (Statens Kriminaltekniska Laboratorium). but in the scale we have avoided to change the wording as the word hypothesis is basically the same in Swedish. In the following, we shall merely use the word ‘proposition’ with some exceptions (e.g. when relating to general statistical theory). 4. Ordinal scales in forensic interpretation 4.1 Scales in general Many features in our daily environment are valued against scales of different kinds. One of the most common scales is the temperature scale. No person without knowledge about physics would be able give an objective and transparent definition of what is meant by stating that the temperature is 10◦ C SCALES OF CONCLUSIONS IN FORENSIC INTERPRETATION 5 or 60◦ F, but most people would have a personal opinion about what such a statement means to her or him. If we could follow the interpretation within a person’s mind we would probably find a less accurate scale with levels like ‘extremely cold’, ‘very cold’, ‘cold’, ‘chilly’, ‘moderately chilly’, ‘moderately warm’, ‘warm’, ‘hot’, ‘very hot’, ‘extremely hot’. There are several levels used with this scale, but still far from the accuracy that characterizes e.g. a thermometer. While the thermometer has what is referred to as an ‘interval’ scale, the verbal levels described above constitute an ‘ordinal’ scale. The difference between these two is about the distances between levels. For the interval scale, the distance is the same between two consecutive levels no matter where in the scale the measuring is done. For the ordinal scale, the corresponding distances (may) vary. Ordinal scales may be easier to use since the levels are limited in number, but the interpretation often suffers from subjectivity. This is particularly true for different types of grading scales, such as course grades, DNA quality grades, wine tasting grades etc. Hence, to make an ordinary scale useful, there is need for careful instructions about how to select levels in a particular situation. Ordinal scales are frequent in the area of jurisdiction and forensic interpretation of evidence. For instance, the (Swedish) Police use a scale to express in what degree of suspicion an individual is arrested or remanded in custody for being the perpetrator of a crime, with the levels ‘identification’, ‘reasonable suspicion’ (lower degrees) and ‘probable cause’, ‘objectively grounded suspicion’ (higher degrees). At the other end of the legal process, the court statements would be dichotomous in the sense that they would either convict ‘beyond reasonable doubt’ or acquit ‘with respect to insufficient evidence’. One would expect that in a particular case, the level of such a scale is selected with almost no element of subjectivity, but there is yet no guarantee that two cases that are in principle identical would result in identical levels. The science of law is not exact and the use of scales therein cannot be compared with the selection of scale levels within e.g. physics or chemistry. The ordinal and interval scales are the most frequent ones for evaluation purposed to judge upon whether a particular value should be considered low or high (or anything in-between). For classification purposes, there is also the ‘nominal’ scale, which completely lacks numerical relationships between the levels (e.g. nuances of paint). Scientific measurements are most often given on a ‘ratio’ scale in which there is a well-defined zero value allowing us to compare values on a relative basis. These two scales, however, are not in focus for the evaluation of evidence and will therefore not be considered in the current paper. 4.2 Scales of conclusions Forensic science takes an intermediate position in that it comprises several parts from various disciplines, more or less exact. Ordinal scales of conclusions are practised within forensic interpretation objecting to present the findings in a form that is free from statements and terminology requiring knowledge within such different scientific areas. In particular, statements regarding forensic evidence will by natural reasons contain probabilistic reasoning, but probabilities themselves may be very difficult to communicate outside the field of expertise. Therefore, the selection of wording in an ordinal scale of conclusions for forensic evidence evaluation becomes a challenge. We shall illustrate this through an example. At a forensic laboratory, one might have come to the conclusion that a particular piece of evidence is consistent with a statement forwarded by the police, for instance that a glove was worn by a particular suspect. Let us assume that the police bring the glove and some biological samples (hairs, blood etc.) from the suspect to the forensic laboratory along with the question ‘Was this glove worn 6 A. NORDGAARD ET AL. by the suspect?’. Without careful thinking, it might be close to express the findings like ‘There is a high probability that the glove was worn by the suspect’ since this in fact is a direct answer to the question. However, such a statement is not correct as value of evidence but as a statement of the case once the forensic findings have been applied to the prior beliefs about whether the suspect wore the glove or not (cf. Aitken and Taroni, 2004). It is not up to the forensic examiner to give statements about the case since he or she should not possess any prior beliefs. Now let us alter the statement so it reads, ‘Our findings give that the probability that the glove was worn by the suspect is high’. A person inexperienced with probability reasoning might find the meaning of the latter statement different from the meaning of the former, but they of course mean the same, and it is only the wording that is different. To change the meaning the word probability (or a synonymous term or expression) must go for the findings and not for the case. Consider the following two expressions: (i) ‘Our findings are highly probable if the glove was worn by the suspect’. (ii) ‘There is much higher probability to obtain our findings if the glove was worn by the suspects than if it was not’. What is the main difference between these two statements? Both statements are consistent with the way evidence evaluation is pursued as they both express probabilities for the findings and not for the statements forwarded by the police (or by a prosecutor). However, (i) is not complete since it only concerns the probability of the findings if the suspect actually wore the glove. There is no value in this statement useful to the court because the probability of the findings might be equally high if the suspect did not wear the glove. Statement (ii) concerns two probabilities for the findings, one given the glove was worn by the suspect and one given it was not. Thus, the findings have a value that can be related to prior beliefs about the case. The example shows that care must be taken what the word probability goes for but in addition the statement cannot be built on a single probability. This in turn implies that the probability scale cannot be used directly for evidence values, which might sound like a drawback keeping in mind that the inclusion of technical evidence in a case requires probabilistic reasoning. However, it is not until the evidence has been applied to the prior beliefs that statements may be forwarded with a degree of (un)certainty; in other words, that the final statement is true up to a probability. As we shall discuss below, statement (ii) is built up from the ‘ratio’ of two probabilities and such a ratio has another scale than has the probability scale. It is seldom easy to interpret a statement telling how much higher is the probability of the findings given that the glove was worn by the suspect compared with the case were it was not. For instance, what would we mean by stating that ‘the findings are 10 000 times more probable if the glove was worn by the suspect than if it wasn’t’? To make the statement more interpretable, we must translate it from wording with probabilities to wording with more ordinary expressions. The statement (ii) may for example be alternatively expressed as (ii*) ‘Our findings strongly support that the glove was worn by the suspect’. Note that this wording might be interpreted as if we were referring to the case and not the evidence. The expression is then confused with something like ‘Our findings give with high probability that the glove was worn by the suspect’. Nevertheless, there is a distinction between the expressions ‘strongly supports’ and ‘gives with high probability”. The former means that looking at the evidence from different perspectives (i.e. wore or wore not the glove), we consider the findings to be SCALES OF CONCLUSIONS IN FORENSIC INTERPRETATION 7 (much) more expected if the glove was worn by the suspect and thus makes that proposition the best explanation for the findings. The latter means that we have come to a conclusion about the state of the proposition with a high probability. Once we have selected a system for rephrasing statements originally involving probabilities, as is the case when translating (ii) to (ii*), we can construct an ordinal scale within this system. A statement representing a lower value than does (ii*) can be ‘Our findings support that the glove was worn by the suspect” and with an even lower value it can be ‘Our findings support to some extent that. . . ’. Corresponding expressions amplifying the degree of support will give higher levels on the scale. The resulting scale, i.e. the collection of levels, becomes ordinal, since there is no possibility to select expressions that have numerically meaningful distances (that would be normalized) unless the expressions are just verbal counterparts to real numbers. The number of levels that should be defined on such a scale depends to a great extent on how the scale is adopted among jurors, prosecutors, defence attorneys and the police. Usually there must be a multistage procedure of developing a scale, including several propositions, feedback and revision before the final scale is settled. 4.3 Issues of interpretation Interpretation of verbal expressions for the value of evidence has previously been studied. Sjerps and Biesheuvel (1999) made an experiment in which they studied jurists’ opinions about two different scales. The first scale was the one currently used at that time within The Netherlands Forensic Institute. This scale used expressions that from a probabilistic point of view related more to the case and less to the evidence. The second scale was a suggested alternative in which the formulations were in line with the expressions (ii) and (ii*) above, i.e. the scale levels related to how probable the findings were under the assumption that one proposition was true compared with how probable they were under other assumptions. The results nevertheless showed that jurists’ in general had problems to understand the necessity of the second scale and preferred the first one. Broeders (1999) addresses the same type of problem, but by studying in which way forensic scientists report and understand their reporting of their findings. Recently, SKL has conducted a survey among different actors in the Swedish judicial process (Nordgaard et al., 2010) about their perceived strength of different phrases (in Swedish) used as ‘amplifiers’ of the support. The results from this survey show that there is substantial variation among the actors in perceived strength but also in their final opinion about a particular case when a certain phrase have been used in the evaluation statement for the technical evidence. For instance, wording that from a linguistic perspective should mean that the support in practise implies certainty about the case was by many respondents interpreted as just moderately strong. So far we have not included probabilistic formalism into the discussion. This is partly due to the fact that mathematics is problematic to use in the interface between the forensic laboratory, which to a large extent uses scientific methods originating in natural and mathematical sciences, and the judicial community (police, prosecutors, lawyers and judges), where such scientific methods may be very hard to understand. Later, we will shortly review and use results from probability theory to explain how findings can be translated to an ordinal scale, but for the moment we may say that the development of ordinal scales for forensic interpretation is thoroughly built on correct probabilistic reasoning and in addition on how relations between probabilities can be interpreted. This is a work that needs to be done by forensic scientists, with sufficient knowledge about probabilistic reasoning and its applicability in forensic biology, chemistry, informatics etc., but at the same time integration with the judicial community is by all means necessary. 8 A. NORDGAARD ET AL. 5. Reporting forensic findings on the scale of conclusions 5.1 The likelihood ratio The state-of-art in forensic interpretation is to evaluate forensic evidence with the use of a likelihood ratio (cf. Aitken and Taroni, 2004). The likelihood ratio expresses the relative strength of the evidence in the comparison of one proposition, often referred to as the ‘prosecutor’s hypothesis’, against another referred to as the ‘defence’s hypothesis’. The former will hereafter be denoted HP and the latter HD . It should be mentioned that the terms ‘prosecutor’ and ‘defence’ should not always be literally interpreted, even if this is the case when a particular piece of evidence is handled within the courtroom. One objective of the current paper is to show how a scale of conclusions can be developed within a forensic laboratory primarily working with evidence material from crime (scene) investigations. However, the body of commission is usually the police and most of the evidence evaluation addresses source level propositions. The prosecutor’s hypothesis may therefore very well be a reformulation of the question (task) forwarded by the police into a proposition (cf. Section 3) and the defence’s hypothesis is the (natural) alternative to that proposition. In other areas of evidence evaluation, these two hypotheses are just two disjoint alternatives where the task is to decide upon which of these is the one most probable. The likelihood ratio used in evidence evaluation is a particular component in the theory of Bayesian hypothesis testing, in which the hypotheses are evaluated by the so-called ‘Bayes factor’ (cf. Berger, 1985). For a pair of mutually exclusive and exhaustive hypotheses (HP , HD ) the prior ‘odds’ of HP to HD is the ratio Pr(HP )/Pr(HD ) and the posterior odds of HP to HD given the observed data E is the ratio Pr(HP |E )/Pr(HD |E ). The Bayes factor, B, is defined as the ratio of the posterior odds to the prior odds, i.e. B= Pr(HD |E)/Pr(HD |E) . Pr(HP )/Pr(HD ) (5.1) If both hypotheses are simple, i.e. each hypothesis depicts a fixed scenario for which a prior probability can be assumed, the Bayes factor simplifies to a likelihood ratio: L(HP ; E) = V, L(HD ; E) (5.2) where the likelihoods L(HP ; E) and L(HD ; E) measure how likely the observed data are (relative to other data) under the assumptions that HP and HD , respectively, are true. For evidence evaluation, E stands for the evidence itself, or the findings from a case. Depending on the scale of the measured data of these findings, the likelihood is either a distinct probability or a value from a probability density function. The former case is the one mostly appearing in forensic literature and the likelihood ratio can then be written Pr(E|HP ) V = . (5.3) Pr(E|HD ) As a simple example of the latter, consider a case where the piece of evidence is a digital image of a disguised person and the question is about whether it is a man or a woman. The forensic method used is estimating the length of the person from the image. Let us say the length estimate is 177 cm. This scale of measurements is not of the kind that we may address distinct SCALES OF CONCLUSIONS IN FORENSIC INTERPRETATION 9 probabilities to each value. Instead utilization of the probability densities of lengths for males and females must be used. Depicting the former f M (x) and the latter f F (x) the applicable likelihood ratio is V = L(HP ; E)/L(HD ; E) = f M (177)/ f F (177). Equating the Bayes factor with the likelihood ratio, (5.1) can be rewritten as Pr(HP ) Pr(HP |E) =V∙ Pr(HD |E) Pr(HD ) (5.4) (also known as Bayes’ theorem on odds form). A likelihood ratio greater than one will thus make the posterior odds higher than the prior odds, while the opposite holds for a likelihood ratio less than one. In a forensic case where V is of the kind (5.3) and greater than one the test result can be expressed as ‘the probability of the findings when HP is true is V times higher than the probability of the findings when HD is true’, but this expression is less proper when V is a ratio of evaluated densities. A more general expression would be ‘the findings increase the prior odds of HP to HD with a factor V ’ and such an expression also moves the focus from discussing probabilities in court to the discussion of supporting evidentiary strength. Moreover, the last expression is also consistent with the general case of Bayesian hypothesis testing, i.e. without substituting the likelihood ratio for the Bayes factor. One example where the Bayes factor cannot be separated from the prior odds (as is the case when it is replaced by the likelihood ratio), is where HP states that the donor of a blood stain is X and HD states that the donor is either the brother of X or another person not related to X. The evidence is a match between the DNA profile obtained from the stain and the DNA profile of X. Here HD is the union of two mutually exclusive propositions with different likelihoods and possibly different prior probabilities, and therefore the likelihood ratio (as expressed in (5.2)) is not the equivalent of the Bayes factor (cf. Buckleton et al., 2006). The more general view of interpreting the likelihood ratio (or the Bayes factor) as how much the prior odds is increased (or decreased) is supportive of the way evidence evaluation should be reported to the commissioner (e.g. in court) when the likelihood ratio cannot be computed explicitly, but estimated to a level of magnitude. Instead of interpreting the likelihood ratio as how more (or less) probable are the findings given one proposition is true than given the other is true, we can say that the findings support one of the propositions and thus imply an amplification (or attenuation) of the prior odds to get the posterior odds. In the absence of an explicit numerical likelihood ratio, the degree of support can be given on an ordinal scale. 5.2 From a likelihood ratio to a scale level Once the likelihood ratio (or the Bayes factor when they are not equivalents) has been obtained, it should be interpreted on the scale of conclusions used. Evett et al. (2000) suggested a scale where a likelihood ratio between 1 and 10 would be reported verbally as ‘Limited evidence to support’, while a likelihood ratio above 10 000 would be reported as ‘Very strong evidence to support’. DNA evidence is probably still the area of application where a likelihood ratio can be reported on almost a continuous scale, and it is far known that calculated likelihood ratios to a vast majority would exceed the value of 10 000. The Netherlands Forensic Institute uses a scale (NFI, 2008), in which the findings (analogously to Evett et al., 2000) are reported to support with different degrees the statement in hand. The scale currently used at SKL (Fig. 1) is analogous to the ones in Evett et al. 10 A. NORDGAARD ET AL. (2000) and NFI (2008), with the exception that the statement for which the level is reported is always the proposition advanced in the forensic examination. This means that likelihood ratios above and below one are differently interpreted on the scale. For the other scales mentioned above only likelihood ratios greater than one are interpreted and the proposition supported may vary from case to case. Consider the scale of Fig. 1. To simplify, we will hereafter refer to this scale as the SKL scale. There are nine levels used but the levels −1, −2, −3 and −4 can be seen as ‘mirrors’ of the Levels +1, +2, +3 and +4 since shifting the propositions would change a level −Lto level +L and vice versa (L = 1, 2, 3, 4). Rules for translation of obtained likelihood ratios to the scale can therefore be developed for the Levels 0, +1, +2, +3 and +4 and be applied either directly to the likelihood ratio or to its inverse. One might ask why the scale is not constructed with only five levels, but the current reporting system at the laboratory is such that findings consistent with what could be the prosecutor’s hypothesis should be reported with positive levels, and findings not consistent with that proposition should be reported with negative levels. Translation now means dividing the range from one to infinity of possible likelihood ratio values into five intervals corresponding with the Levels 0, +1, +2, +3 and +4. It might be argued that Level 0 should be used solely for the case where the likelihood ratio is exactly one, but here and in the forthcoming we must remember that a likelihood ratio obtained in a particular case is most often a point estimate with the amount of uncertainty it may contain (i.e. the likelihood ratio used is an approximation). Thus, an interval for Level 0 is also motivated and the lower limit of this interval is naturally one (for likelihood ratios greater than or equal to one). Each individual may have their own opinions about how intervals should be chosen to be consistent with the verbal expressions used in the SKL scale, and there is no unique mathematical rule for doing it. In a forensic laboratory, the choice of a unified translation system is therefore a question of compromising between the different opinions within and outside that laboratory. We will however suggest a statistically assisted choice based on Bayes’ theorem and some common opinion about uncertainty. Assume that one piece of evidence is the component that should decide whether a suspect should be convicted or acquitted. The evidence value is then combined with the prior odds of guilt to obtain the posterior odds of guilt. What is the lowest posterior odds that can lead to conviction? This question has naturally no distinct answer, but reasoning about it is important for the construction of intervals for the likelihood ratio. Ceci and Friedman (2000) discuss this topic in terms of the social cost of conviction. One conclusion is that the celebrated statement by the British judge Sir William Blackstone (1723–1780) that ‘it is better that ten guilty persons escape, than that one innocent suffer’ should be updated, and if numbers are to be set out it would be more consistent with today’s practice in conviction ‘beyond reasonable doubt’ to substitute 99 for 10 in Blackstone’s statement. Adopting this, a posterior odds of 99:1 or equivalently a posterior probability of 0.99 is considered to be high enough to say that there is support for adopting the corresponding proposition. The same argument was used by Thompson et al. (2003) when discussing how the probability of false positives affects the posterior probabilities in DNA cases. Following these arguments, we suggest that Level +2 of the SKL scale (‘support’ without attributes) should be reached when the likelihood ratio is greater than or equal to a value that ‘on the average’ would give a posterior probability of at least 0.99 for the proposition forwarded. To explain this more thoroughly, consider again Bayes’ theorem on odds form as presented in (5.4). If we assume HP and HD to be complementary propositions, the probability ratios of that 11 SCALES OF CONCLUSIONS IN FORENSIC INTERPRETATION expression are true odds and we may rewrite the relationship as Pr(HP ) Pr(HP | E ) =V∙ = V ∙ Odds(HP ), 1 − Pr(HP | E ) 1 − Pr(HP ) (5.5) where Odds(HP ) stands for the ‘prior’ odds of HP . From (5.5) we deduce the posterior probability of HP as Pr(HP |E) = V ∙ Odds(HP ) . V ∙ Odds(HP ) + 1 (5.6) V . V +1 (5.7) ‘On the average’ may now be represented by a prior odds equal to one, which changes (5.6) into Pr(HP | E ) = Another argument for temporarily setting the prior odds to 1 is that it is the case where the posterior odds are solely determined by the value of evidence. It therefore seems natural to ‘normalize’ the scale at this point. For the posterior probability to be at least 0.99, the likelihood ratio V must be at least 99. This is however an odd number and the choice of 100 is easier to communicate. Hence, the lower limit of the interval interpreted as Level +2 on the SKL scale is settled. The interval corresponding with Level +4 is by natural reasons open-ended towards infinity. The current practice up till now for reporting DNA evidence values on the scale has been to require the random match probability to be less than one in a million to reach Level +4. This choice depends on the sizes of the populations of potential perpetrators involved in Swedish crimes. We have no reason today to question this choice and therefore the lower limit for the interval interpreted as Level +4 of the SKL scale is set to one million. The corresponding lowest posterior probability with a prior odds of one is 0.9999999, which is far beyond any debate about the uncertainty. This anchoring of the lower interval limits for the scale Levels +2 and +4 will now be used to construct the complete interval division of the range one to infinity of likelihood ratio values. The first two interval limits, i.e. 1 and 100 are exactly the scale levels exponentiated with base 10. It is then close to think of the scale as a logarithmic one even if the third interval limit (i.e. one million) does not fit into this. In another perspective, we would strive for coming above Level 0 for evidence that clearly puts one proposition in favour of another even if we consider the evidence value not to be strong. This is particularly important for evidence primarily analysed for intelligence work. As a consequence, the interval lengths must increase with increased level. They certainly do with a full Briggsian1 logarithmic scale, i.e. logarithms with base 10, but using all consecutive numbers in such a scale would require seven levels before the interval limit one million is reached. To avoid that many levels, we suggest the following: The increase in Briggsian logarithm of two consecutive lower interval limits should be (approximately) proportional to the level corresponding with the higher of 1 The logarithm with base 10 is often referred to as the Briggsian logarithm after the British 17th century mathematician Henry Briggs (1561–1630). 12 A. NORDGAARD ET AL. the two limits. We can write the intervals and corresponding scale levels as 1 6 V < R1 : 0 R1 6 V < R2 : +1 R2 6 V < R3 : +2 (5.8) R3 6 V < R4 : +3 R4 6 V < ∞ : +4, where we have already fixed R2 to 100 and R4 to 106 . The lower limit 1 for the interval corresponding with Level 0 is as natural as is the upper undefined limit (∞) for the interval corresponding with Level +4. Thus, the first and the last interval are left out of this construction, and we require log10 Ri − log10 Ri−1 ≈ k ∙ i; i = 2, 3, 4. (5.9) With R2 = 100 and R4 = 106 , a solution to (5.9) is found in which k ≈ 0.5, R1 ≈ 5.625 and R3 ≈ 5625. Now what do these values imply for the posterior probabilities? With prior odds equal to one (as before), we obtain the lowest posterior probabilities for Levels 1 and 3 as 0.8491 and 0.9998, respectively. The former should be consistent with that the findings support HP ‘to some extent’ according to the SKL scale (Fig. 1). Any probability above 0.5 would be consistent with that expression depending on how ‘some extent’ is interpreted, but of course 0.8491 would not be a debatable value in that sense. The latter probability, i.e. 0.9998, is quite high but still difficult to interpret from general probability reasoning. Although it is not appropriate to compare posterior probabilities with P values from traditional frequentistic approaches to hypothesis testing, we may reflect on common discussions about such quantities. In general a P value below 0.05 is interpreted as significant evidence and a P value below 0.01 as strong significant evidence. In medical studies on effects of drug therapies, P-values below 0.001 are considered to give very strong statistical evidence of effects. In the view of this, a posterior probability of at least 0.9998 would be consistent with the expression ‘strong support’ as given in the SKL scale (Fig. 1). It may be argued that the derived lower limits (1), 5.625, 100, 5625 and 106 are impractical from a numerical point of view. Rounding the numbers upwards, i.e. substituting 6 for 5.625 and 6000 for 5625 gives a more convenient representation. The resulting minimum posterior probabilities with prior odds equal to one will then be 0.8571 and 0.9998 (the latter is slightly changed although not visible until the fifth decimal). The increase in logarithms of lower limits between two consecutive levels will no longer be proportional to the higher of the two limits but fairly close. The resulting translation table between values of V and scale levels (both sides of 0) is given in Table 1. In Figure 2 are plotted lowest posterior probabilities against prior odds for the five scale levels 0 to +4 (equating each level as the lowest possible likelihood ratio). 5.3 Reporting on the scale without reference data Forensic scientists may be sceptical to the logical approach of evidence evaluation. This is so because a majority of forensic cases are still such that no explicit reference data exist with which estimations of likelihoods can be done. It is however important to realize that the lack of explicit reference data do not disqualify the use of the logical approach, not even the use of a likelihood ratio. The latter 13 SCALES OF CONCLUSIONS IN FORENSIC INTERPRETATION TABLE 1 Intervals of V (likelihood ratios) and corresponding scale levels Interval V 6 10−6 10−6 < V 6 1/6000 1/6000 < V 6 1/100 1/100 < V 6 1/6 1/6 < V < 6 6 6 V < 100 100 6 V < 6000 6000 6 V < 106 106 6 V Scale level –4 –3 –2 –1 0 +1 +2 +3 +4 FIG. 2. Posterior probabilities plotted versus prior odds for each of the Levels 0, +1, +2, +3 and +4 of the SKL scale of Fig. 1. Calculation of posterior probabilities were done with a likelihood ratio equal to the lower limit for the likelihood ratios for each level. always exists for simple propositions and the problem lies in its estimation. By ‘explicit reference data’, we mean an objectively compiled database of observations of the same kind as our findings for the current piece of evidence, and with enough background data to make it possible to classify these observations with respect to the propositions in question. However, reference data may still exist although not formally stored in a database. An eyewitness may state that the person he saw was a man, implicitly saying that it was not a woman. How is it possible to give such a statement? There are several possibilities, but one may be the case of exclusion: ‘If that person was a woman I wouldn’t have expected her to have the skull shaved, that’s why I think it was a man’. Note that this way of thinking is the same as what is promoted with the logical approach. His findings consist of the shaved skull and these findings are in his opinion more probable was it a man that was it a woman, although he merely uses the conditional probability of his findings given it was a man (i.e. the numerator of the likelihood ratio). To be fair, however, his thinking could have been the opposite: ‘The person’s skull was shaved, thus I think it was a man’. The latter is a statement corresponding with a conditional probability that the person was a man given his findings and would be the numerator of the posterior odds. 14 A. NORDGAARD ET AL. The important issue of the previous example is that reasoning with probabilities or likelihoods can be done without any explicit databases. The forensic scientist has, similar to the eyewitness, an experience that allows her to make likelihood statements with a precision depending on this experience. The statements will to some extent be subjective, yes, but so are the statements of the eyewitness. The difference is that while the eyewitness bases his conclusions upon what may be referred to as common experience, the expert witness bases her conclusions on specific experience. The logical approach still works no matter if the conclusions are based on an objective database or on specific experience as long as the evidence value is in the form of a likelihood ratio. The forensic scientist should consider the two competing propositions, and based on her experience both from historical cases and from the field of expertise itself she should find out the degrees of expectation of the findings under each of the propositions. The proposition under which the degree of expectation is maximized is the proposition that is supported by the findings. The degree of support is based on how much more expected are the findings under that proposition than they are under the other proposition. This judgement is the evidence value and can be reported on the scale and is at the same time the best estimation at hand of the underlying likelihood ratio. It is possible that neither of the propositions is particularly supported by the findings, or that both propositions are approximately equally supported. The evidence value should then be reported with 0 on the scale. This case is probably the simplest to handle at the forensic laboratory since referencing to general uncertainty may serve as an argument for the stated degree of support. What about all cases where support has been concluded? As soon as the findings give more support to one of the propositions, and this cannot be treated as just having occurred by chance, the forensic scientist should report the evidence value at least at Level +1 or Level −1 of the scale depending on which of the two propositions gets the more support. For sake of simplicity, we will from now on only discuss reporting on the positive side of the scale keeping in mind that the negative side can just be thought of as a mirror of the positive side. Equating the Level +1 to a likelihood ratio of at least 6 according to Table 1 means that we state that the findings support the forwarded proposition at least six times more than the alternative proposition. It might be argued that ‘6 times more’ is too strong to be consistent with the argumentation above stating that +1 should be used whenever the indicated support is not just a matter of chance. However, consider the following simple example: A 12-sided dice is suspected to be false in that sense that it gives the result ‘twelve’ every second roll, while each of the other 11 results are equally likely (i.e. comes each with probability 1/22). The alternative proposition says the dice is balanced. We roll the dice once and obtain twelve. The likelihood for the proposition of a false dice is then 1/2 and the likelihood for the proposition of a balanced dice is 1/12. Thus, the likelihood ratio becomes exactly 6 and our findings support the proposition of a false dice six times more than they support the alternative proposition. However, if our result had been e.g. ‘five’, the likelihood ratio would have been 0.54, which supports the proposition of a balanced dice about 1.8 times more than the proposition of a false dice. This example is of course not comparable with a typical forensic case but still illustrates that a likelihood ratio of 6 is not at all too large to harmonize with the scale Level +1. No jury or court would come to the verdict that the dice is false just on the basis of one roll, but we cannot ignore that the only findings in the case, i.e. the observed twelve, give support to some extent for that proposition. The findings may have occurred by chance but that is not the only explanation. The highest level of the scale, i.e. +4, might from a first point of view be impossible to choose without explicit reference data. This is however not true, and this level may on the contrary be quite SCALES OF CONCLUSIONS IN FORENSIC INTERPRETATION 15 natural under the right conditions. The forensic scientist still has two propositions (HP and HD ) to consider. In a case where she believes that the findings cannot by any means be obtained given HD is true, while they very well can if HD is true, the likelihood of the former proposition is very close to zero, and the resulting likelihood ratio must be so large that scale level +4 is reached (in support of HP ). From a philosophical point of view, this situation resembles the one where it can be stated as a fact that one of the propositions is false, i.e. the so-called leap-of-faith has disappeared. The question whether a scale of conclusions should be used in such cases is of course interesting but is out of the scope of the current paper. Depending on how certain the forensic scientist is about the impossibility of obtaining the findings under one of the propositions, the leap-of-faith is more or less pronounced. The two remaining levels, i.e. +2 and +3 are the ones most difficult to decide upon in a particular case (without explicit reference data). Besides long experience within her own field of expertise, the forensic scientist needs to ‘calibrate’ her opinions against casework from other areas. There are a lot of similar situations in the various fields of forensic interpretation and within a laboratory you should benefit on this. At SKL, we are practising a system of ‘calibration meetings’, where forensic scientists from different fields discuss each other’s cases. A case is prepared in such a way that you clarify what are your propositions, what are your findings, what are your conclusions and why. This should be done in such a way that e.g. a person trained in forensic biology could understand a case from forensic informatics and vice versa. By presenting your own case, assimilating questions from your colleague and then doing the opposite procedure the general likelihood reasoning about your own cases is enhanced. 6. Examples from casework at SKL We will here present three typical cases from daily work at the laboratory to illustrate the use of the scale of conclusions. The first case is about forensic interpretation of glass fragments and shows how the scale is used when the likelihood ratio can be estimated on the basis of reference data. The second case is about comparison of paints and illustrates how evidence evaluation is made and how the scale is used without the access to reference data. The third case comes from DNA analysis and shows a combination of evidence evaluation with and without reference data. We would like to make clear that these examples illustrate the current practise at the laboratory, and our objective is not to communicate recommendations about how particular cases should be handled with a scale of conclusions, merely to show how this handling may work. 6.1 A glass case 6.1.1 Background. A side window of a car has been broken in a smash-and-grab incident. The police later seized two people as suspects of the incident. Control glass from the broken window was collected and the suspects’ jackets were sent for investigation. 6.1.2 Investigation and results. From the jacket of Suspect 1 about 20 glass fragments were recovered. The glass fragments were green-tinted tempered float glass with a thickness of 3.16 mm. Four of the fragments were examined by means of refractive index (RI) with a GRIM instrument (GRIM) before (RIbefore ) and after (RIafter ) annealing. The following data were obtained: (RIbefore ) 1.52084, (RIafter ) 1.52252. The elemental composition of one of the recovered glass fragments was also examined by means of scanning electron microscope equipped with an energy dispersive X-ray 16 A. NORDGAARD ET AL. detector (SEM/EDX). From the jacket of Suspect 2 ten glass fragments were recovered. Four of these fragments were examined by means of RI before annealing, and the following data were obtained: (RIbefore ) 1.52088. The recovered glass fragments were too small for further examinations. The control glass from the broken side window was green-tinted tempered float glass with a thickness of 3.16 mm. This glass was examined analogously to above with the following data obtained: (RIbefore ) 1.52081 (RIafter ) 1.52248. The control glass was also examined regarding the elemental composition in SEM/EDX. 6.1.3 Evaluation. The findings of the recovered glass fragments from Suspects 1 and 2 were compared with the findings of the control glass. The examined glass fragments from the jacket of Suspects 1 and 2 could not be distinguished from the control glass from the side window by means of the above-mentioned techniques. It is important to notice that this step cannot by itself grade the conclusion. The results of the glass examination need to be evaluated against two propositions, the forwarded proposition (HP ), stemming from the commissioner’s question, and the alternative proposition (HD ): Proposition HP : ‘The examined glass fragments from the jacket of suspect 1 originate from the broken side window of the car.’ Proposition HD : ‘The examined glass fragments from the jacket of suspect 1 originate from some other glass source.’ The same propositions are used for the recovered glass fragments from Suspect 2. The probability of obtaining matching results if the proposition HP is true is considered to be approximately 1. This is so because we consider the potential sources of errors that may lead to a non-match in either RI or elemental composition if this proposition is true to be negligible. The probability of obtaining matching results if the alternative proposition HD is true depends on the probability of obtaining matching results even though the glass fragments originate from another glass source. To estimate the magnitude of this probability, a database is used. The approach here is simplified in the sense that the findings will be interpreted as belonging to intervals for which probabilities are estimated. The practise at the laboratory is moving towards the use of probability densities (cf. Lindley, 1977; Aitken et al., 2007), but a framework for this has not yet been implemented. The glass database contains about 3000 control glasses. The control glasses have been collected from casework over the years. Parameters that can be searched in the database are for example type of glass, float glass or not, colour, thickness, refractive index before and after annealing. A more relevant database to use would be the one with data from glass findings in clothes, but there is no such database available at SKL at present. The search results of the parameters that were possible to examine in the recovered glass fragments are shown in Tables 2 and 3. When searching in the glass database, different search intervals are used for the parameters. For RIbefore the interval (RI average ±1 × 10−4 ) is used and for RIafter the interval (RI average ±2 × 10−4 ) is used. The use of different intervals is based on studies made at SKL on the variety of RI within glass windows. Moreover, based on studies at SKL, a glass is considered as tempered if the difference between RIbefore and RIafter is more than 1 × 10−3 . In the database search on thickness, the interval (thickness average ±0.2 mm) is used if the glass is thinner than 6 mm. If the average thickness is more than 6 mm an interval of ±0.3 mm is used. The use of different intervals here is based on information, provided by glass producer, about the variety of thickness in production of glass windows. SCALES OF CONCLUSIONS IN FORENSIC INTERPRETATION 17 TABLE 2 Search results of glass fragments recovered from the jacket of Suspect 1. (Note that the frequency column contains successively lower frequencies the more specified the search becomes.) Parameter RIbefore +Tempered glass +RIafter +Colour +Thickness +Float glass Search interval 1.52074–1.52094 Yes 1.52230–1.52270 Green tinted 2.96–3.36 mm Yes Frequency in database 71/2920 54/2920 51/2920 25/2920 11/2920 11/2920 TABLE 3 Search results of glass fragments recovered from the jacket of Suspect 2 Parameter RIbefore Search interval 1.52078–1.52098 Frequency in database 75/2920 At the moment the comparison of the elemental composition of glasses by SEM/EDX is mainly used for the possibility to distinguish glasses. There are plans for implementing the elemental composition of control glasses in the database as well, but these ideas have not yet been realized. 6.1.4 Conclusions of glass fragments recovered from the jacket of Suspect 1. The probability of obtaining matching results if the proposition HP is true is considered to be approximately 1. The probability of obtaining matching results if proposition HD is true is estimated as 11/2920, which gives a likelihood ratio of approximately 265, and corresponds to Level +2 on the scale. The results thus support that the examined glass fragments recovered from the jacket of Suspect 1 originate from the broken side window of the car. 6.1.5 Conclusions of glass fragments recovered from the jacket of Suspect 2. The probability of obtaining matching results if the proposition HP is true is considered to be approximately 1. The probability of obtaining matching results if proposition HD is true is estimated as 75/2920, which gives a likelihood ratio of approximately 40, and corresponds to Level +1 on the scale. The results thus support to some extent that the examined glass fragments recovered from the jacket of Suspect 2 originate from the broken side window of the car. 6.1.6 Discussion. There are several factors that affect the possibility of obtaining matching results even though the glass fragments originate from another glass source. In the example, a lower frequency was received when more analytical methods could be used. However, it does not necessarily have to be this way. A very common glass might not lower the frequency even though several different methods can be used on the glass. On the other hand, a small glass fragment with a very rare RI can present a very low frequency even though only one method is possible to use due to the size of the fragment. In this approach, a search was made in the database to end-up with a hyperrectangle, the probability of which was estimated by calculating its empirical frequency in the database. 18 A. NORDGAARD ET AL. If it can be argued that several of the measurements of the parameters in Table 2 are statistically independent, the evidentiary strength may be increased by evaluating the final likelihood ratio as a product of the individual likelihood ratios from the measurements of those parameters. In particular, the three parameters RIbefore , RIafter and tempered glass could be transformed to the two RIbefore and ΔRI = RIafter − RIbefore , the latter taking into account the issue of whether the glass is tempered or not but on a continuous scale (cf. Zadora, 2009). However, the issue of independence has not yet been investigated at the laboratory but will be part of a more general framework for glass analysis in the future. 6.2 A paint case 6.2.1 Background. A red Volvo ran into a blue Saab and the Volvo left the scene before the police arrived. The damaged front left wing area of the blue Saab was investigated and a red paint flake was recovered. Later the police found a red Volvo in a parking place with scratches on the right front door. Comparison paint from the damaged area of the door was collected. 6.2.2 Investigation and results. The red paint flake recovered from the blue Saab consisted of five layers; a primer, a primersurfacer, a basecoat, a clearcoat and one layer of repair paint. The paint flake was examined layer by layer by means of a light microscope equipped with reflected visible light, transmitted visible light, ultraviolet light and polarized light. The layers were also examined with Fourier Transform Infrared Spectroscopy (FTIR) and SEM/EDX. The comparison paint from the red Volvo, which consisted of four industrial painted layers and one layer of repair paint, was examined the same way as the recovered paint flake from the blue Saab. 6.2.3 Evaluation. The results from the analysis of the recovered paint flake were compared layer by layer with the comparison paint flake. The layers of the recovered paint flake could not be distinguished from the corresponding layers of the comparison paint flake by means of the abovementioned techniques. Like for the glass case in Section 6.1, this step cannot by itself grade the conclusion. Similar to the glass investigations, the results from the paint investigation are evaluated against two non-overlapping propositions, the forwarded proposition (HP ) and an alternative proposition (HD ). Proposition HP : ‘The red multilayered paint flake recovered from the blue Saab originates from the red Volvo’. Proposition HD : ‘The red multilayered paint flake recovered from the blue Saab originates from another car’. How an alternative proposition is chosen depends on the information of the questioned material. For instance, if it were obvious that the questioned material is from a car, then a suitable alternative proposition would be that the questioned material originates from another car. On the other hand, if the recovered material is just a paint flake that could originate from other items, not just a car, a suitable alternative proposition could be that the recovered paint flake originates from something else (another painted object). The propositions above are chosen for illustrative purposes. On the contrary to a glass investigation, where a database is available for the evaluation of the findings, a paint evaluation is made without an explicit database. The probability of obtaining SCALES OF CONCLUSIONS IN FORENSIC INTERPRETATION 19 matching results if the proposition HP is true is considered to be close to 1. The second step in the evaluation process is to consider how likely it is to obtain matching results given that the alternative proposition is true, i.e. the probability of obtaining matching results even though the paint originates from another source. It is generally agreed by the paint examiners at the laboratory that a match between two samples of industrial car paint in four layers is quite rare. It would occur in less than 1 of 100 cases if HD is true, which gives a level of at least +2 on the scale. When there are a couple of additional layers to the industrial paint layers and/or a cross-transfer of paint layers, the probability of obtaining matching results if HD is true may be considered to be so small that the strongest level of support can be used, i.e. Level +4. 6.2.4 Conclusions of paint flake recovered from the blue Saab. The likelihood ratio has not been estimated numerically. The conclusion is based upon logical reasoning. The probability of obtaining matching results if the proposition HP is true is considered very large. The probability of obtaining matching results if the alternative proposition HD is true is considered to be smaller than if the paint flake had consisted of only industrial coating due to the repaint layer. Although the match is unexpected, it could not be classified as almost unique (see further the discussion below). The results therefore strongly support (Level +3) that the recovered red multilayered paint flake originates from the red Volvo, from which the comparison paint is collected. 6.2.5 Discussion. The judgement leading forward to the conclusion that four matching layers of industrial paint would correspond with Level +2 on the scale is built on the common experience collected by the paint examiners at the laboratory and collected information about how cars are painted and the variety of paint colours. In Table 4, the appreciated levels of knowledge of commonness about car paint among paint examiners at SKL is listed. As can be seen from the table, the knowledge is concentrated to the number of layers and the colour. At the moment SKL does not have a reference database on car paints. However, there is a database (EUCAP), containing information about industrial paint layers (among other things), that is used for example in Germany and Belgium. One problem with EUCAP from a Swedish point of view is that it is not known how well it reflects the Swedish car market. One possibility to reach a more precise estimate of the rarity of four particular layers of industrial paint is to make a detailed review of historical case files at the laboratory. The used figure (less than 1/100) is to a large extent the result of an informal review of such cases, but there has yet not been any attempt to formalize it. TABLE 4 Appreciated levels of knowledge of commonness about car paint among paint examiners at SKL Parameter Number of layers Colour IR SEM/EDX Fluorescence Polarization Level of commonness knowledge High Moderate Low Low Low Low 20 A. NORDGAARD ET AL. In cases where there are layers of repair paint, repaint layers and/or cross-transfer of paint in addition to the industrial paint layers, a cognitive process of evaluation can be accepted. It should be consensus that when there are a couple of additional layers of paint to the industrial paint layers and/or cross-transfer of paint layers, a match could in principle not be found anywhere else than in this specific case. It could not be stated as a fact that the Volvo was the source of the paint on the Saab (and vice versa for the cross-transfer) but the leap-of-faith is so small that Level +4 in that case may be the most natural choice. When there is no cross-transfer of paint layers but still at least one extra layer (beyond the four industrial paint layers), the match is still unexpected but a conservative choice of level would be +3 in that case. 6.3 A DNA case 6.3.1 Background. A burglary was reported in a residential housing estate. Small bloodstains were found adjacent to a rummaged wardrobe and later recovered by the police. A couple of days later the police seized one individual as a suspect of the incident. Subsequently, the suspect was swabbed for DNA. Blood from the crime scene and the reference DNA sample were sent for investigation. 6.3.2 Investigation and results. The stain swabbed at the crime scene was tested positive for blood with traditional blood presumptive leucomalachite reagent. DNA amplification revealed a partial DNA profile from a single male donor with results in only six of the autosomal STR markers amplified. The DNA was typed also for Y-chromosomal DNA giving a haplotype in all 17 STR markers analysed. 6.3.3 Evaluation. The DNA profile from the suspect was compared against the partial DNA profile revealing a match. DNA from the suspect is typed also for Y-chromosomal DNA and the haplotype obtained is the same. The results are to be evaluated against two propositions, the forwarded proposition (HP ), stemming from the commissioner’s question, and the alternative proposition (HD ): Proposition HP : ‘The blood found on the crime scene originates from the suspect’. Proposition HD : ‘The blood found on the crime scene originates from another (non-related) individual than the suspect’. The random match probability for the autosomal DNA result was calculated to 1 in 743 000, by using a Swedish population database taking into consideration a 1% FST correction (Wright, 1951) and a lowest allele frequency value of 2%. The rarity of the crime stain Y-chromosomal haplotype was assessed with the YHRD database (YHRD.ORG 3.0, 2011). The haplotype in question has previously not been reported but belongs to the most common haplogroup in Sweden, I1a* (Karlsson et al., 2006). 6.3.4 Conclusions for blood stains recovered from the crime scene matching the suspect. The probability of obtaining matching results if proposition HP is true is considered to be approximately 1. The probability of obtaining matching results if proposition HD is true is calculated to 1/743 000 for the autosomal DNA, which gives a likelihood ratio of 743 000, corresponding to Level +3 on SCALES OF CONCLUSIONS IN FORENSIC INTERPRETATION 21 the scale. Thus, according to the scale, the results strongly support that the bloodstain found on the crime scene of the burglary originates from the suspect. The Y-chromosomal haplotype of the crime stain matches the suspect’s haplotype. The haplotype has previously not been reported to the database. A matching 17 STR haplotype alone, despite a match in a database, would as a general approach even without a given number render a Level +1 conclusion using the scale. The haplotype frequency for the most common haplotype found in Sweden is 5.8% (Holmlund et al., 2006), which corresponds to 1 in 17. This figure can be used for an approximation of a value for the haplotype obtained, to a moderate likelihood ratio of 17. By combining the objective value for the autosomal STRs with the approximate value for the Y-chromosomal STRs simply using the product rule, a combined approximate likelihood ratio of 12 600 000 (743 000 × 17) is obtained. The combined result corresponds to a Level +4 conclusion on the scale. However, it is not necessary to force an explicit approximate likelihood ratio based on haplotype data to attain Level +4 for the combined values of evidence. The match itself renders a Level +1 on the scale and it suffices to use the lower limit 6 (Table 1) as a likelihood ratio proxy since 743 000 × 6 = 4 458 000, which exceeds the lower limit for Level +4. 6.3.5 Discussion. The strength in using lineage markers is at its best for kinship and identification cases. In crime cases, Y-chromosomal DNA analysis is often connected to sexual crime due to the tendency to get DNA mixtures, with minor components showing the culprit’s DNA and the overwhelming part being the victim’s DNA, with a need to add evidential strength or to exclude a suspect. Also a partial single donor profile, as used in this example, can be of interest to extend with Y STRs. Depending on the case investigation as a whole, a Level +3 conclusion for a crime scene stain might not be evidence strong enough for a court conviction. Interpreting the power of the evidence for lineage markers like Y STRs is challenging or even problematic (Palo, 2007). Despite increasing knowledge and growing databases, haplotype occurrences are still not well known and in addition the traits of heritage complicates it all. What weight can be assessed and how should it be reported? Are the haplotype results obtained supported by any autosomal DNA results? The scale of conclusions used at SKL together with the experience of the forensic scientist are important underlying factors for combining results, even if background data clarifying a findings value is meagre. The combination of autosomal and Y-chromosomal DNA has been performed at SKL since 2005. A general and moderate approximation of the obtained haplotypes has been used. Approaches using joint match probabilities have been proposed by for example Walsh et al. (2008). It is used in paternity investigations (Gjertson et al., 2007), but a combination will obviously add no value if male relatives sharing the same haplotype are part of the alternative hypothesis. This should be the case for crime stains as well. The approach of combining autosomal and lineage marker results is generally not adopted or even accepted throughout the forensic DNA community (cf. de Knijff, 2003; Amorim, 2008). According to Buckleton et al. (2011), combining autosomal and linear markers is mainly a matter of communication not calculation. To support a reported autosomal DNA match with a database-generated haplotype frequency to court is prone to end as an overestimation of the evidential value. To report a non-weight haplotype match to court leaves it all to the court to decide. In the case of having both autosomal and Y-chromosomal DNA results for the same stain it will, with or without theoretical and statistical support (in the statement or by testifying in court), add to the difficulties for the court to asses the 22 A. NORDGAARD ET AL. evidential value. If the scientist cannot interpret or approximate the value of a scientific result in a written statement, then how could it be possible to do so by testifying in court, or worst case just leave it to the court alone to assess? Court will in the end, whatever route is taken, somehow sum up the DNA results and all other evidence presented. There is an obvious risk of a randomized over- or underestimation between different courts and different cases. The overall knowledge on haplotype appearances and distribution at for instance local and regional levels is still restricted. One moderate approach, obviously not in detail exact, is to assess the value using the frequency of the most frequent haplotype reported for in the specific country or region of interest. 7. Concluding remarks Reporting the value of evidence on a scale is not controversial itself, but it is merely a way of simplifying the interpretation of numbers. However, this study has shown that the construction of a scale is far from trivial and this is particularly true when it comes to choosing the levels of the scale with respect to the underlying likelihood ratio. Apart from the obvious baseline of the scale representing the inconclusive state, our work has been built on the limitation to four levels. Although it might be agreed that the distances between successive levels should increase with level (it would not even be possible to have equal distances as the likelihood ratio has no upper limit), there are no judicial or common knowledge grounds for how to select the distances. Our choice is mathematically based on the anchoring of three levels and the anchoring itself is a mixture between what is commonly accepted as reasonable posterior odds for conviction and what the limits in a Swedish population are. We think that there should be no more than one level between the inconclusive baseline and the level where we consider the evidence to be clearly supportive of one of the forwarded propositions. However, the number of levels above the latter may depend on general sizes of the populations involved when obtaining the likelihood ratio. The interpretation of DNA evidence can be helpful here; the estimated size of the population of possible donors of a stain is a good base for the lower limit of the likelihood ratio for the highest level. If that limit is very high, there is need for more than four levels above the baseline, but the mathematical construction of interval lengths still apply. Once a numerical likelihood ratio has been obtained, its reporting on the scale is trivial. Notwithstanding, the scale can be used even if the likelihood ratio cannot be numerically estimated. This is not a novelty within forensic science but has been part of evidence evaluation long before the logical approach with likelihood ratios was established. Part of the current study has aimed to show more formally how a non-numerical value of evidence can be incorporated to the logical framework of evidence evaluation. A unified scale of conclusions within a laboratory will make this procedure easier to carry out especially when the scale is continuously used at in-house calibration and training. An extra feature with the type of scale presented in this paper is the possibility to use Table 1 (the translation of intervals of likelihood ratios to scale levels) backwards. Once a scale level has been decided for some particular findings, it is possible (though maybe conservative) to find a lower limit for the likelihood ratio, which in turn may be used if these findings are to be combined with other findings (conditionally independent of the former) of the same criminal case. Instead of leaving the issue of combination to the court, the forensic laboratory may thus investigate the total evidentiary strength of the findings addressed at activity level propositions. SCALES OF CONCLUSIONS IN FORENSIC INTERPRETATION 23 Acknowledgements The authors would like to thank colleagues at the Swedish National Laboratory of Forensic Sciences, in particular members of the evidentiary value project group: Inger Wistedt, Jenny Elmqvist, Tobias Höglund, Mirja Lenz Torbjörnsson, Jane Palmborg, Siw Sullivan and Ing-Marie Wigilius, for valuable inputs, Anna Emanuelson for discussions regarding the glass and paint examples and Staffan Jansson for discussions regarding the DNA examples. The authors would also like to thank Professor Christophe Champod, University of Lausanne for his comments and two anonymous reviewers for valuable inputs. R EFERENCES A ITKEN , C. G. G. AND TARONI , F. (2004). Statistics and the Evaluation of Evidence for Forensic Scientists. 2nd ed. Wiley, Chichester. A ITKEN , G. G. G., Z ADORA , G. AND L UCY D. (2007). A two-level model for evidence evaluation. Journal of Forensic Science 52, 412–419. A MORIM , A. (2008). A cautionary note on the evaluation of genetic evidence from uniparentally transmitted markers. Forensic Science International: Genetics 2, 376–378. B ERGER , J. O. (1985). Statistical Decision Theory and Bayesian Analysis. 2nd ed. Springer-Verlag, New York. B ROEDERS , A. P. A. (1999). Some observations on the use of probability scales in forensic identification. Forensic Linguistics 6, 228–241. B UCKLETON , J. (2005). A framework for interpreting evidence. In: Forensic DNA Evidence Interpretation (J. Buckleton, C.M. Triggs, S.J. Walsh eds.), 27–63. CRC Press, Boca Raton, FL. B UCKLETON , J. S., T RIGGS , C. M. AND C HAMPOD , C. (2006). An extended likelihood ratio framework for interpreting evidence. Science & Justice 46, 69–78. B UCKLETON , J. S., K RAWCZAK , M. AND W EIR , B. S. (2011). The interpretation of lineage markers in forensic DNA testing.Forensic Science International: Genetics 5, 78–83. C ADDY, B. AND C OBB , P. (2004). Forensic science. In: Crime Scene to Court. The Essentials of Forensic Science. 2nd ed. (White PC, eds.), 1–20. The Royal Society of Chemistry, Cambridge. C ECI , S. J. AND F RIEDMAN , R. D. (2000). The suggestibility of children: scientific research and legal implications. Cornell Law Review 86, 33–108. E VETT, I. W., JACKSON , G., L AMBERT, J. A. AND M C C ROSSAN , S. (2000). The impact of the principles of evidence interpretation on the structure and content of statements. Science & Justice 40, 233–239. G JERTSON , D. W., B RENNER , C. H., BAUR , M. P., C ARRACEDO , A., G UIDET, F., L UQUE , J. A., L ESSIG , R., M AYR , W. R., PASCALI , V. L., P RINZ , M., S CHNEIDER , P. M. AND M ORLING , N. (2007). ISFG: Recommendations on biostatistics in paternity testing. Forensic Sciences International: Genetics 1, 223–231. H EDMAN , J., A LBINSSON , L., A NSELL , C., TAPPER , H., H ANSSON , O., H OLGERSSON , S. AND A NSELL , R. (2008). A fast analysis system for forensic DNA reference samples. Forensic Science International: Genetics 2, 184–189. H OLMLUND , G., N ILSSON , H., K ARLSSON , A. AND L INDBLOM , B. (2006). Y-chromosome STR haplotypes in Sweden. Forensic Science International 160, 66–79. D E K NIJFF , P. (2003). Son, give up your gun: Presenting Y-STR results in court. Profiles in DNA 6(2), 3–6. K ARLSSON , A. O., WALLERSTR ÖM , T., G H ÖTERSTR ÖM , A. AND H OLMLUND , G. (2006). Y-chromosome diversity in Sweden—a long time perspective. European Journal of Human Genetics 14, 963–970. L INDLEY, D. (1977). A problem in forensic science. Biometrika 64, 207–213. 24 A. NORDGAARD ET AL. NFI (2008). Vakbijlage Reeks waarschijnlijkheidstermen - versie 2.0. The Netherlands Forensic Institute. N ORDGAARD , A., W ISTEDT, I., D ROTZ , W., E LMQVIST, J., H ÖGLUND , T., JAEGER , L., T ORBJ ÖRNSSON , M. L., PALMBORG , J., S ULLIVAN , S. AND W IGILIUS , I. (2010). Uppfattning av värdeord i sakkunnigutlåtanden - En studie genomförd bland olika aktörer i rättsprocessen i Sverige. SKL Rapport 2010:01. Swedish National Laboratory of Forensic Sciences. PALO , J. U., H EDMAN , M., U LMANEN , I., L UKKA , M. AND S AJANTILA , A. (2007). High degree of Ychromosomal divergence within Finland—forensic aspects. Forensic Science International: Genetics 1, 120–124. S JERPS , M. AND B IESHEUVEL , D. (1999). The interpretation of conventional and ‘Bayesian’ verbal scales for expressing expert opinion: a small experiment among jurists. Forensic Linguistics 6, 214–227. T HOMPSON , W. C., TARONI , F. AND A ITKEN , C. G. G. (2003). How the probability of a false positive affects the value of DNA evidence. Journal of Forensic Science 48, 47–54. WALSH , B., R EDD , A. J., AND H AMMER , M. F. (2008). Joint match probabilities for Y chromosomal and autosomal markers. Forensic Science International 174, 234–238. W RIGHT, S. (1951). The genetical structure of populations. Annals of Eugenics 15, 323–354. YHRD.ORG 3.0 (2011). Y-STR Haplotype Reference Database. www.yhrd.org. Date-of-visit 2011-03-28. Z ADORA , G. (2009). Evaluation of evidence value of glass fragments by likelihood ratio and Bayesian Network approaches. Analytica Chimica Acta 642, 279–290.
© Copyright 2025 Paperzz