Policy: Twenty tips for interpreting scientific
claims
http://www.nature.com/news/policy-twenty-tipsfor-interpreting-scientific-claims-1.14183
William J. Sutherland,
David Spiegelhalter
& Mark Burgman
20 November 2013
This list will help non-scientists to interrogate advisers and to grasp the limitations of
evidence, say William J. Sutherland, David Spiegelhalter and Mark A. Burgman.
Article tools
PDF
BADGER: ANDY ROUSE/NATURE PICTURE LIBRARY; NUCLEAR PLANT: MICHAEL KOHAUPT/FLICKR/GETTY; BEE: MICHAEL
DURHAM/MINDEN/FLPA
Science and policy have collided on contentious issues such as bee declines, nuclear power and the
role of badgers in bovine tuberculosis.
Calls for the closer integration of science in political decision-making have been commonplace for
decades. However, there are serious problems in the application of science to policy — from
energy to health and environment to education.
'Atomtronic' superfluid remembers how it has been stirred
Genome of ‘Clovis boy’ raises questions on handling of Native American remains
UK reveals decontamination procedures for terror attacks
One suggestion to improve matters is to encourage more scientists to get involved in politics.
Although laudable, it is unrealistic to expect substantially increased political involvement from
scientists. Another proposal is to expand the role of chief scientific advisers1, increasing their
number, availability and participation in political processes. Neither approach deals with the core
problem of scientific ignorance among many who vote in parliaments.
Perhaps we could teach science to politicians? It is an attractive idea, but which busy politician
has sufficient time? In practice, policy-makers almost never read scientific papers or books. The
research relevant to the topic of the day — for example, mitochondrial replacement, bovine
tuberculosis or nuclear-waste disposal — is interpreted for them by advisers or external
advocates. And there is rarely, if ever, a beautifully designed double-blind, randomized,
replicated, controlled experiment with a large sample size and unambiguous conclusion that
tackles the exact policy issue.
In this context, we suggest that the immediate priority is to improve policy-makers' understanding
of the imperfect nature of science. The essential skills are to be able to intelligently interrogate
experts and advisers, and to understand the quality, limitations and biases of evidence. We term
these interpretive scientific skills. These skills are more accessible than those required to
understand the fundamental science itself, and can form part of the broad skill set of most
politicians.
To this end, we suggest 20 concepts that should be part of the education of civil servants,
politicians, policy advisers and journalists — and anyone else who may have to interact with
science or scientists. Politicians with a healthy scepticism of scientific advocates might simply
prefer to arm themselves with this critical set of knowledge.
We are not so naive as to believe that improved policy decisions will automatically follow. We are
fully aware that scientific judgement itself is value-laden, and that bias and context are integral to
how data are collected and interpreted. What we offer is a simple list of ideas that could help
decision-makers to parse how evidence can contribute to a decision, and potentially to avoid
undue influence by those with vested interests. The harder part — the social acceptability of
different policies — remains in the hands of politicians and the broader political process.
Of course, others will have slightly different lists. Our point is that a wider understanding of these
20 concepts by society would be a marked step forward.
Differences and chance cause variation. The real world varies unpredictably. Science is mostly
about discovering what causes the patterns we see. Why is it hotter this decade than last? Why
are there more birds in some areas than others? There are many explanations for such trends, so
the main challenge of research is teasing apart the importance of the process of interest (for
example, the effect of climate change on bird populations) from the innumerable other sources of
variation (from widespread changes, such as agricultural intensification and spread of invasive
species, to local-scale processes, such as the chance events that determine births and deaths).
Related stories
Research: A standard for policy-relevant science
Nature focus: Science and politics
Nature focus: Meetings that changed the world
No measurement is exact. Practically all measurements have some error. If the measurement
process were repeated, one might record a different result. In some cases, the measurement
error might be large compared with real differences. Thus, if you are told that the economy grew
by 0.13% last month, there is a moderate chance that it may actually have shrunk. Results should
be presented with a precision that is appropriate for the associated error, to avoid implying an
unjustified degree of accuracy.
Bias is rife. Experimental design or measuring devices may produce atypical results in a given
direction. For example, determining voting behaviour by asking people on the street, at home or
through the Internet will sample different proportions of the population, and all may give different
results. Because studies that report 'statistically significant' results are more likely to be written up
and published, the scientific literature tends to give an exaggerated picture of the magnitude of
problems or the effectiveness of solutions. An experiment might be biased by expectations:
participants provided with a treatment might assume that they will experience a difference and so
might behave differently or report an effect. Researchers collecting the results can be influenced
by knowing who received treatment. The ideal experiment is double-blind: neither the participants
nor those collecting the data know who received what. This might be straightforward in drug trials,
but it is impossible for many social studies. Confirmation bias arises when scientists find evidence
for a favoured theory and then become insufficiently critical of their own results, or cease
searching for contrary evidence.
Bigger is usually better for sample size. The average taken from a large number of
observations will usually be more informative than the average taken from a smaller number of
observations. That is, as we accumulate evidence, our knowledge improves. This is especially
important when studies are clouded by substantial amounts of natural variation and measurement
error. Thus, the effectiveness of a drug treatment will vary naturally between subjects. Its average
efficacy can be more reliably and accurately estimated from a trial with tens of thousands of
participants than from one with hundreds.
Correlation does not imply causation. It is tempting to assume that one pattern causes
another. However, the correlation might be coincidental, or it might be a result of both patterns
being caused by a third factor — a 'confounding' or 'lurking' variable. For example, ecologists at
one time believed that poisonous algae were killing fish in estuaries; it turned out that the algae
grew where fish died. The algae did not cause the deaths2.
Regression to the mean can mislead. Extreme patterns in data are likely to be, at least in part,
anomalies attributable to chance or error. The next count is likely to be less extreme. For
example, if speed cameras are placed where there has been a spate of accidents, any reduction
in the accident rate cannot be attributed to the camera; a reduction would probably have
happened anyway.
Extrapolating beyond the data is risky. Patterns found within a given range do not necessarily
apply outside that range. Thus, it is very difficult to predict the response of ecological systems to
climate change, when the rate of change is faster than has been experienced in the evolutionary
history of existing species, and when the weather extremes may be entirely new.
Beware the base-rate fallacy. The ability of an imperfect test to identify a condition depends
upon the likelihood of that condition occurring (the base rate). For example, a person might have
a blood test that is '99% accurate' for a rare disease and test positive, yet they might be unlikely
to have the disease. If 10,001 people have the test, of whom just one has the disease, that
person will almost certainly have a positive test, but so too will a further 100 people (1%) even
though they do not have the disease. This type of calculation is valuable when considering any
screening procedure, say for terrorists at airports.
DAWID RYSKI
Controls are important. A control group is dealt with in exactly the same way as the
experimental group, except that the treatment is not applied. Without a control, it is difficult to
determine whether a given treatment really had an effect. The control helps researchers to be
reasonably sure that there are no confounding variables affecting the results. Sometimes people
in trials report positive outcomes because of the context or the person providing the treatment, or
even the colour of a tablet3. This underlies the importance of comparing outcomes with a control,
such as a tablet without the active ingredient (a placebo).
Randomization avoids bias. Experiments should, wherever possible, allocate individuals or
groups to interventions randomly. Comparing the educational achievement of children whose
parents adopt a health programme with that of children of parents who do not is likely to suffer
from bias (for example, better-educated families might be more likely to join the programme). A
well-designed experiment would randomly select some parents to receive the programme while
others do not.
Seek replication, not pseudoreplication. Results consistent across many studies, replicated on
independent populations, are more likely to be solid. The results of several such experiments may
be combined in a systematic review or a meta-analysis to provide an overarching view of the
topic with potentially much greater statistical power than any of the individual studies. Applying an
intervention to several individuals in a group, say to a class of children, might be misleading
because the children will have many features in common other than the intervention. The
researchers might make the mistake of 'pseudoreplication' if they generalize from these children
to a wider population that does not share the same commonalities. Pseudoreplication leads to
unwarranted faith in the results. Pseudoreplication of studies on the abundance of cod in the
Grand Banks in Newfoundland, Canada, for example, contributed to the collapse of what was
once the largest cod fishery in the world4.
Scientists are human. Scientists have a vested interest in promoting their work, often for status
and further research funding, although sometimes for direct financial gain. This can lead to
selective reporting of results and occasionally, exaggeration. Peer review is not infallible: journal
editors might favour positive findings and newsworthiness. Multiple, independent sources of
evidence and replication are much more convincing.
Significance is significant. Expressed as P, statistical significance is a measure of how likely a
result is to occur by chance. Thus P = 0.01 means there is a 1-in-100 probability that what looks
like an effect of the treatment could have occurred randomly, and in truth there was no effect at
all. Typically, scientists report results as significant when the P-value of the test is less than 0.05
(1 in 20).
Separate no effect from non-significance. The lack of a statistically significant result (say a Pvalue > 0.05) does not mean that there was no underlying effect: it means that no effect was
detected. A small study may not have the power to detect a real difference. For example, tests of
cotton and potato crops that were genetically modified to produce a toxin to protect them from
damaging insects suggested that there were no adverse effects on beneficial insects such as
pollinators. Yet none of the experiments had large enough sample sizes to detect impacts on
beneficial species had there been any5.
Effect size matters. Small responses are less likely to be detected. A study with many replicates
might result in a statistically significant result but have a small effect size (and so, perhaps, be
unimportant). The importance of an effect size is a biological, physical or social question, and not
a statistical one. In the 1990s, the editor of the US journal Epidemiology asked authors to stop
using statistical significance in submitted manuscripts because authors were routinely
misinterpreting the meaning of significance tests, resulting in ineffective or misguided
recommendations for public-health policy6.
Study relevance limits generalizations. The relevance of a study depends on how much the
conditions under which it is done resemble the conditions of the issue under consideration. For
example, there are limits to the generalizations that one can make from animal or laboratory
experiments to humans.
Feelings influence risk perception. Broadly, risk can be thought of as the likelihood of an event
occurring in some time frame, multiplied by the consequences should the event occur. People's
risk perception is influenced disproportionately by many things, including the rarity of the event,
how much control they believe they have, the adverseness of the outcomes, and whether the risk
is voluntarily or not. For example, people in the United States underestimate the risks associated
with having a handgun at home by 100-fold, and overestimate the risks of living close to a nuclear
reactor by 10-fold7.
Dependencies change the risks. It is possible to calculate the consequences of individual
events, such as an extreme tide, heavy rainfall and key workers being absent. However, if the
events are interrelated, (for example a storm causes a high tide, or heavy rain prevents workers
from accessing the site) then the probability of their co-occurrence is much higher than might be
expected8. The assurance by credit-rating agencies that groups of subprime mortgages had an
exceedingly low risk of defaulting together was a major element in the 2008 collapse of the credit
markets.
Data can be dredged or cherry picked. Evidence can be arranged to support one point of view.
To interpret an apparent association between consumption of yoghurt during pregnancy and
subsequent asthma in offspring9, one would need to know whether the authors set out to test this
sole hypothesis, or happened across this finding in a huge data set. By contrast, the evidence for
the Higgs boson specifically accounted for how hard researchers had to look for it — the 'lookelsewhere effect'. The question to ask is: 'What am I not being told?'
Extreme measurements may mislead. Any collation of measures (the effectiveness of a given
school, say) will show variability owing to differences in innate ability (teacher competence), plus
sampling (children might by chance be an atypical sample with complications), plus bias (the
school might be in an area where people are unusually unhealthy), plus measurement error
(outcomes might be measured in different ways for different schools). However, the resulting
variation is typically interpreted only as differences in innate ability, ignoring the other sources.
This becomes problematic with statements describing an extreme outcome ('the pass rate
doubled') or comparing the magnitude of the extreme with the mean ('the pass rate in school x is
three times the national average') or the range ('there is an x-fold difference between the highestand lowest-performing schools'). League tables, in particular, are rarely reliable summaries of
performance.
Nature
503,
335–337
(21 November 2013)
doi:10.1038/503335a
References
1. Doubleday, R. & Wilsdon, J. Nature 485, 301–302 (2012).
o
Article
o
PubMed
o
ISI
o
ChemPort
Hide context
o
…Another proposal is to expand the role of chief scientific advisers1, increasing their number,
availability and participation in political processes… in article
2. Borsuk, M. E., Stow, C. A. & Reckhow, K. H. J. Water Res. Plan. Manage. 129, 271–282 (2003).
o
Article
Hide context
o
…The algae did not cause the deaths2 in article
3. Huskisson, E. C. Br. Med. J. 4, 196–200 (1974)
o
Article
o
PubMed
o
ChemPort
Hide context
o
…Sometimes people in trials report positive outcomes because of the context or the person providing
the treatment, or even the colour of a tablet3… in article
4. Millar, R. B. & Anderson, M. J. Fish. Res. 70, 397–407 (2004).
o
Article
Hide context
o
…Pseudoreplication of studies on the abundance of cod in the Grand Banks in Newfoundland,
Canada, for example, contributed to the collapse of what was once the largest cod fishery in the
world4 in article
5. Marvier, M. Ecol. Appl. 12, 1119–1124 (2002).
o
Article
o
ISI
Hide context
o
…Yet none of the experiments had large enough sample sizes to detect impacts on beneficial
species had there been any5 in article
6. Fidler, F., Cumming, G., Burgman, M., Thomason, N. J. Socio-Economics 33, 615–630 (2004).
o
Article
Hide context
o
…In the 1990s, the editor of the US journal Epidemiology asked authors to stop using statistical
significance in submitted manuscripts because authors were routinely misinterpreting the meaning of
significance tests, resulting in ineffective or misguided recommendations for public-health policy6 in
article
7. Fischhoff, B., Slovic, P. & Lichtenstein, S. Am. Stat. 36, 240–255 (1982).
o
ISI
Hide context
o
…For example, people in the United States underestimate the risks associated with having a
handgun at home by 100-fold, and overestimate the risks of living close to a nuclear reactor by 10fold7 in article
8. Billinton, R. & Allan, R. N. Reliability Evaluation of Power Systems (Plenum, 1984).
Hide context
o
…However, if the events are interrelated, (for example a storm causes a high tide, or heavy rain
prevents workers from accessing the site) then the probability of their co-occurrence is much higher
than might be expected8… in article
9. Maslova, E., Halldorsson, T. I., Strøm, M., Olsen, S. F. J. Nutr. Sci. 1, e5 (2012).
o
Article
o
PubMed
Hide context
o
…To interpret an apparent association between consumption of yoghurt during pregnancy and
subsequent asthma in offspring9, one would need to know whether the authors set out to test this
sole hypothesis, or happened across this finding in a huge data set… in article
Related stories and links
From nature.com
Research: A standard for policy-relevant science
11 September 2013
Nature focus: Science and politics
Nature focus: Meetings that changed the world
Author information
Affiliations
1. William J. Sutherland is professor of conservation biology in the Department of Zoology,
University of Cambridge, UK.
2. David Spiegelhalter is at the Centre for Mathematical Sciences, University of Cambridge.
3. Mark Burgman is at the Centre of Excellence for Biosecurity Risk Analysis, School of
Botany, University of Melbourne, Parkville, Australia.
Corresponding author
Correspondence to:
William J. Sutherland
For the best commenting experience, please login or register as a user and agree to
our Community Guidelines. You will be re-directed back to this page where you will see
comments updating in real-time and have the ability to recommend comments to other users.
Comments for this thread are now closed.
9 commentsSubscribe to comments
1.
Tal Galili•2013-12-08
06:39 PM
While the article offer nice simplifications of many important topics in understanding
scientific claims, the description of significance (e.g: p-value), is severely lacking. To
quote Wikipedia's section "misunderstandings of p-value" on this: 1) The p-value is
not the probability that the null hypothesis is true, nor is it the probability that the
alternative hypothesis is false – it is not connected to either of these. In fact,
frequentist statistics does not, and cannot, attach probabilities to hypotheses.
Comparison of Bayesian and classical approaches shows that a p-value can be very
close to zero while the posterior probability of the null is very close to unity (if there is
no alternative hypothesis with a large enough a priori probability and which would
explain the results more easily). This is Lindley's paradox. But there are also a priori
probability distributions where the posterior probability and the p-value have similar
or equal values. 2) The p-value is not the probability that a finding is "merely a fluke."
As calculating the p-value is based on the assumption that every finding is a fluke (that
is, the product of chance alone), it cannot be used to gauge the probability of a finding
being true. The p-value is the chance of obtaining the findings we got (or more
extreme) if the null hypothesis is true. https://en.wikipedia.org/wiki/Pvalue#Misunderstandings
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
2.
Patrick Elliott•2013-12-06
06:15 AM
I would add in one more: #21: Understand what placebo actually does mean. Placebo
isn't "does nothing". Rather, it can be very effective for pain, and even has been shown
to have other physiological effects, which can even include improvements in immune
responses (though, this and other effects are not as great as for pain). The reason being
that its literally the "mind over matter" effect. The default assumption is, among those
that don't believe in "mind over matter" is that all this stuff kind of runs on automatic,
and the brain has no control over it. Ironically, the "true believers" in mind over matter
seem to think that the mind (not the brain, duh...) has some sort of mystical, superpower to effect everything, if you just spend enough time meditating. The reality is a
bit more.. complicated, and vastly less mystical. Pain needs to react on a "huge" scale
to mental state. For a wounded animal, the ability to, unconsciously, fade out
"existing" pain from their wound, such as a badly injured leg, for the duration of a
fight, or escape, can be life or death. Even the ability to do things that are familiar, and
thus comforting, and thus reduce the pain, might be critical to handling the stress of
such injuries. In short, there is already a mechanism there, which can **selectively**
knock out pain, and humans have way more complex, not just conscious perceptions,
but layers of sub-conscious responses. Its hardly a surprise that, some where in there,
is a way for a placebo to selectively have a major effect on pain, even over long
periods of time (like for acupuncture, a practice based on the very silly idea that the
body contains nine rivers of power in it, with tributaries, just like ancient China, where
it was invented). Obviously, state of mind "might" be useful to enhance, or suppress,
intentionally or otherwise, immunity, digestion, and a whole lot of other semiautomatic things. Things that, when upset by certain conditions, or even conditioned
responses (I feel like running away every time someone gets sick at work, and I am the
one "assigned" to clean the floors that day, for example), which may have perfectly
reasonable causes (if everyone else is sick, it might be something you ate, so maybe
you need to be sick too?, for example), but which can be unlearned, or otherwise
modified by experience. So, rather than placebo being "does nothing", a more accurate
description is, "Takes advantage of preexisting triggers and experience, to cause
**limited** changes in what the body is doing." If you think about that for a moment,
you can see why its huge headache for medicine research, and a major problem for the
credibility of people selling things that are *not* tested as carefully. In both cases,
trust in the person giving it to you, trust that it will work, belief in the method being
employed, and other factors, will all "trigger" this placebo effect. Someone taking "big
pharma" drugs might feel sicker taking them, but better taking something given to
them by a guru, but its the lack of trust in the former, and the unreasonable trust in the
later, as well as the products being given, which is having the effect, not the actual
drugs themselves. Its a major pain for drug makers, to actually get provable results,
which are not **purely** due to everyone in the study believing in the medicine, and
the doctor. For the guru/herbalist, etc.... its less of a problem, since the people going to
them are probably completely certain it will help them, and trust them, even when,
from the stand point of actually treating the disease, it might be totally worthless. And,
that can kill people, who would have otherwise *been* cured, by proven medications.
Note: this isn't a denial of there being problems with Big Pharma, or with medicine as
its practiced. The former is about money (and so it big-CAM, which is quickly
creeping up, closer and closer to being worth as many billions), while doctors are all
too often paid to offer new drugs, instead of proven ones, paid by number of
procedures, instead of based on outcomes, and almost everything we rely on that
*does* undeniably work, is no longer even made by Big Pharma, but by smaller, less
exacting, more prone to errors, shortfalls, or price gouging, because those, often life
saving drugs, are, "no longer cost effective to produce." The same reason why some
promising drugs, for rare, sometimes even fatal, but possibly treatable, manageable, or
even curable, diseases, never make it to market at all. Its undeniable we have problems
with the industry, but.. its also undeniable that the only reason that doctors "generally"
are not allowed to push placebos, but insurance companies are willing to pay for
unlicensed, not recognized by the AMA, "specialists" for them instead, is because
insurance companies can't have their licenses pulled for lying to their patients about
the chances of something curing them. Instead, they just decide which one will cost
less to give you, something that isn't proven, but a lot of people like, but it vastly
cheaper, or something that might not help you anyway, but makes the doctor a lot of
money, involves lots of , sometimes, unnecessary and inconclusive, lab work and tests,
and then, might not help you, if you are ill enough, any more than the placebo would
have. Of course the insurance companies are willing to pay for that as an option...
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
3.
Vijendra Agarwal•2013-12-04
12:55 AM
te
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
4.
Mike Taylor•2013-12-03
01:33 PM
See also the excellent article in response by Chris Tyler (director of the UK's
Parliamentary Office of Science and Technology) on Top 20 things scientists need to
know about policy-making. These two articles make a very helpful complementary
pair.
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
5.
Edward Powell•2013-11-30
07:19 PM
To these twenty, I would add two more: 21) The predictive ability of computer
simulations is extremely limited, and should only be relied on when substantial and
detailed agreement of the simulation results with experimental data is achieved, *and*
when a detailed and thorough understanding of the elements of uncertainty in the
models and simulations is achieved. A true scientist tries his best to falsify, disprove,
or even "break" his models and simulations in any possible way, before relying on the
results. John von Neuman said, "With four parameters I can fit an elephant, and with
five I can make him wiggle his trunk." Any simulation with *any* free parameters is
suspect, and detailed, conclusive, and scientifically justified causal relationships must
be well understood for each and every free parameter before a simulation can even
begin to be considered reliable. 22) No scientist shall, under any circumstances, refer
to a simulation run as an "experiment". Simulations are not experiments, only data
taken from actual reality-based experiments can be called real data.
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
6.
Oliver H.•2013-12-07
06:22 PM
What is "reality-based"? The vast majority of basic research is not "reality-based", but
based on lab experiments that do not present reality but are actively controlled to keep
certain factors fix that would vary in "reality", so as to isolate an effect. Tissue culture
is nothing BUT a simulation as to what happens in actual tissue.
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
7.
Chris Atkins•2013-11-28
08:30 PM
This is an excellent article and a subject close to my heart. It could (and should) be
used with secondary (high) school students, particularly (but not exclusively) of
science. For that purpose each 'tip' would benefit from a short (one page) article which
describes the issue in more detail with one or two examples to bring it to life.
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
8.
Lee De Cola•2013-11-27
05:48 PM
this is a great list, but to be widely read it needs to be shortened...i've tried and come
up with 3 groupings: psychology (of scientists), experiments, and statistics - could we
get along with 3 or 4 tips in each group?
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
9.
James Scanlan•2013-11-27
05:29 PM
The is a useful list in many respects. But it overlooks what may be the most serious
problem in the interpretation of data on differences between outcome rates. Virtually
all efforts to interpret data ongroup differences in outcome rates, including data on
rates of treated and control subjects in clinical trials, are undermined by failure to
recognize patterns by which standard measures of differences between outcome rates
tend to be systematically affected by the prevalence of an outcome. The most notable
of these patterns is that by which the rarer an outcome the greater tends to be the
relative difference in experiencing it and the smaller tends to be the relative difference
in avoiding it. By way of example pertinent to item 20 on the list, lowering test cutoffs
(or generally improving test performance) tends to reduce relative differences between
rates at which two groups pass a test while increasing relative differences rates at
which they fail the test. Among other examples, reducing poverty tends to increase
relative differences in poverty rates while reducing relative differences in rates of
avoiding poverty; improving health tends to increase relative differences in mortality
and other adverse health outcomes while reducing relative differences in survival and
other favorable health outcomes; improving healthcare tends to reduce relative
differences in receipt of appropriate care while increasing relative differences in rates
of failing to receive appropriate care; reducing adverse lending outcomes or school
suspension rates tends to increase relative difference in experiencing those outcomes
while reducing relative differences in avoiding those outcomes. It is not possible to
interpret data on group differences or advise policy makers on such issues without
knowing these things. Yet very few people interpreting data on differences in outcome
rates know that it is even possible for the two relative differences to change in
opposite directions, much less that they tend to do so systematically. A number of
references explaining these and related patterns and their implications in varied
contexts are listed below.
Reference 1 provides a fairly succinct explanation of the pattern of relative differences
described above and does so in the course of explaining that, as a result of the failure
to understand the pattern, the US government encourages lenders and schools to take
actions that make it more likely that the government will sue them for discrimination.
Reference 8 explains some of the clinical implications of the failure to understand the
pattern, explaining as well that the rate ratio mentioned as a measure of effect in item
20 of the list is not merely a flawed measure of effect, but an illogical one. References
9 and 10 are discussions of the above-described pattern of relative differences by other
persons. The latter reference observes that governments that ignore the pattern “run
the risk of guaranteeing failure, largely for conceptual and methodological reasons
rather than social welfare reasons.” The observation was focused on the meeting of
health inequalities reduction goals cast in relative terms. But, as reflected in references
1 through 8, failure to understand the pattern, and relative patterns by which measures
tend to be affected by the prevalence of an outcome, undermines society’s
understanding of a great many things.
1. “Misunderstanding of Statistics Leads to Misguided Law Enforcement Policies”
(Amstat News, Dec. 2012):
http://magazine.amstat.org/blog/2012/12/01/misguided-law-enforcement/
2. “Measuring Health and Healthcare Disparities,” Federal Committee on Statistical
Methodology 2013 Research Conference, Nov. 5-7, 2013.
http://jpscanlan.com/images/2013_Fed_Comm_on_Stat_Meth_paper.pdf
http://jpscanlan.com/images/2013_FCSM_Presentation.ppt
3. “The Mismeasure of Discrimination,” Faculty Workshop at the University of
Kansas School of Law, Sept. 20, 2013:
http://jpscanlan.com/images/Univ_Kansas_School_of_Law_Faculty_Workshop_Paper
.pdf
http://jpscanlan.com/images/University_of_Kansas_School_of_Law_Workshop.pdf
4. “The Mismeasure of Group Differences in the Law and the Social and Medical
Sciences,” Applied Statistics Workshop at the Institute for Quantitative Social Science
at Harvard University, Oct. 17, 2012:
http://jpscanlan.com/images/Harvard_Applied_Statistic_Workshop.ppt
5. “Can We Actually Measure Health Disparities?” (Chance, Spring 2006):
http://www.jpscanlan.com/images/Can_We_Actually_Measure_Health_Disparities.pd
f
6. “The Misinterpretation of Health Inequalities in the United Kingdom,” British
Society for Populations Studies Conference 2006, Sept. 18-20, 2006:
http://www.jpscanlan.com/images/BSPS_2006_Complete_Paper.pdf
7. “Race and Mortality” (Society, Jan./Feb. 2000):
http://www.jpscanlan.com/images/Race_and_Mortality.pdf
8. Subgroup Effects subpage of the Scanlan’s Rule page of jpscanlan.com
http://www.jpscanlan.com/scanlansrule/subgroupeffects.html
9. Lambert PJ, Subramanian S. Disparities in Socio-Economic outcomes: Some
positive propositions and their normative implications. Society for the Study of
Economic Inequality Working Paper Series, ECINEQ WP 2012 – 281:
http://www.ecineq.org/milano/WP/ECINEQ2012-281.pdf
10. Bauld L, Day P, Judge K. Off target: A critical review of setting goals for reducing
health inequalities in the United Kingdom. Int J Health Serv 2008;38(3):439-454:
http://baywood.metapress.com/app/home/contribution.asp?referrer=parent&backto=is
sue,4,11;journal,7,157;linkingpublicationresults,1:300313,1
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
10.
Benjamin Allen•2013-11-26
11:04 PM
An excellent article - well done! Translation from science into policy is made even
more difficult when scientists themselves produce and promote unreliable science.
Sutherland et al's 20 tips are well worth the read. For an ecological example that
epitomises many of the 20 points they raise, see
'http://www.sciencedirect.com/science/article/pii/S0006320712005022'.
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
11.
Douglas Duncan•2013-11-26
06:38 PM
Almost all those politicians have been to university, but these aspects of science are
rarely discussed with non-science majors there. They need to be! I have been doing so
for about 6 years, and have published the effects it has on students. Remember, these
people have probably never been to a scientific meeting; they have little idea how
scientists interact, and they sure don't learn that in a classroom, because most of us
behave entirely differently there than we do with colleagues. The way scientists
interact, how they arrive at consensus, what causes them to doubt (even the basic
distinction between peer reviewed and non-peer reviewed publication) are things they
have never encountered (according to our interviews of non-science majors). Anyone
is welcome to use my curriculum, which is here (along with reference to the published
results): http://casa.colorado.edu/~dduncan/pseudoscience/
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
12.
Bart Penders•2013-11-26
09:44 AM
The public credibility of science and the trust non-scientists place in scientific claims
and the institution of science is not primarily derived from the methodological quality
of scientific inquiry. The list above is worthwhile to determine that quality (or any
threats to it), but not to help establish the public credibility (or trust). Let me quote
Harvard historian of science Steven Shapin who wrote, eloquently: "When King Lear
decided to take early retirement, he announced his intention to divide up the kingdom
among his three daughters, each to get a share proportioned to the genuine love she
bore him. Each is asked to testify to her love. For Goneril and Regan that presents no
problem, and both use the oily art of rhetoric to good effect. Cordelia, however, trusts
to the authenticity of her love and says nothing more than the simple truth. For Lear
this will not do. Truth is her dower but credibility has she none. Cordelia, we should
understand, was a modernist methodologist. The credibility and the validity of a
proposition ought to be one and the same. Truth shines by its own light. And those
claims that need lubrication by the oily art thereby give warning that they are not
true". (Shapin 1995). Cordelia acts from the conviction that contained in her claims is
the truth and that it can be accessed by those who hear it. The credibility of her claim
is evident to herself - but only to herself. For a scientist to convince non-scientists,
persuading them to see the truth as (s)he does, requires - next to that truth - a
construction of credibility. Scientists have very strict and shared rules to establish the
credibility of their claims, these include notions of peer review, citations, transparant
methodologies and much more. They are part of the well-known academic
credibilising strategy in which the work scientists do to make their data, arguments,
theories or claims become stable and uncontested truths (i.e. to achieve absolute
credibility). "The same work enables them to conduct further research by
strengthening their reputation and attracting new funding. The process of gathering
credibility is never-ending and cyclical: it drives the credibility cycle. Important in this
process are moments of conversion, or translation. Latour and Woolgar show how
time, money and effort are translated into data; data is translated into arguments,
which are subsequently written down in publications. Peer scientists will read or cite
them, resulting in recognition for their claim or argument. This recognition can be
mobilised to support new funding requests. The new funding, translated into new
personnel, projects or machineries, will produce new data, continuing the cycle. Every
translation that takes place, contributes to the credibility of a given scientific claim,
which extends to the researchers, laboratory or institute making the claim. If the
‘scientific wheel of credibility’ continues to turn, it will uphold or heighten the
credibility of claims and claimants. Academic conventions, such as scientific
publications, citations and peer review are part of this credibility cycle. The credibility
cycle as a strategy to construct credibility is powerful only within the scientific
community: outside the university walls, peer review and citation cultures are
relatively unknown and cannot accrue similar credibility." (Penders 2013). Nonscientists (as well as scientists, when it comes to non-scientific claims) use other
criteria, standards and strategies to evaluate the credibility of claims. What matters
most, in real life, are the shape or form (visually, or rhetorically) of a message, the
medium or location of a claim (e.g. quality newspaper vs. casual water-cooler chat),
the degree in which a claim specifically addresses a problem (or question) the
audience identifies with, the source of the claim (a trusted colleague or friend vs. an
unknown online forum) and the relationship between the claim and the audience. The
route a scientific claim takes before it reaches any given policy maker, may matter
more than its content. Who whispers it into her ear may matter more than its content
and the dominant political climate may matter more than the content of the claim. The
authors wisely acknowledge that they "are not so naive as to believe that improved
policy decisions will automatically follow". The reason for that is that any scientific
claim wildly varies in its credibility over time, location and much more. That
credibility is not derived from its content, but it is the reason why a claim is takes
seriously, acted upon, insert into or excluded from policy.
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
13.
Gavin Cawley•2013-11-25
11:35 AM
For policy related issues, the tip I would add would be: Always interpret a scientific
finding in the way that provides the least support for ones current position. This
helps to guard against the sort of confirmation biases etc. to which we are all
susceptible. If the finding genuinely supports your position, it will still do so under its
least favourable interpretation, and adopting this interpretation will demonstrate your
self-skepticism.
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
14.
Anand Ramanathan•2013-11-21
07:11 PM
The article is good about warning policy makers about new scientific claims.
However, it is a little worrisome that the same logic can be applied to any science to
reject established results such as the warming over the last few decades or the effect of
smoking. I would add one more point. Science typically does not change overnight
and results that have been tested and confirmed over years (or decades) are unlikely to
be wrong. Any new result that seeks to overthrow some bit of well-established science
should be backed by a much higher standard of proof, such as independent
replications, or carefully controlled studies with large sample sizes.
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
15.
Gavin Cawley•2013-11-25
11:42 AM
In the case of the warming of the last few decades, I would say this is addressed by the
points "Separate no effect from non-significance" and "Data can be dredged or cherry
picked" as such arguments are normally based on the lack of a statistically significant
warming trend since some cherry picked start (e.g. 1998 El-Nino) and ignore the
statistical power of the test. A test for the existence of a change in the underlying rate
of warming would likely also give a non-significant result.
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
16.
Christopher Tong•2013-11-21
06:37 AM
A few other tips might include an awareness of Simpson's Paradox, the Curse of
Dimensionality, and the ease with which data models can overfit one data set and fail
to generalize to others. The distinction between exploratory and confirmatory findings
should also be emphasized. Exploratory research seeks to generate new hypotheses,
and confirmatory research aims to evaluate pre-specified hypotheses. Sometimes a
single study will have both confirmatory and exploratory findings, as follows. The
confirmatory claims are pre-specified before the data was collected, and their
consistency with the subsequent data are then evaluated. Exploratory findings are
tentative findings of other, unanticipated patterns in the same data. The
epistemological status of both kinds of findings must be kept distinct. Finally, data
analysis methods must be chosen for fitness for purpose, and not by pattern matching,
as is too often the case.
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
17.
Brad Louis•2013-11-21
03:40 AM
Great list. Can I suggest a few more? We don't know everything (Science is never
settled. Any scientist can be wrong.) - that's why we keep investing in scientific
research. We will know more in another hundred years, by which time a lot of what
we think we know now will be proven to be nonsense. Proper experiments produce the
most robust scientific evidence - good experimental design can eliminate many of the
problems mentioned in earlier tips. Observational studies and computer models are not
a patch on proper experiments. Science funding supports a process not an outcome well done science doesn't pretend to know the outcome of research, diversity of
research perspectives is a good thing.
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
18.
Ian Campbell•2013-11-20
11:00 PM
This article provides a valuable summary of issues that need to be appreciated in
interpreting science. However, in the environmental decision making arena, the
science is never sole factor to be considered - there is always a value judgement as
well. Science tells us the consequences of greenhouse gas emissions, or the likelihood
that logging a particular area of forest will cause the extinction of a particular species,
but then we have to make a value judgement. Do I value the profits I make from my
power station more than the contribution it makes to global warming? Do I value the
timber products I can get by logging this forest more than the possum that will become
extinct? As value judgements these decisions should be informed by the science, but
are made by politicians who are expected to reflect the values of their society.
Politicians are always averse to making value judgement calls that will be unpopular
with a substantial proportion of their community, or an influential group within the
community, so they often try to deflect that responsibility. Two common deflection
techniques are appeals to (often dodgy) short term economics (it would be too
expensive, cost too many jobs), and raising doubts about the science either by citing
poor science, or by suggesting that there is a lack of scientific consensus on the issue.
It is difficult to tell whether a true lack of scientific understanding, as addressed in the
article, or a misunderstanding of convenience to deflect criticism is a greater problem.
o
o
Report this comment
Share to TwitterShare to FacebookShare link to this comment
19.
Fred Sachs•2013-11-21
05:16 PM
The article does emphasize some important issues but at the core is how one might
evaluate the contributions of all these factors. The probability of success in a project
cannot be calculated and a legislator would have to assign a weighting factor to each
of the 20 criteria, let alone the societal consequences, to decide on the level of support
and it can't be done. Real answers like this depend on numbers and we don't have the
information to generate the numbers. I suspect that we are going to have to put up with
intuition on the part of the policy makers. I have a few specific issues. One is that 5%
significance is an entirely subjective choice by the researcher and has no significance
if the distribution is not Gaussian. That should be the first test. Even at 5% there is a 1
in 20 chance that the hypothesis is wrong. Does this data represent that time?
Especially in clinical data one should know the probability of getting better compared
to the probability of getting worse. You should not do a one sided test to ask if
subjects got better and ignore the times they got worse. The outcome should also be
weighted by the payoff; a number not easily evaluated. Did the positive patients feel
better while the negative ones died? Finally, for the cause and effect issue I like to
bring up something seen a billion times a day: the roosters crowed and then the sun
came up; clearly the roosters caused sunrise.
© Copyright 2026 Paperzz