Selected Problems of Teaching Probability and Statistics

WDS'10 Proceedings of Contributed Papers, Part I, 116–120, 2010.
ISBN 978-80-7378-139-2 © MATFYZPRESS
Selected Problems of Teaching Probability and Statistics
V. Línek
Charles University Prague, Faculty of Mathematics and Physics, Prague, Czech Republic.
Abstract. Although courses in statistics and probability run at many universities
and colleges in the Czech Republic, the results of this education are often regarded
as insufficient. In this article, we focus on some particular aspects of this problem:
various examples such as diagnostic tests in medicine or probability thinking in
criminology are discussed as well as ways of cultivating statistical thinking.
Introduction
Results of various ways of education in statistics and probability and the level of statistical
thinking among non-mathematical public appear to be very poor both in the Czech Republic
[Špinková, 2007] and abroad [Gigerenzer, 2002]. Apart from the overall shortage of mathematical
literacy, some other reasons for this situation in our country can be pointed out: first of all, there is
very little teaching of statistics and probability at primary and secondary schools. At primary schools,
basic terms of descriptive statistics should be explained, but it doesn’t happen very often due to the
lack of interest and time of teachers. At secondary schools, basics of probability theory are studied.
However, it is held rather as a combinatory exercise than something connected with the “real world”.
As a result, courses in statistics held at colleges and universities are too difficult for students and
make them to learn nothing but formal algorithms. Thus, even if laymen know how to use particular
statistical methods, they are often not conscious about the main purpose of inductive statistics – to
analyze statistical data objectively [Zvárová, 2007]. Instead, statistics is regarded as some kind of
mystery, which can produce any result we want and has nothing to do with the common sense. This
attitude leads not only to incorrect use of statistics, but also to distrust in its results on one hand and
overestimating them on the other.
As the state has been lasting for many decades, an easy and simple solution cannot be expected;
the mathematical and pedagogical public’s attention should be drawn at that point, which is a modest
aim of this article.
Examples from scientific practice
As we occasionally cooperate with physicians and surgeons with statistical analysis of their data,
we can present some examples of flaws occurring in their statistical thinking at various levels. It is our
intention neither to ridicule them nor to state that physicians are not able to use statistics correctly; we
just want to show the ways how the statistical thinking can be distorted and what are possible sources
of this.
Case A
A student of psychology brought 180 questionnaires filled in by participants of some kind of
therapy. Her wish was to prove that by these questionnaires, the participants were “happier than other
people”. She was surprised when she was told that questionnaires by those “other people” were needed
for this aim. As it was impossible, we analyzed these data just by means of descriptive statistics. Three
months after the work had been finished (and paid), she discovered results from another researcher
analyzing the same questionnaires even with the formula which was used for it. Thus all the work
previously performed was found useless and done (and paid) again.
In this case, the data were collected without any idea of how statistics is used. The only one twosample t-test performed by the student during the course in statistics could have prevented the student
from wasting money and time.
116
LÍNEK: SELECTED PROBLEMS OF TEACHING PROBABILITY AND STATISTICS
Case B
A physician needed to find a difference between two groups of patients, divided by the depth of a
tumor. The boundary depth between the groups was stated 5 mm. After we had given the results in,
“new” data were sent to analyze with the boundary depth 4 mm “so that the results were better”.
We can see a typical example of statistical dishonesty here; statistics is not used as an objective
way to analyze the data, but as a way to get the results we want. Unfortunately, this is a much more
common way how statistics is used then the correct one: future writers of articles are more interested
in results they can publish then in being objective. This is not surprising and can hardly be changed.
However, the sinners are rarely conscious that they sin and teachers of statistics might be able to
reform at least partly the situation by a strong warning against this attitude.
Case C
A surgeon needed to show that the duration of a particular kind of surgery depends on the obesity
of a patient. He had divided the patients into two groups – fat and thin ones – and brought the average
values of duration of the operation for both of them. Unfortunately, he did not have the primary data –
just the averages. The order was to simulate the data having the same averages and showing the fact he
wanted to show via a two-sample t-test. We were assured that such a deception is a quite harmless way
as he had been performing the surgery for many years and he “knows” that he is right.
This situation shows more than just a simple dishonesty. Thinking it over carefully, we can find
two paradoxical moments here. Firstly, using statistics could be regarded as needless even if the
primary data were available, as the validity of the considered hypothesis is obvious. Secondly, the
surgeon uses his authority and experience to persuade the statisticians that he is right so that they
could persuade other surgeons about it. Why such a waste of energy? His common sense and
experience are convincing for him; why does he think it is not enough for the other surgeons? It seems
that statistics has completely lost its original purpose here and is used as a kind of certificate which
needs to be purchased when any piece of knowledge is to be published or taken seriously.
Case D
A student of post-graduate course in medicine commented her feelings of despair about
cooperation with a statistician during her research in these words: “I don’t understand the statistics at
all. I don’t even know the form I should give him the data in!”
This expression shows substantial failure of statistical education. Really, it is quite easy to explain
a student what to do with their data if the data is clearly arranged; if it is not, a serious problem arises.
There is much work for teachers at grammar schools in this area.
Diagnostic tests
Diagnostic tests are methods used in medicine to determine the state of patients, e.g., whether they
suffer from a particular disease, or some other unknown condition. Since there is nothing certain in
this world, the results of tests are not absolutely reliable, and interesting statistical and didactical
problems arise here.
Two basic quantities characterizing the reliability of a test are its sensitivity (probability that the
result of the test is positive given that the patient has the disease), and its specificity (probability that
the result of the test is negative given that the patient does not have the disease). We present these
characteristics of several tests in Table 1 together with the prevalence of the tested condition (the
proportion of individuals with the tested condition in a particular population; in other words, the
probability that the patient chosen by chance meets the condition). It is obvious that the prevalence
depends on the selected population.
The trouble is that these numbers are of no importance for patients. They are much more
interested in the false positive rate (probability that the patient does not have the disease given that
he/she has tested positive) and false negative rate (probability that the patient has the disease given
that he/she has tested negative). The standard method of calculation is via the Bayes rule, leading to
formulas:
117
LÍNEK: SELECTED PROBLEMS OF TEACHING PROBABILITY AND STATISTICS
Table 1. The sensitivity and the specificity of selected diagnostic tests and the prevalence of the tested
diseases (or some other unknown conditions) in a relevant population. 1
condition
sensitivity
specificity
prevalence
99.9%
99.99%
0.01%
breast cancer (mammogram)
90%
93%
0.8%
colorectal cancer (test FOBT)
50%
97%
0.3%
pregnancy tests
82%
64%
5.5%
HIV
false positive rate = P ( H + ) =
false negative rate = P ( D −) =
P(+ H ) ⋅ P( H )
P (+ H ) ⋅ P( H ) + P(+ D) ⋅ P( D)
P (− D) ⋅ P ( D)
P(− D) ⋅ P ( D) + P(− H ) ⋅ P( H )
,
,
where following abbreviations are used:
H ... the patient is healthy
D ... the patient has the disease,
+ ... the result of the test is positive,
– ... the result of the test is negative.
Unfortunately, the idea represented by the Bayes rule seems to be too difficult and quite
unavailable for the doctors, who are rarely interested in mathematics. Thus they usually know very
little about uncertainties connected with the results of these tests. This can occasionally lead to
harmful misunderstandings or even to a tragedy [Gigerenzer, 2002].
The subject of diagnostic tests is more deeply analyzed e.g. by Anděl [2007] and Hacking [2001].
Didactics of diagnostic tests
Gigerenzer [2002] claims that the conditional probabilities represent the source of these
difficulties as our minds are not adapted for them. Instead, he suggests formulating the problem in socalled “natural frequencies”, i.e. absolute numbers of particular results in an ideal population of some
round number, e.g. 10,000 people. To prove the legitimacy of this attitude he asked 48 physicians to
estimate the chances of breast cancer for a woman aged 40 to 50 given a positive mammogram in a
routine screening. The input data were given in two possible versions as follows:
Version A (conditional probabilities)
“The probability that one of these women has breast cancer is 0.8%. If a woman has breast
cancer, the probability is 90% that she will have a positive mammogram. If a woman does not have
breast cancer, the probability is 7% that she will still have a positive mammogram. Imagine a woman
who has a positive mammogram. What is the probability that she actually has breast cancer?”
Version B (natural frequencies)
“Eight out of every 1,000 women have breast cancer. Of these with breast cancer, 7 will have a
positive mammogram. Of the remaining 992 women who don’t have the breast cancer, some 70 will
still have a positive mammogram. Imagine a sample of women who have positive mammograms in
screening. How many of these women actually have breast cancer?”
One half of the physicians were given the version A while the other half received the second one.
We present the results of this experiment in Table 2.
1
Data from [Gigerenzer, 2002], [Katz, 2009].
118
LÍNEK: SELECTED PROBLEMS OF TEACHING PROBABILITY AND STATISTICS
Table 2. Results of Gigerenzer’s experiment.
Incorrect answers
Correct answers
Total
Version A
Version B
22 (92 %)
13 (54 %)
2 (8 %)
11 (46 %)
24 (100 %)
24 (100 %)
The results are convincing enough. The question remains, whether the physicians would be able
to “translate” the language of conditional probabilities into that of natural frequencies. In the
experiment it was done by someone else but it cannot be done each time when a new test is used.
A quite easy way to do this is to allocate the population into the natural frequencies according to
the sensitivity, specificity and prevalence in the 2 by 2 contingency table. The data are clearly
arranged here and we can even display the characteristics of the test graphically (Fig. 1).
In fact, such tables are the very way how the specificity and the sensitivity are explained to
students of medicine. Unfortunately, the effect of the prevalence on the false rates can easily be
forgotten here, as the inserted data comes usually from experiments, where the proportion of patients
with the disease differs from that in the normal population [Linn, 2004]. The effect is essential: the
lower the prevalence, the higher the proportion of true and false positive results and the higher the
false positive rate.
Therefore the ability the students of medicine should gain during their education is to fill in the
table according to the prevalence, the sensitivity and the specificity just like in the way which was
suggested in the Version B. It seems this didactic attitude would be more effective than memorizing
the Bayes rule, which is usually forgotten anyway.
Probability and criminology
Examples of using probability in criminology are interesting for teachers, as they show in an
absorbing way how the clouded probability thinking can lead to incorrect conclusions. We present
here a case described in [Gigerenzer, 2002]. It concerns conditional probabilities again.
Case E
A woman was murdered and her husband was accused from the murder. The strongest argument
against him was the fact that he had battered her repeatedly in the past; therefore it could be regarded
as probable that he killed her as well. The defense, consulting the situation with a renowned law pro-
INFECTED HEALTHY
Σ
POSITIVE
90
180
270
NEGATIVE
10
720
730
Σ
100
900
1000
FALSE POSITIVE RATE
67 %
FALSE NEGATIVE RATE
1.4 %
SENSITIVITY
90 %
SPECIFICITY
80 %
Figure 1. Natural frequencies and in 2 by 2 table and their graphical displaying. The prevalence is
10 %. The grey fields represent false results.
119
LÍNEK: SELECTED PROBLEMS OF TEACHING PROBABILITY AND STATISTICS
fessor, objected that out of 100,000 women battered by their partners only 40 of them are killed by the
partners annually. In other words, he stated that the conditional probability of a man murdering his
wife given he had battered her, was too small to be useful on trial:
P (a man will kill her wife ⎪ he batters her) = 40:100,000 = 0.000,4.
In the end, the jury decided that the defendant was innocent 2 . It is not known whether they were
affected by the probability argument. In any case, the argument is not correct. Why?
Out of 100,000 women battered by their partners, 40 are killed annually. But only 5 of the
remaining women are killed each year by someone other than their partners (Fig. 2).
Thus the probability which is relevant in this situation is:
P (a man will kill her wife ⎪ he batters her and she will be killed) = 40:45 = 0.89.
In other words, 40 out of every 45 murdered and battered women have been killed by their
batterers. The probability theory gives a fairly strong case against the defendant, not for him.
women battered
by their partner
100,000
40
women killed by
someone other than by
their partner
5
women killed by
their partner
Figure 2. A schema of the probability arguments discussed in the presented case of murder.
Conclusion
We presented several examples of practical using and misusing statistics and probability which
show why these parts of mathematical education should be focused on. Two remarkable sources of
problems were suggested: a misconception about the purpose of statistics resulting in statistical
dishonesty, and conditional probabilities, which are very tricky and their incorrect use can have
serious consequences. Both of these issues deserve to be addressed carefully. To look for other practical examples and better methods of teaching are certainly a convenient way how to deal with them.
Acknowledgments. The author would like to thank RNDr. Magdalena Hykšová, PhD. for her patient and
kind help with work on this article.
References
Anděl, J., Matematika náhody, Matfyzpress, Praha, 2007.
Gigerenzer, G., Calculated Risks, Simon & Schuster, New York, 2002.
Hacking, I., An Introduction to Probability and Inductive Logic, Cambridge University Press, Cambridge, 2001.
Katz, D., False Positives & False negatives. Lecture Notes for the Course Introductory Statistics, Preston Ridge
Campus, Frisco, 2009
Linn, S., A New Conceptual Approach to Teaching the Interpretation of Clinical Tests, Journal of Statistics
Education, Volume 12(2004), Number 3, čísla stran.
Špinková, M., Statistical Notions and Learning of Statistics, in: WDS´07 Proceedings of Contributed Papers,
Part I, Matfyzpress, Praha, 2007.
Zvárová, J., Základy statistiky pro biomedicíncké obory I., Karolinum, Praha, 2007.
2
The case happened in 1995 in the USA. The accused man was black, the murdered woman was white and
eight out of twelve jurors were black women. Therefore the case was strongly watched by the public because of
its gender and racial connotations.
120