Solomonic decisions

Solomonic decisions
How much would not have paid King Solomon to know something more about
calculating odds!. And he was quite wise. But surely, if he had had a
minimal understanding of statistics, his decisions would have been much
easier. And, of course, he almost certainly would not have had to cut
children in half. Clear that, in this case, he would not be famous now.
Historical characters are like popular festivities: the wilder, the more
preferred.
And to show you what I mean I’m going to imagine, as usual, an example
so stupid that you will end up wanting to keep reading.
Let’s suppose for a delusional moment that I’m a security guard at a
giant candy store. Someone tell me that a child has been caught with a bag
of candy that has been allegedly stolen from the giant barrel full of candy
in the store. The poor kid says he has done nothing wrong and that he has
bought the bag at another store but, of course, what could he say?. What
could we do?. I know… split the child in half, as would King Solomon.
But anyone immediately realizes that this solution is not a good one.
Who knows?, the poor child could be innocent, as he claims. So let’s think
a little about how we can find out if the candies come from our shop or
from ours competitors.
The store clerk tell us that in the barrel 25% of the candies are orange
flavor, 20% strawberry, 20% mint, 25% coffee and 10% chocolate. So we look
into the child’s bag and find out that it has 100 candies of the following
flavors: 27 orange, 18 strawberry, 20 mint, 22 coffee and 13 chocolate.
If those candies were from our barrel, the flavor distribution would be
the same in both the barrel and the bag. From a practical point of view, we
may assume that the robber pulled out 100 random candies from the barrel
(we can’t follow this reasoning if he has selected the candies by its
flavors).
So the question is simple: does the bag’s distribution of flavors
support the candies come from a random sample of our barrel?. Small
differences would be due to sampling error, so we state our null hypothesis
that the kid has stolen our candy.
First, we think about the theoretical distribution that would have to
have the candy and compare it with the distribution they have, always
assuming
that
null
hypothesis
is
true.
We want
to know if the difference between expected and observed distributions can
be explained by chance. But if we add the differences between them they
cancel each other and the end result is zero. As we know this is always
going to happen, what we do is square the differences (to eliminate
negative signs) before adding them. The problem is that it is not the same
to expect 2 and get 7 than to expect 35 and get 40. Although the difference
equals five in both examples, it seems clear that margin of error is
greater in the first case. This is why we standardize the differences
dividing them by the expected value. And, finally, we add these results to
obtain a certain value, which in our example is 1.08.
And, 1.08 is a lot or a few?. It depends, sometimes it will be a lot and
others a few. But we do know that this value approximately follows a chisquare probability distribution with a number of degrees of freedom equal
to the number of categories (flavors in our example) minus one.
Know we can calculate the probability of a chi value of 1.08 with four
degrees of freedom. We can use a computer program, a table of probabilities
or one of the available calculators on the Internet. We come up with a p
value of 0.89 (89%). As is greater than 5%, we cannot reject the null
hypothesis, so we conclude that the child is not only a thief, but also a
liar. His bag of candy is representative of a random sample obtained from
our barrel.
You have seen how easy it is to check the origin of a sample by applying
the chi-square test. But this test is not only good for studying the origin
of a random sample. It can also be used to check if there is any dependence
among qualitative variables. But that’s another story…