Entropy as a Measure of Dispersion

Entropy as a Measure of Dispersion
The entropy1 of a relative frequency distribution is a useful measure of
dispersion for ordinal and nominal data. It is calculated using the following
formula (Shannon, 1948):
k
H = −∑ pi log 2 ( pi )
i=1
where H is the entropy of distribution, k is the number of possible outcomes,
and pi is the relative frequency of the ith outcome. For example, imagine
that one flips a coin 10 times and gets the following results:
€
Possible
Outcomes
Heads
Tails
Raw
Frequency
4
6
Relative
Frequency
0.4
0.6
The entropy of the observed distribution would be computed as follows:
H = −0.4 log 2 (0.4) − 0.6log 2 (0.6) ≈ −0.4(−1.3) − 0.6(−0.7) = .52 + .42 = .94 bits
€
Note that the actual base of the log is arbitrary. Log base 2 is frequently
used by convention, in which case entropy is reported in units of “bits.”
Minimum entropy (i.e., minimum dispersion) occurs when only one possible
outcome is observed (e.g., you flip a coin multiple times and it only comes
up heads), in which case the entropy of the observed distribution is 0.
Maximum entropy (i.e., maximum dispersion) occurs when each possible
outcome occurs an equal number of times (e.g., you flip a coin 10 times and
get 5 heads). The maximum possible entropy value, Hmax, increases with the
number of possible outcomes. For example, if there are only two possible
outcomes (e.g., a coin toss), Hmax is 1 bit. However, if there are four
possible outcomes (e.g., counting the number of Freshman, Sophomores,
Juniors, and Seniors in a class), Hmax is 2 bits.
1
Entropy is also referred to as the “Shannon-Weiner diversity index” or “ShannonWeaver index” (Zar, 1999, pg. 41).
Because the maximum possible entropy value depends on the number
of possible outcomes, some researchers prefer to use “relative entropy2,” J,
as a measure of dispersion (Zar, 1999, pg. 41):
J=
€
H
H max
For instance, the relative entropy of the above coin flip example is .94 as our
observed entropy, H, was .94 bits and the maximum possible entropy when
there are two outcomes, Hmax, is 1 bit. J provides a sense of how close a set
of observations is to maximum or minimum dispersion.
Finally, for those of you with calculators that do not have a log base 2
function, you can compute the log2 of a number, x, using log10 (or any other
base log) from the following formula;
log10 x
= log 2 x
log10 2
For example, to compute log2(0.5) you could go through the following steps:
€
log10 (0.5) −.301
=
= −1 = log 2 (0.5)
log10 (2)
.301
References
€
Shannon, C. E. (1948). A mathematical theory of communication. Bell
System Technical Journal, 27, 379-423, 623-656.
2
Relative entropy is also called “eveness” or “homogeneity” (Zar, 1992, pg. 41).
Zar, J. H. (1999). Biostatistical Analysis (Fourth Ed.). Upper Saddle River,
New Jersey: Prentice Hall.