HMM Handout - Probability Primer.pdf

10 Equations in Biology Series
Seminar #11: Hidden Markov Models
Nov. 28, 2012
1. Basics of Probability
Probability measures the chance that a specific event will occur. By definition, an event’s
probability must lie between 0 (no chance of occurring) and 1 (100% chance of occurring).
The probability of an event E is usually written as P(E). For example, if R represents the
occurrence of rain on a particular day, then P(R) = 0.4 means a 40% chance of rain on that
day. The alternative notations p(E), Pr(E), and Prob(E) are also common: all of these
mean exactly the same as P(E).
Table 1: Rules for working with probabilities
A complex event may consist of a
Event
Probability Assumptions
combination of several simpler
NOT
A
1 – P(A)
none
events. Table 1 summarizes useful
A
OR
B
P(A)
+
P(B)
A and B are mutually exclusive
rules for relating the complex event’s
A AND B P(A) P(B)
A and B are independent
probability to that of the simpler
events. For example, if there is a 40% chance of rain today and a 30% chance of rain
tomorrow, and if today’s weather has no effect on tomorrow’s, then the probability of rain
on both days is:
𝑃(𝑅 𝑇𝑜𝑑𝑎𝑦 𝐴𝑁𝐷 𝑅 𝑇𝑜𝑚𝑜𝑟𝑟𝑜𝑤) = 𝑃(𝑅 𝑇𝑜𝑑𝑎𝑦) ∙ 𝑃(𝑅 𝑇𝑜𝑚𝑜𝑟𝑟𝑜𝑤) = 0.40 ∙ 0.30 = 0.12.
2. Conditional Probability and Bayes’ Theorem
Two events A and B are said to be independent if the outcome of one event has no effect
on the probability of another. For example, consider two flips of a fair coin. Let event F be
“The 1st flip comes up heads” and S be “The 2nd flip comes up heads.” Define P(S|F) as the
conditional probability of event S given event F; in other words, the probability that the
2nd flip comes up heads once the first flip has already come up heads. Because each flip’s
probability is independent of all other flips, P(S|F) = P(S) = 0.50. In general, two events A
and B are independent if and only if P(A|B) = P(A) and P(B|A) = P(B).
Many important events are NOT independent of each other. For
Table 2: Height distribution
example, imagine choosing one student at random from a biology
in a biology class
class (see Table 2). Let event T be “The chosen student is at least 6’
< 6’ ≥ 6’ Total
tall” and W be “The chosen student is a woman.” Because women
# Women 110 10
120
are shorter than men on average, P(T|W) < P(T). The probability
# Men
60 20
80
that a randomly chosen student is a woman who is at least 6’ tall is
Total
170 30
200
then given by:
120 10
𝑃(𝑊 𝐴𝑁𝐷 𝑇) = 𝑃(𝑊) ∙ 𝑃(𝑇|𝑊) = 200 ∙ 120 = 0.05.
Moreover, two events A and B can be considered in either order, so 𝑃(𝐴 𝐴𝑁𝐷 𝐵) =
𝑃(𝐵 𝐴𝑁𝐷 𝐴). Therefore, 𝑃(𝐴) ∙ 𝑃(𝐵|𝐴) = 𝑃(𝐵) ∙ 𝑃(𝐴|𝐵) , which can be rewritten as
𝑃(𝐴|𝐵) =
𝑃(𝐴) ∙ 𝑃(𝐵|𝐴)
. (Bayes’ Theorem)
𝑃(𝐵)
For example, if a particular student in the biology class is known to be at least 6’ tall, the
likelihood of that student’s being a woman is given by:
𝑃(𝑊|𝑇) =
𝑃(𝑊) ∙ 𝑃(𝑇|𝑊)
(120/200) ∙ (10/120)
=
= 0.33.
𝑃(𝑇)
30/200
Page 1 of 2
10 Equations in Biology Series
Seminar #11: Hidden Markov Models
Nov. 28, 2012
3. Probability vs. Likelihood
The words probability and likelihood may seem synonymous, but there is actually a key
distinction between them: probability refers to potential outcomes of a future event,
whereas likelihood applies to a present state. For example, there is a 50% probability that
flipping a fair coin will yield heads. Once the coin has been flipped and the result noted, the
flip would be described as having had a 50% likelihood of yielding that result. Similarly,
the previous page gave the example of randomly choosing a student from a biology class.
Once a student has been chosen, the experiment no longer involves any degree of
randomness: the student is either male or female, and either short or tall. Therefore, as
soon as any definite information is available about the particular student, any subsequent
inferences about that student are statements about likelihood.
Hidden Markov Models are typically applied to cases in which researchers are attempting
to infer the specific process that produced a given set of observations (e.g., the position of
introns and exons within a DNA sequence). Each possible process usually has a known
probability of yielding a particular observation. The researchers then use these
probabilities to calculate the likelihood that a specific set of observations (e.g., a known
DNA sequence) resulted from a particular process.
Logarithms
Many likelihood models, including HMMs, employ logarithms as a convenient way of comparing
very small numbers. When combining large numbers of probabilities via the AND rule (previous
page), the additive property of logarithms is also very useful:
log(𝑎𝑏) = log(𝑎) + log(𝑏).
Logarithms are the inverse function of exponentiation: if 10𝑎 = 𝑏, then log10 (𝑏) = 𝑎. However,
although biologists may be most familiar with base-10 logarithms, any base can be used. In
mathematics, the most common base is e (2.718…). Logarithms can be converted from one
base to another using the formula
log 𝑏 (𝑥) = log 𝑎 (𝑥)/log 𝑎 (𝑏).
Unfortunately, authors do not always specify which logarithmic base they are using. When in
doubt, it is often best to redo an author’s calculations using several different bases to determine
which one he or she is using.
Page 2 of 2