Combinatorial Mathematics and Discrete Probability

Combinatorial Mathematics and Discrete Probability
Shuang (Sean) Luan
Department of Computer Science
University of New Mexico
Albuquerque, NM 87131
E-mail: [email protected]
1
Discrete Probability
Definition 1 (Probabilistic Experiment) There are experiments that do not yield the same results when performed repeatedly. Such experiments are called probabilistic experiments. In contrast a
deterministic experiment always produces the same outcome.
Definition 2 The sample space S of a probabilistic experiment is the set of all of its outcomes.
Example 1 The sample space for tossing a 6-sided die is {1, 2, 3, 4, 5, 6}.
Consider the experiment of tossing a 6-sided die. Suppose the outcome is 2. Does the event that
the outcome is even happen? The answer is yes. Now assume that the toss produces an outcome 4,
does the event that the outcome is even happen? The answer is yes. Actually observe that when the
outcome is from the set {2, 4, 6}, the event happens. Thus an event in a probabilistic experiment is
actually a subset of the outcomes.
Definition 3 A probability space for a probabilistic experiment consists of a triple (S, E, P ):
• S is the sample space that includes all the possible outcomes of the experiment.
• E is a collection of events, i.e., E ⊂ 2S .
• P : E → [0, 1] is the probability function that assigns a probability between 0 and 1 to all the
events, where a probability of 1 means the event will always happen and a probability of 0 means
the event will never happen.
There are some special events of interests, and we shall define them here.
The sample space S itself is an event. Since all the outcomes belong to the sample space, no matter
what outcome happens, the event S always happens. Thus the probability of the event S happening
is 1. We call the event S a certain event. The empty set φ is also an event. Since φ doesn’t contain
any outcomes, it will never happen. Thus the probability of φ happening is 0. We call the event φ an
impossible event.
Two events E and F are said to be mutually exclusive or disjoint if E ∩ F = φ. A set of events
E1 , E2 , ..., En are said to be mutually exclusive if every pair of them are mutually exclusive.
Axiom 1 (Axiom of Probability) Let (S, E, P ) be a probability space. Then:
• 0 ≤ P (E) ≤ 1 for any event E ∈ E.
1
• P (S) = 1 and P (φ) = 0.
• P (E ∪ F ) = P (E) + P (F ) for E ∩ F = φ.
• E is closed under intersection, union, and complement. In other words, for E, F ∈ E, E ∩ F, E ∪
F, E ∈ E.
2
Probability and Counting
The definition of probability space doesn’t tell us how to calculate the probability function. However in
order to use the probability space, we must have the probability function. Surprisingly, with Axiom 1
and the counting principles we have developed, we already have all the tools to construct a probability
function.
Definition 4 A probability space (S, E, P } is said to be discrete if its sample space S is countable,
i.e., S = {s1 , s2 , ..., }.
Definition 5 Let (S, E, P ) be a discrete probability space. The event {sj } that consists of a single
element sj ∈ S is called an elementary event. The probability of an elementary event is called an
elementary probability.
Consider an event E ∈ E for a discrete probability space. Observe that E can be viewed as the union
of |E| elementary events. Using Axiom 1, we can calculate P (E) by summing up all the elementary
probabilities of the contributing elementary events. Thus, if we know the elementary probability, then
we can calculate the probability of any event.
So how does one figure out the elementary probability. If you are given a coin, how do you know
if the coin is fair, i.e., the odds of producing a head or a tail is the same? We would toss the coin a
million times and count how many heads and how many tails and calculate the ratio of the number of
head count over the total number of tosses as we toss. If the coin is fair, one would expect that as the
experiment goes on, this ratio should get closer and closer to 0.5. The point is that if the experiment
is governed by some probability function, then when the experiment is repeated a large number of
times, one should be able to extract the elementary probabilities.
There is a special situation however that one doesn’t need to resort to repeated experiments to
obtain the elementary probabilities.
Definition 6 A sample space S is said to consists of equally likely outcomes if all the outcomes are
equally likely to occur.
Consider a discrete probability space (S, E, P ). Let S be s sample space consisting of equally likely
outcomes. Let S = {s1 , s2 , ..., sn }. (Note that a sample space with equally likely outcomes must be
finite. Why?) Since the outcomes are equally likely, we know that P ({s1 }) = P ({s2 } = ... = P ({sn }.
On the other hand, from Axiom 1, we know that P (S) = 1. Thus P (S) = P ({s1 }) + P ({s2 } + ... +
P ({sn } = 1. Therefore P ({s1 }) = P ({s2 } = ... = P ({sn } = n1 . Hence if the sample space is equally
likely, we can immediately determine each individual elementary probability.
Now what is the probability of an event E from a sample space with equally likely outcomes?
Observe that the probability P (E) of E is simply the cardinality |E| of E multiplied by the elementary
probability. In other words, all we need to do is to count how many elements are there in E, which is
a counting problem.
2
Definition 7 Combinatorial mathematics is the branch of mathematics that studies finite collection
of objects that satisfies specified criteria. By “studying” we mean counting. By “specified criteria”,
we are referring to permutation, combination, etc.
3
Basic Counting Principles
In this section, we introduce some basic counting principles, which will be used as building blocks for
more difficult counting problems.
Definition 8 (The Product Rule) Suppose two tasks T1 and T2 are to be performed in sequence
for some procedure. If T1 can be performed in n1 ways and T2 can be performed in n2 ways. Then the
sequence of these two tasks can be performed in n1 · n2 ways.
Example 2 Let A, B be two finite sets. How many elements are in A × B?
Solution: The elements of A × B are ordered pairs, whose first element is from A and whose second
element is from B. Constructing an ordered pair from A × B can be viewed as accomplishing two tasks
T1 and T2 in sequence, where T1 fills in the first element of the ordered pair from A and T2 fills in the
second element of the ordered pair from B. The number of ways to accomplish T1 is the number of
elements in A, i.e.,|A|, because every element in A can be chosen. Similarly, the number of ways to
accomplish T2 is |B|. Thus the number of elements in A × B is the number of ways to accomplish T1
and T2 in sequence, which is |A| · |B|.
2
The product rule can also be generalized to k tasks T1 , T2 , ..., Tk to be performed in sequence, where
the task Tj can be performed in nj ways. In this situation, the total number of ways to perform the
k tasks in sequence is n1 · n2 · ... · nk .
Example 3 How many different SSN numbers are there?
Solution: The SSN number has 9 digits. Constructing an SSN number can be viewed as accomplishing
9 tasks T1 , ..., T9 in sequence, where task Tj is to fill in the j−th digit. Observe that each task can be
performed in 10 ways, because each digit has 10 choices from {0, 1, ..., 9}. Assuming that there is no
other restrictions for SSN numbers, then the total possible number of SSNs is the number of ways to
accomplish the 9 tasks in sequence, which according to the product rule is 109 .
2
Definition 9 (The Sum Rule) Suppose a task can be done either in one of the n1 ways or in one
of the n2 ways, where none of the set of n1 ways is the same as any of the set of the n2 ways, then
there are n1 + n2 ways to do the task.
Example 4 Suppose that you have decided to get a new laptop computer and have narrowed your
choices to either an Apple or a Dell. As you may know that, Apple offers 2 choices for laptop computers:
MacBookPro and MacBook Air, while Dell offers 3 choices: Inspiron, Latitude, and Precision. Thus
the total number of choices you have is the two choices for choosing an Apple plus the three choices
for choosing a Dell, which is 5.
The sum rule comes from counting the number of elements in the union of two sets. Let A, B be
two sets. Then if A ∩ B = φ, |A ∪ B| = |A| + |B|. For the more general situation, when A ∩ B 6= φ,
we have the following inclusion-exclusion principle.
3
Definition 10 (Inclusion-Exclusion Principle) Let A and B be two finite sets. Then |A ∪ B| =
|A| + |B| − |A ∩ B|.
Example 5 How may bit strings of length eight that either start with a 1 or end with 00?
Solution: Let A be the set of 8-bit strings that start with a 1, and B be the set of 8-bit strings
that end with 00. Then we the problem is asking for |A ∪ B|. Using the inclusion-exclusion principle,
we have |A ∪ B| = |A| + |B| − |A ∩ B|. Using the product rule, we know |A| = 27 , |B| = 26 , and
|A ∩ B| = 25 , thus |A ∪ B| = 27 + 26 − 25 .
2
Definition 11 (Pigeon Hole Principle) When n pigeons are assigned to n − 1 pigeon holes, then
there exists at least one pigeon hole that contains more than 1 pigeon.
Example 6 In a group of 13 people, there are at least two people born in the same month.
Theorem 1 When n pigeons are assigned to m pigeon holes, n > m, then one of the pigeon holes
n
must contain d m
e = b n−1
m c + 1 pigeons.
Example 7 Thirty students enrolled in discrete math. How many students are guaranteed to be born
in the same month?
Solution: View the 30 students as 30 pigeons, and the 12 months as 12 pigeons. Applying the general
pigeon hole principle, the number of students who are guaranteed to be born in the same month is
2
d 30
12 e = 3.
Example 8 Show that if 5 numbers are chosen from 1 to 8, then two of them will add to 9.
Solution: We shall group the numbers 1 to 8 into 4 groups: {1, 8}, {2, 7}, {3, 6}, and {4, 5}. Observe
that the sum of each group is 9. Now if we pick 5 numbers from 1 to 8, with only 4 groups, two
numbers will have to be taken from the same group, which will add to 9.
2
4
Permutation and Combination
Definition 12 (Permutation) A permutation of a finite set is an ordered arrangement of the elements in the set where each element appears once and only once.
Definition 13 (r-Permutation) An r−permutation of a finite set of n elements is an ordered arrangement of r distinct elements from the set. The number of r−permutations is denoted by P (n, r)
or nP r.
So what is nP r?
Observation 1 An r−permutation is an ordered arrangement of r distinct elements. Thus constructing a permutation can be viewed as accomplishing r tasks T1 , ..., Tr in sequence, where the task Tj will
fill in the jth element. The task T1 can be performed in n ways because any element can be chosen
to fill in the first position in the r−permutation. The task T2 can be performed in n − 1 ways. This
is because the elements in an r−permutation must be distinct, and one element has already been used
to fill in the first position by task T1 , thus we can only choose from the remaining n − 1 elements to
fill in the second position. Similarly, task T3 can be performed in n − 2 ways, ..., and task Tr can be
performed in n − r + 1 ways. The number of r−permutations can be calculated using the product rule,
n!
and we obtain nP r = n · (n − 1) · ... · (n − r + 1) = (n−r)!
.
The number of permutations of a set of n elements is simply nP n = n!.
4
Definition 14 (r-Combination) An r−combination of a finite set of n elements is a subset of r
elements. The number of r−combinations is denoted by C(n, r), nCr, or (nr ).
So what is nCr?
Observation 2 We shall use product rule to calculate nCr. Observe that to obtain an r−permutation
one could first pick a subset of r elements, and then generate a permutation of the r elements chosen.
Thus, constructing an r−permutation can be viewed as accomplishing two tasks T1 and T2 in sequence,
where T1 picks a subset of r elements, and T2 constructs the permutation of the elements chosen by
T1 . The number of ways to perform T1 is nCr. The number of ways to perform T2 is rP r. Thus the
number of r−permutations is the product nCr · rP r, i.e., nP r = nCr · rP r. This immediately implies
n!
r
that nCr = nP
rP r = (n−r)!·r! .
Example 9 Pascal’s Identity: (n+1
) = (nk−1 ) + (nk ).
k
Example 10 A binomial is a polynomial with two terms. For example x + y, x2 + 2xy + y 2 and
(x+y)3 are both examples of a binomial. The algebraic expansion of the powers of the binomial x+y is
called a binomial expansion. For example, the binomial expansion of (x+y)3 is x3 +3x2 y+3xy 2 +y 3 .
Observe that each term of the binomial expansion of (x + y)n is in the form cxk y n−k , where c is called
the Binomial Coefficient, and the exponents k is a nonnegative integer ≥ 0 and ≤ n. The binomial
coefficient for xk y n−k is (nk ).
5
Conditional Probability
Definition 15 The joint probability between two events E and F is the probability that both events
occur and is equal to P (E ∩ F ).
Definition 16 Two events E and F are said to be independent if P (E ∩ F ) = P (E) · P (F ).
Definition 17 Let E and F be two events. The probability that E happens given that F happens is
)
called the conditional probability, which is denoted by p(E|F ). P (E|F ) = P P(E∩F
(F ) .
Theorem 2 (Bayes Theorem) P (E|F ) =
P (F |E)·P (E)
.
P (F |E)·P (E)+P (F |E)·P (E)
P (E∩F )
P (F ) , we have P (E ∩
P (F ∩E)
)
= P P(E∩F
(F ) .
P (F ∩E)+P (F ∩E)
Proof: Observe that from P (E|F ) =
Thus
6
P (F |E)·P (E)
P (F |E)·P (E)+P (F |E)·P (E)
=
F ) = P (E|F ) · P (F ).
2
Random Variables
Consider the probability space derived from the experiment of tossing a fair coin. The sample space S =
{Head, T ail}, the event set E = {φ, {Head}, {T ail}, S}, the elementary probabilities are P ({Head}) =
0.5 and P ({T ail}) = 0.5.
It is very cumbersome to use Head and Tail to represent the outcomes. What if we use a function
that maps Head and Tail to some numbers. For example, let X be a function from the sample space
S to real numbers. We define X as X(Head) = 1 and X(T ail) = 0. In this case, we no longer need
to use Head and Tail, we can use their “surrogates” 1 and 0 to represent them. This is the concept of
a random variable (RV), which allows us to deal with numbers instead of a concrete probabilistic
5
experiment. Thus a random variable is not really a variable, but rather a function whose domain is
the sample space and whose range are real numbers.
Let S be a sample space and X(·) a random variable that maps S to the real numbers R. Let
the image of X denoted by Image(X) be defined as: Image(X) = {x|∃s ∈ S, X(s) = x}. We say
Image(X) is the set of the possible values of the random variable X. A random variable X is discrete
if Image(X) is countable. A random variable X is nonnegative if all elements of Image(X) are
nonnegative.
For a discrete probability space, a random variable X maps the sample space to a set discrete numbers Image(X). What about elementary probabilities of the individual numbers in Image(X)? The
elementary probabilities of the numbers in Image(X) will be the same as the elementary probability
of their corresponding elementary event in S. The elementary probabilities actually defines a function
from Image(X) to [0, 1], which is called the probability mass function and is denoted by PX .
Definition 18 Let X be a random variables with values {X1 , X2 , ..., Xn } and probability mass function
PX . Let Y be a random variables with values {Y1 , Y2 , ..., Ym } and probability mass function PY . The
joint probability distribution for X and Y , denoted by (X, Y ) is a probability distribution that gives the
probability for P (X = Xj , Y = Yk ).
Theorem 3 Let X be a random variables with values {X1 , X2 , ..., Xn } and probability mass function
PX . Let Y be a random variables with values {Y1 , Y2 , ..., Ym } and probability mass function PY . Let
P
(X, Y ) be the joint random variable with probability mass function PX,Y . Then nj=1 P (X = Xj , Y =
P
Yk ) = PY (Y = Yk ) and m
k=1 P (X = Xj , Y = Yk ) = PX (X = Xj ).
7
Sum of Random Variables
Consider tossing two 6-sided fair dice. Let X be the random variable representing the outcome of the
first die, and Y be the random variable representing the outcome of the second die. The sum of the
values of the two dice is also a random variable. If we use Z to denote the sum, then Z = X + Y .
What is the probability distribution of Z?
Observe that the possible values of Z are {X + Y |X ∈ {1, 2, 3, 4, 5, 6}, Y ∈ {1, 2, 3, 4, 5, 6}} =
{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. What is the probability mass function PZ of Z? To obtain PZ , we must
know the probability of each individual value of Z. We shall use the calculation of the probability of
Z = 9 to illustrate the construction of PZ . Observe that in order for Z to be 7, one of the following
situations must occur: (1) X = 3 and Y = 6, (2) X = 4 and Y = 5, (3) X = 5, Y = 4, and (4) X = 6
1
and Y = 3. Since both dice are fair, the odds of getting any of these 4 situations is 36
. Thus the
4
probably P (Z = 9) = 36 .
More generally, let X and Y be two random variables, where X assumes the values {X1 , X2 , ..., Xn }
with a probability mass function PX and Y assumes the values {Y1 , Y2 , ..., Ym } with a probability mass
P
function PY . Let Z be the random variable such that Z = X+Y , then P (Z = Zi ) = Xj +Yk =Zi P (X =
Xj , Y = Yk ).
8
Expectation and Variance
Definition 19 (Expectation) Let X be a discrete random variable with values {X1 , X2 , ..., Xn } and
P
a probability mass function P . The expectation of X is defined as E[X] = nk=1 P (Xk ) · Xk .
Theorem 4 Let X and Y be discrete random variables. Then E[X + Y ] = E[X] + E[Y ].
6
Proof: Let X be a random variables with values {X1 , X2 , ..., Xn } and probability mass function
PX . Let Y be a random variables with values {Y1 , Y2 , ..., Ym } and probability mass function PY . Let
(X, Y ) be the joint random variables with probability mass function PX,Y . Let Z = X + Y with values
{Z1 , Z2 , ..., Zl } and probability mass function PZ .
E[X + Y ] = E[Z] =
l
X
Zi · P (Z = Zi )
i=1
=
l
X
Zi
i=1
=
n X
m
X
X
PX,Y (X = Xj , Y = Yk )
Xj +Yk =Zi
PX,Y (X = Xj , Y = Yk )(Xj + Yk )
j=1 k=1
=
=
n X
m
X
PX,Y (X = Xj , Y = Yk )Xj +
n X
m
X
j=1 k=1
j=1 k=1
n
X
m
X
Xj
j=1
m
X
PX,Y (X = Xj , Y = Yk ) +
k=1
Yk
n
X
Xj PX (X = Xj ) +
j=1
m
X
n
X
PX,Y (X = Xj , Y = Yk )
j=1
k=1
=
PX,Y (X = Xj , Y = Yk )Yk
Yk PY (Y = Yk )
k=1
= E[X] + E[Y ]
2
Theorem 5 (Markov Inequality) Let X be a discrete nonnegative random variable, and δ > 0.
Then P (X ≥ δ) ≤ E[x]
δ .
Proof: Let the values of X be 0 ≤ X1 < X2 < ... < Xj−1 ≤ δ ≤ Xj < ... < Xn . Then
E[X] = P (X1 ) · X1 + P (X2 ) · X2 + ... + P (Xn ) · Xn
≥ P (Xj ) · Xj + P (Xj+1 ) · Xj+1 + ... + P (Xn ) · Xn
≥ P (Xj ) · δ + P (Xj+1 ) · δ + ... + P (Xn ) · δ
≥ (P (Xj ) + P (Xj+1 ) + ... + P (Xn )) · δ
= P (X ≥ δ) · δ
Thus P (X ≥ δ) ≤
E[x]
δ .
2
Definition 20 (Variance) Let X be a discrete random variable with values X1 , X2 , ..., Xn . The
P
variance of X is defined as: V ar(X) = nk=1 P (Xj ) · (X − E[X])2 . The standard deviation of X is
p
defined as σ(X) = V ar(X).
The expectation measures the average of a random variable, while the standard deviation measures
its spread.
Theorem 6 V ar(X) = E[X 2 ] − (E[X])2 .
7
Proof: V ar(X) = E[(X − E[X])2 ] = E[X 2 − 2XE[X] + (E[X])2 ] = E[X 2 ] − E[2XE[X]] + (E[X])2 =
E[X 2 ] − 2(E[X])2 + (E[X])2 = E[X 2 ] − (E[X])2 .
2
Theorem 7 (Chebyshev’s Inequality) P (|X − E[X]| ≥ δ) ≤
V ar(X)
.
δ2
Proof: Consider the random variable Z = (X − E[X])2 . Observe that E[Z] = V ar(X). Thus
P (|X − E[X]| ≥ δ) = P (Z ≥ δ 2 ) ≤ E[Z]
= V ar(X)
. The inequality is from Markov’s Inequality.
2
δ2
δ2
Theorem 8 Let X1 , X2 , ..., Xn be a set of independent random variables. Then V ar(X1 + X2 + ... +
Xn ) = V ar(X1 ) + V ar(X2 ) + ... + V ar(Xn ).
9
Some Important Probability Distributions
Definition 21 The Bernoulli trial is a probabilistic experiment with only two outcomes. Since there
are only two outcomes, we usually refer to one of the outcome as “success” and the other as “failure”.
Usually we use a binary random variable X (random variable with value either 1 for success and 0 for
failure) to represent a Bernoulli trial, where P (X = 1) = p and P (X = 0) = 1 − p.
Definition 22 (Binomial Distribution) Consider a Bernoulli trial with a probability of success
equal to p and probability of failure equal to 1 − p. Let X be the random variable representing the
number of success in a sequence of n independent Bernoulli trials. Then P (X = k) = (nk )pk (1 − p)n−k ,
and is denoted by B(n, p). The probability mass function of X is called a binomial distribution.
Theorem 9 E[B(n, p)] = np and V ar(B(n, p)) = np(1 − p).
Proof: Let Xk be the random variable associated with the k−th Bernoulli trial, and P (Xk = 1) = p
and P (Xk = 1) = 1 − p. Then E[Xk ] = p · 1 + (1 − p) · 0 = p. V ar(Xk ) = p · (1 − p)2 + (1 − p) · (0 − p)2 =
p(1 − p).
Now observe that X = X1 + X2 + ... + Xn . Using linearity of expectation, we have E[X] = E[X1 ] +
E[X2 ] + ... + E[Xn ] = np. From Theorem 8, we have V ar[X] = V ar[X1 ] + V ar[X2 ] + ... + var[Xn ] =
np(1 − p).
2
Definition 23 (Geometric Distribution) Consider a Bernoulli trial with a probability of success
equal to p and probability of failure equal to 1−p. Let X be the random variable representing the number
of trials until the first success occurs. Then P (X = k) = (1 − p)k−1 p. The probability distribution of
X is called a geometric distribution.
Theorem 10 Let X be a random variable subject to a geometric distribution. The E[X] =
Proof: [E[X] = (1 − p)0 p · 1 + (1 − p)1 p · 2 + (1 − p)3 p · 3 + ... Let Sn =
have E[X] = limn→ Sn .
Observe that
Pn
j=1 (1
1
p
− p)j−1 p, then we
Sn = (1 − p)0 p · 1 + (1 − p)1 p · 2 + (1 − p)3 p · 3 + ... + (1 − p)n−1 p · n
(1 − p)Sn = (1 − p)1 p · 1 + (1 − p)2 p · 2 + (1 − p)3 p · 3 + ... + (1 − p)n p · n
Sn − (1 − p)Sn = (1 − p)0 p + (1 − p)1 p + (1 − p)3 p + ... + (1 − p)n−1 p − (1 − p)n p · n
8
pSn = ((1 − p)0 + (1 − p)1 + (1 − p)3 + ... + (1 − p)n−1 ) · p − (1 − p)n · n
pSn =
(1 − p)n − 1
· p − (1 − p)n · n
(1 − p) − 1
pSn = 1 − (1 − p)n − (1 − p)n · n
pSn = 1 − (1 − p)n · (n + 1)
Sn =
1 (1 − p)n · (n + 1)
−
p
p
Thus
E[X] = limn→∞
1 (1 − p)n · (n + 1)
−
p
p
=
1
p
E[X 2 ] = (1 − p)0 p · 12 + (1 − p)1 p · 22 + (1 − p)3 p · 32 + ... Let Tn = (1 − p)0 p · 12 + (1 − p)1 p · 22 +
(1 − p)3 p · 32 + ... + (1 − p)n−1 p · n2 , then limn→∞ Tn = E[X 2 ].
2
Definition 24 (Poisson Distribution) A discrete random variable X is said to subject to Poisson
k e−λ
distribution if P (X = k) = λ k!
for fixed λ > 0 and k = 0, 1, ...
P
P
λk e−λ
λk
−λ eλ = 1. Thus Poisson distribution is well defined.
Observe that ∞
= e−λ ∞
k=0
k=0 k! = e
k!
The Poisson distribution describes the probability of a given number of events occurring in a fixed
interval of time if these events occur with a known average rate, and independent of the time since
last event.
Theorem 11 Let X be a Poisson random variable. Then E[X] = λ.
Proof:
k −λ k −λ P
P∞ λk−1 e−λ P∞ λk e−λ P
∞
λ e
λ e
·
k
=
=
λ
=
λ
= λ.
E[X] = ∞
k=0 (k−1)!
k=0
k=0
k=0
k!
k!
(k−1)!
9
2

Download Report

Combinatorial Mathematics and Discrete Probability

Paperzz.com

Your Paperzz