Combinatorial Mathematics and Discrete Probability Shuang (Sean) Luan Department of Computer Science University of New Mexico Albuquerque, NM 87131 E-mail: [email protected] 1 Discrete Probability Definition 1 (Probabilistic Experiment) There are experiments that do not yield the same results when performed repeatedly. Such experiments are called probabilistic experiments. In contrast a deterministic experiment always produces the same outcome. Definition 2 The sample space S of a probabilistic experiment is the set of all of its outcomes. Example 1 The sample space for tossing a 6-sided die is {1, 2, 3, 4, 5, 6}. Consider the experiment of tossing a 6-sided die. Suppose the outcome is 2. Does the event that the outcome is even happen? The answer is yes. Now assume that the toss produces an outcome 4, does the event that the outcome is even happen? The answer is yes. Actually observe that when the outcome is from the set {2, 4, 6}, the event happens. Thus an event in a probabilistic experiment is actually a subset of the outcomes. Definition 3 A probability space for a probabilistic experiment consists of a triple (S, E, P ): • S is the sample space that includes all the possible outcomes of the experiment. • E is a collection of events, i.e., E ⊂ 2S . • P : E → [0, 1] is the probability function that assigns a probability between 0 and 1 to all the events, where a probability of 1 means the event will always happen and a probability of 0 means the event will never happen. There are some special events of interests, and we shall define them here. The sample space S itself is an event. Since all the outcomes belong to the sample space, no matter what outcome happens, the event S always happens. Thus the probability of the event S happening is 1. We call the event S a certain event. The empty set φ is also an event. Since φ doesn’t contain any outcomes, it will never happen. Thus the probability of φ happening is 0. We call the event φ an impossible event. Two events E and F are said to be mutually exclusive or disjoint if E ∩ F = φ. A set of events E1 , E2 , ..., En are said to be mutually exclusive if every pair of them are mutually exclusive. Axiom 1 (Axiom of Probability) Let (S, E, P ) be a probability space. Then: • 0 ≤ P (E) ≤ 1 for any event E ∈ E. 1 • P (S) = 1 and P (φ) = 0. • P (E ∪ F ) = P (E) + P (F ) for E ∩ F = φ. • E is closed under intersection, union, and complement. In other words, for E, F ∈ E, E ∩ F, E ∪ F, E ∈ E. 2 Probability and Counting The definition of probability space doesn’t tell us how to calculate the probability function. However in order to use the probability space, we must have the probability function. Surprisingly, with Axiom 1 and the counting principles we have developed, we already have all the tools to construct a probability function. Definition 4 A probability space (S, E, P } is said to be discrete if its sample space S is countable, i.e., S = {s1 , s2 , ..., }. Definition 5 Let (S, E, P ) be a discrete probability space. The event {sj } that consists of a single element sj ∈ S is called an elementary event. The probability of an elementary event is called an elementary probability. Consider an event E ∈ E for a discrete probability space. Observe that E can be viewed as the union of |E| elementary events. Using Axiom 1, we can calculate P (E) by summing up all the elementary probabilities of the contributing elementary events. Thus, if we know the elementary probability, then we can calculate the probability of any event. So how does one figure out the elementary probability. If you are given a coin, how do you know if the coin is fair, i.e., the odds of producing a head or a tail is the same? We would toss the coin a million times and count how many heads and how many tails and calculate the ratio of the number of head count over the total number of tosses as we toss. If the coin is fair, one would expect that as the experiment goes on, this ratio should get closer and closer to 0.5. The point is that if the experiment is governed by some probability function, then when the experiment is repeated a large number of times, one should be able to extract the elementary probabilities. There is a special situation however that one doesn’t need to resort to repeated experiments to obtain the elementary probabilities. Definition 6 A sample space S is said to consists of equally likely outcomes if all the outcomes are equally likely to occur. Consider a discrete probability space (S, E, P ). Let S be s sample space consisting of equally likely outcomes. Let S = {s1 , s2 , ..., sn }. (Note that a sample space with equally likely outcomes must be finite. Why?) Since the outcomes are equally likely, we know that P ({s1 }) = P ({s2 } = ... = P ({sn }. On the other hand, from Axiom 1, we know that P (S) = 1. Thus P (S) = P ({s1 }) + P ({s2 } + ... + P ({sn } = 1. Therefore P ({s1 }) = P ({s2 } = ... = P ({sn } = n1 . Hence if the sample space is equally likely, we can immediately determine each individual elementary probability. Now what is the probability of an event E from a sample space with equally likely outcomes? Observe that the probability P (E) of E is simply the cardinality |E| of E multiplied by the elementary probability. In other words, all we need to do is to count how many elements are there in E, which is a counting problem. 2 Definition 7 Combinatorial mathematics is the branch of mathematics that studies finite collection of objects that satisfies specified criteria. By “studying” we mean counting. By “specified criteria”, we are referring to permutation, combination, etc. 3 Basic Counting Principles In this section, we introduce some basic counting principles, which will be used as building blocks for more difficult counting problems. Definition 8 (The Product Rule) Suppose two tasks T1 and T2 are to be performed in sequence for some procedure. If T1 can be performed in n1 ways and T2 can be performed in n2 ways. Then the sequence of these two tasks can be performed in n1 · n2 ways. Example 2 Let A, B be two finite sets. How many elements are in A × B? Solution: The elements of A × B are ordered pairs, whose first element is from A and whose second element is from B. Constructing an ordered pair from A × B can be viewed as accomplishing two tasks T1 and T2 in sequence, where T1 fills in the first element of the ordered pair from A and T2 fills in the second element of the ordered pair from B. The number of ways to accomplish T1 is the number of elements in A, i.e.,|A|, because every element in A can be chosen. Similarly, the number of ways to accomplish T2 is |B|. Thus the number of elements in A × B is the number of ways to accomplish T1 and T2 in sequence, which is |A| · |B|. 2 The product rule can also be generalized to k tasks T1 , T2 , ..., Tk to be performed in sequence, where the task Tj can be performed in nj ways. In this situation, the total number of ways to perform the k tasks in sequence is n1 · n2 · ... · nk . Example 3 How many different SSN numbers are there? Solution: The SSN number has 9 digits. Constructing an SSN number can be viewed as accomplishing 9 tasks T1 , ..., T9 in sequence, where task Tj is to fill in the j−th digit. Observe that each task can be performed in 10 ways, because each digit has 10 choices from {0, 1, ..., 9}. Assuming that there is no other restrictions for SSN numbers, then the total possible number of SSNs is the number of ways to accomplish the 9 tasks in sequence, which according to the product rule is 109 . 2 Definition 9 (The Sum Rule) Suppose a task can be done either in one of the n1 ways or in one of the n2 ways, where none of the set of n1 ways is the same as any of the set of the n2 ways, then there are n1 + n2 ways to do the task. Example 4 Suppose that you have decided to get a new laptop computer and have narrowed your choices to either an Apple or a Dell. As you may know that, Apple offers 2 choices for laptop computers: MacBookPro and MacBook Air, while Dell offers 3 choices: Inspiron, Latitude, and Precision. Thus the total number of choices you have is the two choices for choosing an Apple plus the three choices for choosing a Dell, which is 5. The sum rule comes from counting the number of elements in the union of two sets. Let A, B be two sets. Then if A ∩ B = φ, |A ∪ B| = |A| + |B|. For the more general situation, when A ∩ B 6= φ, we have the following inclusion-exclusion principle. 3 Definition 10 (Inclusion-Exclusion Principle) Let A and B be two finite sets. Then |A ∪ B| = |A| + |B| − |A ∩ B|. Example 5 How may bit strings of length eight that either start with a 1 or end with 00? Solution: Let A be the set of 8-bit strings that start with a 1, and B be the set of 8-bit strings that end with 00. Then we the problem is asking for |A ∪ B|. Using the inclusion-exclusion principle, we have |A ∪ B| = |A| + |B| − |A ∩ B|. Using the product rule, we know |A| = 27 , |B| = 26 , and |A ∩ B| = 25 , thus |A ∪ B| = 27 + 26 − 25 . 2 Definition 11 (Pigeon Hole Principle) When n pigeons are assigned to n − 1 pigeon holes, then there exists at least one pigeon hole that contains more than 1 pigeon. Example 6 In a group of 13 people, there are at least two people born in the same month. Theorem 1 When n pigeons are assigned to m pigeon holes, n > m, then one of the pigeon holes n must contain d m e = b n−1 m c + 1 pigeons. Example 7 Thirty students enrolled in discrete math. How many students are guaranteed to be born in the same month? Solution: View the 30 students as 30 pigeons, and the 12 months as 12 pigeons. Applying the general pigeon hole principle, the number of students who are guaranteed to be born in the same month is 2 d 30 12 e = 3. Example 8 Show that if 5 numbers are chosen from 1 to 8, then two of them will add to 9. Solution: We shall group the numbers 1 to 8 into 4 groups: {1, 8}, {2, 7}, {3, 6}, and {4, 5}. Observe that the sum of each group is 9. Now if we pick 5 numbers from 1 to 8, with only 4 groups, two numbers will have to be taken from the same group, which will add to 9. 2 4 Permutation and Combination Definition 12 (Permutation) A permutation of a finite set is an ordered arrangement of the elements in the set where each element appears once and only once. Definition 13 (r-Permutation) An r−permutation of a finite set of n elements is an ordered arrangement of r distinct elements from the set. The number of r−permutations is denoted by P (n, r) or nP r. So what is nP r? Observation 1 An r−permutation is an ordered arrangement of r distinct elements. Thus constructing a permutation can be viewed as accomplishing r tasks T1 , ..., Tr in sequence, where the task Tj will fill in the jth element. The task T1 can be performed in n ways because any element can be chosen to fill in the first position in the r−permutation. The task T2 can be performed in n − 1 ways. This is because the elements in an r−permutation must be distinct, and one element has already been used to fill in the first position by task T1 , thus we can only choose from the remaining n − 1 elements to fill in the second position. Similarly, task T3 can be performed in n − 2 ways, ..., and task Tr can be performed in n − r + 1 ways. The number of r−permutations can be calculated using the product rule, n! and we obtain nP r = n · (n − 1) · ... · (n − r + 1) = (n−r)! . The number of permutations of a set of n elements is simply nP n = n!. 4 Definition 14 (r-Combination) An r−combination of a finite set of n elements is a subset of r elements. The number of r−combinations is denoted by C(n, r), nCr, or (nr ). So what is nCr? Observation 2 We shall use product rule to calculate nCr. Observe that to obtain an r−permutation one could first pick a subset of r elements, and then generate a permutation of the r elements chosen. Thus, constructing an r−permutation can be viewed as accomplishing two tasks T1 and T2 in sequence, where T1 picks a subset of r elements, and T2 constructs the permutation of the elements chosen by T1 . The number of ways to perform T1 is nCr. The number of ways to perform T2 is rP r. Thus the number of r−permutations is the product nCr · rP r, i.e., nP r = nCr · rP r. This immediately implies n! r that nCr = nP rP r = (n−r)!·r! . Example 9 Pascal’s Identity: (n+1 ) = (nk−1 ) + (nk ). k Example 10 A binomial is a polynomial with two terms. For example x + y, x2 + 2xy + y 2 and (x+y)3 are both examples of a binomial. The algebraic expansion of the powers of the binomial x+y is called a binomial expansion. For example, the binomial expansion of (x+y)3 is x3 +3x2 y+3xy 2 +y 3 . Observe that each term of the binomial expansion of (x + y)n is in the form cxk y n−k , where c is called the Binomial Coefficient, and the exponents k is a nonnegative integer ≥ 0 and ≤ n. The binomial coefficient for xk y n−k is (nk ). 5 Conditional Probability Definition 15 The joint probability between two events E and F is the probability that both events occur and is equal to P (E ∩ F ). Definition 16 Two events E and F are said to be independent if P (E ∩ F ) = P (E) · P (F ). Definition 17 Let E and F be two events. The probability that E happens given that F happens is ) called the conditional probability, which is denoted by p(E|F ). P (E|F ) = P P(E∩F (F ) . Theorem 2 (Bayes Theorem) P (E|F ) = P (F |E)·P (E) . P (F |E)·P (E)+P (F |E)·P (E) P (E∩F ) P (F ) , we have P (E ∩ P (F ∩E) ) = P P(E∩F (F ) . P (F ∩E)+P (F ∩E) Proof: Observe that from P (E|F ) = Thus 6 P (F |E)·P (E) P (F |E)·P (E)+P (F |E)·P (E) = F ) = P (E|F ) · P (F ). 2 Random Variables Consider the probability space derived from the experiment of tossing a fair coin. The sample space S = {Head, T ail}, the event set E = {φ, {Head}, {T ail}, S}, the elementary probabilities are P ({Head}) = 0.5 and P ({T ail}) = 0.5. It is very cumbersome to use Head and Tail to represent the outcomes. What if we use a function that maps Head and Tail to some numbers. For example, let X be a function from the sample space S to real numbers. We define X as X(Head) = 1 and X(T ail) = 0. In this case, we no longer need to use Head and Tail, we can use their “surrogates” 1 and 0 to represent them. This is the concept of a random variable (RV), which allows us to deal with numbers instead of a concrete probabilistic 5 experiment. Thus a random variable is not really a variable, but rather a function whose domain is the sample space and whose range are real numbers. Let S be a sample space and X(·) a random variable that maps S to the real numbers R. Let the image of X denoted by Image(X) be defined as: Image(X) = {x|∃s ∈ S, X(s) = x}. We say Image(X) is the set of the possible values of the random variable X. A random variable X is discrete if Image(X) is countable. A random variable X is nonnegative if all elements of Image(X) are nonnegative. For a discrete probability space, a random variable X maps the sample space to a set discrete numbers Image(X). What about elementary probabilities of the individual numbers in Image(X)? The elementary probabilities of the numbers in Image(X) will be the same as the elementary probability of their corresponding elementary event in S. The elementary probabilities actually defines a function from Image(X) to [0, 1], which is called the probability mass function and is denoted by PX . Definition 18 Let X be a random variables with values {X1 , X2 , ..., Xn } and probability mass function PX . Let Y be a random variables with values {Y1 , Y2 , ..., Ym } and probability mass function PY . The joint probability distribution for X and Y , denoted by (X, Y ) is a probability distribution that gives the probability for P (X = Xj , Y = Yk ). Theorem 3 Let X be a random variables with values {X1 , X2 , ..., Xn } and probability mass function PX . Let Y be a random variables with values {Y1 , Y2 , ..., Ym } and probability mass function PY . Let P (X, Y ) be the joint random variable with probability mass function PX,Y . Then nj=1 P (X = Xj , Y = P Yk ) = PY (Y = Yk ) and m k=1 P (X = Xj , Y = Yk ) = PX (X = Xj ). 7 Sum of Random Variables Consider tossing two 6-sided fair dice. Let X be the random variable representing the outcome of the first die, and Y be the random variable representing the outcome of the second die. The sum of the values of the two dice is also a random variable. If we use Z to denote the sum, then Z = X + Y . What is the probability distribution of Z? Observe that the possible values of Z are {X + Y |X ∈ {1, 2, 3, 4, 5, 6}, Y ∈ {1, 2, 3, 4, 5, 6}} = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. What is the probability mass function PZ of Z? To obtain PZ , we must know the probability of each individual value of Z. We shall use the calculation of the probability of Z = 9 to illustrate the construction of PZ . Observe that in order for Z to be 7, one of the following situations must occur: (1) X = 3 and Y = 6, (2) X = 4 and Y = 5, (3) X = 5, Y = 4, and (4) X = 6 1 and Y = 3. Since both dice are fair, the odds of getting any of these 4 situations is 36 . Thus the 4 probably P (Z = 9) = 36 . More generally, let X and Y be two random variables, where X assumes the values {X1 , X2 , ..., Xn } with a probability mass function PX and Y assumes the values {Y1 , Y2 , ..., Ym } with a probability mass P function PY . Let Z be the random variable such that Z = X+Y , then P (Z = Zi ) = Xj +Yk =Zi P (X = Xj , Y = Yk ). 8 Expectation and Variance Definition 19 (Expectation) Let X be a discrete random variable with values {X1 , X2 , ..., Xn } and P a probability mass function P . The expectation of X is defined as E[X] = nk=1 P (Xk ) · Xk . Theorem 4 Let X and Y be discrete random variables. Then E[X + Y ] = E[X] + E[Y ]. 6 Proof: Let X be a random variables with values {X1 , X2 , ..., Xn } and probability mass function PX . Let Y be a random variables with values {Y1 , Y2 , ..., Ym } and probability mass function PY . Let (X, Y ) be the joint random variables with probability mass function PX,Y . Let Z = X + Y with values {Z1 , Z2 , ..., Zl } and probability mass function PZ . E[X + Y ] = E[Z] = l X Zi · P (Z = Zi ) i=1 = l X Zi i=1 = n X m X X PX,Y (X = Xj , Y = Yk ) Xj +Yk =Zi PX,Y (X = Xj , Y = Yk )(Xj + Yk ) j=1 k=1 = = n X m X PX,Y (X = Xj , Y = Yk )Xj + n X m X j=1 k=1 j=1 k=1 n X m X Xj j=1 m X PX,Y (X = Xj , Y = Yk ) + k=1 Yk n X Xj PX (X = Xj ) + j=1 m X n X PX,Y (X = Xj , Y = Yk ) j=1 k=1 = PX,Y (X = Xj , Y = Yk )Yk Yk PY (Y = Yk ) k=1 = E[X] + E[Y ] 2 Theorem 5 (Markov Inequality) Let X be a discrete nonnegative random variable, and δ > 0. Then P (X ≥ δ) ≤ E[x] δ . Proof: Let the values of X be 0 ≤ X1 < X2 < ... < Xj−1 ≤ δ ≤ Xj < ... < Xn . Then E[X] = P (X1 ) · X1 + P (X2 ) · X2 + ... + P (Xn ) · Xn ≥ P (Xj ) · Xj + P (Xj+1 ) · Xj+1 + ... + P (Xn ) · Xn ≥ P (Xj ) · δ + P (Xj+1 ) · δ + ... + P (Xn ) · δ ≥ (P (Xj ) + P (Xj+1 ) + ... + P (Xn )) · δ = P (X ≥ δ) · δ Thus P (X ≥ δ) ≤ E[x] δ . 2 Definition 20 (Variance) Let X be a discrete random variable with values X1 , X2 , ..., Xn . The P variance of X is defined as: V ar(X) = nk=1 P (Xj ) · (X − E[X])2 . The standard deviation of X is p defined as σ(X) = V ar(X). The expectation measures the average of a random variable, while the standard deviation measures its spread. Theorem 6 V ar(X) = E[X 2 ] − (E[X])2 . 7 Proof: V ar(X) = E[(X − E[X])2 ] = E[X 2 − 2XE[X] + (E[X])2 ] = E[X 2 ] − E[2XE[X]] + (E[X])2 = E[X 2 ] − 2(E[X])2 + (E[X])2 = E[X 2 ] − (E[X])2 . 2 Theorem 7 (Chebyshev’s Inequality) P (|X − E[X]| ≥ δ) ≤ V ar(X) . δ2 Proof: Consider the random variable Z = (X − E[X])2 . Observe that E[Z] = V ar(X). Thus P (|X − E[X]| ≥ δ) = P (Z ≥ δ 2 ) ≤ E[Z] = V ar(X) . The inequality is from Markov’s Inequality. 2 δ2 δ2 Theorem 8 Let X1 , X2 , ..., Xn be a set of independent random variables. Then V ar(X1 + X2 + ... + Xn ) = V ar(X1 ) + V ar(X2 ) + ... + V ar(Xn ). 9 Some Important Probability Distributions Definition 21 The Bernoulli trial is a probabilistic experiment with only two outcomes. Since there are only two outcomes, we usually refer to one of the outcome as “success” and the other as “failure”. Usually we use a binary random variable X (random variable with value either 1 for success and 0 for failure) to represent a Bernoulli trial, where P (X = 1) = p and P (X = 0) = 1 − p. Definition 22 (Binomial Distribution) Consider a Bernoulli trial with a probability of success equal to p and probability of failure equal to 1 − p. Let X be the random variable representing the number of success in a sequence of n independent Bernoulli trials. Then P (X = k) = (nk )pk (1 − p)n−k , and is denoted by B(n, p). The probability mass function of X is called a binomial distribution. Theorem 9 E[B(n, p)] = np and V ar(B(n, p)) = np(1 − p). Proof: Let Xk be the random variable associated with the k−th Bernoulli trial, and P (Xk = 1) = p and P (Xk = 1) = 1 − p. Then E[Xk ] = p · 1 + (1 − p) · 0 = p. V ar(Xk ) = p · (1 − p)2 + (1 − p) · (0 − p)2 = p(1 − p). Now observe that X = X1 + X2 + ... + Xn . Using linearity of expectation, we have E[X] = E[X1 ] + E[X2 ] + ... + E[Xn ] = np. From Theorem 8, we have V ar[X] = V ar[X1 ] + V ar[X2 ] + ... + var[Xn ] = np(1 − p). 2 Definition 23 (Geometric Distribution) Consider a Bernoulli trial with a probability of success equal to p and probability of failure equal to 1−p. Let X be the random variable representing the number of trials until the first success occurs. Then P (X = k) = (1 − p)k−1 p. The probability distribution of X is called a geometric distribution. Theorem 10 Let X be a random variable subject to a geometric distribution. The E[X] = Proof: [E[X] = (1 − p)0 p · 1 + (1 − p)1 p · 2 + (1 − p)3 p · 3 + ... Let Sn = have E[X] = limn→ Sn . Observe that Pn j=1 (1 1 p − p)j−1 p, then we Sn = (1 − p)0 p · 1 + (1 − p)1 p · 2 + (1 − p)3 p · 3 + ... + (1 − p)n−1 p · n (1 − p)Sn = (1 − p)1 p · 1 + (1 − p)2 p · 2 + (1 − p)3 p · 3 + ... + (1 − p)n p · n Sn − (1 − p)Sn = (1 − p)0 p + (1 − p)1 p + (1 − p)3 p + ... + (1 − p)n−1 p − (1 − p)n p · n 8 pSn = ((1 − p)0 + (1 − p)1 + (1 − p)3 + ... + (1 − p)n−1 ) · p − (1 − p)n · n pSn = (1 − p)n − 1 · p − (1 − p)n · n (1 − p) − 1 pSn = 1 − (1 − p)n − (1 − p)n · n pSn = 1 − (1 − p)n · (n + 1) Sn = 1 (1 − p)n · (n + 1) − p p Thus E[X] = limn→∞ 1 (1 − p)n · (n + 1) − p p = 1 p E[X 2 ] = (1 − p)0 p · 12 + (1 − p)1 p · 22 + (1 − p)3 p · 32 + ... Let Tn = (1 − p)0 p · 12 + (1 − p)1 p · 22 + (1 − p)3 p · 32 + ... + (1 − p)n−1 p · n2 , then limn→∞ Tn = E[X 2 ]. 2 Definition 24 (Poisson Distribution) A discrete random variable X is said to subject to Poisson k e−λ distribution if P (X = k) = λ k! for fixed λ > 0 and k = 0, 1, ... P P λk e−λ λk −λ eλ = 1. Thus Poisson distribution is well defined. Observe that ∞ = e−λ ∞ k=0 k=0 k! = e k! The Poisson distribution describes the probability of a given number of events occurring in a fixed interval of time if these events occur with a known average rate, and independent of the time since last event. Theorem 11 Let X be a Poisson random variable. Then E[X] = λ. Proof: k −λ k −λ P P∞ λk−1 e−λ P∞ λk e−λ P ∞ λ e λ e · k = = λ = λ = λ. E[X] = ∞ k=0 (k−1)! k=0 k=0 k=0 k! k! (k−1)! 9 2
© Copyright 2026 Paperzz