BASIC PROBABILITY AND STATISTICS 1 Basic Probability

1
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
1
INSTITIÚID TEICNEOLAÍOCHTA CHEATHARLACH
INSTITUTE OF TECHNOLOGY CARLOW
BASIC PROBABILITY AND STATISTICS
1
Basic Probability
Some work on probability and statistics forms the basis of the study of uncertainty, measuring and
quantifying uncertainty, and making decisions under uncertainty. Loosely speaking, by uncertainty we
mean the condition when results and outcomes are not completely determined – their development
depends on a number of factors and not just on a pure chance. Simple examples of uncertainty appear
when you buy a lottery ticket or toss a coin to make a choice.
Uncertainty appears in virtually all areas of computer science and software engineering. Installation
of software requires uncertain time and often uncertain disk space. A newly released software contains
an uncertain number of defects. When a computer program is executed, the amount of required memory
may be uncertain. When a job is sent to a printer, it takes an uncertain time to print, and there is
always a different number of jobs in a queue ahead of it. Electronic components fail at uncertain times,
and the order of their failures cannot be predicted exactly. Viruses attack a system at unpredictable
times and affect an unpredictable number of files and directories.
Uncertainty surrounds us in everyday life, at home, at work, in business, and in leisure. We may
find out that a flight was postponed because of weather conditions. We could ask why did they not
know this in advance, when the flight was scheduled? Forecasting weather precisely, with no error, is
not a solvable problem due to uncertainty. The weatherman may predict a 60% chance of rain. We
could again ask why he cannot let us know exactly, whether it will rain or not, so we’ll know whether
or not to bring our umbrellas? Again, because of uncertainty. If you drive to college tomorrow, you
will see an unpredictable number of green lights when you approach them, you will find an uncertain
number of vacant parking spaces, you will reach your classroom at an uncertain time and you cannot
be certain how many students you will find in the classroom when you enter it.
Realising that many important ideas around us bear uncertainty, we have to understand how to deal
with it. Most of the time, we are forced to make decisions under uncertainty. For example, we have to
deal with internet and e-mail knowing that we may not be protected against all kinds of viruses. New
2
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
2
software has to be released even if its testing probably did not reveal all the defects. Some memory
or disk space may be allocated for each customer by servers, internet service providers, etc., without
knowing exactly what portion of users will be satisfied with these limitations. And so on .....
We now introduce the language that is used to describe and quantify uncertainty. It is the language
of probability. When outcomes are uncertain, we can identify more likely and less likely ones and
assign, respectively, high and low probabilities to them. Probabilities are numbers between 0 and 1,
with 0 being assigned to an impossible event and 1 being the probability of an event that occurs for
sure.
Probability is the measure of the likelihood that a particular event will occur in any one trial.
1.1
Basic Probability – Definitions
The concept of probability agrees perfectly with our intuition.
Example If a fair coin is tossed, we say that it has a 50-50 (equal) chance of turning up heads or
tails. Hence, the probability of each side equals 1/2. It does not mean that a coin tossed 10 times will
produce exactly 5 heads and 5 tails. However, if you toss a coin one million times, the proportion of
heads is anticipated to be very close to 1/2.
In order to interpret the calculated probability correctly take note of the following.
Remark (The Law of Large Numbers) – The more often a random experiment is performed fairly, the
smaller the percentage deviation from the predicted probability.
A calculated probability may be represented as a proportion or as a percentage. For example, from
the first example 1/2 or 0.5 or 50% of occasions. In forecasting, it is common to speak about the
probability of an event as a likelihood of this event to happen (say, the company’s profit is likely to rise
during the next quarter). In gambling, for example horse racing or the national lottery, probability is
equivalent to odds. Having the winning odds of 1 to 100 (1:100) means that the probability to win is
0.01. It also means that if you play long enough, you will win about 1% of the time.
We begin a formal approach to basic probability with the following definitions.
Definition Arandom experiment is any act which can be repeated under given conditions.
3
3
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
Definition The sample space S of a random experiment is the set of all possible outcomes of the
experiment, i.e.,
S = {e1 , e2 , e3 , .........., en }
where ei is any one outcome (or sample point) of the experiment.
Definition An event A is a subset of the sample space S, i.e.
A⊂S
An elementary event is a subset consisting of precisely one sample point.
————————o————————–
Since events are sets of outcomes we need to recall some set theory definitions in order to compute
probabilities of events.
Definition Let A and B be sets. The union of A and B, denoted A ∪ B, is the set of all elements that
belong to A or to B (or to both). The intersection of A and B, denoted A ∩ B, is the set consisting
of all elements that belong to both A and B. The difference of A less B, denoted by A\B, is the set
consisting of elements of A that do not belong to B.
If every element of B belongs to A (so that B is a subset of A), then A\B is referred to as the
0
complement of B in A. We will denote the complement of B as B .
Example Let Z be the set of all integers, and let 2Z denote the set of all even integers (i.e., all integers
that are divisible by two). Then Z\2Z is the set of all odd integers (i.e., all integers that are not divisible
by two). We see that
2Z ∪ (Z\2Z) = Z
(i.e., the set of integers is the union of the set of even integers and the set of odd integers, or in
other words, every integer is even or odd). Also
2Z ∩ (Z\2Z) = ∅
4
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
4
(i.e., the intersection of the set of even integers and the set of odd integers is empty, or in other
words, no integer is both even and odd).
One may also form unions and intersections of three or more sets. If A, B and C are sets, then
A ∪ B ∪ C denotes the union of the three sets A, B and C, and consists of all elements that belong
either to A or to B or to C. Similarly A ∩ B ∩ C denotes the intersection of three sets A, B and C. An
element x is an element of the intersection A ∩ B ∩ C if and only if it is an element of A and also of B
and of C. This notion may be extended to four of more sets.
Definition Sets A,B,C,.... are disjoint or mutually exclusive if their intersection is empty, i.e.,
A ∩ B ∩ C ∩ .......... = φ
Definition Sets A,B,C,.... are exhaustive if their union represents the universal set, i.e.,
A ∪ B ∪ C ∪ .......... = U
Definition Let A and B be sets. We say that the set B is a subset of A if every element of B is an
element of A. If B is a subset of A then we denote this fact by writing B ⊂ A.
The empty set ∅ is a subset of every set. Also any set is a subset of itself (i.e., A ⊂ A for any set A).
Thus a non-empty set A always has at least two subsets, namely ∅ and A itself. Let A and B be sets.
If A ⊂ B and B ⊂ A then A = B. For if A ⊂ B and B ⊂ A then every element of A is an element of
B, and also every element of B is an element of A. But then the sets A and B have the same elements,
and therefore these sets are in fact the same set.
————————o————————–
Now that we have recalled the fundamentals of set theory we can now give a formal definition of
probability.
5
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
5
Definition A probability measure on a sample space S is a function p from the set of all subsets of S
to the set or real numbers, i.e.,
p : S −→ R
This function assigns to each subset A of S a real number denoted by p(A) such that
i 0 ≤ p(A) ≤ 1
ii p(S) = 1
iii if A ∩ B = φ, then p(A ∪ B) = p(A) + p(B).
The number p(A) is called the probability of the event A.
The conditions i, ii, iii above are known as the axioms of probability. The function p may be defined
on any subset of the sample space S of the experiment by adding the probabilities of the elementary
events of the subset.
Some properties of the function p are as follows:
1. p(φ) = 0
Proof:
p(φ)
= p(φ + φ)
= p(φ) + p(φ)
Cancelling p(φ) we get p(φ) = 0.
2. Let A1 , A2 , .........., An be events that are pairwise mutually exclusive, i.e., Ai ∩ Aj = φ whenever
i 6= j. Then
p(A1 ∪ A2 ∪ .......... ∪ Ar ) = p(A1 ) + p(A2 ) + .......... + p(Ar )
Proof:
p(A1 ∪ A2 ∪ .......... ∪ Ar )
=
p(A1 ) + p(A2 ∪ A3 ∪ .......... ∪ Ar )
= p(A1 ) + p(A2 ) + p(A3 ∪ .......... ∪ Ar )
= .................
= p(A1 ) + p(A2 ) + .......... + p(Ar )
6
6
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
More specifically, for two mutually exclusive events, i.e., A ∩ B = φ, we have
p(A ∪ B) = p(A) + p(B)
3. For any two events A and B in S we have
p(A ∪ B) = p(A) + p(B) − p(A ∩ B)
Proof: We can write A ∪ B and B as disjoint unions
A∪B
B
= A ∪ [B\(A ∩ B)]
=
[B\(A ∩ B)] ∪ (A ∩ B)
We can illustrate these disjoint unions on Venn diagrams
B
A
A ∪ B = A ∪ [B\(A ∩ B)]
7
7
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
B
B = [B\(A ∩ B)] ∪ (A ∩ B)
Applying axiom iii to the two disjoint unions we get that
p(A ∪ B) = p(A) + p(B\(A ∩ B))
p(B) = p(B\(A ∩ B)) + p(A ∩ B)
From the second equation we have that
p(B\(A ∩ B)) = p(B) − p(A ∩ B)
and substituting this into the first equation yields the required result
p(A ∪ B) = p(A) + p(B) − p(A ∩ B)
0
4. For complementary events, i.e., A ∪ A = S, we have
0
p(A ) = 1 − p(A)
Proof: We have that
0
A ∪A=S
Applying axiom iii to the disjoint union we get that
0
0
p(A ∪ A) = p(A ) + p(A) = p(S)
8
8
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
Hence, from axiom ii
0
p(A ) + p(A) = 1
Therefore
0
p(A ) = 1 − p(A)
It would be important to consider some simple examples at this stage. Before doing so it would be
useful to define the term equiprobable sample space. Many probability problems are based on sample
spaces where all outcomes are equally likely. In such cases an associated rule will allow for probabilities
to be easily calculated.
Definition An equiprobable sample space is one where all sample points have the same probability.
(All sample points are equally likely).
If S = {e1 , e2 , e3 , .........., en } is an equiprobable sample space, then
p(A) =
N umber of f avourable outcomes
T otal possible outcomes
Mathematically we write
p(A) =
|A|
|S|
where |A| denotes the order of the set A.
1.2
Some Examples
Example A card is selected from a well shuffled pack of cards. To determine the probability of drawing
a king or a club – let A denote the event of selecting a king on a single trial and let B denote the event
of selecting a club on a single trial. Since the sample space is equiprobable we have
p(A) =
4
52
,
p(B) =
13
52
However the events are not mutually exclusive since A ∩ B 6= φ. The events have an outcome in
common – namely the king of clubs. The required probability is obtained using
9
9
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
p(A ∪ B) = p(A) + p(B) − p(A ∩ B) =
13
1
16
4
+
−
=
= 0.3076
52 52 52
52
Remark Recall how to interpret this probability of 0.3076. The more often you perform this experiment
fairly (replacing the card after each trial), we would expect our favourable outcome of a king or a club
to occur on 30.7% of occasions.
Example During some construction, a network blackout occurs on Monday with probability 0.7 and
on a Tuesday with probability 0.5. There is a probability of 0.35 that a network blackout will occur
on both Monday and Tuesday. Therefore the probability of having a blackout on Monday or Tuesday
equals
p(A ∪ B) = p(A) + p(B) − p(A ∩ B) = 0.7 + 0.5 − 0.35 = 0.85
Example If a system appears protected against a new computer virus with probability 0.7, then the
system is exposed to a virus with probability
0
p(A ) = 1 − p(A) = 1 − 0.7 = 0.3
Example Tossing a die results in six equally likely outcomes, identified by the number of dots from 1
to 6. The probability of an odd number
p(A) =
3
N umber of f avourable outcomes
=
T otal possible outcomes
6
This formula to determine probabilities based on equally likely outcomes is simple as long as the
numerator and denominator can be easily evaluated. This is rarely the case. More often the sample
space consists of a multiple of outcomes. Combinatorics provides special techniques for the computation
of the favourable and total possible outcomes of a random experiment.
10
10
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
1.3
Combinatorics
Consider a general situation when objects are selected at random from a set of n. The objects may
be selected with replacement or without replacement. We will need to distinguish if the order of the
objects in the selection is of importance.
U
u
e
u
e
e
e
u
e
Sample f rom U
u
e u e
u
e
u
u
u
e
Definition Sampling with replacement means that every sampled item is replaced into the initial set,
so that any of the objects can be selected with probability 1/n at any time. In particular the object may
be sampled more than once. Sampling without replacement means that every sampled item is removed
from further sampling, so the set of possibilities reduces by 1 after each selection.
When the order of objects in a selection is important we refer to the objects as being distinguishable.
When order is not important in a selection we refer to the objects as indistinguishable.
1. Permutations with Replacement:
Possible selections of k distinguishable objects from a set of n are called permutations. Sampling
with replacement the total number of permutations is given as
Pr (n, k) = n.n.n..........n = nk
2. Permutations without Replacement:
During sampling without replacement, the number of possible selections reduces by 1 each time
an object is sampled. Sampling without replacement the total number of permutations is given
as
P (n, k) =
n!
(n − k)!
11
11
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
where n! = n(n − 1)(n − 2)..........3.2.1 and represents the number of permutations of n distinguishable objects.
3. Combinations with Replacement:
Possible selections of k indistinguishable objects from a set of n are called combinations. Sampling
with replacement the total number of combinations is given as
Cr (n, k) =
(k + n − 1)!
k!(n − 1)!
4. Combinations without Replacement:
Sampling without replacement the total number of combinations is given as
C(n, k) =
n!
k!(n − k)!
Example How many permutations are possible using all the letters in the following word
DU BLIN
Firstly, with replacement we will have Pr (6, 6) = 6.6.6.6.6.6 = 66 = 46, 656.
However, without replacement we will have
P (6, 6) = 6.5.4.3.2.1 = 6! = 720
Example How many 3-lettered permutations are possible using the letters in the following word
DU BLIN
Firstly, with replacement we will have Pr (6, 3) = 6.6.6 = 63 = 216.
However, without replacement we will have
P (6, 3) =
6!
= 120
(6 − 3)!
12
12
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
Example In how many ways can a basketball team of 5 players be selected from 10 possible players?
Firstly the team of 5 players are indistinguishable (order is not important), hence without replacement
we have
C(10, 5) =
10!
10.9.8.7.6
=
= 252
5!(10 − 5)!
5.4.3.2.1
Remark (Combinations from two different sets) If we have two different sets, one set with m different
objects and the other with n different objects, the number of selections which can be made with k from
the first set and r from the second set is
C(m, k) × C(n, r)
Example A container contains 8 black counters and 6 white counters. If four counters are selected one
at a time and without replacement, firstly, in how many ways can the four be selected so that exactly
2 black counters are chosen and secondly, in how many ways can the four be selected so that at least 3
black counters are chosen?
U
u
e
u
e
e
u
e
u
Sample f rom U
u
e u e u
u
e
u
u
e
The sets of 4 counters are indistinguishable (order is not important), hence without replacement
exactly 2 black counters can be chosen in
C(8, 2) × C(6, 2) =
6!
8.7 6.5
8!
×
=
×
= 28 × 15 = 420
2!(8 − 2)! 2!(6 − 2)!
2.1 2.1
At least 3 black counters can be chosen in two possible ways – firstly as 3 black and 1 white and
secondly as all 4 black. So 3 black and 1 white can be chosen in
13
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
C(8, 3) × C(6, 1) =
13
8!
6!
8.7.6 6
×
=
× = 56 × 6 = 336
3!(8 − 3)! 1!(6 − 1)!
3.2.1 1
Also all 4 black can be chosen in
C(8, 4) =
8!
8.7.6.5.4
=
= 70
4!(8 − 4)!
4.3.2.1
Hence, the total is 336 + 70 = 406.
Example How many ways can a committee of 4 men and 3 women be selected from 7 men and 5
women? Firstly the groups of 7 people are indistinguishable (order is not important), hence without
replacement we have
C(7, 4) × C(5, 3) =
5!
7.6.5 5.4
7!
×
=
×
= 35 × 10 = 350
4!(7 − 4)! 3!(5 − 3)!
3.2.1 2.1
Exercise A jar contains 6 red, 5 white and 4 blue counters. In how many ways can 6 counters be
selected so there are precisely 2 counters of each colour?
1.4
Further Examples
Now with basic counting techniques in place we can return to further probability examples.
Example Say the password to the college computer network is made up of a mix of eight characters
from the following set – 10 digits, 26 lower-case and 26 capital letters. We can create
Pr (62, 8) = 62.62.62.62.62.62.62, 62 = 628 = 218, 340, 105, 584, 896
i.e., over 218 trillion different 8-character passwords. At a speed of 1 million passwords per second, it
it would take a computer program almost 7 years to try all these passwords. At this speed the program
can test 604, 800, 000, 000 passwords within 1 week. The probability that it will guess the password in
1 week is
p(A) =
|A|
604, 800, 000, 000
=
= 0.00277
|S|
218, 340, 105, 584, 896
14
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
14
However, if capital letters are not used, thee number of possible passwords is reduced to
Pr (36, 8) = 36.36.36.36.36.36.36.36 = 368 = 2, 821, 109, 907, 456
This password will be guessed in about 16 days. The probability that it will guess the password in
1 week is
p(A) =
|A|
604, 800, 000, 000
=
= 0.214
|S|
2, 821, 109, 907, 456
It is clear now why it is recommended to use three types of characters in your passwords and to
change them every year.
Example For a lottery, 35 cards numbered 1 to 35 are placed in a drum. Five cards will be chosen at
random from the drum as the winning combination.
i What is the probability of winning this lottery?
ii What is the probability of matching four numbers with the winning combination?
iii What is the probability of matching three numbers with the winning combination?
i Let A denote the probability of winning this lottery.
The number of different combinations are
C(35, 5) =
35.34.33.32.31
35!
=
= 324, 632
5!(35 − 5)!
5.4.3.2.1
therefore
p(A) =
|A|
1
=
= 0.00000308
|S|
324, 632
ii Let B denote the probability of matching four numbers with the winning combination.
The number of possible combinations that match exactly four
C(5, 4) × C(30, 1) =
5!
30!
5 30
×
= ×
= 150
4!(5 − 4)! 1!(30 − 1)!
1
1
therefore
p(B) =
|B|
150
=
= 0.000462
|S|
324632
15
15
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
iii Let C denote the probability of matching three numbers with the winning combination.
The number of possible combinations that match exactly three
C(5, 3) × C(30, 2) =
5!
30!
5.4 30.29
×
=
×
= 4350
3!(5 − 3)! 2!(30 − 2)!
2.1
2.1
therefore
p(C) =
|C|
4350
=
= 0.013399
|S|
324632
Example A container has 6 red counters and 4 white counters. Three counters are selected at random,
one at a time and without replacement. Find the probabilities that
i all three counters are red,
ii two of the three counters are red,
iii one of the three counters are red,
iv all three counters are white.
U
e
u
Sample f rom U
e u u
u
u
e
u
e
u
u
e
In this example we have an equiprobable sample space, then
p(A) =
N umber of f avourable outcomes
|A|
=
T otal possible outcomes
|S|
The sets of 3 counters are indistinguishable (order is not important), hence without replacement,
|S|, the total possible outcomes are
|S| = C(10, 3) =
10!
10.9.8
=
= 120
3!(10 − 3)!
3.2.1
16
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
16
i Let A denote the event all three counters are red. Now
|A| = C(6, 3) × C(4, 0) =
6!
4!
6.5.4 4.3.2.1
×
=
×
= 20 × 1 = 20
3!(6 − 3)! 0!(4 − 0)!
3.2.1 4.3.2.1
Therefore, we have
p(A) =
|A|
20
1
=
=
|S|
120
6
ii Let B denote the event that two of three counters are red. Now
|B| = C(6, 2) × C(4, 1) =
6!
4!
6.5 4.3.2.1
×
=
×
= 15 × 4 = 60
2!(6 − 2)! 1!(4 − 1)!
2.1
3.2.1
Therefore, we have
p(B) =
60
1
|B|
=
=
|S|
120
2
iii Let C denote the event one of the counters are red. Now
|C| = C(6, 1) × C(4, 2) =
6!
4!
6 4.3
×
= ×
= 6 × 6 = 36
1!(6 − 1)! 2!(4 − 2)!
1 2.1
Therefore, we have
p(C) =
|C|
36
3
=
=
|S|
120
10
iv Let D denote the event all three counters are white. Now
|D| = C(6, 0) × C(4, 3) =
4!
6.5.4.3.2.1 4.3.2.1
6!
×
=
×
=1×4=4
0!(6 − 0)! 3!(4 − 3)!
6.5.4.3.2.1
3.2.1
Therefore, we have
p(D) =
|D|
4
1
=
=
|S|
120
30
Exercise Two disks are, at the same time, taken from a box containing 3 black, 3 red and 3 yellow
disks. Find the probabilities that
i both disks are yellow,
17
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
17
ii neither of the two disks are yellow,
iii at least one of the two disks are yellow.
1.5
Conditional Probability
Suppose that you are meeting someone at an airport. The flight is likely to arrive on time, the probability
of that is 0.8. Suddenly it is announced that the flight departed one hour behind schedule. Now it has
the probability of only 0.05 to arrive on time. New information affected the probability of meeting this
flight on time. The new probability is called conditional probability, where the new information, that
the flight departed late, is a condition.
Let A and B be events in a sample space S with p(B) > 0. The conditional probability of A, given
B, is denoted by p(A|B) and defined by
p(A|B) =
p(A ∩ B)
p(B)
Example A fair die is tossed twice. We can determine the probability that the sum of the throws is
7, 8 or 9 given that the first throw is a 4 using the definition of conditional probability. Let
S = {(i, j) : i, j ∈ {1, 2, 3, 4, 5, 6}}
Since the die is fair, S is an equiprobable sample space.
Let B = {the f irst throw is a 4}, therefore
B = {(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)}
Therefore, we have
p(B) =
|B|
6
1
=
=
|S|
36
6
Let A = {the sum is 7, 8 or 9}, therefore
A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (2, 6), (3, 5), (4, 4), (5, 3),
(6, 2), (3, 6), (4, 5), (5, 4), (6, 3)}
18
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
18
Therefore, we have
p(A) =
|A|
15
5
=
=
|S|
36
12
Note that p(A) is not what we are trying to find.
From the explicit lists of the sets A and B, it follows that
A ∩ B = {(4, 3), (4, 4), (4, 5)}
Hence,
p(A ∩ B) =
|A ∩ B|
3
1
=
=
|S|
36
12
The required probability is
p(A|B) =
p(A ∩ B)
1/12
1
=
=
p(B)
1/6
2
Thus, in this case, the conditional of A given B is greater than the original probability of A – the
occurrence of B renders the occurrence of A “more likely”.
Example Ninety percent of flights depart on time. Eighty percent of flights arrive on time. Seventy-five
percent of flights depart on time and arrive on time.
i You are meeting a flight that departed on time. What is the probability that it will arrive on
time?
ii You have met a flight, and it arrived on time. What is the probability that it departed on time?
Let A = {f light arrived on time} and let D = {f light departed on time}.
We have
p(A) = 0.8 , p(D) = 0.9 , p(A ∩ D) = 0.75
19
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
19
i The required probability is
p(A|D) =
p(A ∩ D)
0.75
=
= 0.8333
p(D)
0.9
p(D|A) =
0.75
p(D ∩ A)
=
= 0.9375
p(A)
0.8
ii The required probability is
1.6
Independence
The definition of conditional probability is sometimes restated as a “multiplication theorem” as follows:
for events A and B,
p(A ∩ B) = p(A|B)p(B)
A similar equation holds with A and B interchanged:
p(A ∩ B) = p(B|A)p(A)
We can now give an intuitive and clear definition of independent events. Events A and B are
independent if the occurrence of B does not affect the probability of A, i.e.,
p(A|B) = p(A)
In other words events A and B are independent if and only if
p(A ∩ B) = p(A)p(B)
otherwise, they are dependent
Example Three fair coins are tossed. Let A denote the event of all heads or all tails and let B denote
the event of at least two heads. Are the events A and B independent?
We need only investigate if
p(A ∩ B) = p(A)p(B)
Let
20
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
20
S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }
Since the coins are fair, S is an equiprobable sample space.
Let A = {all heads or all tails}, therefore
A = {HHH, T T T }
Therefore, we have
p(A) =
2
1
|A|
= =
|S|
8
4
Let B = {at least two heads}, therefore
B = {HHH, HHT, HT H, T HH}
Hence
p(B) =
|B|
4
1
= =
|S|
8
2
Now A ∩ B = {HHH}, hence
p(A ∩ B) =
|A ∩ B|
1
=
|S|
8
Since
p(A)p(B) =
1 1
1
. = = p(A ∩ B)
4 2
8
we conclude that the events A and B are independent.
Remark In the above example, we use our knowledge of the probabilities of the two events and their
joint occurrence to prove that they are independent. In many other situations, we may assume that
two events are independent, and use this assumption, together with the definition of independence, to
calculate the probability of the joint occurrence from the probability of the two separate events.
21
21
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
Example There are 6 computers, of which 2 are defective, and 10 printers, of which 3 are defective.
Choose at random 1 computer and 1 printer. Find the probabilities that
i both are non-defective,
ii the computer is defective, the printer is non-defective,
iii the computer is non-defective, the printer is defective,
iv both are defective.
We could solve this problem using counting techniques since the sample space is equiprobable,
however, assuming independence outlined in the remark above we can now proceed as follows:
Let A denote the event that the chosen computer is defective, then A0 is the event that the chosen
computer is non-defective. Let B denote the event that the chosen printer is defective, so B 0 is the
event that the chosen printer is non-defective. The choice of computer and printer are assumed to be
independent, so the following four pairs of events are independent.
{AB, AB 0 , A0 B, A0 B 0 }
Hence
i
p(A0 ∩ B 0 ) = p(A0 )p(B 0 ) =
4 7
7
.
=
6 10
15
p(A ∩ B 0 ) = p(A)p(B 0 ) =
2 7
7
.
=
6 10
30
p(A0 ∩ B) = p(A0 )p(B) =
4 3
1
.
=
6 10
5
ii
iii
iv
p(A ∩ B) = p(A)p(B) =
2 3
1
.
=
6 10
10
We can now return to a problem considered earlier and consider an alternative approach to the
solution.
22
22
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
Example A container has 6 red counters and 4 white counters. Three counters are selected at random,
one at a time and without replacement. Find the probabilities that
i all three counters are red,
ii two of the three counters are red,
iii one of the three counters are red,
iv all three counters are white.
U
u
e
Sample f rom U
e u u
u
e
u
u
e
u
u
e
i Let A denote the event that all three counters are red. Therefore,
p(A) =
1
6 5 4
. . =
10 9 8
6
ii Let B denote the event that two of the three counters are red. Now,
p(B) =
1
4 6 5
. . =
10 9 8
6
This probability refers to the arrangement WRR (i.e., white, red, red). However, this can occur
in two further ways as RWR and RRW. These events have the same probability of one chance in
six and furthermore the three events are mutually exclusive. Hence the required probability is
p(B) =
1 1 1
3
1
+ + = =
6 6 6
6
2
23
23
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
iii Let C denote the event that one of the three counters are red. Now,
p(C) =
4 3 6
1
. . =
10 9 8
10
This probability refers to the arrangement WWR (i.e., white, white, red). However, this can occur
in two further ways as RWW and WRW. These events have the same probability of one chance in
ten and furthermore the three events are mutually exclusive. Hence the required probability is
p(B) =
1
1
1
3
+
+
=
10 10 10
10
iv Let D denote the event that all three counters are white. Therefore,
p(D) =
1
4 3 2
. . =
10 9 8
30
Example Out of six computer chips, two are defective. If two chips are randomly chosen for testing
(without replacement), compute the probability that both of them are defective.
p(Def ective) =
p(Both Def ective) =
1
2
=
6
3
2
1
2 1
. =
=
6 5
30
15
Example A new computer virus can enter the system through e-mail or through the internet. There is
a 30% chance of receiving this virus through the e-mail. There is a 40% chance of receiving it through
the internet. Also, the virus enters the system simultaneously through the e-mail and the internet with
probability 0.15. What is the probability that the virus will not enter the system at all.
p(A) = 0.3
,
p(B) = 0.4
,
p(A ∩ B) = 0.15
where A denotes the event of receiving the virus through e-mail, B denotes the event of receiving
the virus through the internet, A ∩ B denotes receiving the virus through the e-mail and through the
internet. Now
p(A ∪ B) = p(A) + p(B) − p(A ∩ B) = 0.3 + 0.4 − 0.15 = 0.55
24
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
24
Finally, the probability that the virus will not enter the system at all, denoted as p(A ∪ B)0 , is
p(A ∪ B)0 = 1 − p(A ∪ B) = 1 − 0.55 = 0.45
Example A computer lab contains 6 computers. At any given time, for each computer, there is a
probability of 0.9 that it is working. What is the probability that
i all the computers are working,
ii at least one computer is working.
Let A denote the event that a computer is working.
p(A) = 0.9
Let A0 denoting the event that a computer is not working.
p(A0 ) = 1 − p(A) = 1 − 0.9 = 0.1
i
p(all working)
= p(A ∩ A ∩ A ∩ A ∩ A ∩ A)
= p(A).p(A).p(A).p(A).p(A).p(A)
=
(0.9)6
=
0.5314
ii Now
p(“at least one” working) = 1 − p(none are working)
Hence
p(none are working)
=
p(A0 ∩ A0 ∩ A0 ∩ A0 ∩ A0 ∩ A0 )
=
p(A0 ).p(A0 ).p(A0 ).p(A0 ).p(A0 ).p(A0 )
=
(0.1)6
Therefore
p(“at least one” working) = 1 − (0.1)6 = 0.9999
25
25
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
Exercise Two events A and B are independent.
p(A) =
1
5
,
p(B) =
1
7
Find
i p(A ∩ B)
ii p(A ∪ B).
Exercise Suppose that after 10 years of service, 32% of computers have problems with motherboards
(MB), 25% have problems with hard drives (HD), and 5% have problems with both MB and HD. What
is the probability that a 10-year old computer still has a functioning MB and HD?
Exercise A box of 25 mobile phones contain 10 phones equipped with WiFi but without a camera, 9
phones equipped with a camera but without WiFi and 6 phones equipped with both. If 5 phones are
chosen at random (without replacement) from this box, what is the probability that
i none of the 5 phones has both WiFi and a camera,
ii all 5 phones have a camera,
iii at least one of the phones has WiFi.
iv exactly two phones have WiFi but no camera, and the other three phones have a camera,
26
2
26
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
Some Basic Statistics
Statistics is the science of collecting, studying and analysing numerical data. The numerical data
could be for example official statistics on employment, or on imports and exports, or monthly meteorological records for a particular region. The subject divides into two branches. Descriptive Statistics is
mainly concerned with collecting, summarising and interpreting data. Inferential Statistics is concerned
with methods for obtaining and analysing data to make inferences applicable in a wider context (e.g.,
from sample to population). It is concerned also with the precision and reliability of such inference
insofar as this involves probabilistic considerations. In this context statistics may be described as a
branch of mathematics based on probability theory.
We introduce basic statistical analysis by considering measures of central tendency and measures
of dispersion. We assume that the data given is in the form of a frequency distribution, or a frequency
distribution has been constructed from the raw data.
Example The following is a record of the percentage marks gained by candidates in an examination:
65
86
30
63
50
16
56
35
93
45
57
39
44
75
25
45
74
93
84
25
57
50
34
55
85
12
50
54
28
77
55
48
78
15
27
79
68
26
66
80
20
83
36
96
75
50
52
67
62
91
54
71
63
51
40
46
61
62
57
67
52
66
67
54
37
46
40
51
45
53
49
54
55
52
46
59
38
52
43
55
58
51
40
53
42
57
57
54
47
51
52
27
56
42
86
50
31
61
33
36
Looking at the figures there are 100 marks ranging from 12 to 96. Say, we set up 9 classes each of
width 10 for our frequency distribution.
Marks Awarded
10 − 19
20 − 29
30 − 39
40 − 49
50 − 59
60 − 69
70 − 79
80 − 89
90 − 99
Number of Candidates
3
7
10
16
34
13
7
6
4
27
27
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
With the data in this form we can represent it pictorially using a histogram.
Histogram
F requency
35
30
25
20
15
10
5
10 − 19
30 − 39
20 − 29
50 − 59
40 − 49
70 − 79
60 − 69
90 − 99
M arks
80 − 89
From the histogram we can determine the mode. We can easily construct a cumulative frequency
curve and determine from this graph the median of the data set as well as other measures of location.
Also, with the data in this form we can easily determine the mean and the standard deviation. We
must define each of the terms mentioned.
Firstly, a frequency distribution will take the form
x1
f1
x2
f2
x3
f3
.......
.......
.......
.......
xn
fn
A group frequency distribution summarises data into groups of values and takes the form
x1 − x2
f1
x2 − x3
f2
x3 − x4
f3
.......
.......
.......
.......
.......
.......
28
28
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
Statistics under the heading of measures of central tendency include mode, median and mean.
Definition The mode of a dataset is the observation that occurs most frequently in the dataset.
Definition The median of a dataset is the middle number when the observations are arranged in
ascending order.
Definition The mean of a dataset is the sum of the observations in the dataset divided by the number
of observations in the dataset.
x̄ =
1X
fx
n
where
n=
X
f
The most common measure of dispersion is the standard deviation.
Definition The standard deviation of a dataset is the average of the deviations from the mean given
by the following formula
"
1 X 2
s=
fd
n−1
# 12
where
d = x − x̄
Remark For raw data, we have
1X
x̄ =
x
n
"
and
1 X 2
s=
d
n−1
# 21
where n is the number of observations and d = x − x̄. These formula are similar to the above with
frequency f removed.
Example Consider the following raw data
23 , 44 , 34 , 24 , 61 , 45 , 53 , 39
29
29
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
Firstly, we can get the required totals as follows
x
23
44
34
24
61
45
53
39
323
x − x̄
−17.375
3.625
−6.375
−16.375
20.625
4.625
12.625
−1.375
2
(x − x̄)
301.89
13.14
40.64
268.14
425.39
21.39
159.39
1.89
1231.87
Now
x̄ =
1X
x
n
Hence
x̄ =
1
323 = 40.375
8
Also
"
1 X 2
s=
d
n−1
Hence
# 12
where
d = x − x̄
21
1
(1231.87) = 13.266
s=
7
"
Exercise Find the mean x̄ and standard deviation s of the following raw data
150 , 488 , 600 , 125 , 179 , 315 , 208 , 82 , 263 , 859
Example The number of defective items produced each day by a production process is recorded in the
table below:
30
30
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
N umber of Def ectives
40 − 44
45 − 49
50 − 54
55 − 59
60 − 64
65 − 69
N umber of Days
14
29
44
38
25
10
i Calculate the mean x̄ and standard deviation s of the distribution.
ii Plot a graph of the cumulative frequency curve.
iii Estimate from this graph the median of the data.
i Let x = mid-interval value.
x
42
47
52
57
62
67
f
14
29
44
38
25
10
160
fx
588
1363
2288
2166
1550
670
8625
x̄ =
1X
fx
n
x − x̄
−11.9
−6.9
−1.9
3.1
8.1
13.1
(x − x̄)2
141.61
47.61
3.61
9.61
65.61
171.61
f (x − x̄)2
1982.54
1380.69
158.84
365.18
1640.25
1716.1
7243.6
Now
where
n=
X
f
Hence
x̄ =
1
(8625) = 53.9
160
Also
"
1 X 2
s=
fd
n−1
# 12
Hence
"
1
s=
(7243.6)
159
where
d = x − x̄
# 21
= 6.7636
31
31
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
ii To plot a cumulative frequency curve for this example we tabulate as follows
x
42
47
52
57
62
67
f
14
29
44
38
25
10
Cumulative f requency
14
43
87
125
150
160
Cumulative F requency Curve
Cumulative
F requency
160
s
s
140
s
120
100
s
80
60
s
40
20
s
47
52
57
62
42
67
x
iii To estimate the median from a cumulative frequency curve we estimate the middle-most measurement i.e., 50% through the distribution, as follows
32
32
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
Cumulative F requency Curve
Cumulative
F requency
160
s
s
140
s
120
100
s
80
60
s
40
20
s
47
52
57
62
67
42
x
The median is approximately 51.
Remark Some other measures of location can be estimated from the cumulative frequency graph,
namely the first and third quartiles.
The three quartiles of raw data or a frequency distribution are those numbers that lie one-quarter,
one-half and three-quarters of the way along the group, and are called the lower quartile, middle and
upper quartiles.
The three quartiles are denoted by Q1 , Q2 and Q3 . The median is Q2 .
A further measure of dispersion is the inter-quartile range Q3 − Q1 .
Remark Finally, we can estimate the mode (the observation that occurs most frequently) of any
grouped frequency distribution by firstly constructing its histogram and using the following construction.
33
33
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
Histogram
F requency
35
D D D D
D
D
D
D
D
D
D
D
D
30
25
20
15
10
5
10 − 19
30 − 39
20 − 29
50 − 59
40 − 49
70 − 79
60 − 69
90 − 99
M arks
80 − 89
From this construction the mode is estimated to be 54.
Exercise Expenditure on monthly mobile phone bills is recorded in the table below:
M onthly Bill
10 < 20
20 < 30
30 < 40
40 < 50
50 < 60
60 < 70
N umber of P eople
12
27
47
33
21
10
i Calculate the mean x̄ and standard deviation s of the distribution.
ii Plot a graph of the cumulative frequency curve.
iii Estimate form this graph the median of the data and the first and third quartiles.
Frequency curves of distributions may be relatively symmetric, but more often are skewed to some
extent. The following frequency curves indicate symmetric, moderately left-skewed and moderately
right-skewed distributions.
34
34
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
y
Symmetric
s
x
M ode
M edian
M ean
y
M oderate lef t skew
s
s
s
x
M ode
M edian
M ean
35
35
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
y
M oderate right skew
s
s
s
x
M ode
M edian
M ean
Consider the results of the Irish Leaving Certificate Mathematics Examination. Which frequency
curve would indicate that the mathematics paper had been set at a satisfactory standard?
3
Probability Distributions
3.1
The Normal Distribution
From samples of data we now move to large data sets which we refer to as a population. To distinguish
between sample and population we have a change in notation for the mean and standard deviation.
For a sample of size n:
x̄ = mean , s = standard deviation
For a population of size N:
µ = mean , σ = standard deviation
Sometimes it is difficult to calculate the standard deviation of the population. In such cases it is
common for the standard deviation of the population σ to be estimated by examining a random sample
taken from the population. A common estimator of σ is an adjusted version of the formula for the
standard deviation of the sample. This formula is
"
1 X 2
s=
fd
n−1
# 12
where
d = x − x̄
36
36
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
where we use n − 1 instead of n. This is called Bessel’s correction. Using n instead of n − 1 tends
to underestimate the population standard deviation.
What do we mean when we say that a dataset is normally distributed?
y
µ−σ
µ − 2σ
µ − 3σ
µ+σ
µ + 2σ
µ + 3σ
x
If a dataset is normally distributed (i.e., conforms to the symmetric bell-shaped frequency curve
shown above), the following characteristics hold:
i 67% of data lies within 1 standard deviation of the mean: i.e., µ ± σ
ii 97% of data lies within 2 standard deviation of the mean: i.e., µ ± 2σ
iii 99% of data lies within 3 standard deviation of the mean: i.e., µ ± 3σ
Different datasets will have different means and different standard deviations so we standardize the
given data-set by transposing the mean to the origin 0 (i.e., subtract µ) and ensure total area under
frequency curve totals 1 by dividing by the standard deviation σ.
We introduce the standard score z as
z=
x−µ
σ
This quantity is simply the number of standard deviations by which x exceeds the mean. The
standard normal curve will have a mean µ = 0 and a standard deviation σ = 1.
37
37
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
y
µ=0
σ=1
−3
−2
−1
1
2
3
x
0
The equation of this curve is
1 2
1
f (x) = √ e− 2 x
2π
The area under this curve may be evaluated from this function to yields a value of 1. We make a link
between this statistical distribution and probability by recalling the second axiom of basic probability,
P (S) = 1. Answering probability questions based on normally distributed data simply amounts to
determining areas areas under the standard normal curve.
Exercise Describe a normal distribution, paying particular attention to precise definition of its most
important properties. Give examples of statistical data which might be expected to conform closely to
a normal distribution.
If z is a random variable with standard normal distribution, find
i p(z ≤ 1.5)
ii p(z ≥ 1.75)
iii p(1.5 ≤ z ≤ 1.75)
Example The time taken by a postman to deliver letters to a certain apartment block is normally
distributed with mean µ = 12 minutes and standard deviation σ = 2 minutes. Estimate the probability
that he takes
38
38
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
i longer than 17 minutes
Firstly, convert this measurement to a standard score
z=
x−µ
17 − 12
=
= 2.5
σ
2
The question is now equivalent to determining p(z ≥ 2.5).
Draw the standard normal curve and indicate the required area.
y
µ=0
σ=1
−3
−2
−1
1
2 6 3
0
2.5
From the normal distribution tables φ(2.5) = 0.4938.
The required proportion (probability) is 0.5 − 0.4938 = 0.0062
ii less than 10 minutes
Firstly, convert this measurement to a standard score
z=
x−µ
10 − 12
=
= −1.0
σ
2
The question is now equivalent to determining p(z ≤ −1.0).
x
39
39
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
Draw the standard normal curve and indicate the required area.
y
µ=0
σ=1
−3
−2
−1
6
1
2
0
−1.0
From the normal distribution tables φ(1.0) = 0.3413.
The required proportion (probability) is 0.5 − 0.3413 = 0.1587
iii between 9 and 13 minutes.
Firstly, convert each measurement to a standard score
z1 =
x−µ
9 − 12
=
= −1.5
σ
2
z2 =
x−µ
13 − 12
=
= 0.5
σ
2
The question is now equivalent to determining p(−1.5 ≤ z ≤ 0.5).
Draw the standard normal curve and indicate the required area.
3
x
40
40
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
y
µ=0
σ=1
−3
−2
6
−1
61
2
3
x
0
−1.5
0.5
From the normal distribution tables φ(0.5) = 0.1915 and φ(1.5) = 0.4332
The required proportion (probability) is 0.1915 + 0.4332 = 0.6247
Exercise The amounts due on monthly mobile phone bills in Ireland are normally distributed with
mean µ = 53 euro and standard deviation σ = 15 euro. If the monthly phone bill is chosen at random,
find the probability that the mean amount due on the bill is
i greater than 53.3 euro.
ii between 47 euro and 74 euro.
Exercise Assume the speed of vehicles along a stretch of road is normal distributed with a mean µ =
119 km\h and a standard deviation σ = 12 km\h.
i The current speed limit is 100 km\h. What is the proportion of vehicles less that or equal to the
speed limit?
ii What proportion of vehicles would be going greater than 125 km\h?
iii A new speed limit will be initiated such that approximately 10% of vehicles will be over the speed
limit. What is the new speed limit based on this criterion?
41
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
3.2
41
The Binomial Distribution
Definition A random variable is a function
X : S −→ R
Consider an experiment of tossing 3 fair coins and counting the number of heads. Let X be the
number of heads. Prior to the experiment, its value is not known. All we can say is that X has to be
an integer between 0 and 3. Since assuming each value is an event, we can compute probabilities,
1 1 1
1
. . =
2 2 2
8
3
1 1 1
P (X = 1) = P (HT T ) + P (T HT ) + P (T T H) = + + =
8 8 8
8
1 1 1
3
P (X = 2) = P (HHT ) + P (HT H) + P (T HH) = + + =
8 8 8
8
1
1 1 1
P (X = 3) = P (HHH) = . . =
2 2 2
8
P (X = 0) = P (T T T ) =
Summarising, we have
x
0
1
2
3
P (X = x)
1/8
3/8
3/8
1/8
Definition This table is called the probability distribution of X. The function
P (x) = P (X = xn )
is called the probability mass function.
A discrete random variable is one whose range is finite, in most practical cases the values taken
are non-negative integers. Examples include the number of jobs submitted to a printer, number of
errors, number of error-free modules, number of failed components. A continuous random variable
is one that takes any value in a finite interval or infinite. Examples of continuous variables include
various times (software installation time, code execution time, connection time, waiting time, etc.), also
physical variables like weight, height, temperature, voltage, distance, the number of miles per gallon,
etc. Rounding a continuous random variable to the nearest integer makes it discrete.
42
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
42
The simplest random variable takes just two possible values. Call them 0 and 1.
Definition A random variable with two possible value, 0 and 1, is called a Bernoulli variable. Any
experiment with precisely two outcomes is called a Bernoulli trial.
Named after the Swiss mathematician Jacob Bernoulli (1654–1705). Examples of Bernoulli trials
are good or defective components, parts that pass or fail tests, transmitted or lost signals, working or
malfunctioning hardware, sites that contain or do not contain a keyword, men and women, heads and
tails, and so on, are examples of Bernoulli trials. All these examples can be categorized by using generic
names for the two outcomes – “successes” and “failures”. These are commonly used generic names; in
fact, successes do not have to be good, and failures do have to be bad.
Now consider a sequence of independent Bernoulli trails and count the number of successes in it.
This may be the number of defective computers in a shipment, the number of updated files in a folder,
the number of girls in a family, thee number of e-mails with attachments, etc.
Definition A variable described as the number of successes in a sequence of independent Bernoulli
trails has Binomial distribution. The binomial probability mass function is
P (r) = C(n, r).pr .(1 − p)n−r
where p denotes the probability of a success on a single trial and q = (1 − p) denotes the probability
of failure on a single trial.
Example As part of a business strategy, a random selection of 20% of new internet service subscribers
receive a special promotion from the provider. A group of 10 neighbors sighs for the service. What is
the probability that exactly 2 of them get a special promotion?
Let X denote the number of people who receive a special promotion. Therefore
P (X = 1) = 0.2
Hence p = 0.2 and n = 10. The required probability is
P (2) = C(10, 2).(0.2)2 .(0.8)8 = 0.302
43
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
43
Exercise A machine produces 20% defective components. In a random sample of 6 components,
determine the probability that
i there will be 4 defective items,
ii there will be not more than 3 defective items, and
iii all the items will be non-defective.
Exercise A manufacturer of integrated circuits knows from experience that 5% of the components he
produces are defective. The quality control procedure is to reject an assembly-line run if two or more
components out of a random sample of five components are defective. Use the binomial distribution to
determine the probability of rejecting an assembly-line run?
Exercise A manufacturer of electronic components employs the following quality control plan. Out of
a batch of components, 15 are randomly selected and tested. If 3 or more of the 15 components are
found defective, the entire batch is rejected. Find, using the binomial distribution and correct to three
decimal places, the probability that a batch will be rejected if 8% of its components are defective.
Exercise A company produces re-writable compact disks and sells them in boxes of 10. The production
process gives rise to 2% of the CD’s packed and sold being faulty. Using the binomial distribution
determine the probability that
i exactly one CD is faulty in a box
ii more than one CD is faulty in a box.
In a delivery to the college of 1000 boxes of 10 re-writable CD’s from this company, in how many
boxes would you expect more than one faulty disk in a box?
44
CHAPTER 5. BASIC PROBABILITY AND STATISTICS
44
Contents
1 Basic Probability
1
1.1
Basic Probability – Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.3
Combinatorics
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.4
Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
1.5
Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
1.6
Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2 Some Basic Statistics
26
3 Probability Distributions
35
3.1
The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.2
The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41