MaMaEuSch
Management Mathematics for
European Schools
http://www.mathematik.unikl.de/˜ mamaeusch
An approach to Calculus of Probabilities through real
situations
Paula Lagares Barreiro∗
Federico Perea Rojas-Marcos∗
Justo Puerto Albandoz∗
MaMaEuSch†
Management Mathematics for European Schools
94342 - CP - 1 - 2001 - 1 - DE - COMENIUS - C21
∗
University of Seville
This project has been carried out with the partial support of the European Community in the framework of the Sokrates programme. The content does not necessarily reflect the position of the European
Community, nor does it involve any responsibility on the part of the European Community.
†
Contents
1 Random and probability
1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 The mus game . . . . . . . . . . . . . . . . . . . . . .
1.3 Random experiments . . . . . . . . . . . . . . . . . . .
1.4 Random events and sample space . . . . . . . . . . . .
1.4.1 Outcomes and random events . . . . . . . . . .
1.4.2 Consistent events and inconsistent events . . .
1.4.3 Certain event . . . . . . . . . . . . . . . . . . .
1.4.4 Impossible event . . . . . . . . . . . . . . . . .
1.4.5 The complement of an event . . . . . . . . . .
1.5 Operations on random events . . . . . . . . . . . . . .
1.5.1 Union of event: one event OR another . . . . .
1.5.2 Intersection of events: one event AND another
1.5.3 Difference of events . . . . . . . . . . . . . . . .
1.5.4 Properties of the operations with events . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
2
3
4
4
5
5
6
6
7
7
7
8
9
9
2 Probability
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Definition of probability from the relative frequencies: empirical probability
2.1.2 Laplace’s rule: theoretical probability . . . . . . . . . . . . . . . . . . . . .
2.2 Extractions with replacement and extractions without replacement. Tree diagrams
2.2.1 Extractions with replacement . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Extractions without replacement . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Axiomatic definition of probability . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Calculus of probabilities in more complex cases . . . . . . . . . . . . . . . . . . . .
2.4.1 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 Independence of random events . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.3 Total probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.4 Bayes’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Answer of the initial question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
10
10
10
13
14
14
15
16
17
17
18
20
22
23
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 One-Dimensional Probability Distributions
3.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 The example . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Introduction. Discrete Random Variable and Probability
3.4 Cumulative Probability Function . . . . . . . . . . . . .
3.5 The Mode . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 The expectation . . . . . . . . . . . . . . . . . . . . . .
3.7 The variance . . . . . . . . . . . . . . . . . . . . . . . .
3.8 Resume of the initial question . . . . . . . . . . . . . . .
4 An
4.1
4.2
4.3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
27
28
30
32
33
33
36
example of discrete random variable: the binomial distribution
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 The expectation . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2 The variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37
37
37
38
42
43
. . . . . . . .
. . . . . . . .
Distributions
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
5 Continuous distributions: normal distribution
45
5.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 The example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2
Chapter 1
Random and probability
Let us play mus. The cards are dealt and it is time to decide how much to bet. We have to take
into account that we do not play alone, but we compete with other participants. Nervously we are
going to watch our cards every time we receive one. Which cards will it be? Will they be better
than the ones of the others?
Before doing anything, we are going to show the objectives that will be covered in this manual
and give some rules for playing mus.
1.1
Objectives
• Understanding the concept of random experiments and distinguishing it from the deterministic
ones.
• Identifying random events after an experiment and telling the difference between event and
outcome.
• Finding some special events: the impossible event and the sure event.
• Operating with random events and interpreting the events resultants after carried out unions,
intersections and differences.
• Assigning probabilities to a simple random events in two different ways: from relative frequency and from Laplace rule for the case of equally likely outcomes.
• Understanding the notion of conditional probability an its uses.
• Understanding the independence of random events and the uses of the calculus of probabilities.
• Working with the rule of total probability and the Baye’s rule, their differences and its applicability in the calculus of probabilities.
3
1.2
The mus game
We are going to play a simple game of cards. In this game, two teams of two players each will
play. The winning team will not receive any money, they will only win a good time with their
friends. In order to play a good game, we need a deck of 40 cards, distributed in the following way:
• Eight aces.
• Four fours.
• Four fives.
• Four sixes.
• Four sevens.
• Four jacks.
• Four horses.
• Four kings.
Deal four cards to each player randomly from the deck. Then, you can do the following moves:
• If you have two equal cards, and the other two are different within them and different from
the first ones, you have a pair. For example, this hand is a pair of horses: (five, horse, horse,
ace).
• Having three cards equals and the fourth one different is a trio. A hand like this (six, king,
king, king) is a trio of kings.
• Two pairs within your four cards is a duplex. They can be different pairs or equal pairs. (ace,
king, king, ace) and (ace, ace, ace, ace) are two different duplex.
In this game, a duplex has more value than a trio and the trio is better than a pair. In the case
of having two pairs, two trios or two duplex, the winning hand will be the one that has the highest
cards in the pair, the trio or one of the pairs of the duplex.
The cards are classified from the lowest to the highest in the following order: ace, four, five, six
seven, jack, horse and king.
For example, a duplex of kings and aces will win over a duplex of jacks and horses, because the
highest pair of the first duplex (pair of kings) is higher than the highest pair of the second duplex
(pair of horses). A pair of jacks is better than a pair of sixes for the reasons above.
In the case of having the same duplex, a trio of the same cards or a pair of the same cards, the
winner will be the one who has the highest cards apart of the pair or apart of the trio, in each case.
If we have two hands with exactly the same four cards, the winner will be the one who first has
received his cards, i.e., the person sitting on the right side of to the dealer.
4
Suppose that four friends have passed a lot of time playing this game and they have noticed that
a pair of kings, a trio of kings or aces and any duplex, has been gotten in a similar number of times.
They are discussing about which of these moves are the most probably of them. What do you think
about this?
Do not answer this question, you can do it when you finish this chapter. Good luck.
1.3
Random experiments
Example 1.3.1 Imagine the following situation: the cards are dealt of the mus game. Will we
know which cards we have before see them?
As you see, we can not have any certainty of the cards that we receive until we see them. We
can get three kings and an ace or four jacks. Both possibilities and many more can occur in the
distribution of our cards. The fact of not having certainty of the result after the distribution is
called randomness.
In our case, we have an experiment: take four cards out of the pack. After the realization of
the experiment several results can be given, we say that it is a random experiment.
If we know the result of the experiment before doing it, we could say that it is a deterministic
experiment. For instance, if we drop a stone from our hand, we know that it will fall to the ground.
Here there is no possibility of having different results, only one: the stone will fall to the ground
So we can say that the biggest difference between a random experiment and a deterministic one,
is that the first one can give different results and we can not predict which one will happen, and in
the deterministic experiment we only have one possibility, and this one is that will occur.
Exercise 1.3.1 Describe two random experiments and two deterministic ones.
Definition 1.3.1 (Random experiment) A random experiment is any procedure or situation
that produces a definite outcome that may not be predictable in advance.
1.4
Random events and sample space
When we understand the idea of what a random experiment is, naturally we will ask ourselves
questions about what will occur.
It´s possible to see that after the card distribution we can obtain a large number of combinations
containing different cards. We can get hands like (ace, king, ace, jack) or (four, king, five, seven).
Each of these possible hands will be called random events.
That is to say, in the random experiment we earlier described (dealing four cards of the mus
deck), the hand (ace, seven, jack, six) is a random event.
The set of all random events of a random experiment is called Sample Space, and we denote it
by E. In our experiment, the sample space is the set consists in all the possible events.
Exercise 1.4.1 Let us imagine the following random experiment: taking a card randomly out of
the mus deck.
Describe the sample space of that experiment by the enumeration of all its random events.
5
Definition 1.4.1 (Sample space and event) The set of all the outcomes that we can obtain
after doing a random experiment is called sample space, and we denote it by E.
Any subset of the sample space is an event.
1.4.1
Outcomes and random events
We distinguish the different results we can obtain into two groups: outcomes and random events
Imagine that you watch the first card of your hand and it is an ace. Now, imagine the followings
results:
• the card watched is an ace,
• the card watched is lower than seven.
You can notice that between these two possible results there is a big difference. In the first one
we specify what kind of card it is, while in the second case we say that the card can be an ace, a
four, a five or a six.
That is to say, the second event described (the card is lower than seven) includes several random
events within it.
Then we can say that the first possible result described is an outcome while the second one is a
random event.
Definition 1.4.2 We say that a result of a random experiment is an outcome when it consists in
only one element of the sample space. In other cases we will call it a random event.
Example 1.4.1 Consider the following random experiment: take out of a Spanish deck a card
randomly. The elements of the sample space are: ”ace of gold”, ”two of gold”,...,”king of basto”
The result ”obtaining the king of gold” is an outcome. But if we choose the result ”obtaining a king”,
we have to say that it is a random event, because it is made up of four different outcomes: ”obtaining
the king of cup”, ”obtaining the king of gold”, ”obtaining the king of sword” and ”obtaining the
king of basto”.
1.4.2
Consistent events and inconsistent events
Let us get back to the dealing of cards. We are going to get four cards each. Let us imagine two
random events:
• A event = ”Two of the four cards are kings”,
• B event = ”Two of the four cards are aces”.
Is it possible to obtain the A and the B events at the same time? That is to say, is it possible to
obtain two kings and two aces in the same hand? The answer is yes. We can get a valuable duplex
of kings and aces.
As the A event and the B event can be obtained at the same time, we say that they are consistent
events. However it isn’t always is like this, there are a couple of events that can not occur at the
same time. As an example, imagine these two new events:
6
• Event C = ”Three out of our four cards are kings”,
• Event D = ”Two out of our four cards are aces”.
And we ask ourselves again, is it possible to get the C event and the D event at the same time?
That is to say, is it possible to obtain three kings and two aces in the same hand? In this case, it is
not, because if we have three kings and two aces in the same hand, this would mean that we have
at least five cards, and in this game four cards is the maximum. Therefore, this is impossible.
So, because the C event and the D event can not appear in the same random experiment, we
say that they are inconsistent events.
Exercise 1.4.2 Find a pair of consistent events and a pair of inconsistent events, they must be
different to the pairs given above, in the same random experiment.
Definition 1.4.3 If we have two random events of a random experiment, we will say that they are
consistent events if they can occur at the same time, and we will say that they are inconsistent
events if they can not occur at the same time.
1.4.3
Certain event
Let us imagine that we take our deck and we divide it into two parts. In one of the parts we have
the aces (eight cards) and in the other part all the other cards (thirty two cards). We choose the
part of aces and we take a card out of this group whit randomness. Can we assure anything? We
can. We can assure that the card chosen is an ace. So simple is the certain event, it is the event
that always occur.
Exercise 1.4.3 In the dealt of the mus’s cards, find a certain event in your hand, that is to say,
within your four cards.
Definition 1.4.4 A certain event is that random event of a random experiment that always occurs.
Is possible to say that certain event is the event made up of every outcome of the sample space.
1.4.4
Impossible event
However, in the same random experiment we can sure not to get a six, because there is no six in
the part of the deck that we have chosen. Then we say that the random event ”the card chosen is
a six” is an impossible event.
Exercise 1.4.4 In the dealt of mus, describe an impossible event in your hand, that is, in your
four cards.
Definition 1.4.5 We say that a random event is an impossible event when it never occurs.
7
1.4.5
The complement of an event
Suppose now that the deck has been divided into two parts or groups: in the first part we have
all the kings and all the aces (sixteen cards) and in the second part we have the rest of the cards
(twenty four cards). Then, we choose a card from the first group. Let us look at the following
events:
• A = ”The card chosen is an ace”,
• B = ”The card chosen is a king”.
As a special characteristic of these events, we can say that if the event A does not occur then
the event B will occur, and if the event B does not occur then the event A will occur. That is, one
of them always occurs. You can also notice that they are inconsistent events, that is to say, they
can not occur at the same time.
These two qualities have to be complied with a pair of random events in order to say that they
are complement events or they are the complement of the other.
Exercise 1.4.5 Getting a pair of complement events in the random experiment of taking a card
out of the complete mus deck with randomness.
Definition 1.4.6 Two events are said to be complement events if they are inconsistent (they can
not occur at the same time) and when one of them always occurs. That is to say, the complement
of any event made up of some outcomes consists in all other outcomes. If we denote by A a random
event then its complement is denoted by A or Ac .
1.5
Operations on random events
In the same form as we operate with numbers (sums, differences, multiplications,...), we can operate
with random events. But now the operations are different than the number operations, in this way
we will talk about unions of events, intersections of events or differences between events.
1.5.1
Union of event: one event OR another
Imagine that we deal the cards again in the mus game, that is, four cards to each player. So
that, we have the random experiment of taking four cards out of the deck and watching them. Let
us describe two possible events of this random experiment:
• A = ”Taking two kings out”,
• B = ”Taking an ace out”.
Suppose that after the dealt, our cards are: (ace, jack, seven, seven). After having knowledge
of our cards, has the event A occurred? It has not, because we do not have two kings in our cards.
And, has the event B occurred? It has, because we have obtained an ace. In this case, we will say
that the event A or B has occurred, and we will denote it A ∪ B.
So, if we have two random events and one of the events or maybe both occurs, then we will say
that the event A ∪ B has occurred.
8
Exercise 1.5.1 We have again the previous random experiment. We deal four cards. Imagine the
following events:
• A = ”We obtain three kings”,
• B = ”We obtain three aces”.
¿From this, describe two distributions in which the event A ∪ B has occurred.
Definition 1.5.1 Given the event A and the event B, we define the event A or B, and we denote
it as A ∪ B, as the event consisting in occurring at least one of them. Note that if both events are
given, the event A ∪ B is given too.
1.5.2
Intersection of events: one event AND another
Let us describe now two new random events after the realization of the random experiment
consisting in dealing four cards of the mus deck.
• A = ”Taking two aces out”,
• B = ”Taking a seven out”.
Imagine that after the distribution of the cards, we have this hand: (ace, king, seven, ace). After
observing it, has the event A occurred? Yes, because we have two aces within our cards. And,
has the event B occurred? It has too, because our third card is a seven. Since both events have
occurred, we say that in this distribution the event A and B has occurred, and we denote this event
as A ∩ B.
Exercise 1.5.2 Think about the following random events after taking the four cards out of the deck
• A = ”Taking two kings out”,
• B = ”Taking two aces out”.
Describe an event in which the event A ∩ B has occurred and another in which the event A ∩ B has
not occurred.
Definition 1.5.2 Given the event A and the event B, the event A AND B is defined, and we
denote it by A ∩ B, as the random event consisting in both events, event A and event B, happens
at once.
Notice that if the intersection between two random events is the impossible event, (∅), these
events are inconsistent, (remember the definition of inconsistent events given in the previous section). If the intersection between two random events is not the impossible event, then the events
are consistent.
9
1.5.3
Difference of events
Let us describe now two new events after the distributions of cards in the mus game:
• A = ”Taking three jacks out”,
• B = ”Taking a king out”.
After the dealt we have obtained the following cards: (jack, ace, jack, jack).
Has the event A occurred? Of course it has, because we have exactly three jacks within our four
cards. And, has the event B occurred? Now we have to say no, because in our cards there are no
kings. Then we say that the event A minus B has occurred, and we will denote it by A \ B.
So every time that an event has occurred and another has not, we will say that has happened
the event difference between the first one and the second one.
Exercise 1.5.3 We are in facing the previous experiment. Think about the following events:
• A = ”We take two aces out”,
• B = ”We take two jacks out”.
¿From this, propose a possible distribution in which the event A \ B occurs, and propose another in
which the event B \ A occurs.
Definition 1.5.3 Given the event A and the event B, the event A \ B is defined as the event
consisting in the occurrence of the event A and the non-occurrence of the event B.
1.5.4
Properties of the operations with events
The properties most utilized are the following. We have to take the event E in account which is
the certain event, the event ∅ is the impossible event. The event A, the event B and the event C
are any three events, subsets of the sample space and Ac is the complement of the event A.
Union:
A ∪ B = B ∪ A, A ∪ E = E, A ∪ ∅ = A, A ∪ Ac = E.
Intersection:
A ∩ B = B ∩ A,
A ∩ E = A,
A ∩ ∅ = ∅,
A ∩ Ac = ∅.
Difference:
A \ B = A ∩ Bc.
De Morgan’s laws:
(A ∪ B)c = Ac ∩ B c ,
(A ∩ B)c = Ac ∪ B c .
Other properties:
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C),
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).
10
Chapter 2
Probability
2.1
Introduction
In the previous chapter we said that when we realize a random experiment, we have no certainty
about the outcomes that we will obtain. In other words: there is uncertainty and we will ’measure’
this uncertainty into a number that we will associate with each random event and it will be called
probability.
For instance, if we take a card of the deck with randomness, we will not know how to predict
certainly which card it will be. But we know that there are more aces than sevens, it would be
more logical to think that there is a bigger chance of taking out an ace than a seven. So we will
say that, after the experiment, the random event ”the card is an ace” has more chances to occur
than the other event which is ”the card is a seven”.
In this chapter we will show several techniques to measure the frequency of occurring that the
different events have, that is to say, we will introduce several methods for assigning probabilities.
Exercise 2.1.1 Write down two events after the distribution of the mus cards that you think they
very different probabilities or possibilities of occurring and explain why.
2.1.1
Definition of probability from the relative frequencies: empirical
probability
In the previous sections we said that probability is a number that we assign to each event after
a random experiment, and with this number we want to show the frequency that the event has of
occurring.
A straightforward way to obtain the probability of a random event is from the table of relative frequencies of these experiment. This probability is called empirical probability because it is obtained
after realizing the experiment. So, if we have done the experiment n times and after observing the
results we see that in k of these times the event that we are studying has happened, we call it A,
11
we say that the probability for the event A to occur is
P (A) =
k
n,
and we will denote it P (A), that is:
k
.
n
Example 2.1.1 Suppose that we are taking cards out of the deck one by one and after seeing one
we replace it in the deck before taking the next one out. So we have obtained the following frequency
table:
Card
ace
four
five
six
seven
jack
horse
king
Relative frequency
38
200
17
200
21
200
24
200
21
200
23
200
18
200
38
200
¿From this table we could say that if we take a card out of the mus deck, we will obtain each card
with the following probabilities:
38
• P(”the card is an ace”)= 200
= 00 19,
17
= 00 085,
• P(”the card is a four”)= 200
21
• P(”the card is a five”)= 200
= 00 105,
24
= 00 12,
• P(”the card is a six”)= 200
21
• P(”the card is a seven”)= 200
= 00 105,
23
= 00 115,
• P(”the card is a jack”)= 200
18
• P(”the card is a horse”)= 200
= 00 09,
38
• P(”the card is a king”)= 200
= 00 19.
Imagine that we do 1000 extractions, then we obtain the following relative frequencies:
Card
ace
four
five
six
seven
jack
horse
king
Relative frequency
192
1000
111
1000
109
1000
85
200
87
1000
116
1000
91
1000
209
1000
12
¿From this table we could say that if we take a card out of the deck with randomness, we will obtain
the different cards with the new following probabilities:
192
• P(”the card is an ace”)= 1000
= 00 192,
111
• P(”the card is a four”)= 1000
= 00 111,
109
• P(”the card is a five”)= 1000
= 00 109,
85
• P(”the card is a six”)= 1000
= 00 085,
87
= 00 087,
• P(”the card is a seven”)= 1000
116
• P(”the card is a jack”)= 1000
= 00 116,
91
• P(”the card is a horse”)= 1000
= 00 091,
209
• P(”the card is a king”)= 1000
= 00 209.
After seeing these probabilities, and once that we understand that the probability is a number
that we assign to each random event to measure how likely it is, we can say that the event ”the card
drawn is a king” is more likely, that is to say, it has more chances to occur, than the event ”the
card drawn is a seven”, because of P(the card is a king) > P(the card is a seven).
Using the same reasoning, we will say the event ”the card is a jack” is more likely than the event
”the card is a horse”,...
Mark 2.1.1 This method to assign probabilities is based on the Law of Large Numbers for Relative
Frequencies, which comes to say that every event has a special number (called its probability) so
that if the random experiment is repeated a large number of times, then the relative frequency of
the event will be close to the probability of the event. The more times the random experiment is
repeated, the closer the relative frequency will tend to be to this special number.
That is to say, if we take 100 cards out in the previous example , the reliability of the probabilities
obtained would be less than the ones obtained taking 200 cards out and vice versa, if in the previous
example we take 10000 cards out instead of the 1000, the probabilities should be closer to the real
probabilities of these events than the ones obtained taking 1000 cards out.
In any case, the table gotten after the performance of the 1000 extractions is more reliable than
the obtained after 200 extractions.
Exercise 2.1.2 Perform the following experiment: carry out twenty distributions of the mus game,
that is to say, four cards of the deck without replacement, and write if in each distribution has been
obtained: a pair, a trio, a duplex or none of them.
Make the table of relative frequencies of these experiment and after assign probabilities to these
random events (pair, trio, duplex and none). Do you think that these probabilities are reliable?
Why?
In the following section, we will assign probabilities of events in other way. We will not need
perform the experiment for knowing these probabilities so in some cases, this method will be easier
than the method explained before (relative frequencies).
13
2.1.2
Laplace’s rule: theoretical probability
As you could see, assigning probabilities from relative frequencies is too tedious , because it is
necessary to repeat the experiment many times to get a good approximation of the real probability
of an event, and even so, we will never be sure of getting the real probability using this method.
For this reason it is necessary to introduce an alternative method for the calculus of probabilities
which has to be more handy.
Let us imagine the previous example: we have a mus deck and we are going to take a card out
of the deck. We want to know the different probabilities of all the possible events.
Well, is logical to think that the deck is well made and we will take a card out with the same
probability as the others. That is to say, there are no cards bigger than another, there are no folded
cards,... in other words, the deck has not any faults and so we can take any of the forty cards out
of the deck with equal probability. In this case we say that they are equally likely. Another equally
likely outcome can be the number that we have after throwing a dice (with equal probability it will
be a one, two,..., six) or the fact of obtaining a head or a tail after throwing a coin (with equal
probability it will be head or tail), always supposing that the dice and the coin have no faults.
Return to our example with the mus deck. We have forty cards, all of them with equal weight,
equal form,... There are eight aces within them, so is logical to think that after forty extractions,
in eight of them we will receive an ace. This is something theoretical. You can see that you will
not always obtain eight aces after forty extractions (you could obtain three aces or twelve aces,
depending of the randomness). But the fact of having eight aces within the deck gives us an idea
of how likely is to obtain an ace after an extraction.
8
. It is a theoretical probability,
So we will say that the probability of an ace was taken out is 40
we have to repeat it, in practice we do not always obtain eight aces after forty extractions. In this
experiment, as there are forty cards in total, we will say that in this experiment there are forty
possible outcomes (we can take forty different cards out of the deck) and eight outcomes in the
event that we analyze (taking an ace out), because it has eight chances of occurring.
Now, after these concepts have been introduced, we can enunciate the Laplace rule for the
calculus of probabilities:
Definition 2.1.1 If all outcomes of a random experiment are equally likely, and we are studying a
random event of this random experiment called A, it holds:
P (A) =
Number of outcomes in A
,
Total number of outcomes
where the ”Number of outcomes in A” are the real possibilities for the occurrence of the event
A.
This definition was the first formal definition given in the history, and it was given by Pierre
Simon de Laplace in the beginning of the nineteenth century.
After this, you can correctly do the following exercise:
Exercise 2.1.3 Calculate, using the Laplace rule, the probabilities of receiving each card in the
experiment consisting in taking a card out of the mus desk with randomness.
14
2.2
Extractions with replacement and extractions without
replacement. Tree diagrams
In this section we are going to propose some new and more complex random experiments, instead
of only taking one card out, we are going to take several cards out. After the study of this section
we will be able to analyze better than before all the different moves in the mus game.
2.2.1
Extractions with replacement
Let us start with a simple situation, we take two cards out of the mus deck, one after the other
and replacing the card in the deck after looking at it. This process will be called extraction with
replacement.
If we call A1 the event ”The first card is a king” and we call A2 the event ”The second card is a
jack”, we could ask ourselves what the probability of these two events occurring at the same time
is, that is to say, the first card taken is a king and the second card taken is a jack.
To calculate the probability of this event, we sketch the following diagram, it is called a tree
diagram:
The probability of the first card being a king and the second card being a jack is the multiplication of the probabilities of the path which leads to the result (the product rule), that is to say,
8
4
1
40 · 40 = 50 .
However, if the only thing we want is for both cards to be a king and a jack, and the order of
occurrence is without importance, we have to propose that we can receive (king, jack) in this order,
or the same cards in the inverse order (jack, king).
If we call B1 the event ”The first card is a jack” and B2 the event ”The second card is a king”,
for obtaining the combination (jack, king) it is necessary the event B1 ∩ B2 occurs.
For the same reasoning as above, using a similar tree diagram as before, we get that the proba4
8
1
bility of receiving (jack, king) is 40
· 40
= 50
.
So, for obtaining the probability of receiving a jack and a king and without taking the order in
account we add up the probabilities from before ”(king, jack) + (jack, king)” (adding up rule), and
15
we obtain
1
50
+
1
50
=
1
25 .
Exercise 2.2.1 In the random experiment before calculate the probability of both cards being aces,
sketching the respective diagram tree.
In the following section we will see another kind of extraction in which the cards are not replaced
after they have been looked at.
2.2.2
Extractions without replacement
In the random experiments performed above, we put the card back after looking at it but, what
would happen if we do not put the card back into the deck? Well, the things changes but the
reasoning is similar, the only things that change are the second’s probabilities, that is to say, the
probabilities relating to the second extraction. It is logical, because if we before the first extraction
have forty cards, just before the second extraction we have thirty nine, because we do not put the
first card that we took back.
Then, in the experiment above, if we want that the first card to be a king and the second one
to be a jack the diagram tree is the same, but the probabilities of the second extraction change.
Let us calculate again the probabilities of taking a king and a jack out in this order, but this time
with different extractions, without putting back the cards after looking at them. These extractions
are called extractions without replacement. The diagram tree of this experiment is quite similar to
the diagram tree before, let us see it:
8
4
In this case we have that the probability of taking (king, jack) out is 40
· 39
.
4
Notice that in this case the second factor is 39 because when we do not put back the first card
that we have taken we only have thirty nine cards, and within them there are four jacks.
In a similar way we can calculate the probability of receiving the combination (jack, king) after
8
4
· 39
.
two extractions without replacement, and it would be 40
And again, if we want to calculate the probability of receiving a jack and a king without taking
the order in account, after two extractions, we will only have to apply the add up rule, and we will
8
4
8
4
8
4
· 39
+ 40
· 39
= 2 · 40
· 39
.
get that the probability of this is 40
16
Exercise 2.2.2 Do the same as in the exercise of the section before but supposing that we do
extractions without replacement.
2.3
Axiomatic definition of probability
Now, in this section, we are going to introduce a definition more abstract of probability. We will
do it according to some principles that we will accept as evident (we will call them axioms). These
axioms are:
1. For each event A, its probability is a number between 0 and 1, that is to say,
0 ≤ P (A) ≤ 1.
2. P (E) = 1, where E is the certain event.
3. If A and B are two inconsistent events, it holds that
P (A ∪ B) = P (A) + P (B).
¿From these axioms, we can deduce a great number of properties that the probability has to
comply with:
1. If we denote by Ac the complement of the event A, it holds that
P (Ac ) = 1 − P (A).
Exercise 2.3.1 Prove this property from the axioms above.
2. If we have a set of events A1 , A2 , ..., An , they are inconsistent in pairs (Ai ∩ Aj = ∅, ∀i 6= j),
it holds that
n
n
[
X
P ( Ai ) =
P (Ai ).
i=1
i=1
As
Sn a particular case we can study the case when the set of events A1 , A2 , ..., An complies that
i=1 Ai = E too, where E is the certain event. In this case
Pn we say that the set of event
A1 , A2 , ..., An is a complete set of events, and it holds that i=1 P (Ai ) = 1.
3. If the sample space can be broken down into n outcomes or single events, E = {x1 , ..., xn },
then it holds that
P (x1 ) + P (x2 ) + ... + P (xn ) =
n
X
P (xi ) = 1.
i=1
As a particular case, if the probability in each single event or outcome is the same, that is,
P (xi ) = 1/n, and A is an event consisting of k outcomes, it holds that P (A) = k/n, which is
the Laplace rule again.
17
4. If A and B are any random events, it holds that
P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
This property can be used in a case of having three events, and we will get that
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩ C) + P (A ∩ B ∩ C).
Example 2.3.1 Suppose that we carry out the following experiment: we take four cards out of the
mus deck with replacement. We take a note of whether the card taken is a face card (jack, horse
or king) or if it is not after every draw and in the end we take a note of the number of face cards
taken out.
Consider as the sample space the number of face cards that we have, (0,1,2,3 ó 4).
a) Describe the outcomes and calculate theirs probabilities.
The outcomes are
Ai = ”we have i face cards”, i = 0, 1, 2, 3, 4.
As there are sixteen face cards in total we know that, applying the Laplace rule, that the probability
2
of taking a card out of the mus deck with randomness is 16
40 = 5 and the probability of the card not
24
3
being a face card is 40 = 5 . From this and making calculations as the calculations that we made
before in this section, we can see that the probabilities of these events are, respectively:
P (A0 ) =
3
4
4
2 2
3
2
3
2 3
2
3
2
3
, P (A1 ) = 4· ·
, P (A2 ) = 6·
·
, P (A3 ) = 4·
· , P (A4 ) =
.
5
5 5
5
5
5
5
5
b) If B = ”We have taken a face card out of the deck”, calculate P (B).
4
We have that B = Ac0 , so, P (B) = P (Ac0 ) = 1 − P (A0 ) = 1 − 53 = 00 1296.
c) If C = ”We have taken three or more face cards out of the deck”, calculate P (C).
C = A3 ∪ A4 , and also A3 ∩ A4 = ∅. Then we have that P (C) = P (A3 ∪ A4 ) = P (A3 ) + P (A4 ) =
3
4
4 · 25 · 35 + 25 = 00 1792.
2.4
2.4.1
Calculus of probabilities in more complex cases
Conditional probability
We deal the cards again. They are dealt one by one and we will receive our card in the forth
turn, that is to say, the last one. The first player has received a king, the second player has received
and ace and the third player has received a jack. What is the probability that we receive an ace? If
7
, because three
we apply the Laplace rule we can say that the probability of receiving an ace is 37
cards has been dealt already and so thirty seven cards remain in the deck, and one of the aces was
given to the second player, so there are seven aces in the deck. Imagine now that no player has
received an ace, what would the probability be of receiving an ace in this case? In this case, there
8
would be eight aces in the deck so the probability would be 37
. But, if two of the players have an
6
ace? What is the probability of receiving an ace? In this case the probability would be 37
. As you
can see the value of the probabilities of receiving an ace are changing depending of the cards that
our rivals have. That is to say, the probability of an event can depend of the information that we
18
have before performing the experiment. In this case, the information before are the cards that our
rivals have received, that is, we know which cards will not be in the deck when our turn comes.
In these cases it is very simple to calculate the probabilities, but there are other cases in which
we will need a formula for calculating these probabilities.
Let us go back to the example explained before. If we call the event ”My card is an ace” A
and we call the event ”The three first players have received: a king, an ace and a jack respectively”
B, we want to calculate P (A), but knowing the cards that the other player have, that is, knowing
that the event B has occurred. We want to calculate the probability of the event A given the event
B, and we will denote it as A/B. For calculating this we can apply the formula of the conditional
probability which says:
P (A/B) =
P (A ∩ B)
,
P (B)
Therefore, we would have to calculate P (A ∩ B) and P (B). For calculating P (B) we apply
the concepts that we learned before about the extractions without replacement, and with a similar
reasoning we obtain that
8 8 4
·
· ,
P (B) =
40 39 38
while for A ∩ B to occur it is necessary for the four players to receive a king, an ace a jack and an
ace, that is to say, the probability of A ∩ B is
P (A ∩ B) =
8 8 4 7
·
·
· .
40 39 38 37
Therefore, we have to apply the formula of the conditional probability that
P (A/B) =
P (A ∩ B)
=
P (B)
8
40
·
8
40
8
39
·
·
8
39
4
38
·
·
4
38
7
37
=
7
,
37
as we knew before.
In this case, we could have solved the question without the conditional probability formula, but
in other cases its use is necessary.
Definition 2.4.1 We will denote the probability of the event A to occur given the event B as A/B,
which gives:
P (A ∩ B)
P (A/B) =
.
P (B)
2.4.2
Independence of random events
Let us go back to our explanations and let us think about the example that we saw in the section
about the tree diagrams. If you remember it, we had to do a lot of operations for calculating the
probabilities asked for, in spite of this, it was one of the simplest cases. Imagine that if instead of
having two possible outcomes after each extraction (king or not king in the first extraction and jack
or not jack in the second extraction) we have three outcomes, there would be nine possibilities in
total, if we have four possible outcomes after each extraction we would have sixteen possibilities after
19
the second extraction. In general, if we have n outcomes in each extraction, after two extraction
we will have to analyse n2 cases, that is a lot. And it is only if we talk about two extractions, if the
number of extractions is three, we would have n3 possible outcomes, if we have four there will be
n4 ,... The tree diagram technique is only usable in very simple cases, when the numbers are getting
bigger, the tree is almost impossible to sketch.
Is there another simpler way for calculating the probability of this kind of events? There is, but
we have to study a new concept before: the independence of random events.
Definition 2.4.2 Given a random experiment and any two events of this experiment, let us call
them A and B, we say that these two events are independent if for one of them to occur it does not
matter if the other one has occurred or not.
In other words, two events, A and B, are independent if the probability of A is equal to the conditional probability of A given B and the opposite, that is
P (A/B) = P (A)
and
P (B/A) = P (B).
In another words two events, A and B, are said to be independent if the probability of A and B is
equal to the product of the probability of A times the probability of B.
P (A ∩ B) = P (A) · P (B).
This result is of great use for the calculus of probabilities in the repetition of random experiments.
So, if we repeat n times a random experiment and we know that the result after any time is
independent of the previous results, and we want to calculate the probability of the occurrence of
the event Ai in each repetition ∀ i = 1, ..., n, we will get the probability for all of these events to
occur, this will be the event intersection between all of them A1 ∩ A2 ∩ ... ∩ An , is
P (A1 ∩ A2 ∩ ... ∩ An ) = P (A1 ) · P (A2 ) · ... · P (An ).
Example 2.4.1 If we take two cards of the mus deck randomly and with replacement, what will be
the probability of both cards being face cards? What is the probability of no card being face card?
What is the probability of one of them being a face card and the other card not?
If we call the event ”The first card is a face card” A and the event ”The second card is a face
card” B, we get that the event ”Both cards are face cards” is the event A ∩ B. Clearly they are
independent events, because we have done extractions with replacement so the conditions before each
extraction are the same. So, it holds that
P (A ∩ B) = P (A) · P (B) =
16 16
· .
40 40
The event ”None of the two cards are face cards” is represented according to the events A and
B in the following way: Ac ∩ B c . They are also random events. Therefore, the probability of this
event can be calculated like this:
P (Ac ∩ B c ) = P (Ac ) · P (B c ) = (1 − P (A)) · (1 − P (B)) =
24 24
· .
40 40
The event ”One card is a face card and the other card is not” is represented according to the
events A and B in the following way: Ac ∩ B and A ∩ B c , because it is possible that the first card
20
is a face card and the second one not and the opposite. So, the event that we want study is the
union of both events, (Ac ∩ B) ∪ (A ∩ B c ).As they are inconsistent events, (disjoints because of
(Ac ∩ B) ∩ (A ∩ B c ) ⊂ A ∩ Ac = ∅ ⇒ (Ac ∩ B) ∩ (A ∩ B c ) = ∅), the probability of this event can be
calculated in this way
P ((Ac ∩ B) ∪ (A ∩ B c )) = P (Ac ∩ B) + P (A ∩ B c ) = P (Ac ) · P (B) + P (A) · P (B c ) =
24 16 16 24
· + · .
40 40 40 40
You can see which of these three events is more likely to occur.
Exercise 2.4.1 Take the following experiment into account: take two cards out of our deck randomly and with replacement, that is to say, putting back the card to the deck after looking at it and
before taking the next card out the deck. Then add up the numeric values that they have in the mus
game. Answer the following questions:
a) Describe the sample space, the certain event and an impossible event of the experiment.
b) Calculate the probability of the sum of the cards value being twenty.
c) Calculate the probability of the sum of the cards value being six or less than six.
Exercise 2.4.2 Do the same as in the previous exercise but supposing that we take the cards randomly but without replacement, that is to say, without putting the card back to the deck.
2.4.3
Total probability
Imagine that we draw any two cards out of the deck without replacement. We look at the first
card and then at the second card. What is the probability that the second card was a king? With
the knowledge that we already have, we can easily say that if the first card was a king then the
7
.
probability that the second card was a king too is 39
However, if the first card is not a king, then we have the probability of the second card being a
8
king is 39
. As you can see, depending of which kind of card the first card taken was, we can assure
something about the second card. In the section about conditional probability, we started with an
advantage, we knew which card the first one was. But now we do not have this advantage. How
can we solve this problem? We can do it taking both possibilities into account, the first card can
be a king or other than a king. Let us see how we solve this problem:
Consider the following random events:
1. A1 = ”The first card is a king”.
2. A2 = ”The second card is a king”.
We want to calculate P (A2 ). Well, we will take if the event A1 happens or not into account. How
can we do it? We divide P (A2 ) into several probabilities which makes it easier to calculate. For
doing it we have to resort to the properties of the operations with random events.
If we consider A1 as the complement of the event A1 , clearly we get that
A1 ∪ A1 = E,
A1 ∩ A1 = ∅.
So, it holds that
21
A2 = A2 ∩ E =⇒ A2 = (A2 ∩ A1 ) ∪ (A2 ∩ A1 ).
Besides, as
(A2 ∩ A1 ) ∩ (A2 ∩ A1 ) = ∅,
we hold that
P (A2 ) = P (A2 ∩ A1 ) + P (A2 ∩ A1 ).
And applying the formula of the conditional probability, we obtain
P (A2 ∩ A1 ) = P (A2 /A1 ) · P (A1 ),
and
P (A2 ∩ A1 ) = P (A2 /A1 ) · P (A1 ).
And we can calculate these two probabilities easily, applying the Laplace rule and the techniques
seen for the extractions without replacement. So, we hold that
P (A2 /A1 ) · P (A1 ) =
7 8
· ,
39 40
and
8 32
·
= ... = 00 2.
39 40
As you see we have divided the probability into two different addends: in the first one we suppose
that the first card is a king, A1 , and in the other one we suppose that the first card is other than a
king, A1 .
The events A1 and A1 have two special characteristics
P (A2 /A1 ) · P (A1 ) =
A1 ∩ A1 = ∅ y A1 ∪ A1 = E.
This technique can be used as a general rule:
If we have a set of inconsistent events A1 , A2 , ..., An in pairs (Ai ∩ Aj = ∅, ∀i 6= j),
and holding that A1 ∪ A2 ∪ ... ∪ An = E, (if they comply with these two conditions, we say that
this set is a complete set of events), then the probability of an event S ⊂ E is equal to
P (S) = P (A1 ) · P (S/A1 ) + P (A2 ) · P (S/A2 ) + ... + P (An ) · P (S/An ),
and this formula is called formula of total probability.
The most difficult part when you apply the formula of total probability is to choose an appropriate complete set of events, because an inappropriate complete set of events only creates more
difficulties solving the problem. It will be necessary to study what the convenient events are, because
a bad choice of events of the complete set of events will not help us solving the problem.
Exercise 2.4.3 Find the probability, in the same experiment proposed at the beginning of this
section, that the second card was not a face card.
22
Exercise 2.4.4 Let us propose the following exercise: we deal three cards of the mus deck. Calculate, a previously chosen appropriate complete set of events and applying the formula of total
probability, the probability of the third card being an ace.
What is the probability that the third card was not an ace?
Suggestion: choose as a complete set of events the number of aces taken out as the two first cards.
2.4.4
Bayes’s rule
Let us go back to the previous situation proposed as example. We took two cards out without
replacement. A new question may be asked: what was the probability of the first card being a
king knowing that the second card was a king? This question can look like the question that we
asked in the previous section, but this one has a great difference: in this case we have performed
the experiment already (we have seen the second card) and we ask ourselves which card was the
first one. That is to say, if we call the events A1 and A2 just like before, we will calculate:
P (A1 /A2 ).
Applying the conditional probability formula, we obtain that
P (A1 /A2 ) =
P (A1 ∩ A2 )
.
P (A2 )
And developing the denominator according to the total probability formula and applying the
conditional probability formula in the numerator, we see that
P (A1 /A2 ) =
P (A2 /A1 ) · P (A1 )
.
P (A2 /A1 ) · P (A1 ) + P (A2 /A1 ) · P (A1 )
The calculus well be easy from the calculus that we did in the previous section, so it holds that
P (A1 /A2 ) =
7
39
·
1
5
8
40
=
7
.
39
In general, Bayes’s formula is obtained in the following way:
Given a Complete Set of Events A1 , A2 , ..., An and any event S, we want to calculate the
probability of the event Ai to occur knowing that after doing the experiment that the event S
occurred, that is to say, we will calculate P (Ai /S). For the reasoning above it holds that:
P (Ai /S) =
P (Ai ∩ S)
P (S/Ai ) · P (Ai )
= Pn
,
P (S)
i=i P (S/Ai ) · P (Ai )
where P (Ai ) is the probability a priori of the event Ai (it is know before performing the experiment)
and P (Ai /S) is its probability a posteriori, because it is calculated once the experiment is performed.
In the same way that we said in the section of the conditional probability, for using correctly
the Bayes’s rule is needed to choose an appropriate Complete Set of Events (CSE) which will be
the most difficult step for solving a problem.
23
Exercise 2.4.5 Calculate the probability of taking out a seven at your first draw knowing that the
second card is a jack, when we take two cards out of the mus deck without replacement.
Exercise 2.4.6 We take four cards out of the deck randomly. We know the fourth card is an ace.
What is the probability of the first card being an ace too? And, what is the probability that the first
card was a jack?
Remark: you have to choose two different CSE for answering each question.
2.5
Answer of the initial question
We said that four friends have been playing and they have received eight pairs of kings, six trios
of aces or kings and five duplex. They argued about with which of these hands were more likely
occur and they did not come to an agreement. So we are going to help them. This is what we call
the previous events:
• RR = ”Receiving a pair of kings”,
• M = ”Receiving a trio of aces or kings”,
• D = ”Receiving any duplex”.
We are going to calculate the probability of receiving a pair of kings, that is to say, P (RR). For
receiving this kind of pair we have to have two kings (obviously) and the other two cards can not
be kings (no-one of them) and they can not be of the same kind neither, because if so we would
have a duplex of kings and the other two cards.
We will have to distinguish two cases, in one of them we have and ace and in the other case we
do not have any aces, because the number of aces in the deck is different from the number of the
other kind of cards.
So, if we denote B to the event ”Taking an ace out” applying the Total Probability Rule we
hold that
P (RR) = P (RR/B) · P (B) + P (RR/B) · P (B) = P (RR ∩ B) + P (RR ∩ B).
If we denote by A an ace and as C and C 0 different cards than kings or aces also different within
them, it holds that a pair of kings could be:
(R, R, A, C),
if we have an ace or
(R, R, C, C 0 ),
if we do not have any aces, and the cards (R, R, A, C) with all its order variations represent the
event RR ∩ B, while the cards (R, R, C, C 0 ) with all its order variations represent the event RR ∩ B.
Let us start calculating the probability that in the dealt of cards (R, R, A, C) was given in this
order, then we will calculate how many possible variations there are and, as they all have equal
probability, we will only have to multiply the probability obtained time the number of variations.
24
8
Let us calculate P (R, R, A, C). The probability of the first card being a king is 40
, the probability
7
8
that the second card is a king too is 39 . The probability of the third card being an ace is 38
, while
the probability that the fourth card was a different card than a king or different card than an ace
is 24
37 . So we hold that
8 7 8 24
P (R, R, A, C) =
·
·
· .
40 39 38 37
Let us calculate the probability of one of the order variations of these cards, in order to see that
they are all just as likely. For instance,
P (A, R, C, R) =
8 8 24 7
8 7 8 24
·
·
·
=
·
·
·
= P (R, R, A, C).
40 39 38 37
40 39 38 37
We could do the same with all the order variations, but we think that it is clear enough that they
are all equally likely.
Once the probability of one of them is found, let us calculate the total number of possible
variations of these cards.
We have a variation of four cards in which two of them are repeated. Let us remember the
general formula of the variations with repetition of n elements where there are n1 elements of one
kind, n2 elements of other kind,... and nk of other kind is
Vnn1 ...nk =
n!
.
n1 ! · n2 ! · ... · nk !
So, in our case it holds that
P (R, R, A, C) =
4!
24
=
= 12.
2! · 1! · 1!
2
Therefore, we have that
P (RR ∩ B) = 12 ·
8 7 8 24
·
·
· .
40 39 38 37
For obtaining P (RR ∩ B) we are going to continue with a similar process. The first that we are
going to do is to calculate P (R, R, C, C 0 ). Applying as before the Laplace’s rule card by card, it
holds that
8 7 24 20
P (R, R, C, C 0 ) =
·
·
· .
40 39 38 37
Again all its variations are equally likely and once again, we hold that there are 12 possible
reorganization, seeing that they are variations with repetitions of 4 elements in which two of them
are repeated. Therefore there are 12 possible reorganization, they all are equally likely and so, the
probability of receiving a pair of kings and of not receiving any aces is
P (RR ∩ B) = 12 ·
8 7 24 20
·
·
· .
40 39 38 37
So that, applying the formula of total probability it holds that
P (RR) = P (RR ∩ B) + P (RR ∩ B) = 12 ·
8 7 8 24
8 7 24 20
·
·
·
+ 12 ·
·
·
·
= 00 255.
40 39 38 37
40 39 38 37
25
Now let us calculate P (M ). The trio can be made out of kings or aces. That is to say, the first
card can be a king or an ace. The second card will have to be a king if the first card was a king
and an ace if the first card was an ace. With the third card it is the same while the fourth card
can not be equal as the three cards before, because if so we would have a duplex and not a trio.
Therefore, the probability of receiving a trio of aces or kings is
8 7 6 32
8 7 6 32
16 7 6 32
·
·
·
+
·
·
·
=
·
·
· ,
40 39 38 37 40 39 38 37
40 39 38 37
and all of them are equally likely. Taken that the first addend is the probability of receiving a
trio of kings and the second one is the probability of receiving a trio of aces into account. They are
also inconsistent events, therefore theirs probabilities can be added to obtain the result.
For calculating the number of possible reorganization, we take again the formula of the variations
with repetition. In this case, we have four elements in with three of them are of the same kind and
4!
4
= 4 possible reorganization of the same cards.
one is unequal. That is, we have V3,1
= 3!
As they are equally likely, multiplying the probability of one of them times all its possible
reorganization, it holds that
P (M ) = 4 ·
16 7 6 32
·
·
·
= 00 039.
40 39 38 37
Finally we are going to calculate the probability of receiving any duplex in the dealt of mus.
For that we are going to divide the event ”getting any duplex” in the different duplex that we can
receive depending of if they have kings, aces or both.
So we can receive the following duplex:
7
8
7
8
· 39
· 38
· 37
. We have variations of four elements
1. Kings-Aces (R, R, A, A). P (R, R, A, A) = 40
4!
4
that they are equals in pairs, that is to say, V2,2 = 2!·2! = 6 possible reorganization. Therefore,
the probability of receiving a duplex of kings an aces is
6·
8 7 8 7
·
·
· .
40 39 38 37
2. A duplex of kings and another card, different than an ace and a king, is (R, R, C, C). P (R, R, C, C) =
8
7
24
3
40 · 39 · 38 · 37 .
4
In this case we have again V2,2
=
of duplex is
4!
2!·2!
= 6 variations, so the probability of receiving this kind
6·
8 7 24 3
·
·
· .
40 39 38 37
3. A duplex of aces and another card that is not a kind an ace is (A, A, C, C). P (A, A, C, C) =
8
7
24
3
4!
4
40 · 39 · 38 · 37 . We have again V2,2 = 2!·2! = 6 variations, so the probability of receiving this
kind of duplex is
8 7 24 3
6·
·
·
· .
40 39 38 37
26
8
7
6
5
4. Kings-Kings (R,R,R,R). We hold that P (R, R, R, R) = 40
· 39
· 38
· 37
and as all the cards are
equals, we only have one possible reorganization, so the probability of receiving a duplex like
this is
8 7 6 5
·
·
· .
40 39 38 37
8
7
6
5
5. Aces-Aces (A,A,A,A). We have that P (A, A, A, A) = 40
· 39
· 38
· 37
and again every cards are
equals so we only have one possible reorganization, so the probability of receiving a duplex of
aces-aces is
8 7 6 5
·
·
· .
40 39 38 37
6. The event ”receiving a duplex without no aces and kings and where the cards of the duplex are
24 3 20 3
· 39 · 38 · 37 .
different within them” can be represented as (C, C, C 0 , C 0 ), y P (C, C, C 0 , C 0 ) = 40
4!
4
As for the possible reorganization, we have again V2,2
= 2!·2!
= 6. Therefore, the probability
of receiving this duplex is
24 3 20 3
6·
·
·
· .
40 39 38 37
7. The last possible duplex is the one made of four equal cards, and where the cards are no or
aces, that is to say, a combination as (C, C, C, C).
3
2
1
The probability of occurring is 24
40 · 3 · 38 · 37 and since every card is equal, we only have one
possible organization. So the probability of receiving a duplex just like this is
24 3 2 1
· ·
· .
40 3 38 37
This duplex that we have enumerated, the different duplex, are all of them disjoint (inconsistent)
and their union gives ground for to the event D = ”taking any duplex out”. Therefore, P (D) will be
equal to the sum of the probabilities obtained for the previous duplex. After making calculations,
it holds that
P (D) = 00 035.
In short, it holds that
P (RR) = 00 255,
P (M ) = 00 039,
P (D) = 00 035,
that is, clearly the pair of kings is the event most likely of them three, while the events ”taking
a trio of kings or aces out” and ”taking any duplex out” are very similar, but with a bit more
probability it the first of them two will occur.
With these calculus, we have been able to give answer to the question that these friends asked
themselves when they finished the game. As you see, although it is much more likely to receive a
pair of kings than a duplex, this proportion has not been shown in the reality, this is because of
when we say that a event is more likely than another one, we do not want to say that this event will
always occur more times than the less likely, but theoretically it has more probabilities of occurring
than the other ones. This is the biggest difference between theory and practice in the calculus of
probabilities.
27
Chapter 3
One-Dimensional Probability
Distributions
3.1
Objectives
• Knowing the concept of probability distribution and its motivations.
• Calculating the Density Probability Function (DP F ) and the Cumulative Probability Function (CP F ) of a Probability Distribution.
• Understanding the concept of mode, mean or expectation and variance of a Discrete Probability Distribution and being able to calculate them from the DP F or from the CP F .
3.2
The example
A stock-exchange investor has 1.000.000 euros to invest in the stock exchange. He is considering
two possibilities:
• putting all the money in a bank making sure him of receiving a 16% profit.
• staking to make an investment plan.
The market research made by a stock exchange analyst says that the investment plan above
gives the followings profits with the own probabilities:
28
Profit (%)
30
25
20
15
10
5
0
Probability
0’15
0’2
0’25
0’15
0’1
0’1
0’05
The investor has to make the decision about what to do with the money. Two questions can be
asked:
The investor trust a lot the concept of expected profit (the money that he would win on average
if he would invest many times), what will we decide taking the expected profit of the investment
plan into account, and in order to get a better profit?
Another option is: the investor does not want to risk a lot and he says he wants to receive at
least 16% of profit with 70% of probabilities. What should he do in this case?
We will solve this problem using the Probability Distributions, concept that will previously be
introduced.
3.3
Introduction. Discrete Random Variable and Probability Distributions
We have previously studied how to assign probabilities to different random events, but all of them
in a particular way. In this chapter we will see general techniques that we will after particularize
in each case.
As you can see for yourself, in the example they have given us a table with the probabilities of
receiving the different profits after the investment plan. With the help of this table we are going
to decide what to do with the money. Before anything we have to remember that the conclusions
that we are going to obtain are theoretical. It is possible that we decide something and then the
randomness, because the questions asked in the example depend on the randomness, makes us
wrong and demolishes our calculus. Even so, we know that the probability will give us a rough idea
of how the profit will be.
Let us start to analyze the problem.
Let us define the concept of random variable.
In our problem we have proposed a random experiment, the profit that we receive after carrying
out the investment plan. After the random experiment we can obtain several different profits in
each case. Well, these possible profits will be the results of our random experiment.
So that,if we denote
X = ”profit received after having carried out the investment plan”,
29
we say that X is a discrete random variable and the possible values that this random variable can
take are the outcomes.
In this case the random variable X can take seven outcomes or values (all the different profits)
that we denote by the letter x and a sub-index in each case. That is to say, we have that the
Discrete Random Variable X can take the values x1 , x2 , x3 , x4 , x5 , x6 , x7 where
1. x1 = 0 =⇒ P (X = x1 ) = 00 05,
2. x2 = 5 =⇒ P (X = x2 ) = 00 1,
3. x3 = 10 =⇒ P (X = x3 ) = 00 1,
4. x4 = 15 =⇒ P (X = x2 ) = 00 15,
5. x5 = 20 =⇒ P (X = x5 ) = 00 25,
6. x6 = 25 =⇒ P (X = x6 ) = 00 2,
7. x7 = 30 =⇒ P (X = x7 ) = 00 15.
The set of all the values that a Discrete Random Variable can take and its own probabilities is called
Discrete Distribution of Probability. In our case we have the following Probability Distribution.
P (X = x1 ) = 00 05,
P (X = x2 ) = 00 1,
P (X = x5 ) = 00 2,
P (X = x3 ) = 00 1,
P (X = x6 ) = 00 25,
P (X = x4 ) = 00 15,
P (X = x7 ) = 00 15,
Logically, the sum of all these probabilities has to be 1. This comes from the fact that if a
Discrete Random Variable takes its values in the set {x1 , ..., xn }, that is to say,
P (X ∈ {x1 , ..., xn }) = 1,
because in any case the random variable will take one of these values and this is precisely that
we defined as a certain event, and as they are disjoint events (inconsistent events), that is, the
random variable X can not take two different values at the same time, it holds that
P (X = x1 ) + ... + P (X = xn ) = P (X = x1 ∪ ... ∪ X = xn ) = P (X ∈ {x1 , ..., xn }) = 1.
So, we will define a discrete random variable and its probability distribution as follows:
Definition 3.3.1 A random variable is a numerical outcome (in the abstract) of a random experiment.
We say that X is a Discrete Random Variable if it can take its values inside a set x1 , x2 , ..., xn , ...
and if we can count it (finite or infinity), with own probabilities
P∞ P (X = x1 ), P (X = x2 ), ..., P (X =
xn ), ... and coming true that 0 ≤ P (X = xn ) ≤ 1, ∀n and i=1 P (X = xi ) = 1.
The probabilities given before form a Probability Distribution for the variable X, that is, the
Probability Distribution of the random variable X is the function that assigns to each its possible
values the probability which this value is taken.
30
In our study about Discrete Random Variables, we will look deeply into the variables that can only
take a value inside a finite set, for example the situation that we proposed in the beginning of the
chapter, seven possibilities only.
As a resume of this section let us remember what random variable we have in the problem and
which is its Probability Distribution.
The random variable is X = ”The profit received after carrying out the investment plan” and
its probability distribution is
P (X = x1 ) = 00 05,
P (X = x2 ) = 00 1,
P (X = x5 ) = 00 2,
P (X = x3 ) = 00 1,
P (X = x6 ) = 00 25,
P (X = x4 ) = 00 15,
P (X = x7 ) = 00 15,
where x1 = 0, x2 = 5, x3 = 10, x4 = 15, x5 = 20, x6 = 25 and x7 = 30.
3.4
Cumulative Probability Function
As you may have realized, we have before associated the events x1 , ..., x7 with the different profits
in a special way, in an increasing order, that is to say, we associate x1 with the smallest profit, the
0, we associate x2 with the 5,... and we associate x7 with the biggest profit, the 30.
This has not been done just like randomly or without reason but because we will use it to
define a new function associated with the random variable X, and this function will characterize X
univocally, the Cumulative Probability Function that we will define later.
Before this we are going to answer the second question of the problem. The investor wants that
the probability of receiving at least 16% profit was higher than 00 7 in the investment plan. If it is
not just like this, he will put all his/her money in the bank that offers him a certain profit of 16%.
That is to say, in terms of our random variable, he wants that P (X > 16) ≥ 00 7. How do we
calculate this? Very easy, we take the possible values that the discrete random variable X can take
and we add up their probabilities. In our problem we have that
P (X > 16) = P (X = 20) + P (X = 25) + P (X = 30)
= P (X = x5 ) + P (X = x6 ) + P (X = x7 ) = 00 25 + 00 2 + 00 15 = 00 6.
So, in our problem of investments, we have that the probability of receiving a profit above 16%
is only 00 6, against the 00 7 that the investor wanted to assure. So we would have to advise him/her
not to carry out the investment plan and he/she has to put all the money in the bank where he/she
will receive the 16% of profit.
We are going to define this function that previously we told and which will characterize our
Random Variable univocally. With that function we will also be able to answer the question that
we previously proposed. That function is the Cumulative Probability Function (CPF). Let us define
it:
31
Definition 3.4.1 If we have the possible values that a discrete random variable X can take ordered
in a increasing way, x1 , ..., xn , we define the function
F : R −→ [0, 1],
as follows
F (x) = P (X ≤ x),
∀x ∈ R.
For a possible value of X, xi , we get that P (X ≤ xi ) = P (X = x1 ) + P (X = x2 ) +...+ P (X = xi ),
so that for any x of R, we get that P (X ≤ x) will be the sum of the probabilities of every xi less
than x or equal as x, that is to say,
X
P (X ≤ x) =
P (X = xi ).
xi ≤x
So that, in the previous example about the investment plan we hold that F is:
• If x < 0,
F (x) = 0, because there is not xi < 0.
• If x ∈ [0, 5),
• If x ∈ [5, 10),
• If x ∈ [10, 15),
F (x) = P (X = x1 ) = 00 05.
F (x) = P (X = x1 ) + P (X = x2 ) = 00 05 + 00 1 = 00 15.
F (x) = P (X = x1 ) + P (X = x2 ) + P (X = x3 ) = 00 05 + 00 1 + 00 1 = 00 25.
• If x ∈ [15, 20), F (x) = P (X = x1 ) + P (X = x2 ) + P (X = x3 ) + P (X = x4 ) =
00 05 + 00 1 + 00 1 + 00 15 = 00 4.
• If x ∈ [20, 25), F (x) = P (X = x1 ) + P (X = x2 ) + P (X = x3 ) + P (X = x4 ) + P (X =
x5 ) = 00 05 + 00 1 + 00 1 + 00 15 + 00 2 = 00 6.
• If x ∈ [25, 30), F (x) = P (X = x1 ) + P (X = x2 ) + P (X = x3 ) + P (X = x4 ) + P (X =
x5 ) + P (X = x6 ) = 00 05 + 00 1 + 00 1 + 00 15 + 00 2 + 00 25 = 00 85.
• If x ≥ 30, F (x) = P (X = x1 ) + P (X = x2 ) + P (X = x3 ) + P (X = x4 ) + P (X =
x5 ) + P (X = x6 ) + P (X = x7 ) = 00 05 + 00 1 + 00 1 + 00 15 + 00 2 + 00 25 + 00 15 = 1.
In short, we can express the function F as follows:
0
si x < 0,
00 05
if 0 ≤ x < 5,
if 5 ≤ x < 10,
00 15
0
0 25
if 10 ≤ x < 15,
F (x) =
if 15 ≤ x < 20,
00 4
if 20 ≤ x < 25,
00 65
if 25 ≤ x < 30,
00 85
1
if x ≥ 30.
As you can see, we have to define the cumulative probability function of a discrete random
variable defined in bits, and the points where the function is not continue are where the probability
is strictly positive.
32
The function F help us for calculating the probability of a random variable taking values less
than a given value or equal as it. So we can calculate the probability asked in the problem using
this function as follows and using the properties of the probability of the complement of an event:
P (X > 16) = 1 − P (X ≤ 16) = 1 − F (16) = 1 − 00 4 = 00 6,
exactly as we calculated before.
Take notice that we have used the complement of the event X > 16 is the event X ≤ 16.
The cumulative probability function of a discrete random variable has some special characteristics, they are:
1. lim F (x) = 0.
x→0
2.
lim F (x) = 0.
x→+∞
3. F is a monotone increasing function for every point of R.
4. F is upper continuous R, that is to say,
lim F (x) − F (x0 ) = 0,
x→x0
∀x0 ∈ R.
Too, the points where the function is non left-continuous will be the points where the discrete
random variable associated to this function can take its values.
If you see the cumulative probability function that we have found, it holds every condition, also the
non continuous points are 0, 5, 10, 15, 20, 25 y 30, exactly the points where the discrete random
variable X can take its values. Logically it is a theoretical affirmation, because the profit of the
investment plan can be a value different than the values above. But these values can give us an
approximation of how the profit will be.
On the other hand, if we have a function what holds the conditions described above, we can say
that it is a cumulative probability function associated to that discrete random variable.
If on the contrary it does not verify any of the four conditions specified above, it will automatically be rejected as a cumulative probability function.
3.5
The Mode
As it occurs in the real life, the mode of a random variable is the value what more times is taken.
When it is said that a kind of clothes is in fashion we can say that there are a lot of people who
put this clothes on, that is to say, if we take a people randomly, this people will dress this kind of
clothes more likely than any other clothes. Similarly, if a random variable takes its values into the
set {x1 , x2 , ..., xn }, the mode will be that xi what gives the maximum of the cumulative probability
function.
In the example of the investor, the profit what has more probability of occurs after the investment
plan is to receive the 20% of profit. Then we say that this value of the random variable is the mode
of the distribution.
33
3.6
The expectation
Now we are going to answer the first question of the problem. The investor wanted to know the
expected value of the profits in the investment plan in order to decide after what to do with the
money. In the bank, the expected value of the profit is easy to calculate, because there is only
one possibility (winning the 16%) and this is the one that will occur. However in the investment
plan proposed, we do not have this security. So we speak about the expectation, expected value,
expected profit, mean,...
The expectation or mean is a measure that gives us an approximation of the average of what is
expected to occur in a random experiment after many repetitions. In a game, the expectation will
be the value of ”we hope” to win (or to loose) after a large number of bets.
It is a theoretical measure that tells us the average value of what we will get when we perform
the random experiment a big number of times.
If in a frequency distribution we defined the arithmetic mean as
x=
n
X
xi fi ,
i=1
where xi are the values that the random variable can take and fi are their own relative frequencies, in a probability distribution we substitute the relative frequency of each value for its
probability.
Definition 3.6.1 The mean or expectation of a random variable X that takes its values into a set
{x1 , ..., xn } with own probabilities {p1 , ..., pn }, (that is P (X = xi ) = pi , ∀i ∈ {1, 2, ..., n}), and we
will call it µ or EX, is calculated trough the expression
EX = µ =
n
X
xi pi .
i=n
So that in our problem the expected profit when we carry out the investment plan is:
µ=
7
X
xi P (X = xi )
i=1
= 0 · 00 05 + 5 · 00 1 + 10 · 00 1 + 15 · 00 15 + 20 · 00 25 + 25 · 00 2 + 30 · 00 15 = 180 25.
Therefore, the expected profit that we will obtain carrying out the investment plan is 180 25%,
higher than the 16% that the bank offers us.
Clearly the investor that trusts the expected value will risk the money and he/she will carry out
the investment plan in order to get a higher profit.
3.7
The variance
Let us suppose that a investor wants to know an interval in which the profits that he will get in
34
the investment plan will be with a higher probability. How can we calculate one of these intervals?
In this section we will give tools for being able to construct an interval with these properties, but we
have previously said that this interval will be very simple and for this reason it is not very reliable.
For obtaining more reliable intervals it will be necessary to study statistic inference. But this is a
different subject.
Sometimes the possible values of a discrete random variable are very separated and far away
from the mean. It is logical to think that if the values are all more or less close to the mean, then
a great part of the values of the random variable will be in a reduced setting of the expected value.
Thanks to this new measure, the variance, will be given an interval in which the profit that we will
obtain when we carry out this investment plan will have a certain security.
For measuring the concentration rate of the values of a random variable about its mean we will
use the variance. The variance is a measure that tells us how far away from the mean the values
are. So it will be logical to think that we will use expressions as (xi − µ)2 , because this number
shows us the distance that exists between the possible value xi and the mean of the distribution µ.
Then we will add up all these deviations and we will get a measure of the total deviation that the
values of the variable have.
We will have to take into account the fact that the variance we will have to give a value to each
distance proportional to the probability of the variable getting this value. For clarifying ourselves
a bit let us take a look at the definition of variance:
Definition 3.7.1 Given a discrete random variable that takes its values into the set x1 , ..., xn with
own probabilities p1 , ..., pn , the variance of the discrete random variable is defined X, and we will
denote it σ 2 , as
n
X
σ2 =
(xi − µ)2 pi .
i=1
As you can see, when we multiply each one of the squares for the probabilities pi we are giving
more value to the distance to the more likely values of being taken for the variable X in the formula
of the variance.
There is another way of calculating the variance of a probability distribution. Let us see how to
obtain this new formula from the definition of variance:
σ2 =
n
X
(xi − µ)2 pi .
i=1
for the previous definition , if we develop the square inside the summation notation, we hold that:
n
X
(x2i + µ2 − 2xi µ)pi =
1=1
n
X
x2i pi +
i=1
n
X
µ2 pi −
i=1
n
X
2xi µpi .
i=1
If we separate the summation notation in three different parts, we get that
n
X
i=1
x2i pi +
n
X
µpi −
i=1
n
X
i=1
35
2xi µpi .
Operating in the second summation and in the third summation, we can take the µ out, because
it is constant, and it holds that
n
X
x2i pi + µ2
i=1
n
X
pi − 2µ
i=1
n
X
xi pi .
i=1
Pn
As X is a random variablePand the pi are its we get that i=1 pi = 1 and, for the definition of
n
expectation, we know that i=1 xi pi = µ, for that, substituted in the expression above, we get that
n
X
x2i pi
2
+ µ − 2µµ =
i=1
n
X
x2i pi − µ2 ,
i=1
which is the easiest formula for doing calculations by hand.
So, we can say that
n
X
σ2 =
x2i pi − µ2 .
i=1
Between the most important properties of the variance, it is possible to emphasize that, since
its own definition, the variance is a positive measure, that is to say, always, σ 2 ≥ 0.
We are going to practice calculating the variance of the probability distribution that we will
obtain when we invest the money in the investment plan described.
Let us remember that our probability distribution was:
P (X = x1 ) = 00 05,
P (X = x2 ) = 00 1,
P (X = x5 ) = 00 2,
P (X = x3 ) = 00 1,
P (X = x6 ) = 00 25,
P (X = x4 ) = 00 15,
P (X = x7 ) = 00 15,
where x1 = 0, x2 = 5, x3 = 10, x4 = 15, x5 = 20, x6 = 25 and x7 = 30.
So that, because of the formula of the variance we hold that
σ2 =
7
X
x2i pi − µ2
i=1
= 0 · 00 05 + 25 · 00 1 + 100 · 00 1 + 225 · 00 15 + 400 · 00 25 + 625 · 00 2 + 900 · 00 15 − 3330 625 = 730 1775.
But this value will not be useful to us for calculating the interval that we described before,
because it is a measure that expresses the units of the mean or expectations of the square, so they
can not be subtracted. For solving this problem we define a new measure that will be the positive
square root of the variance and we will call it standard deviation.
√
σ = + σ2 .
In our example we hold that σ = 80 55.
So, we will say that with a certain probability, more or less depending of the case, the value
that the discrete random variable X takes will be in the interval (µ − σ, µ + σ).
In our example this interval is (90 7, 260 8). As you see it is a quite big interval and for our example
it is not very useful, because of we could have calculated better an interval roughly and without
making so many calculations.
36
We remember again that this interval is only a first step for arriving to the confidence intervals,
which will be much more complex and at the same time more reliable and exact.
It is also necessary to say that just like in the statistic variables, the variance of two random
variables can not be compared, because the values that both variables has taken does not have to
be expressed in the same units. A simple way to compare both√variables in this sense, is by the
coefficient variation, which is defined as CV = σµ , where σ = + σ 2 (the standard deviation) and
µ is the mean of the random variable. This measure can be compared with two different random
variables because it is non-dimensional.
3.8
Resume of the initial question
We will only bring up the results. After making the study we can say that:
• If we have to base the expected profit of the investment plan, we would in this case say that
it is better to put the money in the investment plan than to place in the bank.
• When they ask us to get a profit higher than 16% with a certainty of 70% in the investment
plan, we could not give it, so we advise the investor to put his money in the bank.
• As for the interval in which the profit will move with a certain security, we gave one of them
a very simple interval but that might give us some ideas for, in the future, making better and
more precise intervals.
37
Chapter 4
An example of discrete random
variable: the binomial distribution
4.1
Objectives
• Knowing the random experiments with only two possible outcomes: Bernoulli experiments.
• Calculating the density probability function, cumulative probability function, mean and variance of a Bernoulli random variable.
• Using the Binomial random variable and been able to calculate its density probability function,
cumulative probability function, mean and variance from Bernoulli random variable’s ones.
• Distinguishing random phenomenon which are ruled by a binomial random variable and modelling then by this theoretic model.
4.2
The example
The most of the people, has the right hand more developed than the left one in order to do the
activities that need a certain skill to do it, for example: to eat, to write,... They are the people
called right-handed. However there are a big quantity of people who use the left hand to make
works like the previously mentioned, they are the left-handed people. Like with the hands, the
left-handed people use with more frequency the left leg to do special actions, like to shoot at goal
in football.
In spite of many millions of people in the world are left-handed, they continue having problems
to use some gadget thought to be utilized for right-handed people, like the can openers, a pair of
scissors or some pens. But the left-handed students have another problem in their classrooms: a lot
38
of high schools use a special kind of tables an chairs which are difficult to use for the no right-handed
students.
So the director of your high school has thought about to buy some special tables and chairs, of
the above kind, to put in the class in order to get that all of the left-handed students have their
own special chair and table. Your director wants to know how many special chairs and tables will
he need, and so you must answer these questions:
1. How many chairs, at least, of each kind will be necessary to have in a class of 50 students
to the expected number of left-handed has an appropriate chair and the expected number of
right-handed has its appropriate chair?
2. How many chairs, at least, of left-handed we have to put in order to, with probability 00 9
there is no left-hand student without his own chair? That is to say that, in the 90% of cases
there is no left-handed student without his/her own chair.
3. What is the percentage of classrooms of 50 students in which there are at least 10 left-handed
students?
For the moment do not answer these questions, because you will be able to do it accurately in the
end of the chapter.
4.3
Introduction
We will suppose that the 10% of the population is left-handed.
For solving our problem we are going to develop the contents of this chapter bit by bit, step by
step. The fist question that we can ask us is: What is the probability of a student taken randomly
to be left-handed? Clearly we can answer that this probability is 00 1, because we are supposing
that the 10% of the population is left-handed.
If we propose the random experiment ”taking a student of a high school randomly and watching
if he is left-handed or not”, we can look deeply into this experiment, because this kind of experiment
are very well studied.
After perform this experiment there only are two possible results: being left-handed or not.
In general we call to this possible results Success (E) and Failure (F ). In our example we
consider success the fact of a student to be left-handed, and failure if he is right-handed. That is
to say:
• E = ”The student is left-handed”,
• F = ”The student is not left-handed”.
As a general rule, we will denote as p = P (E) and q = P (F ). Obviously it holds that p + q = 1, so
that sometimes q will be called 1 − p.
This kind of experiments with only two possible results, success and failure, are called Bernoulli
experiments and they are perfectly determined with the probability of success p. A Bernoulli
random experiment with P (E) = p, will be denoted Be(p).
Our experiment of choosing a student randomly and seeing if he is left-handed or not will be
a experiment Be(00 1). However, the study of this kind of experiments will not help us to answer
39
the question of our example, but the repetition of them do, that is, if we have a classroom of 50
students and we want to know how many of them are left-handed, we can repeat the Bernoulli
experiment 50 times. We are going to look the students one by one to see how many of them are
left-handed. At the end of the recount, we will see how many left-handed students there are. The
random variable that gives us the number of left-handed students will be called Binomial. After
performing this problem we will be able to answer the questions asked in the example.
Let us see in general what we understand as a binomial distribution. If we have a situation in
which:
1. n repetitions of the same experiment are always performed under the same conditions, and
in each of them there are only two possible results, success (it will be denoted by E) and
failure, (it will be denoted by F ). These two events are complements of each other also, that
is, P (E ∪F ) = 1 and P (E ∩F ) = 0 (one of them always occurs, and they can not both happen
at the same time).
2. The probability of success P (E) is the same in each test, let us call it q. That is, in each
repetition of the experiment it holds that:
P (E) = p ,
P (F ) = 1 − P (E) = 1 − p = q.
3. If we denote by X = ”number of successes in n repetitions”, it holds that X can take the
values 1, ..., n and X is ruled by a binomial distribution.
The probability distribution that holds these conditions will be called binomial distribution of
parameter p(= P (E)) with n repetitions, and we will denote it as B(n, p).
So that, if our problem is to analyze the number of left-handed students in a classroom of 50,
we have to consider a random variable like this:
X = ”number of left-handed students in a classroom of 50”,
as we can suppose is distributed following a binomial distribution of parameters p = 00 1 and n = 50,
that is to say, X ∼ B(50, 00 1).
Probability of k successes
In an intuitive way a question can arise: what is the probability that there are 2 left-handed
students in the classroom of 50? That is, for this event to occur is necessary to have 2 left-handed
students and 48 right-handed ones. In other words, we can say that the first student has to be lefthanded, the second one too, the third student has to be right-handed, the fourth right-handed,...,the
student number 50 right-handed. If we denote by Z a left-handed student and by D a right-handed
one, we can put this combination:
Z Z D D ...D D .
|
{z
}
48times
As the probability of a student to be left-handed is 00 1 and of being right-handed is 00 9 (and
they are the same in each try) we get that the probability of having that combination is:
00 1 00 1 00 9 00 9 ...00 9 00 9 = (00 1)2 (00 9)48 .
|
{z
}
48times
40
However, the order in which the students are can be different, that is, the left-handed students
can be number 30 and number 43, or 12 and 17,... So that, we will have to multiply the probability
obtained before with the total
of possible arrangements. How many arrangements are
number
50!
there? In total there are 50
=
2
2!(50−2)! = 1225. That number is special, it is called binomial
number. In general, if we want to order
a set of n elements in which there are k of one kind and
n − k of another, we can do it in nk different ways.
So,the probability of having two left-handed students in the classroom of 50, is the same as
saying that there are two successes in our experiment B(50, 00 1), is:
50
P (X = 2) =
· (00 1)2 (00 9)48 = 00 08.
2
As you can see, the probability is very small. In a way it is logical, because if you think in it
hard it is not very likely that there are exactly two left-handed students of 50.
Let us look in general at the probability of obtaining k successes after n repetitions of a Bernoulli
experiment of parameter p, that is, in a B(n, p).
If X follows a binomial distribution B(n, p), we get that the probability of obtaining k successes
after n repetitions is given for the expression
n k n−k
P (X = k) =
p q
, ∀ k = 0, 1, 2, ..., n,
k
where
n
n!
,
=
k!(n − k)!
k
p = P (E) ,
q = P (F ),
p + q = 1.
The reasoning for getting that expression is the same as we followed in the previous particular
event of obtaining two successes for our binomial variable X.
Let us take a look as particular cases which gives 0 successes and where it gives n successes. Let
us start calculating the probability of getting 0 successes, so n failures.
n 0 n−0
n! n
P (X = 0) =
p q
=
q = qn ,
0
0!n!
(remember 0! = 1), while the probability of obtaining n successes is:
n n n−n
n! n 0
P (X = n) =
p q
=
p q = pn .
n
n!0!
Density probability function
The density probability function will be given in the following way:
n k n−k
f (k) =
p q
, ∀ k = 0, 1, 2, ..., n and 0 in the rest,
k
which can be expressed as:
f (k) =
n
k
pk q n−k
if k ∈ {0, 1, 2, ..., n},
otherwise.
0
41
So that, in the example that we are studying, we hold that
50 0 k 0 50−k
if k ∈ {0, 1, 2, ..., 50},
k (0 1) (0 9)
f (k) =
0
otherwise.
Cumulative Probability Function
In the second question they ask us to calculate the necessary number of left-handed’s chairs
in order to secure that no left-handed will be lacking an own chair, that is to say, we look for
the first k ∈ Z such as P (X ≤ k) ≥ 00 9. How can we calculate that k? With the help of the
cumulative probability function of this random variable, which we are going to calculate next,
because P (X ≤ k) = F (k) where F is the CPF of the random variable X.
For a discrete random variable X, by the definition of CPF, it holds that F (x) = P (X ≤ x), so
that in the case X ∼ B(n, p) we get
x X
n k n−k
n 0 n
n
n x n−x
F (x) =
p q
=
p q +
pq n−1 + ... +
p q
, ∀x = 0, 1, 2, ...n.
k
0
1
x
k=0
As a particular event, we can see that
F (0) =
0 X
n
k=0
k
pk q n−k = f (0) = q n ,
and applying the Newton’s binomial formula, we can calculate
n X
n k n−k
n n
n
n n
n−1
p q
=
q +
pq
+ ... +
p = (p + q)n = 1n = 1.
F (n) =
k
0
1
n
k=0
A thing that is logical to think, because in n repetitions certainly there will be a number of successes
less or equal than n.
In our example, we will get that the Cumulative Probability Function is
x X
50
F (x) =
(00 1)k (00 9)50−k , ∀x ∈ {0, 1, ..., 50}.
k
k=0
That is to say, for answering the second question we look for the first k such that F (k) ≥ 00 9.
This was founded trying with different values of k and the calculation used to be very tedious.
We hold that
x X
50
F (k) =
(00 1)i (00 9)50−i .
i
i=0
We try first with k = 10:
F (10) =
10 X
50
i=0
i
(00 1)i (00 9)50−i = 00 99.
42
This probability is bigger than 00 9, let us try with a lower value, for example k = 7
F (7) =
7 X
50
i=0
i
(00 1)i (00 9)50−i = 00 88.
We are getting nearer, let us try the value k = 8:
F (8) =
8 X
50
i=0
i
(00 1)i (00 9)50−i = 00 94.
Since F (7) < 00 9, F (8) > 00 9 and F is monotone increasing, we get that the k looked for is
k = 8, the first one that F (k) ≥ 00 9.
So that, for making sure with a reliability of 90% that there are no left-handed without an own
chair in the classroom, we will have to put 8 chairs for left-handed.
How many chairs for right-handed will be necessary to put at least in order to, with a reliability
of 90%, that are no right-handed students without an own chair?
In this case, we will have to look for the first k in decreasing order such as P (50 − X ≤ k) ≥ 00 9,
because the number of right-handed students is represented by the random variable 50 − X. We
are going to clear out how we have to calculate k:
P (50 − X ≤ k) ≥ 00 9 ⇔ P (X ≥ 50 − k) ≥ 00 9 ⇔ 1 − P (X < 50 − k) ≥ 00 9
⇔ P (X < 50 − k) ≤ 00 1 ⇔ P (X ≤ 49 − k) ≤ 00 1 ⇔ F (49 − k) = 00 1.
Let us start to try:
• k = 48
F (49 − k) = F (1) = 00 03 ≤ 00 1,
• k = 47
F (49 − k) = F (2) = 00 11 00 1.
So that the looked for k is 47. Then so that with a probability 00 9 or bigger no right-handed is
without his own chair it will be necessary to put at least 47 right-handed chairs.
4.3.1
The expectation
Now we are going to answer the same question but based on the expected value, that is, we
will say how many chairs of each kind are necessary to have in order for the expected number of
left-handed and the expected number of right-handed to have an appropriate chair.
Let us remember that the expectation or mean of a random variable was the average that this
variable will have after a big number of repetitions.
As we saw previously, the expectation of a random variable X that takes its values into a set
{x1 , ..., xn } with own probabilities {p1 , ..., pn }, (that is to say, P (X = xi ) = pi , ∀i ∈ {1, 2, ..., n}),
and we will call it µ, is calculated through the expression
µ=
n
X
i=n
43
xi pi .
Therefore, if X ∼ B(n, p), we will get that its expectation is
n
n X
X
n i n−i
µ=
ip(X = i) =
i
pq ,
i
i=1
i=0
and making a mathematical calculations, it holds that
µ = np.
So that, the mean of our random variable X = ”number of left-handed students in the classroom”
is
E(X) = µ = np = 50 · 00 1 = 5.
According to the expected value of the random variable X we will have 5 left-handed students
and 45 right-handed students. Of course it will be very risky to put exactly 5 left-handed chairs
and 45 right-handed ones, because it will not be very likely for this exact event to occur. Really
the probability of this event to happen is
50
P (X = 5) =
(00 1)5 (00 9)45 = 00 18,
5
so that, although it is an event with a high probability if we compare it to the rest of the values
that the random variable X can take, it does not give us any guarantee. However we have to say
that the expectation can give us an approximation of what will happen, but it is always better to
give an interval in which the value of X has a certain security, and it is what we are going to do in
the next section.
4.3.2
The variance
Now we are going to give an interval in which the number of left-handed students will be with a
certain probability. This interval will be given in terms of the mean and the variance. Let us take
a look at the expression of the variance of a binomial distribution:
The formula of the variance of a general random variable is:
σ2 =
n
X
x2i pi − µ2 .
i=1
So that, if X ∼ B(n, p) it holds that
n
X
2 n
σ =
i
pi q n−i − µ2 ,
i
i=0
2
and solving that summation notation, we get
σ 2 = npq.
So in our example, we get that
σ 2 = 50 · 00 1 · 00 9 = 40 5.
44
Therefore
√
σ = + σ 2 = 20 12.
Then we can say that with a great probability the number of left-handed students will be within
the interval
(µ − σ, µ + σ) = (20 9, 70 1),
and as X is a discrete random variable, we can say with a certain security that X ∈ {3, 4, 5, 6, 7},
because these are the possible values which are within the interval. To be exact, this probability is:
7
X
P (X = i) = ... = 00 77,
i=3
that is, in 77% of the cases there will be 3, 4, 5, 6 or 7 left-handed students in the classroom.
For completing the chapter we are going to answer the fourth question, what is the percentage
of classrooms of 50 students in which there are 10 or more left-handed students?
We have to calculate
P (X ≥ 10) =
50
X
P (X = i) = ... = 00 025.
i=10
That is to say, we have that in 20 5% of the classrooms of 50 students there will be 10 or more
left-handed students.
Many more questions can arise for this model. All of them can be solved, but with the help of
a computer and the necessary software, because the calculations like P (X ≤ k) and P (X ≥ k) are
very difficult, for the large number of operations that we have to realize, for calculating by hand.
Also, when the number of repetition of the Bernoulli experiment n is getting bigger the calculations
are getting longer and longer, until they are almost impossible almost impossible.
For this we will give a technique in the next chapter for making these calculations easier, but
for understanding it well it will be necessary to use the Normal Distribution, which is what we are
going to study in the following chapter.
45
Chapter 5
Continuous distributions: normal
distribution
5.1
Objectives
• Understanding the concept of continuous random variables and their difference from discrete
random variables.
• Identifying a continuous density probability function and knowing how to calculate its continuous cumulative probability function associated.
• Knowing how to calculate probabilities of continuous distributions in intervals using the density probability function, graphically or with methods of calculus.
• Calculating probabilities using the cumulative probability function.
• Understanding the importance of the normal distribution, knowing its density probability
function and being able to interpret its parameters (µ and σ).
• Being able to fit data to a normal distribution and checking if the fit has been good or not.
• Standardizing a normal distribution with any mean µ and any standard deviation σ.
• Using the normal area table.
• Tipping a binomial distribution through a normal distribution.
5.2
The example
We know that the average height of a population is changing. So, in some high schools the chairs
46
and tables are getting too small for the tallest students, because they are built for shorter students.
It is known that the students who are less than 160 cm tall feel well in chairs and tables of the kind
A. The students who are between 160 and 180 cm tall are comfortable in chairs and tables of the
type B and a student who is more than 180 cm tall feels good in a table and chair of the type C. A
principal of a high school wants to know how many chairs and tables are needed in each classroom
in order for most part of the students to feel comfortable in their chairs and tables.
Given that it is a very studied phenomenon, it is known that the height of a population is ruled
by a random continuous variable very well known, the normal distribution, with mean equal to the
average of the population and standard deviation equal to the standard deviation of the sample.
The normal distribution more and best studied is the standardized normal distribution, those ones
that have mean 0 and standard deviation 1. We have tables to calculate the probabilities of this
distribution, so we can study this particular case.
We will introduce all the concepts mentioned at first making these questions about the example:
• Calculate the mean and the variance of the height in your class.
• How many numbers of tables and chairs will we need on average in each class?
• Do you think that our estimation is good? Why?
• In the left-handed problem studied for the binomial distribution, give answer to all the questions proposed using the normal approximation to the binomial.
5.3
Introduction
We are going to measure the height of the students of our class. If we have a machine for measuring
the height of a student with a precision of dm we will have data expressed in meters, such as
10 7, 10 7, 10 6, 10 5, 10 7, 10 4, 10 5, 10 8, ...
If we have a machine with a precision of cm we will have data as
10 71, 10 75, 10 66, 10 54, 10 69, 10 48, 10 55, ...
Imagine the precision that we could get with a machine that measures even the mm. We would
have heights expressed in meters of the next kind:
10 712, 10 748, 10 663, 10 541, 10 689, 10 484, 10 552, ...
If we could measure with a precision of 5 decimals we would do it, but in that case it would
be useless to ask if somebody is 10 56053 m, it would be more logical to ask if he/she is more than
1550 5 cm and less than 1560 5, and after we would round up and we would say that he/she is 156
cm or 10 56 m.
As you can see, depending of the precision of our measure, perhaps it does not have any sense
to fit these data to a discrete probability distribution, because although the set of possible values
would be finite, it is also true that this set would be very large, too large in order for the calculations
could be made easily.
As we studied before, a probability distribution is a mathematical model that helps us explain a
real phenomenon and to try to predict it in the future. In the section about discrete distributions,
we saw that they are associated with random variable that took values into a finite set. If on the
other hand these values could be any number in an interval of the real line R, we would say that it
is a continuous random variable.
47
In the example seen about the heights, we can say that it is a continuous random variable,
because, although really we will never get it to be, it will be easier to process the data in this way
than trying to do it as it was a discrete variable.
We are going to try to explain the concept of continuous random variable trough the following
example:
Suppose that after the data of the height of the every student in the high school are collected,
and put into a group of 10 cm, we obtain the following table of relative frequencies:
Interval of height
[140,150)
[150,160)
[160,170)
[170,180)
[180,190)
[190,200)
Relative frequency
0.05
0.2
0.4
0.2
0.13
0.02
And if we represent its histogram we obtain:
48
However, now consider that we divide the heights in groups of 5 cm, that is to say, we polish
more the division. Then we obtain the following table:
Interval of height
[140,145)
[145,150)
[150,155)
[155,160)
[160,165)
[165,170)
[170,175)
[175,180)
[180,185)
[185,190)
[190,195)
[195,200)
Relative frequency
0.01
0.04
0.06
0.14
0.25
0.15
0.11
0.09
0.08
0.05
0.01
0.01
¿From this table of relative frequencies, we obtain the following histogram:
49
As you can see, it is considerably different than the histogram that we obtained making groups
of 10 cm.
But now we can make it more precise and obtain the heights of the students expressed in cm.
In our example we would obtain this table:
50
Height
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
Height
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
192
194
195
196
198
Relative frequency
0.003
0.004
0.003
0.005
0.008
0.008
0.009
0.01
0.01
0.011
0.012
0.013
0.014
0.015
0.02
0.03
0.035
0.04
0.045
0.055
0.06
0.05
0.04
0.035
0.033
0.031
0.026
Relative frequency
0.025
0.025
0.024
0.022
0.02
0.019
0.019
0.019
0.018
0.017
0.017
0.02
0.015
0.01
0.02
0.015
0.015
0.011
0.009
0.01
0.005
0.005
0.003
0.002
0.003
0.005
0.002
With the following histogram If we go even furthermore we would obtain histograms with smaller
and smaller steps, until finishing in the idealization of continuity, that is, the graphic of a continuous
function would be described and from now on we will call it f . You have to realize that for getting
the continuity of the density probability function, but you would need a very big sample size, much
bigger than the number of students of your high school.
51
52
If we performance a large number of observations, we will get the graphic of a function f like
that:
If we do all the approximations necessary for getting that function we can calculate the probability of the height of a student taken randomly being, for instance, between 159 cm and 165 cm,
only calculating the area of the included surface between the function f and the axis OX between
the values of x 159 and 165.
In short, we will have that a random variable X is continuous if it holds that:
• It has a probability distribution associated that is continuous and it is defined through a
function f (x), and thanks to this function we can calculate probabilities like P (x1 ≤ X ≤ x2 ),
with x1 , x2 ∈ R, x1 ≤ x2 .
• This probability will be calculated measuring the area enclosed between the graphic of the
function f (x) and the abscissa axis, between the points x1 and x2 .
• In a continuous probability distribution, talking about probabilities in points makes no sense,
it only makes sense talking about probabilities in intervals. That is because if X is a continuous
random variable, then P (X = x) = 0, ∀x ∈ R (in the previous example it does not make
sense asking if somebody weighs 630 94738274658482736 kg).
Example 5.3.1 Let us consider the probability distribution associated with the function f (x) of the
previous figure:
53
1. Represent graphically the probabilities
P (60 ≤ X ≤ 72),
P (70 ≥ X),
P (X ≤ 60),
P (X = 81).
2. Give an numerical approximation of their probabilities using the previous table.
Exercise 5.3.1 Do the same for
P (50 ≤ X ≤ 60),
P (61 ≥ X),
P (X ≤ 83),
P (X = 76).
Density probability function
The function that permitted us to calculate the probabilities in the intervals, was denoted f and
it is called density probability function of the random variable X.
With the density probability function we can calculate all the other parameters of the distribution. This function is equivalent to the density probability function of the discrete random
variables.
The density probability function f of a random variable X is:
1. f (x) ≥ 0, ∀x ∈ R.
2. The total area contained between the graphic of the function f and the axis OX, is equal to
1, that is to say
P (−∞ ≤ X ≤ +∞) = 1.
This is totally logical. In terms of the integral calculus it can be written as
Z +∞
f (x)dx = 1.
−∞
In particular, if X only takes values into the segment delimited for a and b, it holds that
Z b
P (a ≤ X ≤ b) = S(a, b) =
f (x)dx = 1,
a
where S(a, b) denotes the area of the surface enclosed between the graphic of the curve f (x)
and the abscissas axis, between the points a and b.
3. The probability of the continuous random variable X taking its values into the interval delimited for x1 and x2 , that is to say P (x1 ≤ X ≤ x2 ), is the area contained between the
function f (x) and the OX axis in the interval (x1 , x2 ), that is to say, S(x1 , x2 ), for any
x1 ≤ x2 , x1 , x2 ∈ R. In other words,
Z x2
P (x1 ≤ X ≤ x2 ) = S(x1 , x2 ) =
f (x)dx.
x1
What is more, as a consequence of P (X = x) = 0 ∀x ∈ X, it holds that
P (x1 ≤ X ≤ x2 ) = P (x1 ≤ X < x2 ) = P (x1 < X ≤ x2 ) = P (x1 < X < x2 ), ∀x1 ≤ x2 , x1 , x2 ∈ R.
54
We also get the inverse properties, that is, if we have a function f holding that f (x) ≥ 0, such as the
R +∞
total area between its graphic and the OX axis is 1, −∞ f (x)dx = 1, we can define a continuous
random variable associated with f , only assigning to it as a density probability function in each
interval the area of the surface contained between f (x) and the abscissa axis in that interval.
Example 5.3.2 Let us see if the following function is a density probability function:
1
if 0 < x < 3,
3
f (x) =
0
otherwise.
f has to hold:
• f (x) ≥ 0, ∀x, it holds clearly.
R +∞
• −∞ f (x)dx = 1 ? Let us see it.
R +∞
R0
R3
R +∞
R +∞
R0
R3
f (x)dx = −∞ f (x)dx + 0 f (x)dx + 3 f (x)dx = −∞ 0dx + 0 31 dx + 3 0dx =
−∞
0
3
0 + [ x3 ]x=3
x=0 + 0 = 3 − 3 = 1. Let us see this second condition graphically.
The area of the colored surface is the value of the integral above. This area is equal to the
basis of the rectangle times the height of that rectangle, that is,
3·
1
= 1.
3
It holds both conditions, so we can associate it with a continuous random variable, let us call it X,
and the density probability function f .
We are going to use the density probability function (from now on we will denote it by dpf ) to
calculate some probabilities.
R +∞
R3
P (X ≥ 1) = 1 f (x)dx = 1 31 dx = [ x3 ]31 = 33 − 13 = 23 .
It is possible to calculate trough the area of the colored surface of
This area is equal to basis times height, that is to say,
1
2
= .
3
3
Exercise 5.3.2 Calculate the following probabilities, taking as variable X the random variable
associated with the dpf of the previous example:
2·
P (X ≥ 2), P (00 5 ≤ X < 2), P (2 < X < 4).
55
Cumulative Probability Function
The definition of a Cumulative Probability Function for a continuous random variable is the same
as we gave for the discrete random variables, that is to say
F (x) = P (X ≤ x), ∀x ∈ R.
This function measures the probability of the random variable taking a value less than or equal as
x. In the discrete case, we calculated it through a finite summation notation, in the continuous
case we can not do this, so we will do it through the area of the surface enclose or using the integral
calculus that, in the end is the same.
That is to say, we get that
Z
x
F (x) = S(a, x) =
f (x)dx,
a
where f (x) is the dpf of the variable X.
The Cumulative Probability Function (from now on we will call it CP F ) has uses for the calculus
of probabilities in intervals, so it holds that
P (x1 ≤ X ≤ x2 ) = F (x2 ) − F (x1 ).
A function F can be considered as a CP F if it holds that:
1. 0 ≤ F (x) ≤ 1, ∀x ∈ R.
2. If x ≤ a, F (x) = 0. If x ≥ b, F (x) = 1.
3. F (x) is monotone increasing, that is, if x1 ≤ x2 it holds that F (x1 ) ≤ F (x2 ).
We have also that if F is aR CP F of a continuous random variable X and f is its dpf , then
x
F 0 (x) = f (x), and therefore a f (t)dt = F (x).
Example 5.3.3 Let us start with the dpf shown as example in the previous section
1
if 0 < x < 3,
3
f (x) =
0
otherwise.
Rx
We know that a f (t)dt = F (x). Therefore, we get:
56
1. if x ≤ 0 ⇒ F (x) =
Rx
−∞
0dt = 0.
R0
Rx
2. if x ∈ [0, 3] ⇒ F (x) = −∞ 0dt + 0 13 dt = x3 .
Rx
R0
R3
3. if x ≥ 3 ⇒ F (x) = −∞ 0dt + 0 13 dt + 3 0dt = 1.
That is to say,
0
if x ≤ 0,
if 0 < x < 3,
1
if x ≥ 3.
Let us take a look at that, in fact, this function holds all the conditions necessary for being a
Cumulative Probability Function:
F (x) =
x
3
1. 0 ≤ F (x) ≤ 1, clearly it holds.
2. If x ≤ 0 ⇒ F (x) = 0.
If x ≥ 3 ⇒ F (x) = 1.
3. F (x) is monotone increasing. We have to proof that if x1 ≤ x2 ⇒ F (x1 ) ≤ F (x2 ). Let us
see the different cases that can be given:
(a)
(b)
(c)
(d)
(e)
(f )
x1
x1
x1
x1
x1
x1
< 0 and x2 < 0. F (x1 ) = 0 = F (x2 ). It holds.
< 0 and x2 ∈ [0, 3]. F (x1 ) = 0 ≤ x32 = F (x2 ). It gives.
< 0 and x2 ≥ 3. F (x1 ) = 0 ≤ 1 = F (x2 ). It has been.
∈ [0, 3] and x2 ∈ [0, 3]. F (x1 ) = x31 ≤ x32 = F (x2 ). It is true.
∈ [0, 3] and x2 ≥ 3. F (x1 ) = x31 ≤ 1 = F (x2 ). It holds.
≥ 3 and x2 ≥ 3. F (x1 ) = 1 ≤ 1 = F (x2 ). It is true.
Exercise 5.3.3 Consider the function F defined as follows:
if x ≤ 0,
0
x2
F (x) =
if
0 < x < 1,
2
1
if x ≥ 1.
Is it admissible as CP F ? If it is true, calculate the dpf associated with F and the following
probabilities:
P (X ≤ 00 5),
P (X ≥ 00 8),
P (00 2 ≤ X ≤ 00 5),
P (00 3 ≤ X ≤ 10 5),
where X is the continuous random variable associated with F .
Exercise 5.3.4 Prove if the following function is admissible for being CP F or not:
if x ≤ −1,
0
x2
F (x) =
if
− 1 < x < 2,
4
1
if x ≥ 2.
Exercise 5.3.5 Consider the following function:
x
if 2 < x < 3.
f (x) =
0
otherwise.
Can f be considered as a dpf ? Why?
57
An example: the normal distribution
Because of its theoretical and practical applicability, the most important continuous probability
distribution is, without any doubt, the normal distribution. This theoretical applicability stems
from the fact that in most of the situations, rare cases are not normal and almost every data are
grouped around a central value.
For instance, the normal situation is that an adult man is between 170 and 180 cm tall (non
real data), and it is very strange to find men that are more than 200 cm or less than 150 cm tall.
For that reason, and because it is a example very well studied, we will see that the height of a
population is distributed following a normal distribution.
The height is one of the many examples of phenomenon that are ruled through a normal distribution. Others are the weight, the intellectual quotient, the errors in the measures,...
All of them have a common characteristic: in a collection of data most of them are grouped
around the mean of those data.
In the example about the height of the students of the high school, we are going to suppose that
all those data are ruled through a normal distribution with mean equal to the mean of the data and
standard deviation equal to the standard deviation that we calculated in the relative frequencies
table of the heights.
Normal Density Probability Function: characteristics
The dpf f (x) of a normal random variable, with mean µ and variance σ, was given by Gauss. It
is a exponential function, and its analytical expression is:
f (x) =
2
1 (x−µ)
1
√
e− 2 σ2 ,
σ 2π
and its graphic is
Is f a dpf ? To see if f is a dpf , we have that f (x) ≥ 0 for each x, but we also have to give that the
R +∞
enclosed area between the graphic of f and the OX axis is 1, in other words that −∞ f (x)dx = 1,
an integral that we do not need to know how to do ant that Laplace demonstrated to actually be
the same as one.
58
We are going to answer the first question of the example. For doing that is not necessary to learn
anything new, because we know how to calculate the mean and the standard deviation of a set of
data. In our case, making the appropriate calculations we get that
X
µ=
xi fi = 1660 59,
and
σ2 =
X
x2i fi − µ2 = 1180 29.
We are going to suppose that the data of the heights of the students of the high school are ruled
by a normal distribution. The density probability function of this distribution has some special
characteristics, within them we have to say that the graphic of the dpf is bell-shaped, that is to
say:
• The graph has a vertical line of symmetry and that is where the maximum of the dpf and
the mode of the random correspondent variable are.
• The curve has only one maximum, for values of x less than this maximum the curve f (x) is
monotone non-increasing, and for values bigger than this maximum the curve is monotone
increasing.
• Theoretically, it can take every values of the real line (OX axis), but the curve has this OX axis
as a horizontal asymptote in both sides, and the curve approaches its horizontal asymptote
very quickly.
• When the variance of the distribution gets bigger, the curve that its dpf describes will get
”flatter”, and if the variance is getting smaller, the curve will be ”more pointed”. This is
resumed saying that if X is a random variable ruled through a normal distribution with mean
equal to µ and standard deviation equal to σ, and we will denote it as X ∼ N (µ, σ 2 ), where
σ 2 is the variance X, it holds that:
1. P (µ − σ < X < µ + σ) = 00 683,
2. P (µ − 2σ < X < µ + 2σ) = 00 954,
3. P (µ − 3σ < X < µ + 3σ) = 00 977.
These last equations give us an idea of how improbable it is to obtain values of a normal distribution
far away from the mean. Think that according to these equalities, if X is a random variable ruled
following a normal distribution with mean equal to 2 and variance 1, X ∼ N (2, 1), we will get that
990 7% of the observations of X, will be inside the interval (-1,5), but it is possible to take values
higher than 1.000.000!
We will adjust the data obtained for solving the problem to a normal with mean µ and standard
deviation σ the ones that we obtained before, that is to say, we have that the height of the students
of the high school is ruled by a N (1660 59, 100 88).
However this will not help the calculations easier for giving answer to the questions of the
problem without a computer, so we are going to use a special kind of normal distribution, the
standard normal distribution or N (0, 1), which is very well studied and we can resort to it always
when we can calculate something in other different normal distributions.
59
The N (0, 1)
The normal distribution with mean µ = 0 and standard deviation σ = 1, is known as standard
normal distribution, N (0, 1).
Its interest is in the fact of that the values of their probabilities are perfectly known and, as we
will see later, we can go to it to find probabilities in any other normal distribution.
If we have that Z ∼ N (0, 1), the probabilities P (Z ≤ z), that is F (z) where F is the CP F of Z,
are described in the following table, for all z ∈ [0, 3] and this is enough for calculating all of them,
because as we will see, the probability of the variable taking values higher than 3 is very small, and
it will be enough to know the values of F (z) for z ≥ 0, because of the symmetry of the dpf .
Let us suppose that Z ∼ N (0, 1). Let us see how to calculate P (Z ≤ z) from the data of
the table above, depending of the different values that z could take. We will use for doing it the
symmetry of the distribution in zero.
• If Z ≥ 0 ⇒ P (Z ≤ z), the probability is described in the table.
• If Z < 0 ⇒ P (Z ≤ z) = P (Z ≥ −z) = 1 − P (Z ≤ −z), which can be obtained from the table,
because −z ≥ 0.
• If z1 ≤ z2 , we get that P (z1 ≤ Z ≤ z2 ) = P (z ≤ z2 ) − P (z ≤ z1 ), which is a common property
for all the probability distributions.
Exercise 5.3.6 ¿From the properties above, deduce that:
1. P (−z ≤ Z ≤ z) = 2 · P (Z ≤ z) − 1.
2. P (−Z ≤ z ≤ 0) = P (0 ≤ Z ≤ z).
Where Z ∼ N (0, 1) again and z is a real number higher or equal as zero.
Example 5.3.4 Let us calculate from the table, being Z ∼ N (0, 1), the following probabilities:
1. P (Z ≤ 00 82) = 00 7939.
2. P (Z ≤ −10 2) = 1 − P (Z ≤ 10 2) = 1 − 00 8849.
Exercise 5.3.7 Calculate from the table, where Z ∼ N (0, 1), the following probabilities:
1. P (Z ≤ 00 96).
2. P (Z ≤ −20 18).
3. P (−20 76 ≤ Z ≤ −20 18).
4. P (00 45 ≤ Z ≤ 20 31).
60
61
Standardization
As we said before, there is a great quantity of random phenomenon that can be explained from
a normal distribution, but the problem is that most of them do not have mean 0 and standard
deviation 1. It might seem as what we have studied for the standard normal does not have practical
use. However, it is not true, because every normal distribution X, X ∼ N (µ, σ 2 ), can be related
to a Z ∼ N (0, 1) through the following change of variable:
Z=
X −µ
.
σ
This process is called standardization of the random variable X, and it allows us to use the table
that we presented before with the tabulation of the CP F of a N (0, 1).
This result is deduced as follows:
k−µ
k−µ
X −µ
≤
=P Z≤
.
P (X ≤ k) = P
σ
σ
σ
So, the value of P (X ≤ k) with X any normal distribution,
can be calculated from the table of
.
the standard normal distribution by the formula P Z ≤ k−µ
σ
For clarifying these concepts we are going to calculate the number of each different kind of tables
that we will have to buy for a high school where there are 1.000 students, from the data of the table
of the heights obtained before and supposing that they are ruled by N (1660 59, 100 88).
For calculating the number of tables and chairs of the kind A we have to see what the percentage
of students being less than 160 cm is. That is, if we denote as X = ”Height of a student of the high
school” we have to calculate
P (X ≤ 160).
As we suppose that X ∼ N (1660 59, 100 88) we get that
160 − 1660 59
P (X ≤ 160) = P Z ≤
= P (Z ≤ −00 61),
100 88
where Z ∼ N (0, 1). Looking in the table of the standard normal distribution, we get that P (Z ≤
−00 61) = 1 − P (Z ≤ 00 61) = 1 − 00 7291 = 00 2709.
So we get that 270 09% of the chairs have to be of the kind A.
For calculating the number of tables of the kind B we have to calculate P (160 < X < 180)
because the students who are between 160 and 180 cm feel comfortable in this kind of tables. If we
standardize the random variable X we get:
180 − 1660 59
160 − 1660 59
P (160 < X < 180) = P
<
Z
<
= P (−00 61 < Z < 10 23),
100 88
100 88
and we divide that probability in two different probabilities and it holds that:
P (−00 61 < Z < 10 23) = P (Z < 10 23) − P (Z ≤ −00 61) = 00 8907 − 00 2709 = 0.6198,
that is to say, we will have to buy 610 98% of the tables of the kind B.
62
And for knowing how many tables and chairs of the kind C we have to buy, we only have to
calculate
P (X > 180) = P (Z > 10 23) = 1 − P (Z < 10 23) = 00 1093,
that is to say, 100 93% of the tables and chairs will be of the kind C.
As a definitive answer, we recommend that the high school with 1.000 students buy 1000 · P (X ≤
160) = 271 chairs and tables of the kind A, 1000 · P (160 < X ≤ 180) = 620 of the kind B and
1000 · P (X ≥ 160) = 109 of the kind C.
As you can see, it is very easy to calculate probabilities in any normal distribution, only knowing
how to use the table of the standard normal and remembering the change of variable that we have
to do.
How to tip another distributions with a normal
In real life not every random variable is ruled by a normal distribution. So, before analyzing a
data, we have to check if these data are a perfect fit for a normal variable. They is called a normality
test. Although there are very exact tests, they are very complicated too, so we are only going to
study one of them, because of its simplicity and because the other ones are too complicated for our
level.
The normality test that we will study acts on the following way:
1. We take a sample of n elements out of a population and we measure the values of X, they
will be x1 , x2 , ..., xn .
2. The mean and the standard deviation are calculated from these data, x and s respectively.
3. We count how many of these values are into the intervals (x − s, x + s), (x − 2s, x + 2s) and
in (x − 3s, x + 3s).
4. If it holds that approximately
680 3% of the data are in the interval (x − s, x + s),
950 4% of the data are in the interval (x − 2s, x + 3s) and
990 7% of the data are in the interval (x − 3s, x + 3s),
then we can admit that the population of what we have obtained the data, is ruled by a
normal with mean µ = x and variance σ 2 = s2 .
In these levels, we will say that the fit is good if the percentages obtained are not more than 1%
more or less than the percentages given above.
We are going to see if the data of the height of the students of the high school are ruled by a
normal probability distribution as we supposed in the previous section.
We calculated the mean and of these data, it was x = 1660 59 and its standard deviation was
s = 100 88. Let us calculate the intervals asked in the normality test:
• (x − s, x + s) = (1550 71, 1770 47),
63
• (x − 2s, x + 2s) = (1440 83, 1880 35),
• (x − 3s, x + 3s) = (1330 95, 1990 23).
If we look at the relative frequencies table, we get
• P (X ∈ (x − s, x + s)) = P (X ∈ (1550 71, 1770 47)) = 00 691,
• P (X ∈ (x − 2s, x + 2s)) = P (X ∈ (1440 83, 1880 35)) = 00 965,
• P (X ∈ (x − 3s, x + 3s)) = P (X ∈ (1330 95, 1990 23)) = 1.
Therefore we get that 690 1% of the observations belong the interval (x − s, x + s) when theoretically
it had to be 680 3%. 960 5% of the data are into the interval (x − 2s, x + 2s) and it had to be 950 4%
and 100% is in (x − 3s, x + 3s) when theoretically it had to be 990 7%.
Although the fact of the percentages were not far away from 1% of the theoretical percentages
does not come true, we can say that the fit is good, because the sum of the errors in the three
intervals is only 5%, so we can suppose that the fit that we have done is correct. Moreover, the
random variable that gives the height of a population is a phenomenon very well known and studied,
and it is known that it is distributed by a normal distribution. However, if we wanted to do an
exhaustive study more rigorousness is necessary in in the calculations. As our idea is to show
the applicability of these tools, we will be satisfied if these tools are understood, and the absolute
rigorousness for covering the objectives is not necessary.
The Normal distribution as an approximation of the Binomial distribution
When we studied the binomial distribution, we had a big problem: the calculations were too
tedious. Calculating by hand probabilities like P (X ≤ k) when X ∼ B(n, p) was excessively
complicated to obtain by hand, and practically it was only possible do with the help of a computer.
So we are going to show an approximation of binomial distributions to normal distributions, because
it is easier to calculate their probabilities.
So, if X ∼ B(n, p) it holds that its mean is µ = np and its variance is σ 2 = npq. Then, we
can say that if neither p nor q are close to 0, the binomial distribution can be approximated by the
normal distribution N (np, npq), it will make the calculations easier as we are going to see next. So,
if we use that approximation and then we standardize it, we get that
k − np
P (X ≤ k) = P Z ≤ √
,
npq
where Z ∼ N (0, 1), so we can calculate it easily.
That approximation is good when np and nq are both higher than 5, and gets better when n
gets higher and p is gets closer to 12 .
We have to solve another problem in order for this approximation to be valid, that is while X
is a discrete random variable that takes integer values X = 0, 1, 2, ..., n, the normal distribution
is continuous. That makes that while the probability for a normal variable to take the value in a
point is zero, for X it is not the same.
64
For that we do the continuity correction for calculating probabilities like P (X ≤ k), and we
really calculate P (X ≤ k + 00 5), in order for the variable X to take the value k. For finding
P (X < k), we will calculate P (X ≤ k − 00 5) so that the point k does not fall within the interval.
This correction is indispensable for calculating probabilities as P (X = k), and we will have to
calculate P (k − 00 5 ≤ X ≤ k + 00 5).
Whit this technique we are going to give answer to the questions that we asked before in the
chapter about the binomial distribution.
The first question does not need this approximation for being easily answered, but the two last
ones do. So, we are going to give answer to these questions supposing that our random variable X
= ”number of left-handed in a classroom of 50” is ruled by a N (np, npq) = N (5, 40 5).
So for answering the second question, because of the fact that we have to calculate the first k
such as P (X ≤k + 00 5) ≥ 00
9 (for the continuity correction), in reality we look for the first integer
0
√ 5−5 ≥ 00 9 where Z ∼ N (0, 1).
k such that P Z ≤ k+0
0
4 5
0
0
k+0
5−5
5
P Z ≤ √40 5
= P Z ≤ k−4
0
2 12
The first x ∈ R which holds that P (Z ≤ x) ≥ 00 9 is x = 10 29. So we have to find the first integer
≥ 10 29. We get that
k − 40 5
≥ 10 29 ⇔ k ≥ 70 24.
20 12
That is to say, we get that the k looked for is k = 8, which is the same result as we obtained
in the chapter about binomial distribution, and this time we have not needed any computer for
calculating it, we have only used the table of the standard normal and a calculator at most.
For calculating the percentage of groups of 50 students in which there are at least 10 left handed
we have to calculate P (X ≥ 90 5) and not P (X ≥ 9), because of the continuity correction. That is,
we have to calculate
90 5 − 5
P Z≥
= P (Z ≥ 20 12) = 1 − P (Z ≤ 20 12) = 00 0174.
20 12
0
5
k/ k−4
20 12
So, according to this approximation of the binomial distribution, 10 75% of the classrooms of 50
students will have 10 or more left-handed, a number very near close 20 25% that we calculated
following the binomial model.
As you can see, the answers have been very similar. This is because the approximation has been
correct, because in this case np = 50 · 00 1 = 5 and nq = 50 · 00 9 = 45, both numbers higher or equals
to five, such as was suggested in this section for giving the approximation for good.
65
© Copyright 2025 Paperzz