The concept of a random variable - Institute for Computing and

Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Lecture 8: The concept of a random variable
(Examples from “Chance Encounters: A First Course in Data Analysis
and Inference by C. J. Wild et al.”)
Lejla Batina
Institute for Computing and Information Sciences – Digital Security
Radboud University Nijmegen
Version: autumn 2013
Lejla Batina
Version: autumn 2013
Wiskunde 1
1 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Outline
Main concepts about probability
Discrete Random Variable
Probability Distributions
Lejla Batina
Version: autumn 2013
Wiskunde 1
2 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Recap
• A sample space, S, for a random experiment is the set of all
possible outcomes of the experiment.
• An event is a set of outcomes.
• The following events are often used, for given events A and B:
• Unions A ∪ B,
• Intersections A ∩ B,
• The complement of A, denoted A (occurs if A does not occur)
• Mutually exclusive events cannot occur at the same time.
• A partition is a way of splitting up a sample space into
separate parts. Events C1 , C2 , · · · , Ck form a partition
of the
Sk
sample set S if they are mutually exclusive and i=1 Ci = S.
Lejla Batina
Version: autumn 2013
Wiskunde 1
3 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Conditional probability and independency
• The conditional probability of A occurring given that B
occurs is: P(A|B) =
P(A∩B)
P(B) .
• Events A and B are independent if knowing whether B has
occurred gives no information about the chances of A
occurring, i.e. P(A|B) = P(A). In this case it follows
P(A ∩ B) = P(A) · P(B).
Lejla Batina
Version: autumn 2013
Wiskunde 1
4 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Summary of useful concepts and formulas
1
P(S) = 1, P(S) = P(∅) = 0.
2
P(A) = 1 − P(A).
3
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
4
If A and B are mutually exclusive P(A ∩ B) = 0.
• P(A) = P(A ∩ B) + P(A ∩ B) = P(B)P(A|B) + P(B)P(A|B).
• If C1 , . . . , Ck is a partition:
Pk
Pk
5
P(A) =
6
i=1
P(A ∩ Ci ) =
i=1
P(Ci )P(A|Ci ).
Multiplication formula:
P(A ∩ B) = P(A)P(B|A) = P(B)P(A|B).
P(A ∩ B ∩ C ) = P(A ∩ B)P(C |A ∩ B) =
= P(A)P(B|A)P(C |A ∩ B).
P(A1 ∩ A2 . . . An ) = P(A1 )P(A2 |A1 ) . . . P(An |A1 ∩ . . . ∩ An ).
Lejla Batina
Version: autumn 2013
Wiskunde 1
5 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Bayes Theorem
P(A|B)P(B)
• P(B|A) = P(A∩B)
=
P(A) =
P(A)
P(A|B)P(B)
P(A∩B)+P(A∩B)
• If C1 , . . . , Ck is a partition of S:
P(Ci |A) =
Lejla Batina
P(Ci )P(A|Ci )
P(A)
=
Version: autumn 2013
P(Ci )P(A|Ci )
Pk
.
j=1 P(Cj )P(A|Cj )
Wiskunde 1
6 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Example
The data in Table comes from a telephone poll of 800 adult Americans
carried out in 1993. The question asked was: “Should smoking be
banned from workplaces, should there be special smoking areas, or should
there be no restrictions?”
Banned Special areas No restrictions
Total
Non-smokers 0.3350
0.3975
0.0238
0.7563
Smokers
0.0200
0.1963
0.0274
0.2437
Total
0.3550
0.5938
0.0512
1.0000
What is the probability of a person favors banning given that the person
smokes or not (when person is chosen at random)?
)
0.3350
P(banned|non − smoker ) = P(banned∩non−smoker
= 0.7563
= 0.4429.
P(non−smoker )
P(banned|smoker ) = 0.0821.
)
0.3350
= 0.3550
= 0.9437.
P(non − smoker |banned) = P(banned∩non−smoker
P(banned)
Hence, 94% of people in the survey who favor banning smoking are
non-smokers.
Lejla Batina
Version: autumn 2013
Wiskunde 1
7 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Random variables
A random variable is a type of measurement taken on the outcome
of a random experiment.
It is a process of assigning a number to every outcome of an
experiment.
Example
A coin is tossed twice, then the sample space is
S = {HH, HT , TH, TT }, and X - number of heads. Then we get:
HH HT TH TT
X 2
1
1
0
Definition
Let S be a sample space and A is an event from S. A random
variable is a real function defined on S, f : S → R.
Lejla Batina
Version: autumn 2013
Wiskunde 1
8 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Definition of probability function
Definition
The probability function for a discrete random variable X gives
P(X = x) = pi for every values x that X can take and the
following also holds.
A sequence of numbers p1 , p2 , . . . is a probability distribution for a
discrete sample space S = s1 , s2 , . . . provided
• pi ≥ 0, ∀i,
•
P
pi = 1
A random variable that takes on a finite or a countably infinite
number of values is called a discrete random variable, otherwise we
have a non-discrete random variable.
Lejla Batina
Version: autumn 2013
Wiskunde 1
9 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Example: tossing a coin twice
Consider again tossing a coin twice.
x
P(x)
0
1/4
1
1/2
2
1/4
A biased coin, for which the probability of getting a “head” is p is
tossed twice. In this case, we get the following probability function:
x
P(x)
0
(1 − p)2
1
2p(1 − p)
2
p2
Often we will need to use a probability function to compute
probabilities of events that contain more than one value, e.g.
P(X ≥ a), P(X > b), P(X ≤ c) or P(a ≤ X ≤ b).
Lejla Batina
Version: autumn 2013
Wiskunde 1
10 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Geometric distribution
Example
Consider tossing a biased coin with P(H) = p until the first head
appears. Then S = {H, TH, TTH, TTTH, . . .}. Let X be the total
number of tosses executed. Then X = 1, 2, 3, . . .. We get the
following values for the probability function:
P(X = 1) = p,
P(X = 2) = (1 − p)p,
P(X = x) = (1 − p)x−1 p, for x = 1, 2, 3, . . . .
This is called the Geometric distribution and in this case we write
X ∼ Geometric(p). The formula for P forms a geometric series. It
holds: P(X ≥ x) = (1 − p)x−1 .
Lejla Batina
Version: autumn 2013
Wiskunde 1
11 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Example: modeling a real-life situation by “coin tossing”
Example
The chances of a successful pregnancy resulting from implanting a frozen
embryo are about 1 in 10. Suppose a couple who are desperate to have
children will continue to try this procedure they succeed. We can assume
that the process is just like tossing a biased coin until the first success.
Let X be the number of times the couple tries the procedure up to and
including the successful attempt. Then X has a Geometric distribution
with p = 0.1.
• The probability of success on the 4th try is
P(X = 4) = 0.93 0.1 = 0.0729.
• The probability of success before the 4th try is P(X < 4) = P(X ≤
3) = P(X = 1) + P(X = 2) + P(X = 3) = 0.271.
• The probability of success on the 2nd, 3rd or 4th attempt is
P(2 ≤ X ≤ 4) = P(X = 2) + P(X = 3) + P(X = 4) = 0.2439.
Lejla Batina
Version: autumn 2013
Wiskunde 1
12 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Hypergeometric distribution
Example
Consider a barrel or urn containing N balls of which M are black
and the rest N − M, are white. We take a simple random sample
(i.e. without replacement), of size n and measure X , the number
of black balls in the sample.
This distribution is called the Hypergeometric distribution of X
and in this case we write X ∼ Hypergeometric(N, M, n) and for
the probability function we get:
P(X = x) =
Lejla Batina
(Mx )(N−M
n−x )
(Nn )
Version: autumn 2013
Wiskunde 1
13 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Example: applications of the Hypergeometric distribution
Example
The two color urn model can be used to model any situation in
which we take a random sample from a finite population and count
the number of objects (or individuals) in the sample that have (or
not) the characteristic of interest.
Examples: people who do or don’t smoke, will or not vote for a
particular political party etc.
Here: N is the size of population, M is the number of individuals
with the characteristic of interest and X measures the number with
that characteristic in a sample of n.
Lejla Batina
Version: autumn 2013
Wiskunde 1
14 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Example
Example
Suppose a company has 20 cars, out of which exactly 7 cars do not
meet government standards and are therefore releasing excessive
pollution. Moreover, suppose that a traffic policeman randomly
inspects 5 cars. Find the probability that he does not find more
than 2 polluting cars.
Since N = 20, M = 7 and n = 5, we get:
P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) = 0.7932.
Lejla Batina
Version: autumn 2013
Wiskunde 1
15 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Example: the game of Lotto
Example
A player purchases a board and choose 6 different numbers between 1
and 40. On the night of the draw, a sampling machine draws six balls
(so-called winning numbers) at random without replacement from forty
balls labeled 1 to 40. The machine then chooses a 7th ball from the
remaining 34 giving the so-called bonus number, which is treated
specially. Prizes are awarded according to how many of the winning
numbers the player has picked. Say that the following scheme applies:
• Category 1: All 6 winning numbers.
• Category 2: 5 of the winning numbers plus the bonus number.
• Category 3: 5 of the winning numbers but not the bonus number.
• Category 4: 4 of the winning numbers.
Find the probabilities of the prizes from Category 1, 2 and 4.
Lejla Batina
Version: autumn 2013
Wiskunde 1
16 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Solution
Example
Let X the number of matches between the player’s numbers and
the winning numbers. Then
X ∼ Hypergeometric(N = 40, M = 6, n = 6).
34
(x6)(6−x
)
.
40
(6)
P(Category 1 − prize) = P(X = 6) = 2.605x10−7 .
P(Category 4 − prize) = P(X = 4) = 0.0022.
P(Category 2 − prize) = P(X = 5 ∩ bonus) =
1
= P(X = 5)P(bonus|X = 5) = P(X = 5)x 34
= 1.563x10−6 .
P(X = x) =
Lejla Batina
Version: autumn 2013
Wiskunde 1
17 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Binomial distribution
Suppose we have again a biased coin where P(X = H) = p. A
random experiment consists of making a fixed number, say n
tosses and let X measures the number of heads. Then,
X ∈ {0, 1, 2, . . . , n}, we write X ∼ Bin(n, p) and we call this
distribution the Binomial distribution.
X is said to be a Binomial
random variable with parameters n and
p if and P(X = x) = xn p x (1 − p)n−x .
Example
Find the probability
8of 82heads out of 10 times flipping a coin.
P(X = 8) = 10
8 0.5 0.5 = 0.04394.
Lejla Batina
Version: autumn 2013
Wiskunde 1
18 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Example: applications of the Binomial distribution
Example
The biased coin tossing model can be used in many situations
providing the following assumptions are valid.
• Each trial (“toss”) has only 2 outcomes.
• The probability of getting a success is the same (say p) for
each trial.
• The outcomes of the trials are mutually independent.
For example, we can even use it when rolling a die if we are only
interested in getting a six.
Lejla Batina
Version: autumn 2013
Wiskunde 1
19 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Relationship to the Hypergeometric distribution
Consider the urn again, but now we sample n balls with
replacement. In this case, “the biased coin model” applies. For
each trial, the probability of a success is constant at p = M
N.
Suppose X measures the number of black balls in a sample of size
n, then X ∼ Bin(n, p = M
N ).
In practice, we do not have replacements often, i.e. for people, but
we can still use this model if M and N − M are large compared to
the sample size n.
Take N = 1000, M = 200 and n = 5; then after 5 balls have drawn (out
200−x
200
of which x are black) the proportion of black balls is 1000−5
≈ 1000
.
Summary: If we take a sample of less than 10 % from a large population
in which a proportion p have a characteristic of interest, the distribution
of X i.e. the number in the sample with that characteristic, is
approximately Binomial(n, p), where n is the sample size.
Lejla Batina
Version: autumn 2013
Wiskunde 1
20 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Poisson distribution
Definition
A random variable X , where X ∈ {1, 2, . . . , ...} has a Poisson
distribution if:
x
P(X = x) = e −λ · λx! , where λ is a constant. In this case we write
X ∼ Poisson(λ).
Note that, P(X = 0) = e −λ . It can be shown that the probabilities
P(X = x) all sum to 1.
Lejla Batina
Version: autumn 2013
Wiskunde 1
21 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Example: applications of the Poisson distribution
Example
We consider a type of event occurring randomly through time, e.g.
earthquakes, errors in accounts, telephone calls in a given time
interval, arrivals at a queue, mistakes in calculations etc.
Let X be the number occurring in a unit interval of time. Then
under the following conditions, X can be shown to have Poisson(λ)
distribution.
• The event occurs at a constant average rate λ per unit time.
• Occurrences are independent of one another.
• More than one occurrence cannot happen at the same time.
Lejla Batina
Version: autumn 2013
Wiskunde 1
22 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Example
While checking the galley proofs of several chapters of a book, the authors found 1.6 printer’s errors per page on
average. We can assume the errors were occurring randomly according to a Poisson process. Let X be the number
of errors on a single page. Then X ∼ Poisson(λ = 1.6). Find the following probabilities:
1
The probability of finding no errors on any particular page, P(X = 0) = e −1.6 = 0.2019.
2
The probability of finding 2 errors on any particular page, P(X = 2) = 0.2584.
3
The probability of no more than 2 errors on a page,
P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) = 0.7833.
4
The probability of more than 4 errors on a page, P(X > 4) = 1 − P(X ≤ 4) = 0.0238.
5
The probability of getting a total of 5 errors on 3 consecutive pages,
Let Y be the number of errors in 3 pages. Then Y ∼ Poisson(λ = 4.8). P(Y = 5) = 0.1747.
6
What is the probability that in a block of 10 pages, exactly 3 pages have no errors?
Let W be the number
of pages wit no errors. Then, W ∼ Binomial(n = 10, p = 0.2019) and
P(W = 3) =
7
10
3
0.20193 0.7981 = 0.2037.
What is the probability that in 4 consecutive pages, there are no errors on the first and third pages, and
one error on each of the other two? Let Xi be the number of errors on the i-th page. Then we have:
P(X1 = 0 ∩ X2 = 1 ∩ X3 = 0 ∩ X4 = 1) = P(X1 = 0)P(X2 = 1)P(X3 = 0)P(X4 = 1) =
= 0.20192 x0.32302 = 0.0043.
Here, P(X = 1) = 0.3230.
Lejla Batina
Version: autumn 2013
Wiskunde 1
23 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Approximation with Binomial
Suppose X ∼ Bin(n = 1000, p = 0.06), the calculation is rather
complicated.
But in this case, it can be shown:
−λ x
n x
n−x
' e x!λ
x p (1 − p)
So, if p is small and n is large in a Binomial distribution such that
np = λ where λ is a constant, then we get:
k
limn→∞ P(X = k) = e −λ · λk! , where k = 0, 1, 2, . . ..
The rule of thumb: we can use Poisson distribution if p < 0.1 and
λ ≤ 5 or λ ≤ 10. Note: λ is the average number of events per
time unit.
Lejla Batina
Version: autumn 2013
Wiskunde 1
23 / 24
Main concepts about probability
Discrete Random Variable
Probability Distributions
Radboud University Nijmegen
Example
Example
When rolling 4 dice, the chance to have 4x6 is 614 . If we are rolling
1000 times then λ = 1000 · 614 ≈ 0.77. The probability to get 0, 1,
2 or 3 or more then 4 times 6 is computed respectively as:
• P(X = 0) = e −λ ≈ 0.46
• P(X = 1) = e −λ · λ ≈ 0.36
• P(X = 2) = e −λ · λ2 /2 ≈ 0.14
• P(X ≥ 3) = 1 − P(X < 3) ≈ 0.043
Lejla Batina
Version: autumn 2013
Wiskunde 1
24 / 24