Probabilities. Probability distributions

Trainer: Bondor Cosmina-Ioana, PhD
A ALWAYS
Random variables and probability
distributions
S SEEK
K KNOWLEDGE
Objectives
The meaning of the term Probability
Metodhs of sampling
Probability distribution
Exemples
Life sciences
• If you want to know about living beings than you need to use
statistics
• Why?
• Living beings are not the same
• There are variable characteristics
• Populations
Probability
• Basic definitions
– experiment
– trial
– Outcome, event
• Rules about how to combine the probabilities of events: AND,
OR, NON
• Events: mutually exclusive, imposible, complementary, certain,
independents
Probability
An experiment repeated many times,
Each repetition called a trial
One or more outcomes can result from each trial
the number of times that outcome occurs
The probability (outcome) =
the total number of trials
Example 1 – Ebola virus cases
• Ebola virus disease (EVD) – acute infectious disease with high
mortality risk.
•
Epidemia din 2014 CDC. 2014 Ebola Outbreak in West Africa - Case Counts. Nov.2015. Available at:
http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/case-counts.html
Country
Cases
Deaths
Guinea
Siera Leone
3805
14122
2536
3955
Liberia
10672
4808
Other states
36
15
Total
28635
11314
Deaths (%)
Death
probability
Survival
probability
Example 1 – Ebola virus cases
Country
Cases
Deaths
Survival
Siera Leone
14122
3955
=14122-3955
Death probability
Survival
probability
Example 1 – Ebola virus cases
Country
Cases
Deaths
Survival
Death probability
Siera Leone
14122
3955
10167
=3955/14122
Survival
probability
Example 1 – Ebola virus cases
Country
Cases
Deaths
Survival
Death
probability
Guinea
3805
2536
Siera Leone
14122
3955
66.6
10167
0.666
0.28
Liberia
10672
4808
Alte state
36
15
45.1
=15/36*100=
41.7
0.451
=15/36=
0.417
Total
28635
11314
39.5
0.395
Survival
probability
Example 1 – Ebola virus cases
Death  Survival
No. of deaths + No. of survival = Total
Deaths (%) + Survival (%) = 100 %
Example 1 – Ebola virus cases
Deaths (%) + Survival (%) = 100 %
Country
Cases
Deaths
Survival
Death
probability
Survival
probability
Guinea
3805
2536
Siera Leone
14122
3955
66.6
10167
0.666
0.28
=1-0.666
=1 - 0.28
Liberia
10672
4808
45.1
0.451
=1-0.4or51
Alte state
36
15
41.7
0.417
=10167/14122
Total
28635
11314
39.5
0.395
?
Example 1 - P(A) and P(nonA)
Deaths (%) + Survival (%) = 100 %
P(A) + P(nonA) = 1
Country
Cases
Deaths
Survivals
Death
probability
Survival
probability
Guinea
3805
2536
Siera Leone
14122
3955
66.6
10167
0.666
0.28
0.334
0.72
Liberia
10672
4808
45.1
0.451
0.549
Alte state
36
15
41.7
0.417
0.583
Total
28635
11314
39.5
0.395
0.605
Example 1 - P(A) and P(nonA)
Deaths (%) + Survival (%) = 100 %
P(A) + P(nonA) = 1
Country
Cases
Deaths
Survivals
Death
probability
Survival
probability
Guinea
3805
2536
?
?
?
Siera Leone
14122
3955
10167
0.28
0.72
Liberia
10672
4808
?
?
?
Alte state
36
15
?
?
?
Total
28635
11314
?
?
?
Example 2 – P(A and B) – Blood type by gender for
1000 people
Blood type
Frequency
P(Blood type)
Women
O
400
200
A
450
200
B
142
96
AB
Total
8
1000
4
500
P (Women)
P (Women and
Blood type)
Example 2 – P(A and B) – Blood type by gender
Blood type
Frequency
P(Blood type)
Women
O
A
400
450
=400/1000
=450/1000
200
200
B
142
=142/1000
96
AB
Total
8
1000
?
4
500
P(Women)
P (Women and
blood type)
P(A and B) = P(A)*P(B) – for independent events
Blood type
Frequency
P(Blood type) Women
O
400
0.40
200
A
450
0.45
200
B
AB
142
8
0.142
0.008
96
4
Total
1000
1.0
500
P (Women)
=200/450
P (Women and
Blood type)
P(A and B) = P(A)*P(B) – for independent events
Blood type
Frequency
P(Blood type)
Women
O
400
=400/1000
200
A
450
0.45
200
B
142
=142/1000
96
AB
Total
8
1000
?
4
500
P(Women)
0.44
P (Women and
blood type)
= 𝑃 𝑤𝑜𝑚𝑒𝑛
∗ 𝑃(𝑇𝑦𝑝𝑒 𝐴)
= 𝟎. 𝟒𝟓 ∗ 𝟎. 𝟒𝟒
Example 2 – P(A and B) – Blood type distribution
Blood type
Frequency
P(Blood type)
Women
P (Women)
O
A
400
450
?
0.45
200
200
?
0.44
P (Women and
Blood type)
?
0.20
B
142
?
96
?
?
AB
Total
8
1000
?
?
4
500
?
?
?
?
P(A or B) = P(A) + P(B) – P(A and B)
Blood Frequency
type
O
400
A
450
P(Blood
type)
0.40
0.45
B
142
0.142
AB
Total
8
1000
0.008
1.0
P(Type 0 or Type A)= ?
= P(Type 0) + P(Type A) – P(Type 0 and Type A) =
=
400
1000
+
450
1000
−0=
950
1000
= 0.95
Type 0 and Type A are mutually exclusive:
P(Type 0 and Type A) = 0
Summary
A, B two events:
P(A or B) = P(A) + P(B) - P(A and B)
A,B two independent events:
P(A and B) = P(A) * P(B)
A,B two dependent events:
P(A and B) = P(A dependent B) * P(B)
P(A and B) = P(A|B) * P(B)
Summary
• A,B mutually exclusive events
P(A and B)=0
• B the complementary event of A
P(complementary event of A) = 1 - P(A)
• A the certain event
P(certain event) = 1
• A the impossible event
P(impossible event) = 0
Bayes' theorem
Dependent events
Conditional probability—the probability of an outcome
depending on an earlier outcome.
𝑃(𝐴|𝐵) =
𝑃 𝐵 𝐴 ∗𝑃(𝐴)
.
𝑃(𝐵)
Envolved in making medical decision – reasoning process interpreting diagnostic procedures.
Why to study samples instead of whole population?
Researchers study samples instead of populations
•
•
•
•
More quickly
Less expansive
Less dangerous
More accurate conclusions
Clinical research
Generalizing results:
Group of patients  similar subjects
Sample  population
Example
100 patients
with dental
infections
Treatment
with
75 % succes
erythromycin
When we can do the generalization?
Generalization: The probability to treat dental infection with erythromycin is 75%
Experiment
Probability of a
child born being
male ≈0.5 (50%
of the cases)
3
boys
4
boys
Family
with 4
children
1
boys
2
boys
0
boys
In 100 family with 4 children?
In 100 families with 100 children?
Select 100 families with 4 children:
Number
of boys
No. of
family
0
1
2
3
4
Total
4
29
40
24
9
100
• Number of boys in a family – Variable
Probability distribution
We call frequency distribution of variable X the number of times
occur the possible values of the variable X
Number
0
of boys
No. of
4
family
Probability 0.04
1
2
3
4
Total
29
40
24
9
100
0.29
0.40
0.24
0.09
1
• Number of boys in a family – Variable
If we select families from the infertility treatment clinic
• Families with identical twins:
Number
of boys
0
1
2
3
4
Total
50
0
0
0
50
100
• ! Selection influence the results
Inferential statistics - condition
To select random
sample  population:
What is the distribution of no. of boys in families with 4 children?
Population
Sampling
Population
Sampling
Sample population
Inference
Sample
Why random selection?
• Reducing / eliminating experimental errors = decrease /
eliminate selection bias
• The sample should be representative of the population = have
the same distribution of important characteristics as the
population
where:
– important characteristics = in connection with the studied characteristic
Sampling methods
• Random sampling - each subject has the same probability to
be selected
• Systematic sampling – each kth subject is selected
• Stratified sampling – population is divided into subgroups and
a random sample is selected from each subgroups
Sampling methods
• Cluster sampling – population is divided into clusters and a
random sample is selected from each cluster (cluster =
geographic zones)
• Nonprobabilistic sample – the probability that a suject is
selected is unknown
Sampling methods
• When researchers select random samples and then make
measurements, the result is a random variable.
Probability distribution
• The values of a random variable can be summarized in a
frequency distribution which we call probability distribution
Probability distribution in families with 4 children?
Number of
boys
Probability
0
1
2
3
4
Total
0.0625
0.25
0.375
0.25
0.0625
1.00
• How we compute probability distribution?
How we compute probability distribution?
Formula
Rule
Match with a theoretical probability distribution
Random variable
The results of random selection and measurements of some
characteristics is a random variable.
Values of a random variable = a frequency distribution = probability
distribution.
Distribution types
Commonly used symbols in inference and statistics
Characteristic
Mean
Standard deviation
Proportion
Population
Parameters
Symbol



Sample
Parameters
Symbol
𝑋
SD
p
Probability distributions
• Binomial distribution is used to determine the probability of
yes/no events—the number of times a given outcome occurs
in a given number of attempts.
• Poisson distribution is used to determine the probability of
rare events.
• Normal distribution is used to find the probability that an
outcome occurs when we measure a numerical observations have a bell-shaped distribution.
Normal distribution
Random variable X is normal N(, ) if the
distribution depend on two parameters: mean  and
standard deviation 
Formula:
f(x) 
1
 2
e
1 x  2
 (
)
2 
.
The Standard Normal Distribution
.
A normal distribution with =0 and =1. We change the variable
with:
X
U

Formula:
f(x) 
1
2
1

2
e
x
2
Questions
• Which value of Z divide the area under the curve in 95% and
5%? Zα = 1.645
• Which value of Z divide the area under the curve in 97.5% and
2.5%? Zα = 1.96
Exercise
Men
Ages SBP
Women
SBP
16
19
24
29
39
49
59
69
112
114
115
116
118
123
128
134
115
119
122
122
123
125
128
132
SBP is normal distributed with =120 and =10 mmHg.
1. Which value of SBP divide the area under the curve
in 95% and 5%?
2. Which value of SBP divide the area under the curve
in 97.5% and 2.5%?
• Thank you !!!