Community Medicine Dept. 1 out of 9 Probability: The concept of

Biostatistics
Assist. Prof. Dr. Waleed Arif Tawfiq
Lecture 6
11/11/2014
Probability:
The concept of probability is not a foreign to health workers and is
frequently encountered in every day activities, it is a familial concept as
example when we hear a physician giving the chance of surviving from
certain operation to a patient such as 50 - 50 chance of surviving (meaning
50% to survive and 50% chances to death out of 100%).
Thus we are accustomed to measure the probability of the
occurrence of some event by a number between zero and one. The more
likely the event, the closer the number is to one; and the more unlikely the
event, the closer the number is to zero.
- An event that can not occur has a probability of zero,
- An event that less likely to occur has a probability near to zero,
- An event that is certain to occur has a probability of one, and
- An event that is most likely to occur has a probability near to one.
Probability; it is an outcome out of the total possible outcomes.
Outcome
Probability (P) = -----------------------------Total possible outcomes
The concept of objective probability that is derived from objective
processes is of two types:
1- Classical probability (Priori Probability):
Developed out of the attempts to solve problems related to games of
chance. For example;
Community Medicine Dept. 1 out of 9
Biostatistics
Assist. Prof. Dr. Waleed Arif Tawfiq
Lecture 6
11/11/2014
Coin: the rolling of coin will give only two results (Head, Tail) with
equally likely chances of each one to occur and that there is no
reason to favour the drawing of anyone.
P H= 1/2= 0.5
P T= 1/2= 0.5
Total P.= 1
Dice: the rolling of dice will give only 6 results (1,2,3,4,5 & 6) with
probability of 1/6 for the appearance of any number and the total
probability of 1.
Cards: we have 52 cards (of two colours red and black, of 4 types) so
probability of selecting of one card is 1/52.
The probability here depend on whether the events are mutually
exclusive or not (the two events does not occur at the same time).
The probability is calculated by the process of abstract reasoning. I t
is calculated by many methods; addition, multiplication, factorial, or
combination.
2- Relative frequency probability (Posteriori Probability):
It depends on the repeatability of some process and the ability to
count the number of repetitions, as well as the number of times that some
event of interest occurs.
Outcome
Probability (P) = -----------------------------Total possible outcomes
Frequency of occurrence
Probability (P) = ---------------------------------Repetitions number
Community Medicine Dept. 2 out of 9
Biostatistics
Assist. Prof. Dr. Waleed Arif Tawfiq
Lecture 6
11/11/2014
f
P = -----n
The probability distribution of a discrete random variable will
specify all possible values along with their respective probability.
The following' table represents probability distribution of number
of children with their haemoglobin level (g/dl).
Hb (g/ dL)
F
Prob.
Cumulative Prob.
10
1
0.02
0.02
11
4
0.08
0.10
12
6
0.12
0.22
13
7
0.14
0.36
14
9
0.18
0.54
15
10
0.20
0.74
16
7
0.14
0.88
17
4
0.08
0.96
18
2
0.04
1.00
Total
50
1.00
-
Frequncy
12
10
9
10
8
6
6
7
7
4
4
4
2
2
1
0
10
11
12
13
14
15
16
Community Medicine Dept. 3 out of 9
17
18
Biostatistics
Assist. Prof. Dr. Waleed Arif Tawfiq
Lecture 6
11/11/2014
From this data that show the probability distribution of this
continuous random variable (and even for the discrete type), it is clear that
the column of relative frequency, it show corresponding positive values,
they are all less than 1, and their sum is equal to 1. These are the
characteristics of probability distribution, which also represent the relative
frequency of occurrence of such values, also we can obtain the cumulative
probability by adding the probability through the relative frequency from
cumulating it, and it gives a good determination of probability.
There are different theoretical types of probability distribution;
1- Binomial Prob. Distribution.
2- Poisson Prob. Distribution.
3- Normal Prob. Distribution.
4- Skewed Prob. Distribution. (Negatively skewed  to left
Positively skewed  to right)
When we consider the distribution of a continuous random variable,
it is presented as a specified class intervals values by their frequency and
probability as in repetition occurrence, it will be presented in histogram as
follows
Fre quncy
30
25
20
15
10
5
0
10
12
14
16
18
20
Community Medicine Dept. 4 out of 9
22
24
Biostatistics
Assist. Prof. Dr. Waleed Arif Tawfiq
Lecture 6
11/11/2014
And also in frequency polygon
30
Frequncy
25
20
15
10
5
0
10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25
In line graph as the number of observations [n] approaches the
infinity and the width of the class interval approaches zero, then the
frequency polygon approaches a smooth curve.
The line when deal with probability distribution has some
characteristics;
- Smooth curve
- Total area under the curve is equal to one.
- Relative frequency (f/n) of occurrence of value between any two points
on x-axis is equal to the area bounded by the curve and the two points.
Community Medicine Dept. 5 out of 9
Biostatistics
Assist. Prof. Dr. Waleed Arif Tawfiq
Lecture 6
11/11/2014
Normal distribution curve:
The graph of normal distribution produces the bell-shaped curve. I t
is the most important type of distribution in all of statistics. It is also called
Gaussian distribution. It has the following characteristics;
1-It is symmetrical about its mean (The curve on either side of the mean is
a mirror image of the other side).
2-The mean, mode and median are equal (the measures of central tendency
coincide to each other).
3-The total area under the curve above the x-axis is one unit (the normal
distribution is a probability distribution).
4-The normal distribution curve is completely determined by two
parameters (mean & standard deviation) (it is tall & narrow for small
SD, and short & wide for large SD)
5-68% of the total area under the curve is enclosed by mean + 1 SD. While
95% of the total area under the curve is enclosed by mean + 2 SD.
99.7% enclosed under mean + 3 SD.
-2SD
-lSD
Mean
+lSD
Community Medicine Dept. 6 out of 9
+2SD
Biostatistics
Assist. Prof. Dr. Waleed Arif Tawfiq
Lecture 6
11/11/2014
The importance of normal distribution:
1- It is a good empirical description of many variables and especially the
biological variables of human being, all laboratory investigations and
body measurements.
2- It represents the bases of statistical analysis and testing hypothesis, so
any data to be tested should be normally distributed or assumed to be
normally distributed.
For a group of 75 persons with their age (mean= 35.23 year,
Median= 35.6 year, SD= +12.96 year);
1-Within which range 68% of observations in the sample lie?
68% of observation lie in a range of =
+ 1 SD =35.23 + 12.96
Upper limit =
+ 1SD = 35.23 + 12.96 = 48.19
Lower limit =
- 1SD = 35.23 - 12.96 = 22.27
So 68% of observations lie in range of ( 22.27 --- 48.19)
2-Within which range 95% of observations in the sample lie?
95% of observation lie in a range of =
+2 SD=35.23+(2x12.96)
=35.23 + 25.92
Upper limit =
+ 2SD = 35.23 + 25.92 = 61.15
Lower limit =
- 2SD = 35.23 - 25.92 = 9.31
So 95% of observations lie in range of ( 9.31 --- 61.15)
Community Medicine Dept. 7 out of 9
Biostatistics
Assist. Prof. Dr. Waleed Arif Tawfiq
Lecture 6
11/11/2014
Sampling variation & Standard error:
The sample mean is unlikely to be exactly equal to the population
mean () (mu) as a different sample would give a different estimate, the
difference being due to sampling variation.
If we collect many independent samples of the same size and
calculate the sample mean of each of them, then a frequency distribution of
those means could be formed. The mean of this distribution would be the
population mean (), and it can be shown that its standard deviation would
be equal to SD /  n .
This is called the standard error of the sample mean and it measures
how precisely the population mean is estimated by the sample mean.
Sampling variation; is the degree of variability of sample mean
from the population mean from which we the sample were taken.
SE; is the measurement of the degree of variability of sample mean
from population mean from which the sample were taken.
The size of SE depends on:
1- Variation in the population (SD,..)
2- Size of the sample (n).
We use sample SD to estimate the SE by:
SD
SE = --------n
For example the mean of blood glucose level of 9 health persons is
95 mg/100 ml, the SD is 10.5 mg/100 ml, then the SE estimated to be;
SD
10.5
SE = --------- = --------- = +3.5 mg/l00 ml
n
9
Community Medicine Dept. 8 out of 9
Biostatistics
Assist. Prof. Dr. Waleed Arif Tawfiq
Lecture 6
11/11/2014
Approximately 95% of the sample means obtained by repeated
sampling would lie within 2 SE above or below the population mean. This
fact can be used to construct a range of likely value of population mean
based on the observed sample mean and its SE. This range is called
confidence interval in which we are 95% confident that the population
mean lie in this range which is called population mean.
As 95% of observations lie in the range of
So 95% C.I. or  =
+ 2 SD
+ 1.96 SE
In which we are 95% confident that the population mean lie in this range
(which is sample mean + 1.96 SE)
Community Medicine Dept. 9 out of 9