Introduction to Biostatistics

Faculty of Medicine
Introduction to Community Medicine Course
(31505201)
Introduction to
Statistics and Demography
By
Hatim Jaber
MD MPH JBCM PhD
27+29 - 11- 2016
1
World AIDS Day 2016:
end AIDS by 2030
• People living with HIV 36.7
million
• People on antiretroviral
therapy 18.2 million
• Mother-to-child
transmission 7 out of 10
2
3
4
Presentation outline
Time
Introduction and Definitions of Statistics and
biostatistics
12:00 to 12:10
Role of Statistics in Clinical Medicine
12:10 to 12:20
Basic concepts
12:20 to 12:30
Methods of presentation of data
12:30 to 12:40
12:40 to 12:50
5
Introduction to
Biostatistics
6
Definition of Statistics
• Different authors have defined statistics differently. The best
definition of statistics is given by Croxton and Cowden according to
whom statistics may be defined as
the science, which
deals with collection, presentation, analysis
and interpretation of numerical
data.
• The science and art of dealing with variation in data through collection,
classification, and analysis in such a way as to obtain reliable
results. —(John M. Last, A Dictionary of Epidemiology )
• Branch of mathematics that deals with the collection, organization,
and analysis of numerical data and with such problems as
experiment design and decision making. —(Microsoft
Encarta Premium 2009)
7
Definition of Biostatistics= Medical
statistics
• Biostatistics may be defined as application of
statistical methods to medical, biological
and public health related problems.
• It is the scientific treatment given to the medical
data derived from group of individuals or patients
Collection of data.
Presentation of the collected data.
Analysis and interpretation of the results.
Making decisions on the basis of such analysis
8
Role of Statistics in Clinical Medicine
The main theory of statistics lies in the term
variability.
There is No two individuals are same. For example, blood
pressure of person may vary from time to time as well as from
person to person.
We can also have instrumental
observers variability.
variability as well as
Methods of statistical inference provide largely objective means for
drawing conclusions from the data about the issue under study.
Medical science is full of uncertainties and statistics deals with
uncertainties. Statistical methods try to quantify the uncertainties
present in medical science.
It helps the researcher to arrive at a scientific judgment about
a hypothesis. It has been argued that decision making is an
integral part of a physician’s work.
Frequently, decision making is probability based.
9
Role of Statistics in
Public Health and Community Medicine
Statistics finds an extensive use in Public Health and Community Medicine.
Statistical methods are foundations for public health administrators to understand
what is happening to the population under their care at community level as well as
individual level. If reliable information regarding the disease is available, the public
health administrator is in a position to:
●● Assess community needs
●● Understand socio-economic determinants of health
●● Plan experiment in health research
●● Analyze their results
●● Study diagnosis and prognosis of the disease for taking
effective action
●● Scientifically test the efficacy of new medicines and
methods of treatment.
10
Why we need to study Medical Statistics?
Three reasons:
(1) Basic requirement of medical research.
(2) Update your medical knowledge.
(3) Data management and treatment.
11
Role of statisticians
 To guide the design of an experiment or survey prior to
data collection
 To analyze data using proper statistical procedures and
techniques
 To present and interpret the results to researchers and
other decision makers
12
I. Basic concepts
• Homogeneity: All individuals have similar values or
belong to same category.
Example: all individuals are Chinese, women, middle age (30~40
years old), work in a computer factory ---- homogeneity in nationality,
gender, age and occupation.
• Variation: the differences in feature, voice…
• Throw a coin: The mark face may be up or down ---- variation!
• Treat the patients suffering from pneumonia with same antibiotics:
A part of them recovered and others didn’t ---- variation!
• If there is no variation, there is no need for statistics.
• Many examples of variation in medical field: height, weight, pulse,
blood pressure, … …
13
2. Population and Sample
• Population: The whole collection of individuals that
one intends to study.
• Sample: A representative part of the population.
• Randomization: An important way to make the
sample representative.
14
limited population and limitless population
• All the cases with hepatitis B collected in a hospital
in Amman . (limited)
• All the deaths found from the permanent residents
in a city. (limited)
• All the rats for testing the toxicity of a medicine.
(limitless)
• All the patients for testing the effect of a medicine.
(limitless) hypertensive, diabetic, …
15
Random
By chance!
• Random event: the event may occur or may not
occur in one experiment.
Before one experiment, nobody is sure whether
the event occurs or not.
Example: weather, traffic accident, …
There must be some regulation in a large number
of experiments.
16
3. Probability
• Measure the possibility of occurrence of a random
event.
• A : random event
• P(A) : Probability of the random event A
P(A)=1, if an event always occurs.
P(A)=0, if an event never occurs.
17
Estimation of Probability----Frequency
• Number of observations: n (large enough)
Number of occurrences of random event A: m
f(A)  m/n
(Frequency or Relative frequency)
Example: Throw a coin event:
n=100, m (Times of the mark face occurred)=46
m/n=46%, this is the frequency; P(A)=1/2=50%,
this is the Probability.
18
4. Parameter and Statistic
• Parameter : A measure of population or
A measure of the distribution of population.
Parameter is usually presented by Greek letter.
such as μ,π,σ.
-- Parameters are unknown usually
To know the parameter of a population, we need a sample
• Statistic: A measure of sample or A measure of the distribution of sample.
Statistic is usually presented by Latin letter
such as s , p, t.
19
5. Sampling Error
error :The difference between observed value and
true value.
Three kinds of error:
(1) Systematic error (fixed)
(2) Measurement error (random) (Observational error)
(3) Sampling error (random)
20
Sampling error
• The statistics of different samples from same
population: different each other!
• The statistics: different from the parameter!
The sampling error exists in any sampling research.
It can not be avoided but may be estimated.
21
II. Types of data
1. Numerical Data ( Quantitative Data )
• The variable
quantitatively
-- Numerical
describe
the
characteristic
of
individuals
Data
• The data of numerical variable
-- Quantitative
Data
22
2. Categorical Data ( Enumeration Data )
• The variable describe the category of individuals according to a
characteristic of individuals
-- Categorical Data
• The number of individuals in each category
-- Enumeration Data
23
Special case of categorical data :
Ordinal Data ( rank data )
•
There exists order among all possible categories. ( level of
measurement)
-- Ordinal Data
•
The data of ordinal variable, which represent the order of
individuals only
-- Rank data
24
Examples
Which type of data they belong to?
• RBC (4.58 106/mcL)
• Diastolic/systolic blood pressure
(8/12 kPa) or ( 80/100 mmHg)
• Percentage of individuals with blood type A (20%)
(A, B, AB, O)
• Protein in urine (++) (-, ±, +, ++, +++)
• Incidence rate of breast cancer ( 35/100,000)
25
III. The Basic Steps of Statistical Work
1. Design of study
• Professional design:
Research aim
Subjects,
Measures, etc.
26
• Statistical design:
Sampling or allocation method,
Sample size,
Randomization,
Data processing, etc.
27
2. Collection of data
• Source of data
Government report system such as: cholera,
plague (black death) …
Registration system such as: birth/death
certificate …
Routine records such as: patient case report …
Ad hoc survey such as: influenza A (H1N1) …
28
• Data collection – Accuracy, complete,
in time
Protocol: Place, subjects, timing; training; pilot;
questionnaire; instruments; sampling method and
sample size; budget…
Procedure: observation, interview, filling
form, letter, telephone, web.
29
3. Data Sorting
• Checking
Hand, computer software
• Amend
• Missing data?
• Grouping
According to categorical variables (sex, occupation, disease…)
According to numerical variables (age, income, blood pressure …)
30
4. Data Analysis
• Descriptive statistics (show the sample)
mean, incidence rate …
-- Table and plot
• Inferential statistics (towards the population)
-- Estimation
-- Hypothesis testing (comparison)
31
About Teaching and Learning
• Aim:
Training statistical thinking
Skill of dealing with medical data.
• Emphasize:
Essential concepts and statistical thinking
-- lectures and practice session
Skill of computer and statistical software
-- practice session ( Excel and SPSS )
32
Sources of
data
Records
Comprehensive
Surveys
Experiments
Sample
33
Types of data
Constant
Variables
34
Types of variables
Quantitative variables
Quantitative
continuous
Quantitative
descrete
Qualitative variables
Qualitative
nominal
Qualitative
ordinal
35
Methods of presentation of data
Numerical presentation
Graphical presentation
Mathematical presentation
36
1- Numerical presentation
Tabular presentation (simple – complex)
Simple frequency distribution Table (S.F.D.T.)
Title
Name of variable
(Units of variable)
Frequency
%
- Categories
Total
37
Table (I): Distribution of 50 patients at the surgical
department of AAAAA hospital in May 2008
according to their ABO blood groups
Blood group
A
B
AB
O
Total
Frequency
%
12
18
5
15
50
24
36
10
30
100
38
Table (II): Distribution of 50 patients at the surgical
department of AAAAA hospital in May 2008 according
to their age
Age
(years)
20-<30
304050+
Total
Frequency
%
12
18
5
15
50
24
36
10
30
100
39
Complex frequency distribution Table
Table (III): Distribution of 20 lung cancer patients at the chest
department of AAAAA hospital and 40 controls in May 2008 according
to smoking
Lung cancer
Cases
Control
No.
%
No.
%
No.
%
Smoker
15
75%
8
20%
23
38.33
Non
smoker
5
25%
32
80%
37
61.67
Total
20
100
40
100
60
100
Smoking
Total
40
Complex frequency distribution Table
Table (IV): Distribution of 60 patients at the chest department of
AAAAA hospital in May 2008 according to smoking & lung
cancer
Lung cancer
positive
negative
No.
%
No.
%
No.
%
Smoker
15
65.2
8
34.8
23
100
Non
smoker
5
13.5
32
86.5
37
100
Total
20
33.3
40
66.7
60
100
Smoking
Total
41
42
Line Graph
MMR/1000
Year MMR
1960 50
60
50
40
30
20
10
0
Year
1960
1970
1980
1990
1970
45
1980
26
1990
15
2000
12
2000
Figure (1): Maternal mortality rate of (country),
1960-2000
43
Frequency polygon
Age
(years)
Males
Females
20 -
3 (12%)
2 (10%)
(20+30) / 2 = 25
30 -
9 (36%)
6 (30%)
(30+40) / 2 = 35
40-
7 (8%)
5 (25%)
(40+50) / 2 = 45
50 -
4 (16%)
3 (15%)
(50+60) / 2 = 55
60 - 70
2 (8%)
4 (20%)
(60+70) / 2 = 65
Total
Sex
Mid-point of interval
25(100%) 20(100%)
44
Age
Frequency polygon
Males
Females
%
40
35
30
Sex
M
F
20-
(12%) (10%)
25
30-
(36%) (30%)
35
40-
(8%) (25%)
45
50-
(16%) (15%)
55
60-70
(8%) (20%)
65
25
20
15
10
5
0
Age
25
35
45
55
M-P
65
Figure (2): Distribution of 45 patients at (place) , in
(time) by age and sex
45
Frequency curve
Frequency
9
8
Female
7
Male
6
5
4
3
2
1
0
20-
30-
40-
50-
60-69
Age in years
46
Histogram
Distribution of a group of cholera patients by age
Age (years)
% 35
Frequency
%
2530404560-65
3
5
7
4
2
14.3
23.8
33.3
19.0
9.5
Total
21
100
30
25
20
15
10
5
65
60
45
40
30
25
0
0
Age (years)
Figure (2): Distribution of 100 cholera patients at (place) , in (time)
47
by age
Bar chart
%
45
40
35
30
25
20
15
10
5
0
Single
Married
Divorced
Widowed
status
MaritalMarital
Status
48
Bar chart
%
50
Male
Female
40
30
20
10
0
Single
Married
Divorced
Widowed
Marital status
Marital Status
49
Pie chart
Inversion
18%
Deletion
3%
Translocation
79%
50
Doughnut chart
Hospital B
DM
Hospital A
IHD
Renal
51
3-Mathematical presentation
Summery statistics
Measures of location
1- Measures of central tendency
2- Measures of non central locations
(Quartiles, Percentiles )
Measures of dispersion
52
Summery statistics
1- Measures of central tendency (averages)
Midrange
Smallest observation + Largest observation
2
Mode
the value which occurs with the greatest
frequency i.e. the most common value
53
Summery statistics
1- Measures of central tendency (cont.)
Median
the observation which lies in the middle of the
ordered observation.
Arithmetic mean (mean)
Sum of all observations
Number of observations
54
Measures of dispersion
Range
Variance
Standard déviation
Semi-interquartile range
Coefficient of variation
“Standard error”
55
Standard déviation SD
7 7
7 77
7
Mean = 7
SD=0
7
8
7 77
6
Mean = 7
SD=0.63
3
2
7 8
13
9
Mean = 7
SD=4.04
56
Standard error of mean SE
A measure of variability among means of samples
selected from certain population
S
SE (Mean) =
n
57