Mean (grouped data)

5. Schedule of Assessment Tasks for Students During the Semester
Assessment task (Tutorials, test, group discussion and
presentation, examination.)
Week Due
Proportion of Total
Assessment
1
Midterm Written exam
20%
2
Participation and attendance
Week 7 and
week 12
All along
3
Assignment and presentation
All along
30%
4
Final Written Exam
End of term
40%
10%
1. Summary of the main learning outcomes for students enrolled in the course.
The course will introduce elementary methods for presenting biomedical and sociodemographic data in summary form, analyzing data , statistical inference and methods
of data collection and sampling techniques. It is not a mathematics course and so will
not stress derivations of formulae but, rather, will emphasize the application of
statistical ideas and methods to the design and interpretation of biomedical data.
Numerical & graphical
summarization of data

Dr. Omer Alhaj







Contents of the presentation:
Definition of Data
Types of Data
Graphical presentation of Data
Numerical presentation of Data
Measures of central tendency
Normal distribution curve
Data
Def.: it is the basic building blocks of statistics and
refer to the individual values (Presented, Measured,
Observed).
 Types of data
Grouped versus ungrouped
Primary versus secondary (data sources)
Quantitative versus qualitative
 Tools of data collection
1. Observation
2. Questionnaire
3. Interviews
4. Record analysis

Ungrouped versus grouped data
Ungrouped data:



Presented or observed
individually
ex.: List of weight for sex
men:
80, 70, 70, 70, 95, 95 kg
Grouped data


Presented in groups
consisting of identical
data by frequency
See the table:
Weight
(kg)
No. of
men
80
70
95
1
3
2
Sources of data








Census
Vital statistics report
International publication
(WHO) data
Scientific journals data
Hospital and outpatient
clinics data
Recorded data
Secondary
Survey
Studies
Primary
Data presentation
Usefulness of data presentation:
To organize and summarize raw data in
easily comprehensible forms


1.
2.
3.
Methods of data presentation:
Tabular
Diagrammatic
Numerical
I-Tabular presentation of data


It is a basic method in data presentation
Characteristics of good table
1. Simple
2. Self explanatory
- Explaining abbreviation
- Columns and rows labeled clearly
- Unites of measures should be written
- Title: should be clear and concise and separated
from the head of the table.
3. Source of data should be written (data not original)
Types of table
Univariate table (simple
frequency distribution
table)
 Bivariate table
 Multivariate
table
Age distribution
of the studied
cases by sex

Age in years
-10
11-20
21-30
No. of cases
m
f
1
2
4
2
2
1
Age distribution of the studied cases
Age in years
-10
11-20
21-30
No. of cases
3
4
5
II-Diagrammatic presentation of Data
It includes presentation of
data in the forms of:
1.
2.
Graphs.
Charts
- A graph or chart is used
to present facts in visual
form.
- Graphical representation
of data is far more effective
in conveying information
than are tables of data.
Graphs
1. Histogram
2. Frequency
polygon
3. Scatter
diagram
Charts
1. Simple bar
chart.
2. Component
bar chart
3. Pie chart
Histogram (2)
-
-
Histogram composed of columns with no spaces between
them, and it is suitable for presenting data that are
continuous, measured in interval or ratio scales.
2 axis (x axis “abscissa”, and y axis “ordinate”). The
continuous data are presented on X and their frequency
on Y.
-
Histogram is similar to bar chart; however the only
difference is the presentation being that the bars of
histogram are joined together.
-
The histogram evolved to meet the need for evaluating
data that occurs at a certain frequency.
Histogram (2)
1200
1000
800
600
400
200
0
7.0
8.0
Child's age / Year
9.0
10.0
11.0
12.0
Frequency Polygon (1)
-
-
If we connect the midpoints of each class interval
with straight lines, a frequency polygon is
formed.
The frequency polygon describes the
distribution of the data.
Frequency Polygon (2)
Scatter Diagram

Scatter graphs are widely used in science to present
measurements of two (or more) variables (i.e.,
continuous) that are expected to be related; one variable
is plotted on the Y axis (dependent variable e.g.
(Weight) & the other variable is plotted on the X axis
(Height). The latter is said to be the independent
variable.
Scatter plots are useful for illustrating
the relationship between continuous variables
Results: If the pattern of plot:
1- tend to form a straight line  THERE IS A RELATION (+ ve or –
ve).
2- tend to form just a scatter point THERE IS NO A RELATION
(as the figure below demonstrate).
Heights and weights of 6 students
Student Height (cm) Weight (kg)
1
167
60
2
170
64
3
160
57
4
152
46
5
157
55
6
160
50
Scatter plot of Age versus IQ
16
14
12
10
8
6
4
2
0
Age/years
10
20
30
Scatter plot of Income versus Age
Scatter plot of Income versus Age Classified by Sex
Bar chart (1)
1.
Bar chart is composed of columns, all of the same
width and there are spaces between columns and
this type is ideally suited for comparing categories
of mutually exclusive discrete data.
2.
A bar chart is similar to a histogram except that
the bar chart has spaces between the bars
whereas the bars in a histogram are contiguous. A
bar chart should not be called a histogram
because the bar chart illustrates categorical data
and the histogram shows the distribution of
continuous data
Bar chart (2)
Types of Bar chart
1.
2.
3.
Simple bar chart
Component or segmented bar chart
it is a bar chart in which the bars are divided into
portions which are either colored or shaded to denote
their classifications
Grouped bar chart
Frequency of STDs in Cairo (Simple Bar chart)
frequency distribution of STDS
400
250
200
Series1
100
DS
AI
rh
oe
a
Go
no
r
ph
lis
Sy
Fr
ee
Ot
he
rS
TD
s
50
500
400
300
200
100
0
frequency
Frequency of Ethnic group in the
studied subjects
60
40
Freq
20
0
White
Black
Asian African
Ethnic group
Freq. dist. Of Ch. Dis. In 3 governorates (Component bar Chart)
4500
4000
3500
3000
2500
2000
1500
1000
500
0
Ismalia
Alex
Disease
er
s
O
th
er
C
an
c
M
D
yp IHD
er
te
ns
io
n
Cairo
H
Frequency
Distribution of chronic diseases
in 3 governorates
Freq. dist. Of Ch. Dis. In 3 governorates (grouped bar Chart)
Cairo
Alex
C
an
ce
r
O
th
er
s
M
Ismalia
D
yp IHD
er
te
ns
io
n
2500
2000
1500
1000
500
0
H
frequency
Distribution of chronic diseases
in 3 governorates
Disease
Pie Chart (1)
•
•
•
•
It should be used only where the values have a 
constant sum (usually 100%).
It should be used where  the individual values show
significant variations; a pie chart of equal values is of no
use.
It should be used when the number of categories (`slices')
is reasonably small; as a rule of thumb the number of
categories should be normally between 3 and 10.
It can be used to display  quantitative (discrete data) &
qualitative ( categorical) data.
Pie Chart
% distribution of diseases among
the studied
persons
Free
Asthmatics
Hypertensi
ve
Diabetics
III. Numerical presentation of data
A- MEASURE OF CENTRAL TENDENCY
MEAN
MEDIAN AND
MODE
B- Measure Of Dispersion (= Variability)
Range
Variance And SD
Mean Deviation
Co-efficient of variation
MEASURE OF CENTRAL TENDENCY
MODE (1)
- Mode is defined as the most frequently
occurring number in a distribution.
- The advantage of the mode as a measure
of central tendency is that its meaning is
obvious.
- Further, it is the only measure of central
tendency that can be used with nominal
data.
MEASURE OF CENTRAL TENDENCY
MODE (2)
*Example:
ex.1(23, 34, 35, 36, 36, 37, 40, 45, 50)
Mode = 36
ex.2(4, 10, 10, 15, 18, 20, 20, 24,
26) Modes = 10 and 20 (bimodal)
ex.3(44, 47, 50, 56, 58, 60, 65,
75) Mode = 0
Mode
(4)
Advantages


Quick and easy to calculate
Unaffected by extreme values
Disadvantages


May not be representative of the whole
sample as they do not use all values
Seldom gives statistical significance
MEASURE OF CENTRAL TENDENCY
Median (1)
Median:
it is the central value which divide the data into 2 equal
parts after data arrangement in descending or ascending
manner).
i.e., it is the value that divides a series of observations into 2
equal halves when all observations are listed from lowest
to highest or from highest to lowest.
-
In odd numbered series, Median = (n+1)/2 .
In even numbered series, Median = n/2, n/2 +1
MEASURE OF CENTRAL TENDENCY
Median (2)
1.
2.
Characteristics of Median:
The median is less sensitive to extreme
scores than the mean and this makes it a
better measure than the mean for highly
skewed distributions.
Used mainly in survival analysis
MEASURE OF CENTRAL TENDENCY
Median (3)
Examples:
“ Do not forget to rearrange the data, if any”
ex.1 (odd series) (2,4, 5, 7, 8, 10, 11)
Median = 7+1÷2 = 4 (i.e. observation No. 4)  7
ex.2 (even series)(2,4, 5, 7, 8, 10, 11,12) Median =
8+1÷2 = 4.5 (i.e. observation No. 4&5)  7+8 ÷2 =
7.5
ex.3  (7,11, 5, 2, 8, 10, 4)  Re-arrangement 
(2,4, 5, 7, 8, 10, 11)
Median = 7
Median: Advantages







Fairly easy to calculate and always exist
Relatively easy to interpret - half of the sample (normally)
lies above/below the median
Is not affected by extreme data values
Used when distribution of data is skewed
Does not include values of observations, only their ranks
Can be used with ordinal observations because calculation
does not use actual vales of the observations
Do not need a complete data set to calculate the rank
Median: Disadvantages


Manually tedious to find for a large sample which is
not in order (Requires ordering)
Does not utilize all data values
MEASURE OF CENTRAL TENDENCY
Mean (1)
Mean is:
the most common and a useful measure to describe the
central tendency or arithmetic average of a distribution of
values for any group of individuals, objects or events.
Def.:
It can be defined as the sum of values of a series of observations
divided by the number of observations.
MEASURE OF CENTRAL TENDENCY
Mean (2)
•
-
Calculation and examples
Ungrouped data:
Mean X = ∑ xi / n
5, 8, 12, 15, 40 Mean = 80 ÷5 = 16
2, 4, 6, 8, 10
Mean = 30 ÷5 = 6
-
Grouped data:
Mean X = ∑ Fj xj / n
Mean (grouped data) (3)
Pulse rate
40-49
50-59
60-69
70-79
80-90
Total
Freq (Fj)
3
10
12
4
1
Class midpoint Xj
45
55
65
75
85
FjXj
135
550
780
300
85
30
325
1850
X = ∑ Fj xj / n
= 1850 /30 = 61.67
Mean: Advantages









It is familiar to most people
It reflects the inclusion of every item in the data set
Utilize all values
It always exists
It is unique
It is easily used with other statistical measurements
The mean is the center of gravity of the data and,
easy to understand and to calculate
Distribution is determine symmetrical
Important for statistical analyses and its applications
Mean: Disadvantages




It can be affected by extreme values in
the data set, called outliers, and
therefore be biased
Loss of accuracy when the distribution
is skewed
Including or excluding a data (number)
will change the mean
Manually, more tedious to calculate
Classification of B P in the wards A & B into
3 categories
Systolic B P
Ward A
Ward B
100 - < 120
n
4
%
40.0
n
3
%
30.0
120 - < 140
4
40.0
3
30.0
140 +
2
20.0
4
40.0
Frequency distribution of systolic B.
P in Ward A & B
5
Frequency
4
4
4
4
3
3
3
2
2
1
0
100 - < 120
120 - < 140
B P categories
140 +
Ward A
Ward B
8
7
6
5
Ward B
4
Ward A
3
2
1
0
100 - < 120
120 - < 140
140 +
The Normal Distribution Curve
(Gaussian curve) (1)
Definition:
It is a mathematical model which describes
adequately many types of measurement in
medicine.
The Normal Distribution Curve
(Gaussian curve)
Idea:
 When scientists first began
constructing histograms, a
particular shape occurred so
often that people began to expect
it. Hence, it was given the name normal
distribution.

The normal distribution is
symmetric (you can fold it in half
and the two halves will match) and
unimodal (single peaked).

It is what psychologists call the bellshaped curve.

Thank you