Data Types

64
Journal of The Association of Physicians of India ■ Vol. 64 ■ June 2016
Statistics for Researchers
Data Types
Siddharth Deshpande, Nithya Gogtay, Urmila Thatte
‘W
hy do I need to understand
statistics? Let the
statistician take care of it and just
tell me whether p<0.05. That is all
I need to know’.
This is an oft-expressed thought
by busy clinicians. In actual fact,
although we do not have to be
software engineers or code writers
to be able to use computers and
now even phones, we still need to
know the tool we use to be able
to use it optimally. And so also
for statistics. Statistical methods
are but tools that we all need to
understand to be able to either use
in our own research or understand
what is reported in literature. After
all, we do not want to do as Vin
Scully once said “Statistics are used
much like a drunk uses a lamppost:
for support, not illumination”!
This series of articles is an attempt
to simplify and discuss some
statistical methods needed to
describe and analyse clinical
research findings. At the end of
the series, we hope that readers
are better able to understand and
interpret results of either their own
research or that which is published
and therefore draw appropriate
conclusions to encourage best
practice evidence based medicine.
The first in this series deals with
the most important component of
any research that is “DATA”. In
this article, we describe the various
types of data that are generated
d u r i n g a n y s t u d y a s we l l t h e
importance of why it important
to understand what data has been
generated.
What, then is data? In statistics
we use the term ‘variable’ to mean
a quality or quantity which varies
from one member of a sample or
population to another, e.g. height
of children, number of deaths in a
clinical trial, blood glucose levels.
Data is the information gathered
about a variable, to arrive at a
conclusion. Just as cotton is raw
material for making clothes, data
is the raw material to generate
results of a study. Data has been
categorized into different types
and it is important to understand
these types because it decides how
the data is presented and analysed
statistically.
Data Types
Data can be broadly divided
into two types: Quantitative and
Q u a l i t a t i ve . Q u a n t i t a t i ve d a t a
answers the question “how much?”,
for example, height, weight, blood
pressure, serum phenytoin levels,
blood sugar levels, intraocular
pressure, etc. Qualitative data,
on the other hand, as the name
suggests describes a “character”
or “quality”, for example, colour
of hair or eyes, blood groups,
gender, ethnic groups, etc. and
answers the question “what type”.
Whereas the former is measurable
or quantifiable as a number (and
hence also called measurement
or numerical data), the latter can
be put into categories, and is also
called categorical data.
Q u a n t i t a t i ve d a t a i s f u r t h e r
divided into ‘continuous data’ and
‘discrete data’. Continuous data as
the name suggests is continuous
i.e. there are no gaps between two
measurements. Serum cholesterol
is a type of continuous data, since a
person may have serum cholesterol
value of 188, 188.1, 188.12 and so
on. The continuousness of the data
will depend on the sensitivity of
the instrument used to measure
the variable. Discrete data on the
other hand cannot take any value in
between two integers, like number
of malaria cases, number of deaths
due to myocardial infarction,
etc. where there can be two or
three deaths due to myocardial
infarction, but not be 2.5 or 2.9
deaths. Hence, the variable cannot
take a value other than a whole
integer.
Qualitative data is divided into
two types; ‘nominal’ and ‘ordinal’.
The name ‘Nominal’ comes from
the Latin nomen, meaning ‘name’
and nominal data indicates a
particular character or trait of
the variable being studied. Items
differentiated by a simple naming
system fall into this category and
the only thing a nominal scale does
is to say that items being measured
have something in common, e.g.
colour of iris (black, grey, blue
etc.) or type of cancer (esophageal,
stomach, lung etc). There is no
relationship or hierarchy in
nominal data and it is independent,
e.g. gender (male/ female) - One
cannot say that males are better
than females, or vice-versa, Blood
Groups (A/B/O/AB) where it cannot
be said that group A is superior
to group O, or Religion (Hindu/
Muslim/ Christian/ Buddhist, etc.)
where there is no logical order to
the categories but they all have
something in common.
Ordinal data, on the other
hand, has a specific order to it,
e.g. headache (mild, moderate,
Dept. of Clinical Pharmacology, Seth GS Medical College, Mumbai, Maharashtra
Journal of The Association of Physicians of India ■ Vol. 64 ■ June 2016
Table 1: Types of data
Type of data Subtype
Example
Quantitative Continuous Serum cholesterol
HbA1C
BMI
Discrete
Number of malaria cases
Number of deaths due to myocardial infarction
Number of adverse drug reactions
Qualitative Nominal
Colour of iris (black, grey, blue etc.)
data
Gender (male/ female/other)
Blood groups (A/B/O/AB)
Religion (Hindu/ Muslim/ Christian/ Buddhist, etc.)
Ordinal
Headache (mild, moderate, severe)
Status of blood pressure (hypotensive, hypotensive, normal, high
normal, Grade 1 hypertension, Grade II hypertension.
severe), status of blood pressure
(hypotensive, hypotensive, normal,
high normal, Grade 1 hypertension,
Grade II hypertension. Items on
an ordinal scale are set into some
kind of order by their position on
the scale. You cannot do arithmetic
with ordinal numbers -- they show
sequence only.
Both nominal and ordinal data
are further classified into binary
and non-binary data. When the
variable can take up only two
possible values e.g. yes/ no, dead/
alive, and there is no alternative
other than the two it is called
binary data.
Table 1 summarises types of data
with examples.
Conversion of Quantitative
Data to Qualitative Data
It is possible to covert
quantitative data to qualitative data
to ease presentation and analysis.
For example, the quantitative data
of height (181 cm, 134 cm, etc.) can
be converted to qualitative data
(short, average, tall, very tall) by
pre-defining cut offs for categories.
Whether quantitative or qualitative
data is to be collected, this has to
be described in the study protocol
à priori.
Collecting what is inherently
quantitative data like systolic BP
values (120, 140, 190 mmHg) in
the form of qualitative data like
proportions of patients (20%,
40%, 60%) in normal, borderline
hypertension and hypertension
groups tends to weaken the
data Thus, if you decide that
hypertension will be defined
as every individual with blood
pressure above 140 mm Hg systolic,
patients with blood pressures of
150mm Hg and 210mm Hg will be
treated similarly, although there is
a significant difference between the
two values. The clinical outcomes
will also differ between the two
individuals. Therefore, although it
65
can be done, it is to be resorted only
if necessary and applicable.
Why it is Important to
Study Types tf Data?
The ‘type’ of data (whether
quantitative or qualitative) will
decide which type of statistical
test (either parametric or nonparametric test) will be used for
analysis. Qualitative data is always
analyzed by non-parametric tests
while Quantitative data can be
analyzed by parametric tests ONLY
if it is normally distributed.
In conclusion, data, which is the
basic unit for any research program
and is analysed for reaching
conclusions using statistical tests is
broadly of two types: qualitative (or
categorical data) and quantitative
(also called as numerical data).
Qualitative data can be nominal or
ordinal while quantitative data can
be continuous or discrete. The type
of data determines statistical test
applied and mode of representation
of data.
Although Benjamin Disraeli
said, “There are three types of lies
-- lies, damn lies, and statistics”
we hope that this series will help
readers sift the wheat from the
chaff and remove the “lies” from
statistical analyses. References
1.
Michael J. Campbell, T. D. V. Swinscow,
“Statistics at Square One, 11th Edition” 2009.
BMJ Books. Wiley Blackwell.