64 Journal of The Association of Physicians of India ■ Vol. 64 ■ June 2016 Statistics for Researchers Data Types Siddharth Deshpande, Nithya Gogtay, Urmila Thatte ‘W hy do I need to understand statistics? Let the statistician take care of it and just tell me whether p<0.05. That is all I need to know’. This is an oft-expressed thought by busy clinicians. In actual fact, although we do not have to be software engineers or code writers to be able to use computers and now even phones, we still need to know the tool we use to be able to use it optimally. And so also for statistics. Statistical methods are but tools that we all need to understand to be able to either use in our own research or understand what is reported in literature. After all, we do not want to do as Vin Scully once said “Statistics are used much like a drunk uses a lamppost: for support, not illumination”! This series of articles is an attempt to simplify and discuss some statistical methods needed to describe and analyse clinical research findings. At the end of the series, we hope that readers are better able to understand and interpret results of either their own research or that which is published and therefore draw appropriate conclusions to encourage best practice evidence based medicine. The first in this series deals with the most important component of any research that is “DATA”. In this article, we describe the various types of data that are generated d u r i n g a n y s t u d y a s we l l t h e importance of why it important to understand what data has been generated. What, then is data? In statistics we use the term ‘variable’ to mean a quality or quantity which varies from one member of a sample or population to another, e.g. height of children, number of deaths in a clinical trial, blood glucose levels. Data is the information gathered about a variable, to arrive at a conclusion. Just as cotton is raw material for making clothes, data is the raw material to generate results of a study. Data has been categorized into different types and it is important to understand these types because it decides how the data is presented and analysed statistically. Data Types Data can be broadly divided into two types: Quantitative and Q u a l i t a t i ve . Q u a n t i t a t i ve d a t a answers the question “how much?”, for example, height, weight, blood pressure, serum phenytoin levels, blood sugar levels, intraocular pressure, etc. Qualitative data, on the other hand, as the name suggests describes a “character” or “quality”, for example, colour of hair or eyes, blood groups, gender, ethnic groups, etc. and answers the question “what type”. Whereas the former is measurable or quantifiable as a number (and hence also called measurement or numerical data), the latter can be put into categories, and is also called categorical data. Q u a n t i t a t i ve d a t a i s f u r t h e r divided into ‘continuous data’ and ‘discrete data’. Continuous data as the name suggests is continuous i.e. there are no gaps between two measurements. Serum cholesterol is a type of continuous data, since a person may have serum cholesterol value of 188, 188.1, 188.12 and so on. The continuousness of the data will depend on the sensitivity of the instrument used to measure the variable. Discrete data on the other hand cannot take any value in between two integers, like number of malaria cases, number of deaths due to myocardial infarction, etc. where there can be two or three deaths due to myocardial infarction, but not be 2.5 or 2.9 deaths. Hence, the variable cannot take a value other than a whole integer. Qualitative data is divided into two types; ‘nominal’ and ‘ordinal’. The name ‘Nominal’ comes from the Latin nomen, meaning ‘name’ and nominal data indicates a particular character or trait of the variable being studied. Items differentiated by a simple naming system fall into this category and the only thing a nominal scale does is to say that items being measured have something in common, e.g. colour of iris (black, grey, blue etc.) or type of cancer (esophageal, stomach, lung etc). There is no relationship or hierarchy in nominal data and it is independent, e.g. gender (male/ female) - One cannot say that males are better than females, or vice-versa, Blood Groups (A/B/O/AB) where it cannot be said that group A is superior to group O, or Religion (Hindu/ Muslim/ Christian/ Buddhist, etc.) where there is no logical order to the categories but they all have something in common. Ordinal data, on the other hand, has a specific order to it, e.g. headache (mild, moderate, Dept. of Clinical Pharmacology, Seth GS Medical College, Mumbai, Maharashtra Journal of The Association of Physicians of India ■ Vol. 64 ■ June 2016 Table 1: Types of data Type of data Subtype Example Quantitative Continuous Serum cholesterol HbA1C BMI Discrete Number of malaria cases Number of deaths due to myocardial infarction Number of adverse drug reactions Qualitative Nominal Colour of iris (black, grey, blue etc.) data Gender (male/ female/other) Blood groups (A/B/O/AB) Religion (Hindu/ Muslim/ Christian/ Buddhist, etc.) Ordinal Headache (mild, moderate, severe) Status of blood pressure (hypotensive, hypotensive, normal, high normal, Grade 1 hypertension, Grade II hypertension. severe), status of blood pressure (hypotensive, hypotensive, normal, high normal, Grade 1 hypertension, Grade II hypertension. Items on an ordinal scale are set into some kind of order by their position on the scale. You cannot do arithmetic with ordinal numbers -- they show sequence only. Both nominal and ordinal data are further classified into binary and non-binary data. When the variable can take up only two possible values e.g. yes/ no, dead/ alive, and there is no alternative other than the two it is called binary data. Table 1 summarises types of data with examples. Conversion of Quantitative Data to Qualitative Data It is possible to covert quantitative data to qualitative data to ease presentation and analysis. For example, the quantitative data of height (181 cm, 134 cm, etc.) can be converted to qualitative data (short, average, tall, very tall) by pre-defining cut offs for categories. Whether quantitative or qualitative data is to be collected, this has to be described in the study protocol à priori. Collecting what is inherently quantitative data like systolic BP values (120, 140, 190 mmHg) in the form of qualitative data like proportions of patients (20%, 40%, 60%) in normal, borderline hypertension and hypertension groups tends to weaken the data Thus, if you decide that hypertension will be defined as every individual with blood pressure above 140 mm Hg systolic, patients with blood pressures of 150mm Hg and 210mm Hg will be treated similarly, although there is a significant difference between the two values. The clinical outcomes will also differ between the two individuals. Therefore, although it 65 can be done, it is to be resorted only if necessary and applicable. Why it is Important to Study Types tf Data? The ‘type’ of data (whether quantitative or qualitative) will decide which type of statistical test (either parametric or nonparametric test) will be used for analysis. Qualitative data is always analyzed by non-parametric tests while Quantitative data can be analyzed by parametric tests ONLY if it is normally distributed. In conclusion, data, which is the basic unit for any research program and is analysed for reaching conclusions using statistical tests is broadly of two types: qualitative (or categorical data) and quantitative (also called as numerical data). Qualitative data can be nominal or ordinal while quantitative data can be continuous or discrete. The type of data determines statistical test applied and mode of representation of data. Although Benjamin Disraeli said, “There are three types of lies -- lies, damn lies, and statistics” we hope that this series will help readers sift the wheat from the chaff and remove the “lies” from statistical analyses. References 1. Michael J. Campbell, T. D. V. Swinscow, “Statistics at Square One, 11th Edition” 2009. BMJ Books. Wiley Blackwell.
© Copyright 2026 Paperzz