• • ... BRIDGING THE BIOSTATISTICSEPIDEMIOLOGY GAP: THE BANGLADESH TASK by Pranab Kumar Sen Department of Biostatistics University of North Carolina •! Institute of Statistics Mimeo Series No. 2131 , • May 1994 BRiDGiNG THE BiOSTATiSTiCS-EPiDEMiOLOGY GAP: THE BANGLADESH TASK I • BY University of North Carolina at Chapel Hill, NC, USA • In spite of having a concordant scientif ic base, biostatistics and epidemiology may not always have harmony in their philosophical as well as operational stands. Part of this dissension is due to a less than full appreciation of each others basic foundation and objectives, and there is ample room for better interaction and coordination. Incorporation of 'local' factors (and covariates) in epidemiological modelling and statistical analysis is a vital component in this respect. with reference to some of the vital epidemiological issues in Bangladesh, the role of biostatistics in their effective "resolution is discussed here in a broader perspective. Among the major disciplines pertaining to the pUblic health sciences, the triplet, biostatistics, environmental sciences and epidemiology, constitute the so called quantitative (or measurement) sciences, while health administration and health policy, health behavior, health education, health promotion and disease prevention, laboratory practice, maternal and child health, nutrition, parasitology, public health nursing and other disciplines form the pUblic health practice (or clinical) sector. Nevertheless, there is no rigid 1. • • PRANAB KUMAR SEN iNTRODUCTiON. demarcation of the boundaries of these territories, and there is ample room for effective interactions not only between the branches within the same sector but also between the sectors themselves. For example, without biostatistics, perhaps, nutrition and, to a certain extent, lAMS SUbject Classifications: , • Keywords and phrases: AIDS; biomathematics; biotechnology; Cancer; Cardiovascular disease; chemometry; clinical epidemiology; Cholera; demography; drug research; ecology;environmental science; etiology; infectious disease; immunology; inhalation toxicology; Kalazar; Malaria; occupational health; pharmacoepidemiology; pollution; reproductive epidemiology; statistical modelling; stochastic processes; toxicology. 1 maternal and child health may lose their scientific foundations. Likewise, biostatistics can not have its full development with respect to the integrated field of pUblic health without the input from each of the other fields referred to earlier. As a matter of fact, the impact of biostatistics is by no means confined to the field of pUblic health alone. There is hardly any area left in medical and health sciences where biostatistics has not made an appearance as an indispensable tool. More specifically, in medical investigations, multicenter clinical trials, biotechnology, immunology, medical diagnostics and image processing, dentistry, pharmacology, mutagenesis, chemometrics and a variety of areas of research and practice within the greater domain of health affairs, biostatistics is emerging as the binding force. It is an indispensable quantitative tool for the planning (or designing) of any objective investigation, a mentor for data collection and data management and a passport for invoking appropriate statistical • .' methodology pertaining to proper formulation of analysis schemes providing valid and efficient statistical conclusions from the experimental outcome. Such conclusions need to be relayed to the investigators and others who may not have profound statistical background, and the interpretational role of biostatistics. is also equally important. In a broad sense, biostatistics is a hybrid of biometry and statistics, inheriting the basic emphasis on biological applications from the former and the affections for sound methodology If from the latter. The advent of modern biomedical and health sciences has indeed stimulated the veins of biostatistics, and, without any reservation, it may be presented that the emergence of biostatistics as a discipline marks the most significant development in statistical sciences qualifying as a key technology as well as a refined art for decision making in every walk of life, science and society. This unique feature is bound to continue beyond the turn of this century. In this perspective, however, it should not be forgotten that long before the emergence of biostatistics, epidemiology paved the way for a genuine quantitative approach to the study of epidemics, infectious diseases, occupational hazards, and pUblic health as a whole; the grim environmental impacts on our life have caught the attention of concerned 2 • • • • • • people only in the recent past, and in these assessments, both epidemiology and biostatistics are indispensable. For centuries, different parts of this planet have experienced, periodically, divastating epidemics (such as cholera, dengue, jaundice, yellow fever, plague, pox and others). Infectious diseases (such as malaria, typhoid, hepatitis, tuberculosis and others) have showered catastrophic effects on human health and population dynamics. It's epidemiology which emerged as the basic quantitative approach to unfold the intricate relationship between quality of life (i.e., living styles and security) and susceptibility to such epidemics and communicable diseases. Sexually transmitted diseases and the immunological epidemiology both are on the top of research agenda at the current time. Modern epidemiology owes a lot to those pioneers whose deep foresight and penetrating line of objective thinking opened the doors for this genuinely important branch of public health. The epidemiology without the active and matching collaboration of biostatistics is incapable of meeting the challenge of today in pUblic health, and equally, without the epidemiological impacts, biostatistics by itself will be dehydrated of pUblic spirits, and cannot resolve the vital issues in this area of serious human concern. Thus, they are complementary to each other, so that there is a genuine need to nurture a healthy partnership between these two vital wings of pUblic health. This can be implemented effectively by only having a comprehensive view of the biostatistics-epidemiology integrated discipline, examining the foundations of each field, their strengths and weakness, and then attempting to bridging the gap, if any, between the two approaches which share the common obj ectives to a greater extent. Or, in other words, we need to find out suitable avenues for their fruitful integration. With this objective in mind, in section 2, we proceed to examine the interface of biostatistics and epidemiology, with due emphasis on their individual foundation, domains of applications and scopes for further augmentations. section 3 deals with their basic differences (in philosophy/concepts/operations) Which, often, force a dissension between biostatistical perspectives and epidemiological objectives. section 4 is devoted to the basic aspects of possible dissensions and means to 3 bridging the gap. In epidemiological studies, geographical, cultural, socio-economic, religious and other factors generally have profound impacts, and biostatistics has the right ingredients to formulating suitable models allowing the role of such factors and drawing meaningful conclusions and interpretations from such studies. For example, AIDS (or HIV) is on the march almost everywhere in the world, and yet, there are distinct geographical variation in the incidence rate, and, in this respect, socio-economic, cultural and religious factors are, often, • . important contributors towards the propagation of such an epidemic. For this reason, in the concluding section, the most prevalent epidemiological issues in the greater Bangladesh region (and in adjacent parts of India, Burma and some other territories) are highlighted with a view to emphasizing on the need for a Bangladesh task force to resolve these problems to some satisfactory extent. The main emphasis, in this quest, is, of course, on the need of developing more appropriate and adequate biostatistics concepts and tools in order that valid and efficient conclusions can be drawn. The basic drawbacks of the conventional statistical inference (and planning) -procedures in dealing with such nonstandard problems are also presented side by side, so that the need for novel methodology would be appreciated more. 2. THE INTERFACE. Almost one hundred years ago, in quest .. of quantitative models in heredity and anthropometry, the need for statistical methodology cropped up, and biometry emerged as a vital branch to deal with this specialization. Soon afterwards, the need for development of statistical methodology emerged in the field of agricultural sciences, and later on, in industrial sectors too. Public health sciences were themselves in rUdimentary forms (mostly), and statistics gradually found its way in this novel field as other areas started having considerable developments. No wonder that in most of the places, biostatistics and epidemiology were put in a common slot, although, it was not uncommon to house biostatistics in the so called department of preventive medicine (which is, often, in the school of medicine rather than public health). One of the interesting points in 4 • . .. .. • this genesis of biostatistics i~ that not only pUblic health recognized the need for sound biostatistical theory and methodology for its various tributaries, various branches of medical sciences also realized the need for implementation of biostatistics for not only statistical analysis of their experimental outcomes but also for sound planning of their (medical) studies. Government health departments and health agencies, in their quest for improving the various vital statistics records laid down emphasis on the need for training of biostatisticians with supporting programs in population studies or demography. Central agencies, like the National Institutes of Health (NIH) and Food and Drug Administration (FDA) in USA, the British Council of Medical Research (in UK) and other places, realized a far greater need for statistical methodology for their regulatory purposes. Not only control of the spread of various infectious diseases tops the list of their objectives, there is a genuine need to scrutinize marketability of new drugs being pushed by various pharmaceutical research groups (allover the world). At the same time, the pharmaceutical moguls started recruiting biostatisticians to conduct the needed statistical analysis of their studies with a view to getting approval from appropriate agencies for marketability of their products. In all these ventures, biostatisticians appear as an indispensable personnel. Awareness of environmental impacts has become a "household word" in the recent past. Smoking habits are going through a lot of basic changes, and environmental epidemiology has emerged as a vital area of pUblic health sciences. Biostatistics is an essential component in this sector too. While some of these features of biostatistics have been discussed in detail in Sen (1993)2, we would like to emphasize here mainly the interactive features of the triplet: Biostatistics, epidemiology and environmental sciences. They are not the same; they differ in their • approaches, coverages as well as basic philosophy. Nevertheless, they all aim at a common goal: How to improve the quality of our lives by controlling our environment (before it becomes unmanageable)? • 2Sen (1991) [Statistical perspectives in clinical and health sciences: The broadway to modern applied statistics. Jr. Appl. Statist. Sci. 1:1-50]. 5 Environment is not simply what we get from the sun (and other planets), air, water and other resources, but more on what we are contributing towards making our own lives unsafe! The thinning of the Ozone layer is a concern for the entire planet--not even least for the industrialized nations! Exhaust from Inhalation toxicology is a hot topic of study. industrial plants, gasoline and diesel combustions by automobiles (and airplanes too), continuing use of natural resources for energy producing plans, and the use of various chemical agents have raised the level of pollution (and radiation too) to an unsafe grade, almost allover the world. Chemical dumpings and nuclear wastes are causing serious water and subsoil contaminations. Can we drink the natural water any more? Can we breathe the air comfortably? Where are we heading to? Epidemiology has taken up this challenge (in cooperation with biostatistics and environmental sciences) to put a halt to such disasters. Let me present a very brief outline of the basic sectors in epidemiology with a view to providing more information on the current developments. ~ EPidem~cs a~d Infectious diseases; ~ Epidemiology The early diseases, __ EcologJ.cal J.mpacts; - ---- Clinical approaches; ~ Demographical shades; ______ Toxicology. developments related mostly to epidemics and • infectious and demographical approces provided the usual tools. course of this traditional progress, a striking feature emerged. In This led to the formal separation of the two philosophical slants: Ecology and toxicology. In ecology, not much emphasis is usually placed on the etiological issues, but more on the means of studying the nature of the development with a view to preserving our environmental treasures. toxicology, on the contrary, In emphasis is primarily on the cause and effect type of studies, and the environmental engineering impacts are 6 • more predominant in this respect. Nevertheless, for both ecology and toxicology approaches, biostatistics is indispensable! • To illustrate the coverage of modern epidemiology, let me mention a few of the most important areas: • (i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) • • (x) (xi) Infectious diseases epidemiology Epidemiology of Immunizations Biochemical Epidemiology Epidemiology of sexually transmitted diseases Reproductive epidemiology Cardiovascular diseases epidemiology Cancer epidemiology pharmacoepidemiology Occupational health epidemiology Environmental epidemiology Clinical trials in epidemiology There are many other tributaries. The important feature common to all these branches is that Epidemiology deals primarily with human life. Bearing in mind, the complexities of modern life, we may need to appraise the scope of epidemiology from a much broader perspective, and in this assessment biostatistics is the main tool. To relate the two wings of pUblic health in a harmonious blending, it is also necessary to examine the current state of the art of biostatistics. Although biostatistics literally means statistics relating to bio , or life sciences, its growth has not been restricted only to biological sciences. Of the variety of areas enriched with biostatistics (logic 7 and modelling), we may refer to.the following: Biomathematic/biomedical engineering statistical methodology Sampling methodology Biometry . Demography Genetics/Mutagenesis Environmental Sciences Epidemiology Clinical trials ........ Stochastic modelling B I o S T A T I S T I C S Neural network (neurophysiology) ·Medical diagnostics Tomography (Image processing) -Health Sciences (& Management) Medical Studies Data monitoring/management Biotechnology 'Chemometry Drug research & information There is virtually no limit to the scope of applicability of modern biostatistics theory and methodology. The question may therefore arise naturally: to be so attentive to epidemiology? Why biostatistics has The answer to this query is simple: In order to foster the healthy growth of research on quantitative (or measurement sciences) analysis in pUblic health, biostatistics has the natural obligation to side with both epidemiology and environmental sciences. Only then a full assessment of the scientific as well as equalpractical implications of pUblic health problems can be resolved effectively. For example in an environmental or epidemiological problem (say, inhalation toxicology), it may be quite tempting to incorporate 8 . • • • biomathematics or biomedical engineering methodology and formulate a suitable quantitative model to depict the nature of the development. . However, to make it applicable in a particular context, the socio- economic, graphic and other concomitant information may be vital for an effective resolution. Moreover, keeping in mind the stochastic elements embedded in such a formulation, there is a genuine need to introduce stochastic components along with the deterministic ones. Then, one may need to set suitable margin of fluctuations for the stochastics, so that the deterministic factors can be quantitatively assessed, objectives of the study may therefore be met adequately. ological concepts, demographic measures and and the Epidemi- environmental interpretations are by no means adequate to address fully this problem. . . • Likewise, (mathematical) inappropriate statistician to handle be statistics, such fully by itself, problems. It conversant is may be essential wi th grossly that the a basic epidemiological/environmental aspects of the problem before planning such a study, and subsequent statistical analysis schemes may have to be innovated in the light of the specificities of such problems. interface of pUblic health measurement sciences is The therefore of fundamental importance for any objective scientific investigation and effective resolution of pUblic health problems. is highly valuable in this respect. Mathematical statistics However, the common assumptions which are generally made in statistical modelling and inference may not be tenable in the pUblic health content. sampling (with or without replacement) For example, equal probability and independence of observations may not generally hold in pUblic health studies. 9 sample The nature of sampling may depend very much on the particular problems under study (viz, air pollution, water contamination, follow-up studies, retrospective studies, case-control studies, etc.), and typically, in such a case, standard statistical tools may not be usable. Biostatistics has been designated as the branch of statistics to deal with such more general, more practical models, to develop . novel statistical modelling and analysis tools and to provide meaningful interpretations of the study-outcomes not only to statisticians but also to public health scientists as well as general pUblic for the betterment of their lives. 3. THE DISSENSION. (Bio-)statistics is more of an art (than a science) for providing meaningful interpretations and drawing valid and objective conclusions from experimental or observational studies, and is guided by the basic principles of statistical methodology in carrying out this delicate task with controllable margin of errors. hardly be characterized as a bonafide member of the It may club of mathematical, biological, physical or engineering sciences, and there should not be any attempt to coin the term statistical sciences to classify (bio-) statistics as a science discipline. compared to the social sciences On the contrary, (including economics/management and political sciences), statistics has a far more visible scientific base, although in mathematical principle, sciences. objectivity of a it differs from Nevertheless, scientific study, the basic biostatistics allowing the philosophy of combines the flexibility of an uncontrolled experimental setup, and retains its adaptability as a handy 10 • tool for scientific assessment Accommodation of this broader domain of in a wide spectrum of aims biological, medical range of applications. and objectives, within the and public health sciences, naturally calls for a close examination of the basis (nonstatistical) problems from Scientific as well as socio-economic perspectives. This is generally accomplished by the rule of three in biostatistics: planning (or modelling, design) and of the (iii) data experimental collection, scheme, (ii) monitoring and statistical conclusions from the experimental outcome. approach enables biostatistics also to (i) statistical drawing of This combined provide meaningful interpretations to the experimenters as to the general outcomes of the study. In this delicate task, inference • • procedures are to statistical planning, be blended harmoniously flexible and yet efficient manner. modelling and in an enough Therefore, there remains room for more thoughtful provocations of statistical theory, methodology and basic principles in the general development of pUblic health sciences. As has already been pointed out in the previous section, majority of pUblic health investigations, epidemiology in a and/or environmental sciences occupy the focal stand, and thereby dominate the scenario. Since the basic problems generally crop up in this area, it is quite natural to plead to their general principles for seeking interpretations, motivations and justifications of the general features underlying such experimentations. Frankly, without this patronage, there would be hardly any ground for a statistical planning/analysis of • pUblic health studies. The main point is therefore a proper coordination and mutual understanding between biostatisticians and other 11 public health researchers so that a genuine pUblic health problem can be statistically well formulated, and then appropriate statistical tools can be incorporated for proper planning of the study, safe·and efficient data collection and useful statistical analysis. outside the scope of reachibility! This objective is not Environmental science has a strong . engineering component where chemistry and physical sciences play a basic role. As such, it is not unexpected to observe an engineering overtone in the formulation of environmental problems. this conventional setup, environmental and Ecology is a dissident in and it also serves as the liaison between epidemiological approaches; nevertheless; ) statistics is a very vi tal component of ecology too. deals more directly with human population. (bio- Epidemiology It started with the direct etiological issues of epidemics and infectious diseases and then with the assessment of their aftermath. consideration stochastic dominate elements indispensable in over in In this setup, often, ecological etiological the issues, investigations epidemiology. and make Toxicological the prevalent biostatistics approaches in epidemiological studies are more etiologic-oriented while demographic approaches, often, lay less emphasis on etiology and more on descriptive demographic factors. Clinical epidemiology is a relatively new branch where a marriage of (bio-)statistical principles with epidemiological objectives is usually aimed in a setup. (reasonably) controlled experimental Randomized clinical trials have been incorporated, during the past twenty five years, in many vital epidemiological investigations. The wealth of information acquired in this manner has greatly advanced the state of our knowledge regarding environmental and epidemiological factors. 12 many of these medical, • From pUblic health perspectives, it is quite clear that there is a genuine need to nurture healthy growth of each of its vital components, . and yet, there has to be a coordinated effort in fostering interdisciplinary developments so as to place the composite field in an . effectively adaptable standing. The dissension has its roots in the intricate philosophical as well as manual channels of the individual disciplines. It may not be improper to say that, often, the epidemiologists are themselves divided by their basic ideological (or philosophical) slants: Whither etiological, clinical or ecological foundations! Environmental, cardiovascular and reproductive epidemiology, toxicology, mutagenesis and some other areas are, by formulation, more etiologically oriented, while in cancer epidemiology, AIDS and even in respiratory disease epidemiology, there is a considerable overtone of ecological • conceptions. The clinical epidemiology comes in between the two extremes, and it attempts to provide meaningful interpretations through a clinical approach which etiological study, and yet, preserves the basic objectives of an accommodates ecological foundations to a greater extent. It is quite clear that the role of biostatistics (in modelling, monitoring data presented in a approaches. and statistical analysis) can not be single framework for these wings of epidemiological It is therefore imperative for biostatisticians to identify first the basic epidemiological foundation before even planning a study and to analyze the observational data. it is equally imperative for the epidemiologists to convey clearly the etiological, clinical, ecological 13 and demographic basis of a study, so that a proper communication channel can be established with the biostatisticians and a healthy resolution of their common problem be made. In practice, this may not, often, be the case. Both the disciplines (i.e., biostatistics and epidemiology) aim to acquire, (viz, as much as possible, case-control studies, information from observational studies field trails, follow-up studies, retrospective studies, clinical trials, demographic surveys etc.), and yet, because of their basic philosophies, they differ considerably (and, often, incomprehensively) in their conclusions from acquired data banks. cause of their dissension. operational manuals for drawing This is, by far, the most notable Let's illustrate this basic phenomenon with a concrete clinical epidemiological model, and a similar problem crops up in almost all other areas in a epidemiology and public health, in general. Suppose that a clinical trial is to be planned for studying the effect of high-fat diet on heart diseases. In a quantitative approach, the first and foremost job is to define the "risk" of heart attacks in a clearcut manner. Angina Pectors, strokes (in the ascending aorta), arteriosclerosis, Perkinson's disease, and heart attacks are all members of the general class of cardiovascular diseases. The diet may have differential effects on different members in this class, and hence, it needs to be clearly formulated: that is the primary end point o'f the trial? Are there any secondary end-points? point study? study? Is it then a multiple end- What are the etiological considerations underlying such a Has there been any pilot study to collect some preliminary 14 .. information relating to such formulations? Phase I . trials, there might have been some preliminary studies with subhuman primates: How far the response characteristics on such animals can be projected for human sUbjects? .. As often is the case with valid in the current setup? Or, is accelerated life testing Above all, the clinical trial is planned for human sUbjects, and therefore there are a number of basic queries: (i) What sector of the population is the focal point of such a study? (ii) ( iii) (iv) (v) (vi) How are they to be recruited? Retrospective or follow-up study? How to define properly high-fat diet? Racial/socio-economic/ethnic considerations? Besides high-fat diet, are there other important factors which are very relevant (viz., smoking, drinking, lack of physical exercise, tension at working place/home, family history etc.)? (vii) Whether to choose some of these factors as treatment variates or as concomitant ones? (viii) How much "control" the trial may have on the follow-up scheme with such "outpatients"? (ix) What to do with "noncompliances" due to dropouts/ withdrawals/failure due to other causes? (x) How to cope with the medical ethics? There may be one thousand and one other considerations! There is a endpoint(s) in lot of arbitrariness in the formulation of primary clinical trials. Medical and epidemiological considerations, often, do not match very harmoniously, and, as a result, 15 statistical formulations are, areas often, not so precise. in statistical sciences (viz, In some other agricultural experiments/animal studies), statistical hypotheses are generally quite precise and because such experiments can be conducted with a greater amount of control, statistical formulations are usually quite simple. In epidemiological/environment studies, this is not generally the case. Because of possible concomitant multiple variates, deterministic ones, the end points stochastic and hence, and a aspects, large often, number dominate extra care is needed to of the identify a suitable model, plan the study in such a way that information pertaining to such a model can be properly extracted from the experimental outcome, and to formulate valid and efficient statistical tools for drawing conclusions from the acquired data set. with respect to each of these three basic considerations, epidemiological approaches are, often, at cross-roads with biostatistical ones. From epidemiological point of view, it may be quite natural to assume that the more is the number of factors and response variables included in the stUdy, the greater would be the amount of information, and hence, the better would be the quality of conclusions to be drawn from the experimental outcome!!! were no stochastic elements in this formulation, would have been very reasonable. setup, the larger variables/concomitant is variates, such an expectation On the contrary, the number there will of be If there in a stochastic factors/response greater amount of variability from true pattern, and hence, there will be greater risk for making incorrect/imprecise conclusions There is therefore a genuine need 16 from for the acquired data reconciliation of set. the .. deterministic vs. stochastic. undercurrents, and the integrated biostatistics-epidemiological study has to be founded on this mutual standing. Any development in isolation is inappropriate and inadequate for an integrated study. ... Let me iterate the last point with the following salient points • Suppose that in the planned clinical trial, it is aimed to compare the cardio-vascular problems of a control group (of low-fat diet people) and a treatment group (of high-fat diet people). If really the control group has a lower risk then medical ethics would prompt us to curtail the study as early as possible (with due evidence on this significance) and to switch all the sUbjects to the low-fat diet group so that they have a greater chance of survival. It is not, therefore, uncommon to have an interim analysis scheme where the accumulating data set is . periodically examined with this early possible termination in mind: Does it matter how many times you look at the date? Indeed, biostatisticians and epidemiologists, often, do not agree on the format of such interim analysis! Fortunately, the academics as well as the agencies are aware of this basic methodological issue, and the past two decades have witnessed a phenomenal growth of research literature on interim analysis. Randomized clinical trials have been proposed and successfully conducted to inflict greater amount of control through the so called "double blind" studies, and yet, from a practical point of view, there remain the concern: How to make sure that randomization works? Statistical analyses are usually made to draw valid and efficient statistical conclusions based on the trial outcome. 17 Interim analysis has led to time-sequential, progressive censoring and/or repeated (as well as group sequential) significance testing schemes. Although these are being advocated more and more, the intricate stochastic base of the trial data set biostatisticians is often as (SPSS/BMDP/SAS/S-plus, misunderstood well. This etc. ) by is the epidemiologists Routine unfortunate. statistical packages and often are mechanically adapted for such statistical analysis, and the Cox (1972)3 proportional hazard model (PHM) has become epidemiologists and biostatisticians as well! a household word to There is a genuine need to examine the appropriateness of any specific model and/or program in a specific case, and this calls for a lot of coordination between epidemiologists and biostatisticans. 4. THE BRIDGE. Biostatistics, having its genesis in biomedical, clinical, health and statistical sciences, has been catering their needs very well. Likewise, epidemiology initiated a quantitative approach to a general class of problems in health sciences and is very much aligned to the modern pUblic health sciences. Both the disciplines deal, in an objective manner, with quantitative (or measurement) aspects of health problems, and they share a common ground in their foundations too. This concordance of their basic concepts and objectives is indeed the bridge between the two seemingly less related wings of public health. There is, however, a basic need to fortify this link in a way as to allow free trespassing of ideas, concepts and operational manuals from one camp to the other. 3COX (1972) [Regression Modes and Life Tables discussion). J. Roy. statist. Sec. B. 34:187-220.] 18 (with The mathematical (which (or theoretical) is more popularly known as statistics) counterpart of biostatistics the theoretical or mathematical has an inherent tendency to rely heavily on probability theory, measure theory, real and functional analysis as well as other areas of (pure and applied) mathematics, and may, obscured in biostatistics abstractions. to refined This generally theoretical at times, be quire limits statistics, the access although this of is certainly quite healthy for development of theory and methodology which can be incorporated in biostatistics as well. For the latter task, it may be desired to have strong interactive research in biostatistics (theory and methodology) wherein the mathematical sophistications can be decoded to a grater extent for making room for fruitful applications. Nevertheless, in a majority of practical problems arising in biomedical, clinical and health studies, direct adoptations statistics may stumble into read blocks: considerably, leading to possibly from theoretical The basic setups may differ different sets of regularity assumptions, so that an adoptation, without checking the validity, may, often, be disastrous in biostatistics applications. Fortunately, at the present time, through the intervention of competent statisticians having sound theoretical background and keen interest in biostatistical applications, there has been a steady growth of research work on the interface of theoretical statistics and biostatistics. Epidemiology comes from the other corner, and putting theoretical statistics on the same table with epidemiology may not always be wise. • Biomathematics, often, serves as a liaison between mathematics and biological sciences, 19 and there are some subtle biostatistical approaches. di.fferences between biomathematical and Let us examine these basic points so as to prepare the way for a proper bridging the biostatistics-epidemiology . gap. (i) Whither biomathematics? In a sense biostatistics combines the -. mathematical objectivity of biomathematics and the statistical concepts of theoretical statistics, and hence, serves as a better bartender in health sciences where stochastics may not play any insignificant role compared to the deterministic factors. emphasis on the modelling part Biomathematics by somewhat more incorporating all the deterministic factors, and these models work out well when the stochastic components are not so dominant. such a setup, It is indeed possible to include stochastics in but the resulting picture may depend very much on the distributional properties of such stochastic elements. Often, . stochastic differential equations are imported to describe biological systems. But their adoptability in a given context may depend very much on the regularity assumptions on the stochastic components. In most of the health sciences-problems, these regularity conditions are generally more complex so that such stochastic differential equations may not lead to simple solutions. Theoretical statistics may have some problems too, and discussed. later these will be on. From this perspective, biostatistics offers a better choice. (ii) Beyond the LLd. sampling. Conventionally, in (mathematical) statistics, it is assumed that the sample (on which statistical theory has to be developed) consists of independent and identically distributed (i.i.d.) random elements. In sample survey methodology this relates to 20 • the so called simple random sampling with replacement (SRSWR). In practice, often, the population size (N) is finite and sampling is made without replacement (WOR). probability SRSWOR. .. The first step towards this is the equal In Socio-economic, agricultural and demographic surveys, often, a stratification of the population is incorporated along with SRSW(O}R to reduce further the sampling error. methodology has decades. Objective sampling gone through an evolutionary growth In many epidemiological investigations, in the past 4 (stratified or not) SRSW(O)R are not the appropriate, and, usually more complex sampling schemes are adopted to suit the practicality. control studies, field trials, retrospective For example, or follow-up in casestudies, matching and other covariate adjustments often eliminate the possibility of using . • i. i. d. sampling or some minor modifications of the same . Length biased sampling or weighted sampling is becoming quite popular in environmental studies. In theoretical statistics, there has been, too, some developments to deal with more complex situations than in i. Ld. sampling, although the main emphasis is on long-range dependence which typically arises in time-series models. Spatial statistics is another area os theoretical interest but have good scope for applications in a variety of models where the so called L Ld. appropriate. diverse type Naturally, of there is a sampling schemes structure may not be genuine need to that are look into the appropriate for epidemiological (as well as environmental) studies, and to develop more • • statistical methodology so as to make them usable in the field of pUblic health. 21 (iii) Beyond the Parametrics. There is an abundance of rates, ratios and proportions in epidemiological measures and interpretations. Thus, it may be quite intuitive to incorporate some simple parametric models (resting hypergeometric, mostly on binomial, Poisson, exponential or normal distributions) analysis of such studies. binomial, in statistical However, most of these parametric models are tied-down to SRSW(O)R, while (as explained in (ii» studies, negative in epidemiological such simple sampling schemes may not be that relevant, and hence, more complex parametric models may crop up in such studies. In this model, the binomial (hypergeomtric) law has been extended to the beta-binomial and the Poisson to the negative binomial laws. still then, the situation is not totally satisfactory. most of the approach, epidemiological models, because of the complexities, even The bottom line is that in one adopts a parametric the number of parameters may become large, and in terms of robustness, the statistical procedures become more vulnerable. [Refer to a general birth and death process allowing nonstationarity in the rates and also immigration in a general form.] It is therefore desirable to consider more complex statistical models which are built on the epidemiological axioms, and, with due emphasis on validity and robustness considerations, formulate efficient and yet practically adoptable statistical procedures. Nonparametric methods generally fare well in this respect. (iv) Whither Linear Models. Linearity of regression function, homoscedasticity and normality of the error components(s) form the basis of linear statistical inference. In many biological, epidemiological t and medical studies, the error distribution may be highly skewed, and 22 hence, often, a transformation (viz, Box-Cox or others) is made on the response variable to induce "more normality" into the system! Such transformations may not only affect the error distribution but also the underlying linear model, if any. • As a result, such linear statistical inference has been characterized as highly nonrobust in many practical applications, and epidemiology is no exception. In many epidemiological studies, the response variable may be binary (or ordered categorical), and hence, the usual linear model may not be very appropriate. There has been a steady growth of research literature on such models, and logistic regression, and more generally, generalized linear models have evolved to eliminate some of the basic inapplicability aspects of the classical linear models. nevertheless, because of the basic differences in the sampling schemes, one needs to pay close attention to the scope • of such generalized linear models to complex sampling schemes (other than SRS). (v) Misclassification in Data Synthesis. In epidemiological studies, often, the response or independent variables are misclassified (due to latent effects or other reasons). Such a misclassification can have serious effects on the validity of statistical conclusions to be made from acquired data sets, and in a majority of cases, severe bias crops up due to such misclassifications. In epidemiology, the terms sensitivity and specificity refer to the effects of misclassification, and there remains a lot to be accomplished to make them usable in more • .. complex sampling schemes . (vi) Measurement (or observational) errors. This phenomenon is related to misclassification, although this is somewhat more specialized 23 to error in measurement rather than classification of states etc. is not only on the primary scope of measurement errors The (response) variate (when it is continuous, discrete or categorical) but also on other auxiliary or concomitant variates. Fortunately, the past few years have witnessed a steady growth of statistical research literature . on this vital topic, albeit mostly relating to SRS and some other simple models. There is a genuine need to include more complex epidemiological models in such studies. (viii) Interim Analysis. Medical, epidemiological and many health studies are often based on a follow-up scheme which pertains to accumulating datasets consensus among over a period of epidemiologists, time. there biostatisticians is a general and medical researchers that (statistical) monitoring of such a follow-up study can not only lead to an early termination of the study having time and cost (as well as human lives too) but also providing greater control on ethical constraints as well as other (viz. side-) effects. On the contrary, in order to do this statistical monitoring in a valid and yet eff icient manner, attended there are certain basic factors properly. Sometimes, these create the dissension between two statisticians and others! argue: which are to be basic course of Epidemiologists may Does it matter how many times you look into the accumulating data set? An uncomfortable biostatistician may try to nod in protest: How are you going to control the level of significance of the test or coverage probability accumulating data set? of the estimates you want to base on the • Fortunately, the situation is far more clear now t than twenty years ago. Interim analysis has been accepted as a valid 24 statistical decisions tool too) for or studying statistical accumulating data sets. properties Repeated (and making significance testing, group sequential tests, time-sequential procedures , progressive censoring schemes have been systematically developed to handle this .. basic problem in a class of situations. Nevertheless, there is a genuine need to incorporate more complex epidemiological models in this statistical methodology. (viii) Clinical Epidemiology: Whose Child is it After All? It is generally claimed that it is a hybrid of medical and epidemiological concepts and practices! Nevertheless, the most signif icant component in this venture is biostatistics. The conventional methods of collecting information for epidemiological research may not always work out that well when dealing with chronic diseases or with other epidemiological .. studies without having a strong etiology, so that a clinical trial with an adequate number of sUbjects maybe planned to gather the pertinent information. usually sUbj ects are enrolled into the proj ect from a pool of volunteers, and it is imperative that they agree to go through the clinical trial protocol and satisfy the basic need of randomization effectively. The very planning of such a study demands considerable statistical expertise. The scope of such a study is limited to the population for which the volunteered sUbjects form a Issues of noncompliance, elimination of bias, (random) sample! effectiveness of randomization and validity of standard analysis tools are the most • pertinent ones. They call for a sound and thorough interaction between epidemiological objectives and statistical principles. There is light at the other end of the tunnel, and we hope for the best to emerge in near future. 25 (ix) Control studies, of Extraneous Factors! control of extraneous factors In many epidemiological (especially from etiological perspectives) is essential for elimination of relatively less important and unrelated causes or factors, conclusions can be drawn. so that more precise statistical Since, human sUbjects are typically involved . in such studies, it may not be possible to have a completely controlled experimental setup (as in agricultural/laboratory experiments). For this reason, are matching, analysis of adopted to enhance compatibility. covariance and other means Therefore, the real challenge for biostatisticians is to fathom out the intricacies of the sampling design [see (ii)], and, in view statistical methodology for of such complexities, possible epidemiological investigations. to incorporation develop in a proper variety of Better not to pass on the blame to epidemiologists for not using a sophisticated statistical package, but . to put more emphasis on interactive research which would permit easy access to such complex statistical models. (x) in the Beyond Epidemic Theory. quantitative assessment Although epidemiology has its roots of epidemics-etiology, its branching includes a far wider spectrum of objectives; toxicology are both integrated components of the same. )mathematical models for the common epidemics theory), in spite of their elegance, modern epidemiology. current ecology and the (bio- (known as the epidemic may not be very pertinent for There is a genuine need for more complex, 'more flexible, stochastic modellings in modern epidemiology. This approach should be capable of incorporating the etiological factors along with t 26 the ecological aspects in a way to have a natural evolution. On both counts, regional, cultural, religious and other factors are important ones, and hence, in the concluding section, I will touch on some of these issues with special reference to Bangladesh. 5. It is almost impossible to track down the THE BANGLADESH TASK. entire set of epidemiological issues in a densely populated country like Bangladesh. I shall only consider a few important issues, and discuss the related biostatistical problems. with a geographical area not larger than the smaller states (in USA), Bangladesh has a population more than half of USA. Many of the epidemiological issues revolve around this enormous population, its relatively low profile in income, health and education, industrialization. torrential rains) and its low standing with respect to Natural calamities (e.g., tidal floods, tornadoes, invade this country very regularly, famines are not unexpected. and recurring The rate of population growth is one of the highest ones and the per capita income is in the lowest category. Yet, Bangladesh is making remarkable progress in various directions, and in this venture, biostatistics plays a vital role. following: (i) (ii) (iii) (iv) (v) (vi) Reproductive Epidemiology Child and Maternal Health Epidemiology AIDS and Venereal Diseases Depression and Mental Illness Cholera, Diarrheal Diseases, Kalazar Malnutrition 27 Let me mention the (vii) Pollution and water contamination (viii) Smoking, Cancer and tuberculosis (xi) (x) Cardiovascular Disease Epidemiology • Chronic Disease Epidemiology The sheer weight of population has made demography as the custodian .. of these studies on Bangladesh, and no wonder, that among the Bangladesh statisticians, there are more demographers than in any other branch! The wealth of demographic background and insights may very well be utilized in all of the areas of epidemiology referred to above. I understand that the International Center for Diarrheal Disease Research (ICDDR), located in Dakha (Bangladesh) is also a prominent center for the study of cholera and other intestinal diseases, although primarily from epidemiological demography Research) group, of and the their demographical ICDDR own, which epidemiological findings. although more in has a points journal collects and of view. (of Like Diarrheal disseminates the Disease pertinent There are occasional biostatistical sparks, line with demography. There is a department of Nutrition (in the PG Hospital at Dakha) and the National Institute of Preventive Medicine expertise too. (also in Dakha) On the other hand, which have some biostatistics the main thrust of statistical activities (research as well as applications in various fields) rests with the department of primarily an academic Statistics, University of statistics This is institution with the provision of attracting bright students and faculty members from the academia. established Dakha. training program, and Also, it has an this journal (of statistical Research) is a convenient outlet of their creative research. 28 t Therefore, this journal, amidst its silver jubilee celebrations, should be charged to enlarge its area of jurisdiction. transfer the framework to epidemiology or There is no need to demography, but it is imperative to include all the vital components of modern statistics in • "statistical Research". This research should bridge the gap not only between biostatistics and epidemiology (with especial emphasis on the Bangladesh issues) methodological but issues also between operational arising thereof. biostatistics The epidemiological and issues referred to above are some of clinical nature, some etiological while others are more ecological oriented. Only biostatistics can bring all of these apparently diverse approaches into a common stream, and for this unification, methodology. . the main task This development, is the development in turn, of appropriate depends heavily on active collaboration of researchers from these diverse fields with competent statisticians who are interested in extending standard methodology to such nonstandard situations. unrestricted use of some standard packages (such as logistic regression/proportional hazards model etc.) without checking their validity and appropriateness in such nonstandard situations is dangerous. It is the (bio-)statisticians' responsibility to model develop tools for specification, cross-validation and statistical analysis, and only then the bridging of the biostatisticsepidemiology gap will be complete. The legendary physician, Dr. Bidhan Chandra Roy, made the remark (in the context of control of Kalazar) that for every disease there ought to be a local clue (solution) for the remedy [Presidential address as the Indian Science Congress Association &. Meeting, Calcutta, 1957]. It may not be an exaggeration to iterate his 29 legendary remark further in saying that for every epidemiological problem there ought to be certain local/regional factors which provide key information (on etiology as well as ecology) which should form the • base of plausible biostatistical resolutions (regarding planning of the study, data collection and monitoring, statistical conclusions). the key technology model selection and drawing Therefore, we may propose biostatistics as in extracting local/regional information to the maximum extent possible and to incorporate the same in a valid and efficient analysis of epidemiological investigations. For the assessment of cholera epidemiology, for example, the water contamination problem, the infectious nature of the disease and local socio-economic patterns are all vital components, and their study depends a lot on regional as well as social conditions. As such, any model one wants to consider must take into account such factors on a local/regional basis. If Therefore, a specific model pertaining to the Indus-delta (Karachi area) in Pakistan, or the Hooghly-delta (India) region may not have the same etiology or ecological factors as in the Padma-delta in Bangladesh. In smoking/respiratory disease epidemiology, similar regional factors are very pertinent. (Cigar, cigarettes and bidi may not have the same impact, and how about chewing tobacco?) In cardiovascular epidemiology, the diet pattern, physical exercise, smoking/not and many other factors have distinct regional differentials, and hence, the model for Pakistan or even India may not suit very well Bangladesh. epidemiology may contain a Chronic diseases significant genetic effect, regional/cultural/socio-economic impacts are and, overwhelming! again, AIDS/ venereal diseases and other sexually transmitted ones have distinct 30 • regional effects: The models. for Kenya may not be appropriate for Ethiopia !Industrialized nations may have different contours than • developing measure. • FAO (UN) ones. Malnutrition is a somewhat imprecisely defined It has been observed (through the pioneering efforts of the under the leadership of Professor P. v. Sukhatme) that the amount of calorie intake with our food not only depends on the physical characteristics (such as age, climatic factors. height, weight etc.) but also on the For example, in the northern Europe (or Canada), a daily average of 3, 000 calorie is ideal (especially in wintertime), where as in the Indian sUbcontinent (especially in the coastal regions) a thousand calorie would suffice. matter of debating: Different The intake of protein is similarly a religious and cultural sectors have different patterns, not to speak of socio-economic factors within each • sector! As such, a first and foremost task in the study of malnutrition is to define the norm taking into account all such pertinent factors. Again, the resolution has to be regional/cultural/religious factors. another customs, area regional where factors social have profound highly dependent Reproductive economic impacts. we the epidemiology conditions Can on and expect is other that in Bangladesh the model of India/Pakistan or any other nation will be totally appropriate? It is a complex study involving demographic under current, government policies, birth control and contraceptive measures and religious/cultural factors. In the Indian subcontinent with evident emphasis on male births, the number of children in a family may also depend (at least, stochastically) on the outcome of a boy or girl! Maternal health (and hence, child health too) 31 are not in commendable shape in Bangladesh (India too). to improve the situation? What are the most important measures Can biostatistics be kept at bay in this study too? • To sum up, take up this I would strongly urge the Bangladesh statisticians to challenge "statistical research", appropriate statistical of augmenting biostatisticians in their • and to pay special attention to developing models, sound statistical methodology and efficient (and yet simple) statistical analysis schemes with direct and in depth collaboration with scientists from epidemiology, pUblic health and clinical sciences, in general. To eliminate poverty, to eradicate malaria as well as illiteracy, to eliminate malnutrition and to provide a balanced diet to population of all ages, to combat with epidemics and infectious diseases, to survive the AIDS episode, to be able to breathe fresh air, drink natural water and to live in peace and good health, we need public health awareness and advancements, and biostatistics is the binding force for all disciplines in this greater domain. Let statistics in Bangladesh embrace biostatistics and bring the much needed monsoon of applicable methodological research, and let this be reflected in the Journal of statistical Research in the years to follow. t i • 32
© Copyright 2024 Paperzz